Field Verification and Comparison of Field Applicability with Different Meteorological Data Sources of Species Distribution Models for American Bullfrogs (Aquarana catesbeiana)
Abstract
Species distribution models (SDMs) can be helpful for managing invasive species. Although prevalence of SDMs has increased considerably, they require further field verification. Data within each country could further improve regional outcomes. To test the accuracy of these models, we developed an SDM for Aquarana catesbeiana (American Bullfrogs) based on WorldClim and Korea Meteorological Administration (KMA) data and verified it through field surveys at 430 sites. For comparison of WorldClim and KMA data, we evaluated the SDM and field applicability of the two data sets and compared them with the values of respective variables. Predicted probability was slightly higher than observed probability in most areas. However, the opposite occurred in areas with lower predicted probabilities. Predicted frequency and observed frequency showed a similar probability trend in the field verification. WorldClim and KMA data showed numerically different results, and the two models displayed high field applicability overall, with some areas being exceptions. We recommend vigorous surveys to obtain more presence data and field verification for highly accurate models and maximum field applicability. Although our two data-based SDMs showed high field applicability, using meteorological data from multiple local sources should be considered in regional studies because the WorldClim data showed some inaccurate values. Our work offers a case study in evaluating the suitability of distribution models for assessing the potential distribution range of a target invasive species, informing associated management plans accordingly.
When invasive species are introduced into nonnative environments, they can become economically and environmentally harmful to native ecosystems (Pimentel, 2001; Beck et al., 2008; Park et al., 2022a). Hence, invasive species are one of the leading causes of worldwide extinction of biodiversity (Ficetola et al., 2007). Currently, with increases in international trade and pet markets, spread of exotic invasive species is increasing worldwide (Gollasch and Leppäkoski, 2007; Park et al., 2022a, b, 2023; Cheon et al., 2023). Accordingly, the International Union for Conservation of Nature has declared a list of “100 of the World's Worst Invasive Alien Species” to highlight the fascinating complexity and negative impacts of invasive species (Lowe et al., 2000).
American Bullfrogs (Aquarana catesbeiana, hereafter referred to as “bullfrogs”), one of the “100 of the World's Worst Invasive Alien Species”, are large amphibians native to northeastern North America that have been introduced to more than 40 countries (Lowe et al., 2000; Lever, 2003; Ficetola et al., 2007; Park et al., 2022a). Although bullfrogs have been introduced to many countries, mainly as a food source, escapes and deliberate releases have created natural populations and posed serious threats to native species (Dejean et al., 2012). As an invasive species and a carrier of Batrachochytrium dendrobatidis, bullfrogs have spread disease and caused ecological disturbances by predation and disruption of native species breeding in countries where they have been introduced (Kraus, 2015; Borzée et al., 2017; Park et al., 2018; Gobel et al., 2019; Park et al., 2022a).
As in other countries, bullfrogs were introduced and cultivated in South Korea for food in 1957 and 1971 (Kim, 1971, 1975). Initially, many bullfrogs were released into natural environments because of a lack of economic resources and ecological awareness (Kil et al., 2011). Currently, bullfrogs are widely distributed across the country, including on islands, and have been designated as an “ecosystem disturbance species” by the South Korean Ministry of Environment (Park et al., 2018). However, the agency has been concerned that little research on bullfrogs is being done, and thus the management and eradication program has not shown much success (Kim, 2020). Therefore, it is important to continuously conduct research on bullfrogs using robust methods, including species distribution models (SDMs), to manage bullfrogs efficiently as an invasive species in South Korea.
SDMs aim to predict the probability of a species's spatial distribution and reveal relationships between distribution and influential geographic and climatic factors (Guisan and Thuiller, 2005; Franklin, 2010). Recently, the prevalence of SDMs has increased (Hof et al., 2012) in attempts to predict impacts on biodiversity by climate change in various fields (e.g., biogeography, ecology, conservation biology, etc.; Elith and Leathwick, 2009). Prevalence of species distribution modeling, especially for invasive species, has increased because it is effective in predicting and managing potential species distributions by applying many multifactorial algorithms (Ficetola et al., 2007).
Various models and their algorithms, such as the Maximum Entropy Model (Maxent), Generalized Additive Model, Generalized Linear Model, and Random Forest, are used to produce SDMs (Austin, 2007; Elith et al., 2011). Among the different models, Maxent, which uses only occurrence data, is one of the most widely used with high reliability and accuracy (Elith et al., 2006; West et al., 2016; Phillips, 2017). However, Maxent uses only pseudo-absence data, so the likelihood of an actual absence cannot be estimated (Evangelista et al., 2008; Kumar et al., 2009). In addition, Maxent evaluates model results by using part of the species occurrence data as verification data; it has a weakness in evaluating a wider range than available species occurrence data cover (Fielding and Bell, 1997; Elith et al., 2006; West et al., 2016). Thus, to improve reliability of model accuracy, independent field verification is required (West et al., 2016).
Ecological and environmental models mainly use gridded climate data (Poggio et al., 2018). WorldClim's global climate grid data has been widely used in environmental, agricultural, and ecological fields (Fick and Hijmans, 2017). WorldClim's data are considered a reliable source that provide readily available information about current and future climate change scenarios (Poggio et al., 2018). They are mainly applied to bioclimate modeling and are used in research on movement and spread of invasive species and pests (Poggio et al., 2018). However, WorldClim data consist of versions 1.4 (from 1960 to 1990) and 2.0 (from 1970 to 2000), resulting in time differences in recent data-based studies (WorldClim, 2019). Problems can arise when global datasets are used for regional studies, because they can incorrectly represent climate space patterns in the study region (Soria-Auza et al., 2010; Bedia et al., 2013; Pliscoff et al., 2014). Furthermore, climate data obtained in each individual country are based more on meteorological station data than on WorldClim data and therefore can further improve national and regional outcomes (Poggio et al., 2018). The Korea Meteorological Administration (KMA) provides nationwide meteorological data for South Korea (KMA, 2022). However, KMA data have been overlooked in recent studies from South Korea (e.g., Koo and Choe, 2021; Lee et al., 2021).
In this study, we aimed to address two questions. First, we wanted to elucidate whether the predicted potential distribution model was applicable in field conditions. Second, we wanted to determine whether model results using WorldClim or KMA data had higher field applicability. To address these questions, we performed Maxent modeling and a field survey based on the results of the Maxent model for bullfrogs, and the two meteorological model results were verified with field survey data. Thus, this work offers a case study in evaluating suitability of distribution modeling for assessments of potential distribution range of a target invasive species and for setting management plans accordingly. This study also evaluates considerations for selection of climate data sources to be used in regional distribution prediction models.
Materials and Methods
In general, use of data from both the native and invasive range in SDMs limits the possibility that a species’ invasion potential is underestimated. However, to reduce potential biases associated with integrating data sources and for field applicability and comparison between two climate models, we used a total of 2,716 bullfrog distribution records in South Korea. These data were obtained from the results of Kang et al. (2019, Table S1). These data consist of the third National Natural Environment Survey (2006–2012, 2,527 sites), the Nationwide Survey of Nonnative Species (2015–2017, 146 sites), and Primary Monitoring for National Wetlands (2013–2017, 43 sites) information. Together with occurrence data of American Bullfrogs in South Korea, we used bioclimatic variables as environmental inputs to establish the SDM. Our model included 19 biologically meaningful climate variables that have often been used in SDMs (O’Donnell and Ignizio, 2012). We obtained data from WorldClim 1.4 (WorldClim, 2019) as the baseline current climate and 30-second spatial resolution (degrees of longitude/latitude). We used ArcMap 10.6 (ESRI, USA) to limit data within the scope of South Korea based on the 2019 administrative district map provided by the Geospatial Information System Based Technology Research Institute (Geoservice, 2019). Among the 19 variables, we excluded Bio09 and Bio17 because they showed a correlation of over 95% with the corresponding data in Bio11 and Bio19 (Table 1). Additionally, considering the wintering of bullfrog larvae in permanent water (Ficetola et al., 2007), we selected a variable that represented the cold quarter rather than the dry quarter.
To predict potential distribution of bullfrogs, we used Maxent Program 3.4.1 (Steven et al., 2019) with default settings: random test percentage = 25, regularization multiplier = 1, and the maximum number of background points = 10,000. We replicated the model 15 times using a bootstrap method, with each replicate having 5,000 iterations (Phillips, 2017), and set the output file type to the ASC extension which is applicable to ArcMap 10.6. We evaluated model fitness with the area under the curve (AUC) of the receiver operating characteristics (ROC) curve (Jiménez-Valverde, 2012). We performed model evaluation using the partial ROC-AUC (Peterson et al., 2008), which we calculated with the “kuenm” R package (Cobos et al., 2019).
We conducted field verifications in two provinces (Jeolla-do and Chungcheong-do) of southern South Korea between March and July 2019 (Fig. 1). Jeolla-do and Chungcheong-do were considered appropriate provinces to identify changes due to the availability of abundant data on bullfrogs. We randomly selected survey points from each province, for which we converted the ASC extension produced by the Maxent model into a center point of about a 1 km2 grid unit using the function “Raster to Point” of the ArcMap 10.6 program (ESRI, USA). We obtained the potential distribution probability of the center points of each grid unit based on predicted bullfrog distribution model results using Maxent. Next, we conducted a field survey at the selected center point following a water system within approximately 1.6 km, which is the maximum movement distance of bullfrogs (Ingram and Raney, 1943). If a water system did not exist within 1.6 km of the center point, we randomly selected and investigated the nearest waterway or reservoir. In those cases, we matched the coordinates of the reservoir where we conducted the survey with model results to extract coordinates. In the field, two experts spent 10 min at the edge of the water investigating the presence of bullfrog adults, subadults, larvae, and eggs. We performed field surveys at 430 points (243 in. Jeolla-do and 187 in. Chungcheon-do; Table S1). We produced the predicted probability (PP) from Maxent model results and allocated it into Classes 1–5 (0–20%, 20–40%, 40–60%, 60–80%, and 80–100%). However, Class 5 was excluded from further analysis because it did not contain any allocated points. In addition, we calculated the observed probability (OP) from survey data (percentage of survey sites at which bullfrogs occurred).


Citation: Journal of Herpetology 59, 3; 10.1670/22-084
To compare PP and OP trends in all classes, predicted frequency (PF) and observed frequency (OF) were calculated from the distribution model and observed points in the field, respectively. We calculated PF using the following formula: (sample size of class / median value of class percent category)×100. Using a binomial test, we tested for differences between OF and PF (Chow et al., 2017).
Nineteen 30 s (~1 km2) bioclimatic variables from WorldClim 1.4 (current, 1960–1990) were collected for increased spatial resolution. Using ArcMap 10.6 (ESRI, USA), data were converted by excluding only the South Korean range from the global range of bioclimatic variable data. Because bioclimatic variables are data obtained based on temperature and precipitation, multicollinearity among variables may occur, resulting in distorted predictive results (Watling et al., 2012; Guo et al., 2013). Therefore, after exclusion of variables with a strong Pearson correlation (>0.75), a total of just six variables were used: Bio01, Bio02, Bio04, Bio12, Bio13, and Bio14. We used data from KMA to produce domestic bioclimatic variables. We downloaded data from 2006 to 2017 from KMA's climate information portal (KMA, 2022). We attempted to gather MK-PRISM 1.2 data for a Representative Concentration Pathway scenario, which is the only available model data provided in raster format, to match the same scenario and time series as those of the WorldClim data. Based on average monthly temperature, maximum temperature, minimum temperature, and precipitation, Bio01, Bio02, Bio04, Bio12, Bio13, and Bio14—the same variables as in the WorldClim data— were used (O’Donnell and Ignizio, 2012). To compare WorldClim data to KMA data, we considered measures such as mean and standard deviation. Additionally, to assess relationships among variables, we employed the Pearson correlation coefficient. We conducted model production through Maxent 3.4.4; the setting was 25% of test data, 15 repetitions, and the remaining settings followed the default values (Phillips, 2017). We compared field applicability by analyzing field verification and model results.
Results
Our SDM showed a high potential distribution probability of bullfrogs in southern South Korea, especially in coastal areas, and a low potential distribution probability in central and northern South Korea as well as in inland regions of southern South Korea (Fig. 2). The AUC of our model was 0.808, and the mean AUC ratio at 5% was 1.47 (t499 = 962.7, P = 0.0001). Four major environmental variables contributed more than 5% each to the distribution of bullfrogs in South Korea. Results showed that Bio11 (mean temperature of coldest quarter) was the most influential environmental variable (28.1%), followed by Bio08 (mean temperature of wettest quarter; 24.8%), Bio06 (minimum temperature of coldest month; 23.0%), and Bio05 (maximum temperature of warmest month; 9.4%). Response curves indicated that potential distribution probability of bullfrogs was positively correlated with major environmental variables; increases in variables resulted in increases in distribution potential (Fig. 3). Jackknife analysis showed that Bio08 was the variable with the highest importance in terms of its significance and contribution to the model, followed by Bio01 (Fig. 4).


Citation: Journal of Herpetology 59, 3; 10.1670/22-084


Citation: Journal of Herpetology 59, 3; 10.1670/22-084


Citation: Journal of Herpetology 59, 3; 10.1670/22-084
Across all survey sites for Classes 1–4, OP was higher than PP for Classes 1 and 2, and OP and PP were similar for Classes 3 and 4 (Fig. 5). OF was significantly higher than PF for Classes 1 (P = 0.0004) and 2 (P = 0.003), and PF and OF were not different for Classes 3 (P = 0.198) and 4 (P = 0.979; Table 2). In Jeolla–do, OP was considerably higher than PP for Classes 1 and 3, and PP and OP were similar for Classes 2 and 4 (Fig. 5). OF was significantly higher than PF for Class 3 (P = 0.00003), and PF and OF were not different for Classes 1 (P = 0.1), 2 (P = 0.057), and 4 (P = 0.299; Table 2). In Chungcheong–do, OP was considerably higher than PP for Classes 1 and 2, and OP and PP were similar for Classes 3 and 4 (Fig. 5). Additionally, OF was significantly greater than PF for Classes 1 (P = 0.002) and 2 (P = 0.002), and PF and OF were not different for Classes 3 and 4 (P = 0.999; Table 2).


Citation: Journal of Herpetology 59, 3; 10.1670/22-084
Means of WorldClim and KMA data differed for each variable (Table 3). Correlation analysis revealed the following correlation coefficients: Bio01, 0.91; Bio02, 0.75; Bio04, 0.9; Bio12, 0.74; Bio13, 0.79; Bio14, 0.59 (Table 4). AUC of the bullfrog Maxent model with WorldClim data was 0.809, and the mean AUC ratio at 5% was 1.4 (t499 = 594.6, P < 0.0001); AUC with KMA data was 0.796, and the mean AUC ratio at 5% was 1.44 (t499 = 1424.9, P < 0.0001). As a result of the WorldClim and KMA data, OP was higher than PP for Class 3, but PP and OP were similar for Classes 1, 2, and 4 (Fig. 6). In addition, OF was significantly higher than PF for Class 3 (P = 0.005), and PF and OF were not different for Classes 1, 2, and 4 (P = 0.55, 0.17, and 0.14, respectively; Table 5). For KMA data, OF was significantly higher than PF in Class 3 (P = 0.02), and PF and OF were not different in Classes 1, 2, and 4 (P = 0.45, 0.71, and 0.11, respectively; Table 5).


Citation: Journal of Herpetology 59, 3; 10.1670/22-084
Discussion
In SDMs, even small changes in data can make a large difference in model predictions (Williams et al., 2009). Changes in presence data may alter probability predictions of species distributions. In accordance with many previous studies that focused only on South Korea (Kim et al., 2021; Koo and Choe, 2021), we found that distribution probability of invasive bullfrogs is concentrated in southern areas and is relatively low in northern mountainous areas. In contrast, many studies, especially those that examined distribution probability of bullfrogs from a global perspective and used different presence data, showed different results (Ficetola et al., 2007; Andersen et al., 2021). We used distribution data from various monitoring efforts conducted in South Korea for our modeling, unlike Ficetola et al. (2007) and Andersen et al. (2021), who used distribution data from global sources. This discrepancy could be attributed to predictions of distributions based on native fundamental niches of invasive species. For species distribution models, distributions of target species commonly expand into wider fundamental niches in the invasive range, even beyond limits present in the native range (Le Maitre et al., 2008). Therefore, it is expected that our model predictions differ from previous studies. To predict expanded fundamental niches of bullfrogs, it is essential to use unbiased invasive and native data.
Among the 17 bioclimatic variables we analyzed, four variables were identified as the most important contributors to the model. Bioclimatic variables Bio11, Bio08, Bio06, and Bio05—all of which were related to temperature—contributed strongly to potential distribution of bullfrogs. Bioclimatic variable Bio11 was the most significant, whereas Bio05 was the least significant. In contrast to our results, Giovanelli et al. (2008) found that Bio02 was the most significant variable for their model, followed by Bio01 and Bio17 with the least effects. Bioclimatic variable Bio17 (Bio19 here), related to precipitation, was also not a significant predictor in our study. Differences between these studies may be attributed to discrepancies in time since introduction and associated adaptations (Prentis et al., 2008). Furthermore, differences may have arisen due to different data sources and the associated spatiotemporal variability. In addition to field observations and comparison to previous literature, AUC values may also give an indication of model accuracy (Araújo et al., 2005; Hosmer et al., 2013). The partial ROC-AUC results also indicated that the model demonstrated significant accuracy.
Except for Class 1, our results showed a positive correlation between PP and OP. In most cases, OP was higher when PP increased and vice versa. Many previous studies that focused on field verification of SDMs showed similar trends between predicted probability of species distribution and observed distribution (Greaves et al., 2006; Rebelo and Jones, 2010; West et al., 2016). Unlike the overall trend, PF and OF were statistically different for Classes 1 and 2. Additionally, for Classes 1 and 2, the potential habitat of bullfrogs was relatively lower than for Classes 3 and 4. Difference in PF and OF for Classes 1 and 2 indicate that bullfrogs inhabit more areas than predicted. Also, in Jeolla–do, PF was statistically higher than OF for Class 3, further confirming that there are more bullfrogs in the field than the model predicted. Nevertheless, we used 430 localities for field verification of the SDM, which is a larger dataset than in previous studies (Greaves et al., 2006; Rebelo and Jones, 2010; West et al., 2016; Table S1). To acquire an accurate estimation of the actual niche in the field relative to the modeled niche, it is important to collect a large dataset that can sufficiently reflect the ecological distribution of the target species (Wisz et al., 2008). Collection of distribution data often exhibits a strong bias in investigative efforts, with some sites being more likely to be sampled than others (Dennis and Thomas, 2000; Schulman et al., 2007; Phillips et al., 2009). Therefore, continuous monitoring of areas where bullfrog habitats have not been identified through multiple methods, such as auditory call sampling and eDNA, will be required for unbiased sampling. In addition, it is difficult to investigate distribution of invasive species except in areas where there are many environmental and civic groups participating, and there is often a lack of cooperation between local governments and management groups (Jung and Jang, 2019). Establishing an integrated management system for invasive species will allow us to monitor changes in distributions across regions.
This study compared the results of species distribution modeling using KMA data obtained for domestic climate characteristics and WorldClim data used for global targets. Among the bioclimatic variable data, Bio01, Bio02, and Bio04 were temperature-related variables, and Bio12, Bio13, and Bio14 were precipitation-related variables (O’Donnell and Ignizio, 2012). Bio01, Bio02, and Bio04 of the WorldClim data showed numerically similar results to the KMA bioclimatic variables Bio01, Bio02, and Bio04, but there were different numerical outcomes in Bio12, Bio13, and Bio14. Correlation results between the WorldClim and KMA data indicated that variables related to precipitation, Bio12 and Bio14, showed a low correlation. WorldClim data are known to be less accurate in precipitation-related variables than in temperature-related variables (Fick and Hijmans, 2017), and previous studies have also shown discrepancies in precipitation data from WorldClim relative to regional data (Soria-Auza et al., 2010; Bedia et al., 2013; Marchi et al., 2019). WorldClim 2.1, which has been improved and incorporates a larger dataset of weather station data, has addressed limitations of previous versions of WorldClim (Bobrowski et al., 2021). We evaluated model results according to the AUC value, which indicates fit of the model. An AUC value >0.9 indicates high fit, and an AUC of between 0.7 and 0.9 indicates an appropriate fit (Manel et al., 2001; Franklin et al., 2009). Because the AUC values were ≥0.7, models were considered to fit appropriately. AUC values do not indicate relative superiority among models (Jiménez-Valverde et al., 2021). Thus, even if WorldClim data showed a higher AUC value than that of KMA data, it would not signify a better model. Additionally, partial ROC-AUC results indicated that each of the two models demonstrated significant accuracy. Comparing results of the two models based on field verification data, we confirmed high field applicability in all classes except for Class 3. Differences in PF and OF for Class 3 indicate that bullfrogs inhabit more areas than predicted. This difference was true for SDM results using both climate datasets, suggesting that distribution data collection is insufficient. Therefore, when comparing SDMs using WorldClim and regional data, the SDMs showed similar field applicability; however, we recommend use of regional data rather than WorldClim data.
We recommend SDMs as a useful tool for predicting potential distribution probability, especially for invasive species. Although several studies have questioned whether modeling results could accurately predict distributions of target species (Hirzel and Le Lay, 2008), we found that the Maxent model had high accuracy and high field applicability. In addition, our results indicated the importance of continuous data collection and field verification. Our study cross-checked large amounts of presence data for bullfrogs in South Korea, verified the SDM through the AUC value and field observations, and highlighted the applicability of SDMs in estimating distribution probability and managing invasive species. Although AUC values were similar, we recommend using regional data in regional studies, because WorldClim data may have discrepancies. Overall, this study can provide guidelines for using species distribution modeling to manage and control invasive bullfrogs in South Korea as well as important information for future research.

Map of South Korea. Provinces marked with a star were surveyed for field verification of the distribution model of Aquarana catesbeiana. Source of world map: https://hub.arcgis.com/; source of South Korea map: Geoservice, 2019.

Predicted distribution map of Aquarana catesbeiana in South Korea, with field verification sites.

Response curves of the most influential bioclimatic variables for the potential distribution of Aquarana catesbeiana; (A) bioclimatic variable 11 (mean temperature of coldest quarter), (B) bioclimatic variable 08 (mean temperature of wettest quarter), (C) bioclimatic variable 06 (minimum temperature of coldest month), and (D) bioclimatic variable 05 (maximum temperature of warmest month).

The Maxent jackknife method was employed to assess the impact of each of the 19 bioclimatic variables on the overall model, providing detailed insights into the functionality and importance of each variable. The figure displays the results, where light blue bars represent the model's impact when the respective variable is excluded and dark blue bars indicate the independent contribution of each variable to the model. The numbers of bioclimatic variables correspond to the numbers and descriptions mentioned in Table 1.

Predicted probability and observed probability of distribution of Aquarana catesbeiana; estimation for the entire survey, Jeolla-do and Chungcheong-do, was done separately.

Predicted probability and observed probability of distribution of Aquarana catesbeiana; estimation for the result of KMA and WorldClim was done separately.
Contributor Notes
