scholarly journals The effects of data adequacy and calibration size on the accuracy of presence-only species distribution models

2019 ◽  
Author(s):  
Truly Santika ◽  
Michael F. Hutchinson ◽  
Kerrie A. Wilson

ABSTRACTPresence-only data used to develop species distribution models are often biased towards areas that are frequently surveyed. Furthermore, the size of calibration area with respect to the area covered by the species occurrences has been shown to affect model accuracy. However, existing assessments of the effect of data inadequacy and calibration size on model accuracy have predominately been conducted using empirical studies. These studies can give ambiguous results, since the data used to train and test the model can both be biased.These limitations were addressed by applying simulated data to assess how inadequate data coverage and the size of calibration area affect the accuracy of species distribution models generated by MaxEnt and BIOCLIM. The validity of four presence-only performance measures, Contrast Validation Index (CVI), Boyce index, AUC and AUCratio, was also assessed.CVI, AUC and AUCratio ranked the accuracy of univariate models correctly according to the true importance of their defining environmental variable, a desirable property of an accuracy measure. Contrastingly, Boyce index failed to rank the accuracy of univariate models correctly and a high percentage of irrelevant variables produced models with a high Boyce index.Inadequate data coverage and increased calibration area reduced model accuracy by reducing the correct identification of the dominant environmental determinant. BIOCLIM outperformed MaxEnt models in predicting the true distribution of simulated species with a symmetric dominant response. However, MaxEnt outperformed BIOCLIM in predicting the true distribution of simulated species with skew and linear dominant responses. Despite this, the standard performance measures consistently overestimated the performance of MaxEnt models and showed them as always having higher model accuracy than the BIOCLIM models.It has been acknowledged that research should be directed towards testing and improving species distribution modelling tools, particularly how to handle the inevitable bias and scarcity of species occurrence data. Simulated data, as demonstrated here, provides a powerful approach to comprehensively test the performance of modelling tools and to disentangle the effects of data properties and modelling options on model accuracy. This may be impossible to achieve using real-world data.

2018 ◽  
Author(s):  
Roozbeh Valavi ◽  
Jane Elith ◽  
José J. Lahoz-Monfort ◽  
Gurutzeta Guillera-Arroita

SummaryWhen applied to structured data, conventional random cross-validation techniques can lead to underestimation of prediction error, and may result in inappropriate model selection.We present the R package blockCV, a new toolbox for cross-validation of species distribution modelling.The package can generate spatially or environmentally separated folds. It includes tools to measure spatial autocorrelation ranges in candidate covariates, providing the user with insights into the spatial structure in these data. It also offers interactive graphical capabilities for creating spatial blocks and exploring data folds.Package blockCV enables modellers to more easily implement a range of evaluation approaches. It will help the modelling community learn more about the impacts of evaluation approaches on our understanding of predictive performance of species distribution models.


2020 ◽  
Author(s):  
Daijiang Li ◽  
Russell Dinnage ◽  
Lucas Nell ◽  
Matthew R. Helmus ◽  
Anthony Ives

SummaryModel-based approaches are increasingly popular in ecological studies. A good example of this trend is the use of joint species distribution models to ask questions about ecological communities. However, most current applications of model-based methods do not include phylogenies despite the well-known importance of phylogenetic relationships in shaping species distributions and community composition. In part, this is due to lack of accessible tools allowing ecologists to fit phylogenetic species distribution models easily.To fill this gap, the R package phyr (pronounced fire) implements a suite of metrics, comparative methods and mixed models that use phylogenies to understand and predict community composition and other ecological and evolutionary phenomena. The phyr workhorse functions are implemented in C++ making all calculations and model estimations fast.phyr can fit a variety of models such as phylogenetic joint-species distribution models, spatiotemporal-phylogenetic autocorrelation models, and phylogenetic trait-based bipartite network models. phyr also estimates phylogenetically independent trait correlations with measurement error to test for adaptive syndromes and performs fast calculations of common alpha and beta phylogenetic diversity metrics. All phyr methods are united under Brownian motion or Ornstein-Uhlenbeck models of evolution and phylogenetic terms are modelled as phylogenetic covariance matrices.The functions and model formula syntax we propose in phyr serves as a simple and unified framework that ignites the use of phylogenies to address a variety of ecological questions.


2019 ◽  
Author(s):  
Simon Croft ◽  
Graham C. Smith

AbstractSpecies distribution models (SDMs) are an increasingly popular tool in ecology which, together with a vast wealth of data from citizen science projects, have the potential to dramatically improve our understanding of species behaviour for applications such as conservation and wildlife management. However, many of the best performing models require information regarding survey effort, specifically absence, which is typically lacking in opportunistic datasets. To facilitate the use of such models, pseudo-absences from locations without recorded presence must be assumed. Several studies have suggested that survey effort, and hence likely absence, could be estimated from presence-only data by considering records across “target groups” of species defined according to taxonomy.We performed a probabilistic analysis, computing the conditional probability of recording a species given a particular set of species are also recorded, to test the validity of defining target groups by taxonomic order and to explore other potential groupings. Based on this quantification of recording associations we outline a new method to inform pseudo-absence selection comparing predictive performance, measured the area under curve (AUC) statistic, against the standard method of selection across a series of SDMs.Our findings show some support for target grouping classification based on taxonomy but indicate that an alternative classification using survey method may be more appropriate for informing effort and consequently absence. Across 49 terrestrial mammal species, pseudo-absence selection using our proposed method outperformed that of the standard method showing an improvement in the predictive performance of presence-absence models for 17 out of 22 with sufficient data to elicit a significant difference. Based on our method we also observed a substantial improvement in the performance of presence-absence models compared to that of presence-only models (MaxEnt) with a higher AUC for all 22 species showing a significant difference between approaches.We conclude that our method produces sensible robust pseudo-absences which either compliment patterns in known presences or, where conflicts occur, are explainable in terms of ecological variables potentially improving our understanding of species behaviour. Furthermore, we suggest that presence-absence models using these pseudo-absences provide a viable alternative to MaxEnt when modelling using presence-only data.


2019 ◽  
Author(s):  
Emy Guilbault ◽  
Ian Renner ◽  
Michael Mahony ◽  
Eric Beh

1AbstractSpecies distribution modelling, which allows users to predict the spatial distribution of species with the use of environmental covariates, has become increasingly popular, with many software platforms providing tools to fit species distribution models. However, the species observations used in species distribution models can have varying levels of quality and can have incomplete information, such as uncertain species identity.In this paper, we develop two algorithms to reclassify observations with unknown species identities which simultaneously predict different species distributions using spatial point processes. We compare the performance of the different algorithms using different initializations and parameters with models fitted using only the observations with known species identity through simulations.We show that performance varies with differences in correlation among species distributions, species abundance, and the proportion of observations with unknown species identities. Additionally, some of the methods developed here outperformed the models that didn’t use the misspecified data.These models represent an helpful and promising tool for opportunistic surveys where misidentification happens or for the distribution of species newly separated in their taxonomy.


2021 ◽  
Author(s):  
Stephanie Hogg ◽  
Yan Wang ◽  
Lewi Stone

AbstractJoint species distribution models (JSDMs) are a recent development in biogeography and enable the spatial modelling of multiple species and their interactions and dependencies. However, most models do not consider imperfect detection, which can significantly bias estimates. This is one of the first papers to account for imperfect detection when fitting data with JSDMs and to explore the complications that may arise.A multivariate probit JSDM that explicitly accounts for imperfect detection is proposed, and implemented using a Bayesian hierarchical approach. We investigate the performance of the JSDM in the presence of imperfect detection for a range of factors, including varied levels of detection and species occupancy, and varied numbers of survey sites and replications. To understand how effective this JSDM is in practice, we also compare results to those from a JSDM that does not explicitly model detection but instead makes use of “collapsed data”. A case study of owls and gliders in Victoria Australia is also illustrated.Using simulations, we found that the JSDMs explicitly accounting for detection can accurately estimate intrinsic correlation between species with enough survey sites and replications. Reducing the number of survey sites decreases the precision of estimates, while reducing the number of survey replications can lead to biased estimates. For low probabilities of detection, the model may require a large number of survey replications to remove bias from estimates. However, JSDMs not explicitly accounting for detection may have a limited ability to dis-entangle detection from occupancy, which substantially reduces their ability to accurately infer the species distribution spatially. Our case study showed positive correlation between Sooty Owls and Greater Gliders, despite a low number of survey replications.To avoid biased estimates of inter-species correlations and species distributions, imperfect detection needs to be considered. However, for low probability of detection, the JSDMs explicitly accounting for detection is data hungry. Estimates from such models may still be subject to bias. To overcome the bias, researchers need to carefully design surveys and choose appropriate modelling approaches. The survey design should ensure sufficient survey replications for unbiased inferences on species inter-dependencies and occupancy.


2017 ◽  
Author(s):  
Miguel Berdugo ◽  
Fernando T. Maestre ◽  
Sonia Kéfi ◽  
Nicolas Gross ◽  
Yoann Le Bagousse-Pinguet ◽  
...  

AbstractDespite being a core ecological question, disentangling individual and interacting effects of plant-plant interactions, abiotic factors and species-specific adaptations as drivers of community assembly is challenging. Studies addressing this issue are growing rapidly, but they generally lack empirical data regarding species interactions and local abundances, or cover a narrow range of environmental conditions.We analysed species distribution models and local spatial patterns to isolate the relative importance of key abiotic (aridity) and biotic (facilitation and competition) drivers of plant community assembly in drylands worldwide. We examined the relative importance of these drivers along aridity gradients and used information derived from the niches of species to understand the role that species-specific adaptations to aridity play in modulating the importance of community assembly drivers.Facilitation, together with aridity, was the major driver of plant community assembly in global drylands. Due to community specialization, the importance of facilitation as an assembly driver decreased with aridity, and became non significant at the border between arid and semiarid climates. Under the most arid conditions, competition affected species abundances in communities dominated by specialist species. Due to community specialization, the importance of aridity in shaping dryland plant communities peaked at moderate aridity levels.Synthesis: We showed that competition is an important driver of community assembly even under harsh environments, and that the effect of facilitation collapses as driver of species relative abundances under high aridity because of the specialization of the species pool to extremely dry conditions. Our findings pave the way to develop more robust species distribution models aiming to predict the consequences of ongoing climate change on community assembly in drylands, the largest biome on Earth.


2021 ◽  
Author(s):  
Jaime Carrasco ◽  
Fugencio Lison ◽  
Andres Weintraub

Traditional Species Distribution Models (SDMs) may not be appropriate when examples of one class (e.g. absence or pseudo-absences) greatly outnumber examples of the other class (e.g. presences or observations), because they tend to favor the learning of observations more frequently. We present an ensemble method called Random UnderSampling and Boosting (RUSBoost), which was designed to address the case where the number of presence and absence records are imbalanced, and we opened the "black-box" of the algorithm to interpret its results and applicability in ecology. We applied our methodology to a case study of twenty-five species of bats from theIberian Peninsula and we build a RUSBoost model for each species. Furthermore,in order to improve to build tighter models, we optimized their hyperparametersusing Bayesian Optimization. In particular, we implemented a objective function that represents the cross-validation loss: kFoldLoss(z), with z representing the hyper-parameters Maximum Number of Splits, Number of Learners and Learning Rate. The models reached average values for Area Under the ROC Curve (AUC), specificity, sensitivity, and overall accuracy of 0.84±0.05%, 79.5±4.87%, 74.9±6.05%,and 78.8±5.0%, respectively. We also obtained values of variable importance and we analyzed the relationships between explanatory variables and bat presence probability. The results of our study showed that RUSBoost could be a useful tool to develop SDMs with good performance when the presence/absence databases are imbalanced. The application of this algorithm could improve the prediction of SDMs and help in conservation biology and management.


2020 ◽  
Vol 431 ◽  
pp. 109180 ◽  
Author(s):  
Poliana Mendes ◽  
Santiago José Elías Velazco ◽  
André Felipe Alves de Andrade ◽  
Paulo De Marco

2014 ◽  
Vol 38 (1) ◽  
pp. 117-128 ◽  
Author(s):  
Jennifer A. Miller

Species distribution models (SDMs) have become a dominant paradigm for quantifying species-environment relationships, and both the models and their outcomes have seen widespread use in conservation studies, particularly in the context of climate change research. With the growing interest in SDMs, extensive comparative studies have been undertaken. However, few generalizations and recommendations have resulted from these empirical studies, largely due to the confounding effects of differences in and interactions among the statistical methods, species traits, data characteristics, and accuracy metrics considered. This progress report addresses ‘virtual species distribution models’: the use of spatially explicit simulated data to represent a ‘true’ species distribution in order to evaluate aspects of model conceptualization and implementation. Simulating a ‘true’ species distribution, or a virtual species distribution, and systematically testing how these aspects affect SDMs, can provide an important baseline and generate new insights into how these issues affect model outcomes.


Sign in / Sign up

Export Citation Format

Share Document