Assessing the effect of prevalence on the predictive performance of species distribution models using simulated data

ABSTRACTPresence-only data used to develop species distribution models are often biased towards areas that are frequently surveyed. Furthermore, the size of calibration area with respect to the area covered by the species occurrences has been shown to affect model accuracy. However, existing assessments of the effect of data inadequacy and calibration size on model accuracy have predominately been conducted using empirical studies. These studies can give ambiguous results, since the data used to train and test the model can both be biased.These limitations were addressed by applying simulated data to assess how inadequate data coverage and the size of calibration area affect the accuracy of species distribution models generated by MaxEnt and BIOCLIM. The validity of four presence-only performance measures, Contrast Validation Index (CVI), Boyce index, AUC and AUCratio, was also assessed.CVI, AUC and AUCratio ranked the accuracy of univariate models correctly according to the true importance of their defining environmental variable, a desirable property of an accuracy measure. Contrastingly, Boyce index failed to rank the accuracy of univariate models correctly and a high percentage of irrelevant variables produced models with a high Boyce index.Inadequate data coverage and increased calibration area reduced model accuracy by reducing the correct identification of the dominant environmental determinant. BIOCLIM outperformed MaxEnt models in predicting the true distribution of simulated species with a symmetric dominant response. However, MaxEnt outperformed BIOCLIM in predicting the true distribution of simulated species with skew and linear dominant responses. Despite this, the standard performance measures consistently overestimated the performance of MaxEnt models and showed them as always having higher model accuracy than the BIOCLIM models.It has been acknowledged that research should be directed towards testing and improving species distribution modelling tools, particularly how to handle the inevitable bias and scarcity of species occurrence data. Simulated data, as demonstrated here, provides a powerful approach to comprehensively test the performance of modelling tools and to disentangle the effects of data properties and modelling options on model accuracy. This may be impossible to achieve using real-world data.

Download Full-text

Predictive performance of plant species distribution models depends on species traits

Perspectives in Plant Ecology Evolution and Systematics ◽

10.1016/j.ppees.2010.04.002 ◽

2010 ◽

Vol 12 (3) ◽

pp. 219-225 ◽

Cited By ~ 36

Author(s):

Jan Hanspach ◽

Ingolf Kühn ◽

Sven Pompe ◽

Stefan Klotz

Keyword(s):

Plant Species ◽

Species Distribution ◽

Species Distribution Models ◽

Predictive Performance ◽

Species Traits ◽

Distribution Models

Download Full-text

The Effects of Sampling Bias and Model Complexity on the Predictive Performance of MaxEnt Species Distribution Models

PLoS ONE ◽

10.1371/journal.pone.0055158 ◽

2013 ◽

Vol 8 (2) ◽

pp. e55158 ◽

Cited By ~ 224

Author(s):

Mindy M. Syfert ◽

Matthew J. Smith ◽

David A. Coomes

Keyword(s):

Species Distribution ◽

Species Distribution Models ◽

Predictive Performance ◽

Sampling Bias ◽

Model Complexity ◽

Distribution Models

Download Full-text

The impact of modelling choices in the predictive performance of richness maps derived from species-distribution models: guidelines to build better diversity models

Methods in Ecology and Evolution ◽

10.1111/2041-210x.12022 ◽

2013 ◽

Vol 4 (4) ◽

pp. 327-335 ◽

Cited By ~ 34

Author(s):

Blas M. Benito ◽

Luis Cayuela ◽

Fabio S. Albuquerque

Keyword(s):

Species Distribution ◽

Species Distribution Models ◽

Predictive Performance ◽

Distribution Models ◽

The Impact

Download Full-text

blockCV: an R package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models

10.1101/357798 ◽

2018 ◽

Cited By ~ 3

Author(s):

Roozbeh Valavi ◽

Jane Elith ◽

José J. Lahoz-Monfort ◽

Gurutzeta Guillera-Arroita

Keyword(s):

Species Distribution ◽

Cross Validation ◽

Species Distribution Models ◽

Predictive Performance ◽

R Package ◽

Species Distribution Modelling ◽

List Type ◽

Distribution Models ◽

Distribution Modelling ◽

Evaluation Approaches

SummaryWhen applied to structured data, conventional random cross-validation techniques can lead to underestimation of prediction error, and may result in inappropriate model selection.We present the R package blockCV, a new toolbox for cross-validation of species distribution modelling.The package can generate spatially or environmentally separated folds. It includes tools to measure spatial autocorrelation ranges in candidate covariates, providing the user with insights into the spatial structure in these data. It also offers interactive graphical capabilities for creating spatial blocks and exploring data folds.Package blockCV enables modellers to more easily implement a range of evaluation approaches. It will help the modelling community learn more about the impacts of evaluation approaches on our understanding of predictive performance of species distribution models.

Download Full-text

Testing whether ensemble modelling is advantageous for maximising predictive performance of species distribution models

Ecography ◽

10.1111/ecog.04890 ◽

2020 ◽

Vol 43 (4) ◽

pp. 549-558 ◽

Cited By ~ 10

Author(s):

Tianxiao Hao ◽

Jane Elith ◽

José J. Lahoz‐Monfort ◽

Gurutzeta Guillera‐Arroita

Keyword(s):

Species Distribution ◽

Species Distribution Models ◽

Predictive Performance ◽

Distribution Models ◽

Ensemble Modelling

Download Full-text

The Predictive Performance and Stability of Six Species Distribution Models

PLoS ONE ◽

10.1371/journal.pone.0112764 ◽

2014 ◽

Vol 9 (11) ◽

pp. e112764 ◽

Cited By ~ 52

Author(s):

Ren-Yan Duan ◽

Xiao-Quan Kong ◽

Min-Yi Huang ◽

Wei-Yi Fan ◽

Zhi-Gao Wang

Keyword(s):

Species Distribution ◽

Species Distribution Models ◽

Predictive Performance ◽

Distribution Models

Download Full-text

Structuring the unstructured: estimating species-specific absence from multi-species presence data to inform pseudo-absence selection in species distribution models

10.1101/656629 ◽

2019 ◽

Cited By ~ 1

Author(s):

Simon Croft ◽

Graham C. Smith

Keyword(s):

Species Distribution ◽

Standard Method ◽

Species Distribution Models ◽

Predictive Performance ◽

Survey Method ◽

List Type ◽

Distribution Models ◽

Target Groups ◽

Significant Difference ◽

Survey Effort

AbstractSpecies distribution models (SDMs) are an increasingly popular tool in ecology which, together with a vast wealth of data from citizen science projects, have the potential to dramatically improve our understanding of species behaviour for applications such as conservation and wildlife management. However, many of the best performing models require information regarding survey effort, specifically absence, which is typically lacking in opportunistic datasets. To facilitate the use of such models, pseudo-absences from locations without recorded presence must be assumed. Several studies have suggested that survey effort, and hence likely absence, could be estimated from presence-only data by considering records across “target groups” of species defined according to taxonomy.We performed a probabilistic analysis, computing the conditional probability of recording a species given a particular set of species are also recorded, to test the validity of defining target groups by taxonomic order and to explore other potential groupings. Based on this quantification of recording associations we outline a new method to inform pseudo-absence selection comparing predictive performance, measured the area under curve (AUC) statistic, against the standard method of selection across a series of SDMs.Our findings show some support for target grouping classification based on taxonomy but indicate that an alternative classification using survey method may be more appropriate for informing effort and consequently absence. Across 49 terrestrial mammal species, pseudo-absence selection using our proposed method outperformed that of the standard method showing an improvement in the predictive performance of presence-absence models for 17 out of 22 with sufficient data to elicit a significant difference. Based on our method we also observed a substantial improvement in the performance of presence-absence models compared to that of presence-only models (MaxEnt) with a higher AUC for all 22 species showing a significant difference between approaches.We conclude that our method produces sensible robust pseudo-absences which either compliment patterns in known presences or, where conflicts occur, are explainable in terms of ecological variables potentially improving our understanding of species behaviour. Furthermore, we suggest that presence-absence models using these pseudo-absences provide a viable alternative to MaxEnt when modelling using presence-only data.

Download Full-text