Assessing the effect of prevalence on the predictive performance of species distribution models using simulated data

2010 ◽  
Vol 20 (1) ◽  
pp. 181-192 ◽  
Author(s):  
Truly Santika
2019 ◽  
Author(s):  
Truly Santika ◽  
Michael F. Hutchinson ◽  
Kerrie A. Wilson

ABSTRACTPresence-only data used to develop species distribution models are often biased towards areas that are frequently surveyed. Furthermore, the size of calibration area with respect to the area covered by the species occurrences has been shown to affect model accuracy. However, existing assessments of the effect of data inadequacy and calibration size on model accuracy have predominately been conducted using empirical studies. These studies can give ambiguous results, since the data used to train and test the model can both be biased.These limitations were addressed by applying simulated data to assess how inadequate data coverage and the size of calibration area affect the accuracy of species distribution models generated by MaxEnt and BIOCLIM. The validity of four presence-only performance measures, Contrast Validation Index (CVI), Boyce index, AUC and AUCratio, was also assessed.CVI, AUC and AUCratio ranked the accuracy of univariate models correctly according to the true importance of their defining environmental variable, a desirable property of an accuracy measure. Contrastingly, Boyce index failed to rank the accuracy of univariate models correctly and a high percentage of irrelevant variables produced models with a high Boyce index.Inadequate data coverage and increased calibration area reduced model accuracy by reducing the correct identification of the dominant environmental determinant. BIOCLIM outperformed MaxEnt models in predicting the true distribution of simulated species with a symmetric dominant response. However, MaxEnt outperformed BIOCLIM in predicting the true distribution of simulated species with skew and linear dominant responses. Despite this, the standard performance measures consistently overestimated the performance of MaxEnt models and showed them as always having higher model accuracy than the BIOCLIM models.It has been acknowledged that research should be directed towards testing and improving species distribution modelling tools, particularly how to handle the inevitable bias and scarcity of species occurrence data. Simulated data, as demonstrated here, provides a powerful approach to comprehensively test the performance of modelling tools and to disentangle the effects of data properties and modelling options on model accuracy. This may be impossible to achieve using real-world data.


2018 ◽  
Author(s):  
Roozbeh Valavi ◽  
Jane Elith ◽  
José J. Lahoz-Monfort ◽  
Gurutzeta Guillera-Arroita

SummaryWhen applied to structured data, conventional random cross-validation techniques can lead to underestimation of prediction error, and may result in inappropriate model selection.We present the R package blockCV, a new toolbox for cross-validation of species distribution modelling.The package can generate spatially or environmentally separated folds. It includes tools to measure spatial autocorrelation ranges in candidate covariates, providing the user with insights into the spatial structure in these data. It also offers interactive graphical capabilities for creating spatial blocks and exploring data folds.Package blockCV enables modellers to more easily implement a range of evaluation approaches. It will help the modelling community learn more about the impacts of evaluation approaches on our understanding of predictive performance of species distribution models.


Ecography ◽  
2020 ◽  
Vol 43 (4) ◽  
pp. 549-558 ◽  
Author(s):  
Tianxiao Hao ◽  
Jane Elith ◽  
José J. Lahoz‐Monfort ◽  
Gurutzeta Guillera‐Arroita

PLoS ONE ◽  
2014 ◽  
Vol 9 (11) ◽  
pp. e112764 ◽  
Author(s):  
Ren-Yan Duan ◽  
Xiao-Quan Kong ◽  
Min-Yi Huang ◽  
Wei-Yi Fan ◽  
Zhi-Gao Wang

2019 ◽  
Author(s):  
Simon Croft ◽  
Graham C. Smith

AbstractSpecies distribution models (SDMs) are an increasingly popular tool in ecology which, together with a vast wealth of data from citizen science projects, have the potential to dramatically improve our understanding of species behaviour for applications such as conservation and wildlife management. However, many of the best performing models require information regarding survey effort, specifically absence, which is typically lacking in opportunistic datasets. To facilitate the use of such models, pseudo-absences from locations without recorded presence must be assumed. Several studies have suggested that survey effort, and hence likely absence, could be estimated from presence-only data by considering records across “target groups” of species defined according to taxonomy.We performed a probabilistic analysis, computing the conditional probability of recording a species given a particular set of species are also recorded, to test the validity of defining target groups by taxonomic order and to explore other potential groupings. Based on this quantification of recording associations we outline a new method to inform pseudo-absence selection comparing predictive performance, measured the area under curve (AUC) statistic, against the standard method of selection across a series of SDMs.Our findings show some support for target grouping classification based on taxonomy but indicate that an alternative classification using survey method may be more appropriate for informing effort and consequently absence. Across 49 terrestrial mammal species, pseudo-absence selection using our proposed method outperformed that of the standard method showing an improvement in the predictive performance of presence-absence models for 17 out of 22 with sufficient data to elicit a significant difference. Based on our method we also observed a substantial improvement in the performance of presence-absence models compared to that of presence-only models (MaxEnt) with a higher AUC for all 22 species showing a significant difference between approaches.We conclude that our method produces sensible robust pseudo-absences which either compliment patterns in known presences or, where conflicts occur, are explainable in terms of ecological variables potentially improving our understanding of species behaviour. Furthermore, we suggest that presence-absence models using these pseudo-absences provide a viable alternative to MaxEnt when modelling using presence-only data.


Sign in / Sign up

Export Citation Format

Share Document