scholarly journals blockCV: an R package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models

2018 ◽  
Author(s):  
Roozbeh Valavi ◽  
Jane Elith ◽  
José J. Lahoz-Monfort ◽  
Gurutzeta Guillera-Arroita

SummaryWhen applied to structured data, conventional random cross-validation techniques can lead to underestimation of prediction error, and may result in inappropriate model selection.We present the R package blockCV, a new toolbox for cross-validation of species distribution modelling.The package can generate spatially or environmentally separated folds. It includes tools to measure spatial autocorrelation ranges in candidate covariates, providing the user with insights into the spatial structure in these data. It also offers interactive graphical capabilities for creating spatial blocks and exploring data folds.Package blockCV enables modellers to more easily implement a range of evaluation approaches. It will help the modelling community learn more about the impacts of evaluation approaches on our understanding of predictive performance of species distribution models.

2019 ◽  
Author(s):  
Emy Guilbault ◽  
Ian Renner ◽  
Michael Mahony ◽  
Eric Beh

1AbstractSpecies distribution modelling, which allows users to predict the spatial distribution of species with the use of environmental covariates, has become increasingly popular, with many software platforms providing tools to fit species distribution models. However, the species observations used in species distribution models can have varying levels of quality and can have incomplete information, such as uncertain species identity.In this paper, we develop two algorithms to reclassify observations with unknown species identities which simultaneously predict different species distributions using spatial point processes. We compare the performance of the different algorithms using different initializations and parameters with models fitted using only the observations with known species identity through simulations.We show that performance varies with differences in correlation among species distributions, species abundance, and the proportion of observations with unknown species identities. Additionally, some of the methods developed here outperformed the models that didn’t use the misspecified data.These models represent an helpful and promising tool for opportunistic surveys where misidentification happens or for the distribution of species newly separated in their taxonomy.


2009 ◽  
Vol 21 (1) ◽  
pp. 39-49
Author(s):  
Karla Donato Fook ◽  
Silvana Amaral ◽  
Antônio Miguel Vieira Monteiro ◽  
Gilberto Câmara ◽  
Arimatéa de Carvalho Ximenes ◽  
...  

Currently, biodiversity conservation is one of the most urgent and important themes. Biodiversity researchers use species distribution models to make inferences about species occurrences and locations. These models are fundamental for fauna and flora preservation, as well as for decision making processes for urban and regional planning and development. Species distribution modelling tools use large biodiversity datasets which are globally distributed, can be in different computational platforms, and are hard to access and manipulate. The scientific community needs infrastructures in which biodiversity researchers can collaborate and share knowledge. In this context, we present a computational environment that supports the collaboration in species distribution modelling network on the Web. This environment is based on a modelling experiment catalogue and on a set of geoweb services, the Web Biodiversity Collaborative Modelling Services - WBCMS.


2020 ◽  
Author(s):  
Daijiang Li ◽  
Russell Dinnage ◽  
Lucas Nell ◽  
Matthew R. Helmus ◽  
Anthony Ives

SummaryModel-based approaches are increasingly popular in ecological studies. A good example of this trend is the use of joint species distribution models to ask questions about ecological communities. However, most current applications of model-based methods do not include phylogenies despite the well-known importance of phylogenetic relationships in shaping species distributions and community composition. In part, this is due to lack of accessible tools allowing ecologists to fit phylogenetic species distribution models easily.To fill this gap, the R package phyr (pronounced fire) implements a suite of metrics, comparative methods and mixed models that use phylogenies to understand and predict community composition and other ecological and evolutionary phenomena. The phyr workhorse functions are implemented in C++ making all calculations and model estimations fast.phyr can fit a variety of models such as phylogenetic joint-species distribution models, spatiotemporal-phylogenetic autocorrelation models, and phylogenetic trait-based bipartite network models. phyr also estimates phylogenetically independent trait correlations with measurement error to test for adaptive syndromes and performs fast calculations of common alpha and beta phylogenetic diversity metrics. All phyr methods are united under Brownian motion or Ornstein-Uhlenbeck models of evolution and phylogenetic terms are modelled as phylogenetic covariance matrices.The functions and model formula syntax we propose in phyr serves as a simple and unified framework that ignites the use of phylogenies to address a variety of ecological questions.


2019 ◽  
Author(s):  
Simon Croft ◽  
Graham C. Smith

AbstractSpecies distribution models (SDMs) are an increasingly popular tool in ecology which, together with a vast wealth of data from citizen science projects, have the potential to dramatically improve our understanding of species behaviour for applications such as conservation and wildlife management. However, many of the best performing models require information regarding survey effort, specifically absence, which is typically lacking in opportunistic datasets. To facilitate the use of such models, pseudo-absences from locations without recorded presence must be assumed. Several studies have suggested that survey effort, and hence likely absence, could be estimated from presence-only data by considering records across “target groups” of species defined according to taxonomy.We performed a probabilistic analysis, computing the conditional probability of recording a species given a particular set of species are also recorded, to test the validity of defining target groups by taxonomic order and to explore other potential groupings. Based on this quantification of recording associations we outline a new method to inform pseudo-absence selection comparing predictive performance, measured the area under curve (AUC) statistic, against the standard method of selection across a series of SDMs.Our findings show some support for target grouping classification based on taxonomy but indicate that an alternative classification using survey method may be more appropriate for informing effort and consequently absence. Across 49 terrestrial mammal species, pseudo-absence selection using our proposed method outperformed that of the standard method showing an improvement in the predictive performance of presence-absence models for 17 out of 22 with sufficient data to elicit a significant difference. Based on our method we also observed a substantial improvement in the performance of presence-absence models compared to that of presence-only models (MaxEnt) with a higher AUC for all 22 species showing a significant difference between approaches.We conclude that our method produces sensible robust pseudo-absences which either compliment patterns in known presences or, where conflicts occur, are explainable in terms of ecological variables potentially improving our understanding of species behaviour. Furthermore, we suggest that presence-absence models using these pseudo-absences provide a viable alternative to MaxEnt when modelling using presence-only data.


2012 ◽  
Vol 367 (1586) ◽  
pp. 247-258 ◽  
Author(s):  
Colin M. Beale ◽  
Jack J. Lennon

Motivated by the need to solve ecological problems (climate change, habitat fragmentation and biological invasions), there has been increasing interest in species distribution models (SDMs). Predictions from these models inform conservation policy, invasive species management and disease-control measures. However, predictions are subject to uncertainty, the degree and source of which is often unrecognized. Here, we review the SDM literature in the context of uncertainty, focusing on three main classes of SDM: niche-based models, demographic models and process-based models. We identify sources of uncertainty for each class and discuss how uncertainty can be minimized or included in the modelling process to give realistic measures of confidence around predictions. Because this has typically not been performed, we conclude that uncertainty in SDMs has often been underestimated and a false precision assigned to predictions of geographical distribution. We identify areas where development of new statistical tools will improve predictions from distribution models, notably the development of hierarchical models that link different types of distribution model and their attendant uncertainties across spatial scales. Finally, we discuss the need to develop more defensible methods for assessing predictive performance, quantifying model goodness-of-fit and for assessing the significance of model covariates.


2021 ◽  
Author(s):  
Jaime Carrasco ◽  
Fugencio Lison ◽  
Andres Weintraub

Traditional Species Distribution Models (SDMs) may not be appropriate when examples of one class (e.g. absence or pseudo-absences) greatly outnumber examples of the other class (e.g. presences or observations), because they tend to favor the learning of observations more frequently. We present an ensemble method called Random UnderSampling and Boosting (RUSBoost), which was designed to address the case where the number of presence and absence records are imbalanced, and we opened the "black-box" of the algorithm to interpret its results and applicability in ecology. We applied our methodology to a case study of twenty-five species of bats from theIberian Peninsula and we build a RUSBoost model for each species. Furthermore,in order to improve to build tighter models, we optimized their hyperparametersusing Bayesian Optimization. In particular, we implemented a objective function that represents the cross-validation loss: kFoldLoss(z), with z representing the hyper-parameters Maximum Number of Splits, Number of Learners and Learning Rate. The models reached average values for Area Under the ROC Curve (AUC), specificity, sensitivity, and overall accuracy of 0.84±0.05%, 79.5±4.87%, 74.9±6.05%,and 78.8±5.0%, respectively. We also obtained values of variable importance and we analyzed the relationships between explanatory variables and bat presence probability. The results of our study showed that RUSBoost could be a useful tool to develop SDMs with good performance when the presence/absence databases are imbalanced. The application of this algorithm could improve the prediction of SDMs and help in conservation biology and management.


2018 ◽  
Vol 373 (1761) ◽  
pp. 20170446 ◽  
Author(s):  
Scott Jarvie ◽  
Jens-Christian Svenning

Trophic rewilding, the (re)introduction of species to promote self-regulating biodiverse ecosystems, is a future-oriented approach to ecological restoration. In the twenty-first century and beyond, human-mediated climate change looms as a major threat to global biodiversity and ecosystem function. A critical aspect in planning trophic rewilding projects is the selection of suitable sites that match the needs of the focal species under both current and future climates. Species distribution models (SDMs) are currently the main tools to derive spatially explicit predictions of environmental suitability for species, but the extent of their adoption for trophic rewilding projects has been limited. Here, we provide an overview of applications of SDMs to trophic rewilding projects, outline methodological choices and issues, and provide a synthesis and outlook. We then predict the potential distribution of 17 large-bodied taxa proposed as trophic rewilding candidates and which represent different continents and habitats. We identified widespread climatic suitability for these species in the discussed (re)introduction regions under current climates. Climatic conditions generally remain suitable in the future, although some species will experience reduced suitability in parts of these regions. We conclude that climate change is not a major barrier to trophic rewilding as currently discussed in the literature.This article is part of the theme issue ‘Trophic rewilding: consequences for ecosystems under global change’.


2021 ◽  
Author(s):  
Gabriel Dansereau ◽  
Pierre Legendre ◽  
Timothée Poisot

Aim: Local contributions to beta diversity (LCBD) can be used to identify sites with high ecological uniqueness and exceptional species composition within a region of interest. Yet, these indices are typically used on local or regional scales with relatively few sites, as they require information on complete community compositions difficult to acquire on larger scales. Here, we investigate how LCBD indices can be used to predict ecological uniqueness over broad spatial extents using species distribution modelling and citizen science data. Location: North America. Time period: 2000s. Major taxa studied: Parulidae. Methods: We used Bayesian additive regression trees (BARTs) to predict warbler species distributions in North America based on observations recorded in the eBird database. We then calculated LCBD indices for observed and predicted data and examined the site-wise difference using direct comparison, a spatial autocorrelation test, and generalized linear regression. We also investigated the relationship between LCBD values and species richness in different regions and at various spatial extents and the effect of the proportion of rare species on the relationship. Results: Our results showed that the relationship between richness and LCBD values varies according to the region and the spatial extent at which it is applied. It is also affected by the proportion of rare species in the community. Species distribution models provided highly correlated estimates with observed data, although spatially autocorrelated. Main conclusions: Sites identified as unique over broad spatial extents may vary according to the regional richness, total extent size, and the proportion of rare species. Species distribution modelling can be used to predict ecological uniqueness over broad spatial extents, which could help identify beta diversity hotspots and important targets for conservation purposes in unsampled locations.


2019 ◽  
Author(s):  
Truly Santika ◽  
Michael F. Hutchinson ◽  
Kerrie A. Wilson

ABSTRACTPresence-only data used to develop species distribution models are often biased towards areas that are frequently surveyed. Furthermore, the size of calibration area with respect to the area covered by the species occurrences has been shown to affect model accuracy. However, existing assessments of the effect of data inadequacy and calibration size on model accuracy have predominately been conducted using empirical studies. These studies can give ambiguous results, since the data used to train and test the model can both be biased.These limitations were addressed by applying simulated data to assess how inadequate data coverage and the size of calibration area affect the accuracy of species distribution models generated by MaxEnt and BIOCLIM. The validity of four presence-only performance measures, Contrast Validation Index (CVI), Boyce index, AUC and AUCratio, was also assessed.CVI, AUC and AUCratio ranked the accuracy of univariate models correctly according to the true importance of their defining environmental variable, a desirable property of an accuracy measure. Contrastingly, Boyce index failed to rank the accuracy of univariate models correctly and a high percentage of irrelevant variables produced models with a high Boyce index.Inadequate data coverage and increased calibration area reduced model accuracy by reducing the correct identification of the dominant environmental determinant. BIOCLIM outperformed MaxEnt models in predicting the true distribution of simulated species with a symmetric dominant response. However, MaxEnt outperformed BIOCLIM in predicting the true distribution of simulated species with skew and linear dominant responses. Despite this, the standard performance measures consistently overestimated the performance of MaxEnt models and showed them as always having higher model accuracy than the BIOCLIM models.It has been acknowledged that research should be directed towards testing and improving species distribution modelling tools, particularly how to handle the inevitable bias and scarcity of species occurrence data. Simulated data, as demonstrated here, provides a powerful approach to comprehensively test the performance of modelling tools and to disentangle the effects of data properties and modelling options on model accuracy. This may be impossible to achieve using real-world data.


Sign in / Sign up

Export Citation Format

Share Document