scholarly journals embarcadero: Species distribution modelling with Bayesian additive regression trees in R

2019 ◽  
Author(s):  
Colin J. Carlson

embarcadero is an R package of convenience tools for species distribution modelling with Bayesian additive regression trees (BART), a powerful machine learning approach that has been rarely applied to ecological problems. Like other classification and regression tree methods, BART estimates the probability of a binary outcome based on a set of decision trees. Unlike other methods, BART iteratively generates sets of trees based on a set of priors about tree structure and nodes, and builds a posterior distribution of estimated classification probabilities. So far, BARTs have yet to be applied to species distribution modelling. embarcadero is a workflow wrapper for BART species distribution models, and includes functionality for easy spatial prediction, an automated variable selection procedure, several types of partial dependence visualization, and other tools for ecological application. The embarcadero package is available open source on Github and intended for eventual CRAN release. To show how embarcadero can be used by ecologists, I illustrate a BART workflow for a virtual species distribution model. The supplement includes a more advanced vignette showing how BART can be used for mapping disease transmission risk, using the example of Crimean-Congo haemorrhagic fever in Africa.

2020 ◽  
Author(s):  
Willson Gaul ◽  
Dinara Sadykova ◽  
Hannah J. White ◽  
Lupe León-Sánchez ◽  
Paul Caplat ◽  
...  

ABSTRACTBiological records are often the data of choice for training predictive species distribution models (SDMs), but spatial sampling bias is pervasive in biological records data at multiple spatial scales and is thought to impair the performance of SDMs. We simulated presences and absences of virtual species as well as the process of recording these species to evaluate the effect on species distribution model prediction performance of 1) spatial bias in training data, 2) sample size (the average number of observations per species), and 3) the choice of species distribution modelling method. Our approach is novel in quantifying and applying real-world spatial sampling biases to simulated data. Spatial bias in training data decreased species distribution model prediction performance, but only when the bias was relatively strong. Sample size and the choice of modelling method were more important than spatial bias in determining the prediction performance of species distribution models.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e10411
Author(s):  
Willson Gaul ◽  
Dinara Sadykova ◽  
Hannah J. White ◽  
Lupe Leon-Sanchez ◽  
Paul Caplat ◽  
...  

Biological records are often the data of choice for training predictive species distribution models (SDMs), but spatial sampling bias is pervasive in biological records data at multiple spatial scales and is thought to impair the performance of SDMs. We simulated presences and absences of virtual species as well as the process of recording these species to evaluate the effect on species distribution model prediction performance of (1) spatial bias in training data, (2) sample size (the average number of observations per species), and (3) the choice of species distribution modelling method. Our approach is novel in quantifying and applying real-world spatial sampling biases to simulated data. Spatial bias in training data decreased species distribution model prediction performance, but sample size and the choice of modelling method were more important than spatial bias in determining the prediction performance of species distribution models.


Author(s):  
M. Z. G. Untalan ◽  
D. F. M. Burgos ◽  
K. P. Martinez

Abstract. Maxent is a machine learning model used for species distribution modelling (SDM) that is rising in popularity. As with any species distribution model, it needs to be validated for certain species before being used to generate insights and trusted predictions. Using Maxent, SDM of two endemic species in the Philippines, Varanus palawanensis (Palawan monitor lizard) and Caprimulgus manillensis (Philippine nightjar), were created using presence-only data, with 14 V. palawanensis and 771 C. manillensis occurrences, and 19 bioclimatic variables from BIOCLIM. This study shows the consistency to historical facts of Maxent on two endemic species of the Philippines of varying nature. The applicability of Maxent on the two very different species show that Maxent has high likelihood to give good results for other species. Showing that Maxent is applicable to the species of the Philippines gives additional tools for ecologists and national administrators to lead the development of the Philippines in the direction that conserves the biodiversity of the Philippines and that increases the productivity and quality of life in the Philippines.


2020 ◽  
Author(s):  
V. Tytar ◽  
O. Baidashnikov

Species distribution models (SDMs) are generally thought to be good indicators of habitat suitability, and thus of species’ performance, consequently SDMs can be validated by checking whether the areas projected to have the greatest habitat quality are occupied by individuals or populations with higher than average fitness. We hypothesized a positive and statistically significant relationship between observed in the field body size of the snail V. turgida and modelled habitat suitability, tested this relationship with linear mixed models, and found that indeed, larger individuals tend to occupy high-quality areas, as predicted by the SDMs. However, by testing several SDM algorithms, we found varied levels of performance in terms of expounding this relationship. Marginal R2, expressing the variance explained by the fixed terms in the regression models, was adopted as a measure of functional accuracy, and used to rank the SDMs accordingly. In this respect, the Bayesian additive regression trees (BART) algorithm (Carlson, 2020) gave the best result, despite the low AUC and TSS. By restricting our analysis to the BART algorithm only, a variety of sets of environmental variables commonly or less used in the construction of SDMs were explored and tested according to their functional accuracy. In this respect, the SDM produced using the ENVIREM data set (Title, Bemmels, 2018) gave the best result.


2020 ◽  
Vol 77 (5) ◽  
pp. 1841-1853
Author(s):  
Chongliang Zhang ◽  
Yong Chen ◽  
Binduo Xu ◽  
Ying Xue ◽  
Yiping Ren

Abstract Varying catchability is a common feature in fisheries and has great impacts on fisheries assessments and species distribution models. However, spatial variations in catchability have been rarely evaluated, especially in the multispecies context. We advocate that the need for multispecies models stands for both challenges and opportunities to handle spatial catchability. This study evaluated the influence of spatially varying catchability on the performance of a novel joint species distribution model, namely Hierarchical Modelling of Species Communities (HMSC). We implemented the model under nine simulation scenarios to account for diverse spatial patterns of catchability and conducted empirical tests using survey data from Yellow Sea, China. Our results showed that ignoring variability in catchability could lead to substantial errors in the inferences of species response to environment. Meanwhile, the models’ predictive power was less impacted, yielding proper predictions of relative abundance. Incorporating a spatially autocorrelated structure substantially improved the predictability of HMSC in both simulation and empirical tests. Nevertheless, combined sources of spatial catchabilities could largely diminish the advantage of HMSC in inference and prediction. We highlight situations where catchability needs to be explicitly accounted for in modelling fish distributions, and suggest directions for future applications and development of JSDMs.


Zoodiversity ◽  
2021 ◽  
Vol 55 (1) ◽  
pp. 25-40
Author(s):  
V. Tytar

Species distribution models (SDMs) are generally thought to be good indicators of habitat suitability, and thus of species’ performance. Consequently SDMs can be validated by checking whether the areas projected to have the greatest habitat quality are occupied by individuals or populations with higher than average fi tness. We hypothesized a positive and statistically signifi cant relationship between observed in the fi eld body size of the snail V. turgida (Rossmässler, 1836) and modelled habitat suitability, tested this relationship with linear mixed models, and found that indeed, larger individuals tend to occupy high-quality areas, as predicted by the SDMs. However, by testing several SDM algorithms, we found varied levels of performance in terms of expounding this relationship. Marginal R2 expressing the variance explained by the fi xed terms in the regression models, was adopted as a measure of functional accuracy, and used to rank the SDMs accordingly. In this respect, the Bayesian additive regression trees (BART) algorithm gave the best result, despite the low AUC and TSS. By restricting our analysis to the BART algorithm only, a variety of sets of environmental variables commonly or less used in the construction of SDMs were explored and tested according to their functional accuracy. In this respect, the SDM produced using the ENVIREM data set gave the best result.


2012 ◽  
Vol 367 (1586) ◽  
pp. 247-258 ◽  
Author(s):  
Colin M. Beale ◽  
Jack J. Lennon

Motivated by the need to solve ecological problems (climate change, habitat fragmentation and biological invasions), there has been increasing interest in species distribution models (SDMs). Predictions from these models inform conservation policy, invasive species management and disease-control measures. However, predictions are subject to uncertainty, the degree and source of which is often unrecognized. Here, we review the SDM literature in the context of uncertainty, focusing on three main classes of SDM: niche-based models, demographic models and process-based models. We identify sources of uncertainty for each class and discuss how uncertainty can be minimized or included in the modelling process to give realistic measures of confidence around predictions. Because this has typically not been performed, we conclude that uncertainty in SDMs has often been underestimated and a false precision assigned to predictions of geographical distribution. We identify areas where development of new statistical tools will improve predictions from distribution models, notably the development of hierarchical models that link different types of distribution model and their attendant uncertainties across spatial scales. Finally, we discuss the need to develop more defensible methods for assessing predictive performance, quantifying model goodness-of-fit and for assessing the significance of model covariates.


2015 ◽  
Vol 37 (2) ◽  
pp. 182 ◽  
Author(s):  
Martin Predavec ◽  
Daniel Lunney ◽  
Ian Shannon ◽  
Dave Scotts ◽  
John Turbill ◽  
...  

In Private Native Forestry in New South Wales, species-specific provisions in the code of practice are triggered by the presence of koalas (Phascolarctos cinereus), based on existing database records in the Atlas of NSW Wildlife. Whereas Species Distribution Modelling allows questions to be posed regarding the distribution of a species, and how it relates to environmental variables and threats, the key question, in many management situations, is whether or not a species is, or has been, present at a particular location, rather than the overall predicted distribution of the species. This is particularly the case for such a high-profile species as the koala. In this project, we developed a simple distribution model for the koala in New South Wales based on the proportion of koala records from within a suite of mammal records in 10 km × 10 km cells. This provides a measure of the likelihood of koalas being present. At the same time it allows deficiencies in the data to be highlighted, and recommendations made for further survey. This model and map will allow the potential for more robust and transparent decisions to be made regarding koala protection in areas proposed for private native forestry.


Sign in / Sign up

Export Citation Format

Share Document