phyr: An R package for phylogenetic species-distribution modelling in ecological communities

SummaryModel-based approaches are increasingly popular in ecological studies. A good example of this trend is the use of joint species distribution models to ask questions about ecological communities. However, most current applications of model-based methods do not include phylogenies despite the well-known importance of phylogenetic relationships in shaping species distributions and community composition. In part, this is due to lack of accessible tools allowing ecologists to fit phylogenetic species distribution models easily.To fill this gap, the R package phyr (pronounced fire) implements a suite of metrics, comparative methods and mixed models that use phylogenies to understand and predict community composition and other ecological and evolutionary phenomena. The phyr workhorse functions are implemented in C++ making all calculations and model estimations fast.phyr can fit a variety of models such as phylogenetic joint-species distribution models, spatiotemporal-phylogenetic autocorrelation models, and phylogenetic trait-based bipartite network models. phyr also estimates phylogenetically independent trait correlations with measurement error to test for adaptive syndromes and performs fast calculations of common alpha and beta phylogenetic diversity metrics. All phyr methods are united under Brownian motion or Ornstein-Uhlenbeck models of evolution and phylogenetic terms are modelled as phylogenetic covariance matrices.The functions and model formula syntax we propose in phyr serves as a simple and unified framework that ignites the use of phylogenies to address a variety of ecological questions.

Download Full-text

blockCV: an R package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models

10.1101/357798 ◽

2018 ◽

Cited By ~ 3

Author(s):

Roozbeh Valavi ◽

Jane Elith ◽

José J. Lahoz-Monfort ◽

Gurutzeta Guillera-Arroita

Keyword(s):

Species Distribution ◽

Cross Validation ◽

Species Distribution Models ◽

Predictive Performance ◽

R Package ◽

Species Distribution Modelling ◽

List Type ◽

Distribution Models ◽

Distribution Modelling ◽

Evaluation Approaches

SummaryWhen applied to structured data, conventional random cross-validation techniques can lead to underestimation of prediction error, and may result in inappropriate model selection.We present the R package blockCV, a new toolbox for cross-validation of species distribution modelling.The package can generate spatially or environmentally separated folds. It includes tools to measure spatial autocorrelation ranges in candidate covariates, providing the user with insights into the spatial structure in these data. It also offers interactive graphical capabilities for creating spatial blocks and exploring data folds.Package blockCV enables modellers to more easily implement a range of evaluation approaches. It will help the modelling community learn more about the impacts of evaluation approaches on our understanding of predictive performance of species distribution models.

Download Full-text

Confronting preferential sampling in wildlife surveys: diagnosis and model-based triage†

10.1101/080879 ◽

2016 ◽

Author(s):

Paul B. Conn ◽

James T. Thorson ◽

Devin S. Johnson

Keyword(s):

Survey Data ◽

Species Distribution ◽

Species Distribution Models ◽

List Type ◽

Implicit Assumption ◽

Distribution Models ◽

Model Based ◽

Preferential Sampling ◽

Animal Populations ◽

Biasing Effects

SummaryWildlife surveys are often used to estimate the density, abundance, or distribution of animal populations. Recently, model-based approaches to analyzing survey data have become popular because one can more readily accommodate departures from pre-planned survey routes and construct more detailed maps than one can with design-based procedures.Species distribution models fitted to wildlife survey data often make the implicit assumption that locations chosen for sampling and animal abundance at those locations are conditionally independent given modeled covariates. However, this assumption is likely violated in many cases when survey effort is non-randomized, leading to preferential sampling.We develop a hierarchical statistical modeling framework for detecting and alleviating the biasing effects of preferential sampling in species distribution models fitted to count data. The approach works by jointly modeling wildlife state variables and the locations selected for sampling, and specifying a dependent correlation structure between the two models.Using simulation, we show that moderate levels of preferential sampling can lead to large (e.g. 40%) bias in estimates of animal density, and that our modeling approach can considerably reduce this bias.We apply our approach to aerial survey counts of bearded seals (Erignathus barbatus) in the eastern Bering Sea. Models that included a preferential sampling effect led to lower estimates of abundance than models without, but the effect size of the preferential sampling parameter decreased in models that included explanatory environmental covariates.When wildlife surveys are conducted without a well-defined sampling frame, ecologists should recognize the potentially biasing effects of preferential sampling. Joint models, such as those described in this paper, can be used to test and correct for such biases. Predictive covariates are also useful for bias reduction, but ultimately the best way to avoid preferential sampling bias is to incorporate design-based principles such as randomization and/or systematic sampling into survey design.

Download Full-text

Assessing Historical Fish Community Composition Using Surveys, Historical Collection Data, and Species Distribution Models

PLoS ONE ◽

10.1371/journal.pone.0025145 ◽

2011 ◽

Vol 6 (9) ◽

pp. e25145 ◽

Cited By ~ 27

Author(s):

Ben Labay ◽

Adam E. Cohen ◽

Blake Sissel ◽

Dean A. Hendrickson ◽

F. Douglas Martin ◽

...

Keyword(s):

Community Composition ◽

Species Distribution ◽

Fish Community ◽

Species Distribution Models ◽

Distribution Models ◽

Fish Community Composition ◽

Collection Data

Download Full-text

The effects of data adequacy and calibration size on the accuracy of presence-only species distribution models

10.1101/775700 ◽

2019 ◽

Author(s):

Truly Santika ◽

Michael F. Hutchinson ◽

Kerrie A. Wilson

Keyword(s):

Performance Measures ◽

Species Distribution ◽

Species Distribution Models ◽

Simulated Data ◽

List Type ◽

Model Accuracy ◽

Distribution Models ◽

Accuracy Measure ◽

Data Coverage ◽

True Distribution

ABSTRACTPresence-only data used to develop species distribution models are often biased towards areas that are frequently surveyed. Furthermore, the size of calibration area with respect to the area covered by the species occurrences has been shown to affect model accuracy. However, existing assessments of the effect of data inadequacy and calibration size on model accuracy have predominately been conducted using empirical studies. These studies can give ambiguous results, since the data used to train and test the model can both be biased.These limitations were addressed by applying simulated data to assess how inadequate data coverage and the size of calibration area affect the accuracy of species distribution models generated by MaxEnt and BIOCLIM. The validity of four presence-only performance measures, Contrast Validation Index (CVI), Boyce index, AUC and AUCratio, was also assessed.CVI, AUC and AUCratio ranked the accuracy of univariate models correctly according to the true importance of their defining environmental variable, a desirable property of an accuracy measure. Contrastingly, Boyce index failed to rank the accuracy of univariate models correctly and a high percentage of irrelevant variables produced models with a high Boyce index.Inadequate data coverage and increased calibration area reduced model accuracy by reducing the correct identification of the dominant environmental determinant. BIOCLIM outperformed MaxEnt models in predicting the true distribution of simulated species with a symmetric dominant response. However, MaxEnt outperformed BIOCLIM in predicting the true distribution of simulated species with skew and linear dominant responses. Despite this, the standard performance measures consistently overestimated the performance of MaxEnt models and showed them as always having higher model accuracy than the BIOCLIM models.It has been acknowledged that research should be directed towards testing and improving species distribution modelling tools, particularly how to handle the inevitable bias and scarcity of species occurrence data. Simulated data, as demonstrated here, provides a powerful approach to comprehensively test the performance of modelling tools and to disentangle the effects of data properties and modelling options on model accuracy. This may be impossible to achieve using real-world data.

Download Full-text

Review for "phyr: An R package for phylogenetic species‐distribution modelling in ecological communities"

10.1111/2041-210x.13471/v1/review2 ◽

2020 ◽

Keyword(s):

Species Distribution ◽

R Package ◽

Species Distribution Modelling ◽

Ecological Communities ◽

Phylogenetic Species ◽

Distribution Modelling

Download Full-text

sdmbench: R package for benchmarking species distribution models

The Journal of Open Source Software ◽

10.21105/joss.00847 ◽

2018 ◽

Vol 3 (29) ◽

pp. 847 ◽

Cited By ~ 2

Author(s):

Boyan Angelov

Keyword(s):

Species Distribution ◽

Species Distribution Models ◽

R Package ◽

Distribution Models

Download Full-text

Structuring the unstructured: estimating species-specific absence from multi-species presence data to inform pseudo-absence selection in species distribution models

10.1101/656629 ◽

2019 ◽

Cited By ~ 1

Author(s):

Simon Croft ◽

Graham C. Smith

Keyword(s):

Species Distribution ◽

Standard Method ◽

Species Distribution Models ◽

Predictive Performance ◽

Survey Method ◽

List Type ◽

Distribution Models ◽

Target Groups ◽

Significant Difference ◽

Survey Effort

AbstractSpecies distribution models (SDMs) are an increasingly popular tool in ecology which, together with a vast wealth of data from citizen science projects, have the potential to dramatically improve our understanding of species behaviour for applications such as conservation and wildlife management. However, many of the best performing models require information regarding survey effort, specifically absence, which is typically lacking in opportunistic datasets. To facilitate the use of such models, pseudo-absences from locations without recorded presence must be assumed. Several studies have suggested that survey effort, and hence likely absence, could be estimated from presence-only data by considering records across “target groups” of species defined according to taxonomy.We performed a probabilistic analysis, computing the conditional probability of recording a species given a particular set of species are also recorded, to test the validity of defining target groups by taxonomic order and to explore other potential groupings. Based on this quantification of recording associations we outline a new method to inform pseudo-absence selection comparing predictive performance, measured the area under curve (AUC) statistic, against the standard method of selection across a series of SDMs.Our findings show some support for target grouping classification based on taxonomy but indicate that an alternative classification using survey method may be more appropriate for informing effort and consequently absence. Across 49 terrestrial mammal species, pseudo-absence selection using our proposed method outperformed that of the standard method showing an improvement in the predictive performance of presence-absence models for 17 out of 22 with sufficient data to elicit a significant difference. Based on our method we also observed a substantial improvement in the performance of presence-absence models compared to that of presence-only models (MaxEnt) with a higher AUC for all 22 species showing a significant difference between approaches.We conclude that our method produces sensible robust pseudo-absences which either compliment patterns in known presences or, where conflicts occur, are explainable in terms of ecological variables potentially improving our understanding of species behaviour. Furthermore, we suggest that presence-absence models using these pseudo-absences provide a viable alternative to MaxEnt when modelling using presence-only data.

Download Full-text

Classification of unlabeled observations in Species Distribution Modelling using Point Process Models

10.1101/651125 ◽

2019 ◽

Author(s):

Emy Guilbault ◽

Ian Renner ◽

Michael Mahony ◽

Eric Beh

Keyword(s):

Species Distribution ◽

Species Distribution Models ◽

Species Distributions ◽

Species Distribution Modelling ◽

Process Models ◽

List Type ◽

Distribution Models ◽

Species Identity ◽

Distribution Modelling ◽

Unknown Species

1AbstractSpecies distribution modelling, which allows users to predict the spatial distribution of species with the use of environmental covariates, has become increasingly popular, with many software platforms providing tools to fit species distribution models. However, the species observations used in species distribution models can have varying levels of quality and can have incomplete information, such as uncertain species identity.In this paper, we develop two algorithms to reclassify observations with unknown species identities which simultaneously predict different species distributions using spatial point processes. We compare the performance of the different algorithms using different initializations and parameters with models fitted using only the observations with known species identity through simulations.We show that performance varies with differences in correlation among species distributions, species abundance, and the proportion of observations with unknown species identities. Additionally, some of the methods developed here outperformed the models that didn’t use the misspecified data.These models represent an helpful and promising tool for opportunistic surveys where misidentification happens or for the distribution of species newly separated in their taxonomy.

Download Full-text