Presence-only and Presence-absence Data for Comparing Species Distribution Modeling Methods

Jane Elith; Catherine Graham; Roozbeh Valavi; Meinrad Abegg; Caroline Bruce; Andrew Ford; Antoine Guisan; Robert J. Hijmans; Falk Huettmann; Lucia Lohmann; Bette Loiselle; Craig Moritz; Jake Overton; A. Townsend Peterson; Steven Phillips; Karen Richardson; Stephen Williams; Susan K. Wiser; Thomas Wohlgemuth; Niklaus E. Zimmermann

doi:10.17161/bi.v15i2.13384

Presence-only and Presence-absence Data for Comparing Species Distribution Modeling Methods

Biodiversity Informatics ◽

10.17161/bi.v15i2.13384 ◽

2020 ◽

Vol 15 (2) ◽

pp. 69-80

Author(s):

Jane Elith ◽

Catherine Graham ◽

Roozbeh Valavi ◽

Meinrad Abegg ◽

Caroline Bruce ◽

...

Keyword(s):

Species Distribution ◽

Large Scale ◽

R Package ◽

Open Science ◽

Species Occurrence ◽

Distribution Models ◽

Algorithm Performance ◽

Modeling Methods ◽

Science Framework ◽

Occurrence Records

Species distribution models (SDMs) are widely used to predict and study distributions of species. Many different modeling methods and associated algorithms are used and continue to emerge. It is important to understand how different approaches perform, particularly when applied to species occurrence records that were not gathered in structured surveys (e.g. opportunistic records). This need motivated a large-scale, collaborative effort, published in 2006, that aimed to create objective comparisons of algorithm performance. As a benchmark, and to facilitate future comparisons of approaches, here we publish that dataset: point location records for 226 anonymized species from six regions of the world, with accompanying predictor variables in raster (grid) and point formats. A particularly interesting characteristic of this dataset is that independent presence-absence survey data are available for evaluation alongside the presence-only species occurrence data intended for modeling. The dataset is available on Open Science Framework and as an R package and can be used as a benchmark for modeling approaches and for testing new ways to evaluate the accuracy of SDMs.

Download Full-text

Genetic and Transcriptomic Biomarkers in Neurodegenerative Diseases: Current Situation and the Road Ahead

Cells ◽

10.3390/cells10051030 ◽

2021 ◽

Vol 10 (5) ◽

pp. 1030

Author(s):

Julie Lake ◽

Catherine S. Storm ◽

Mary B. Makarious ◽

Sara Bandres-Ciga

Keyword(s):

Neurodegenerative Diseases ◽

Large Scale ◽

Disease Diagnosis ◽

Open Science ◽

Future Directions ◽

The Road ◽

Current State ◽

Molecular Complexity ◽

Science Framework ◽

Lateral Sclerosis

Neurodegenerative diseases are etiologically and clinically heterogeneous conditions, often reflecting a spectrum of disease rather than well-defined disorders. The underlying molecular complexity of these diseases has made the discovery and validation of useful biomarkers challenging. The search of characteristic genetic and transcriptomic indicators for preclinical disease diagnosis, prognosis, or subtyping is an area of ongoing effort and interest. The next generation of biomarker studies holds promise by implementing meaningful longitudinal and multi-modal approaches in large scale biobank and healthcare system scale datasets. This work will only be possible in an open science framework. This review summarizes the current state of genetic and transcriptomic biomarkers in Parkinson’s disease, Alzheimer’s disease, and amyotrophic lateral sclerosis, providing a comprehensive landscape of recent literature and future directions.

Download Full-text

The ghost of past species occurrence: improving species distribution models for presence-only data

Journal of Applied Ecology ◽

10.1111/j.1365-2664.2006.01191.x ◽

2006 ◽

Vol 43 (4) ◽

pp. 802-815 ◽

Cited By ~ 61

Author(s):

MICHAEL LÜTOLF ◽

FELIX KIENAST ◽

ANTOINE GUISAN

Keyword(s):

Species Distribution ◽

Species Distribution Models ◽

Species Occurrence ◽

Distribution Models

Download Full-text

blockCV: an R package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models

10.1101/357798 ◽

2018 ◽

Cited By ~ 3

Author(s):

Roozbeh Valavi ◽

Jane Elith ◽

José J. Lahoz-Monfort ◽

Gurutzeta Guillera-Arroita

Keyword(s):

Species Distribution ◽

Cross Validation ◽

Species Distribution Models ◽

Predictive Performance ◽

R Package ◽

Species Distribution Modelling ◽

List Type ◽

Distribution Models ◽

Distribution Modelling ◽

Evaluation Approaches

SummaryWhen applied to structured data, conventional random cross-validation techniques can lead to underestimation of prediction error, and may result in inappropriate model selection.We present the R package blockCV, a new toolbox for cross-validation of species distribution modelling.The package can generate spatially or environmentally separated folds. It includes tools to measure spatial autocorrelation ranges in candidate covariates, providing the user with insights into the spatial structure in these data. It also offers interactive graphical capabilities for creating spatial blocks and exploring data folds.Package blockCV enables modellers to more easily implement a range of evaluation approaches. It will help the modelling community learn more about the impacts of evaluation approaches on our understanding of predictive performance of species distribution models.

Download Full-text

Trends and gaps in the use of citizen science derived data as input for species distribution models: A quantitative review

PLoS ONE ◽

10.1371/journal.pone.0234587 ◽

2021 ◽

Vol 16 (3) ◽

pp. e0234587

Author(s):

Mariano J. Feldman ◽

Louis Imbeau ◽

Philippe Marchand ◽

Marc J. Mazerolle ◽

Marcel Darveau ◽

...

Keyword(s):

Citizen Science ◽

Species Distribution ◽

Species Distribution Models ◽

Spatial Scales ◽

Species Conservation ◽

Innovative Technology ◽

Data Types ◽

Species Occurrence ◽

Distribution Models ◽

Digital Platforms

Citizen science (CS) currently refers to the participation of non-scientist volunteers in any discipline of conventional scientific research. Over the last two decades, nature-based CS has flourished due to innovative technology, novel devices, and widespread digital platforms used to collect and classify species occurrence data. For scientists, CS offers a low-cost approach of collecting species occurrence information at large spatial scales that otherwise would be prohibitively expensive. We examined the trends and gaps linked to the use of CS as a source of data for species distribution models (SDMs), in order to propose guidelines and highlight solutions. We conducted a quantitative literature review of 207 peer-reviewed articles to measure how the representation of different taxa, regions, and data types have changed in SDM publications since the 2010s. Our review shows that the number of papers using CS for SDMs has increased at approximately double the rate of the overall number of SDM papers. However, disparities in taxonomic and geographic coverage remain in studies using CS. Western Europe and North America were the regions with the most coverage (73%). Papers on birds (49%) and mammals (19.3%) outnumbered other taxa. Among invertebrates, flying insects including Lepidoptera, Odonata and Hymenoptera received the most attention. Discrepancies between research interest and availability of data were as especially important for amphibians, reptiles and fishes. Compared to studies on animal taxa, papers on plants using CS data remain rare. Although the aims and scope of papers are diverse, species conservation remained the central theme of SDM using CS data. We present examples of the use of CS and highlight recommendations to motivate further research, such as combining multiple data sources and promoting local and traditional knowledge. We hope our findings will strengthen citizen-researchers partnerships to better inform SDMs, especially for less-studied taxa and regions. Researchers stand to benefit from the large quantity of data available from CS sources to improve global predictions of species distributions.

Download Full-text

occAssess: An R package for assessing potential biases in species occurrence data

10.1101/2021.04.19.440441 ◽

2021 ◽

Author(s):

Robin James Boyd ◽

Gary Powney ◽

Claire Carvell ◽

Oliver Pescott

Keyword(s):

Target Population ◽

R Package ◽

Heterogeneous Databases ◽

Ecological Research ◽

Species Occurrence ◽

Worked Example ◽

Occurrence Data ◽

Data Coverage ◽

Discrete Functions ◽

Occurrence Records

Species occurrence records from a variety of sources are increasingly aggregated into heterogeneous databases and made available to ecologists for immediate analytical use. However, these data are typically biased, i.e. they are not a representative sample of the target population of interest, meaning that the information they provide may not be an accurate reflection of reality. It is therefore crucial that species occurrence data are properly scrutinised before they are used for research. In this article, we introduce occAssess, an R package that enables quick and easy screening of species occurrence data for potential biases. The package contains a number of discrete functions, each of which returns a measure of the potential for bias in one or more of the taxonomic, temporal, spatial and environmental dimensions. The outputs are provided visually (as ggplot2 objects) and do not include a formal recommendation as to whether data are of sufficient quality for any given inferential use. Instead, they should be used as ancillary information and viewed in the context of the question that is being asked, and the methods that are being used to answer it. We demonstrate the utility of occAssess by applying it to data on two key pollinator taxa in South America: leaf-nosed bats (Phyllostomidae) and hoverflies (Syrphidae). In this worked example, we briefly assess the degree to which various aspect of data coverage appear to have changed over time. We then discuss additional ways in which the package could be used, highlight its limitations, and point to where it could be improved in the future. Going forward, we hope that occAssess will help to improve the quality, and transparency, of assessments of species occurrence data as a necessary first step where they are being used for ecological research at large scales.

Download Full-text

speciesgeocodeR: An R package for linking species occurrences, user-defined regions and phylogenetic trees for biogeography, ecology and evolution

10.1101/032755 ◽

2015 ◽

Cited By ~ 6

Author(s):

Alexander Zizka ◽

Alexandre Antonelli

Keyword(s):

Data Quality ◽

Phylogenetic Trees ◽

Large Scale ◽

Data Cleaning ◽

R Package ◽

Species Occurrence ◽

Occurrence Data ◽

User Friendly ◽

Species Occurrences

1. Large-scale species occurrence data from geo-referenced observations and collected specimens are crucial for analyses in ecology, evolution and biogeography. Despite the rapidly growing availability of such data, their use in evolutionary analyses is often hampered by tedious manual classification of point occurrences into operational areas, leading to a lack of reproducibility and concerns regarding data quality. 2. Here we present speciesgeocodeR, a user-friendly R-package for data cleaning, data exploration and data visualization of species point occurrences using discrete operational areas, and linking them to analyses invoking phylogenetic trees. 3. The three core functions of the package are 1) automated and reproducible data cleaning, 2) rapid and reproducible classification of point occurrences into discrete operational areas in an adequate format for subsequent biogeographic analyses, and 3) a comprehensive summary and visualization of species distributions to explore large datasets and ensure data quality. In addition, speciesgeocodeR facilitates the access and analysis of publicly available species occurrence data, widely used operational areas and elevation ranges. Other functionalities include the implementation of minimum occurrence thresholds and the visualization of coexistence patterns and range sizes. SpeciesgeocodeR accompanies a richly illustrated and easy-to-follow tutorial and help functions.

Download Full-text

Exploring snake occurrence records: Spatial biases and marginal gains from accessible social media

PeerJ ◽

10.7717/peerj.8059 ◽

2019 ◽

Vol 7 ◽

pp. e8059 ◽

Cited By ~ 2

Author(s):

Benjamin M. Marshall ◽

Colin T. Strine

Keyword(s):

Social Media ◽

North America ◽

South America ◽

Species Distribution ◽

Species Distribution Models ◽

Conservation Status ◽

Model Performance ◽

Distribution Models ◽

Occurrence Data ◽

Occurrence Records

A species’ distribution provides fundamental information on: climatic niche, biogeography, and conservation status. Species distribution models often use occurrence records from biodiversity databases, subject to spatial and taxonomic biases. Deficiencies in occurrence data can lead to incomplete species distribution estimates. We can incorporate other data sources to supplement occurrence datasets. The general public is creating (via GPS-enabled cameras to photograph wildlife) incidental occurrence records that may present an opportunity to improve species distribution models. We investigated (1) occurrence data of a cryptic group of animals: non-marine snakes, in a biodiversity database (Global Biodiversity Information Facility (GBIF)) and determined (2) whether incidental occurrence records extracted from geo-tagged social media images (Flickr) could improve distribution models for 18 tropical snake species. We provide R code to search for and extract data from images using Flickr’s API. We show the biodiversity database’s 302,386 records disproportionately originate from North America, Europe and Oceania (250,063, 82.7%), with substantial gaps in tropical areas that host the highest snake diversity. North America, Europe and Oceania averaged several hundred records per species; whereas Asia, Africa and South America averaged less than 35 per species. Occurrence density showed similar patterns; Asia, Africa and South America have roughly ten-fold fewer records per 100 km2than other regions. Social media provided 44,687 potential records. However, including them in distribution models only marginally impacted niche estimations; niche overlap indices were consistently over 0.9. Similarly, we show negligible differences in Maxent model performance between models trained using GBIF-only and Flickr-supplemented datasets. Model performance appeared dependent on species, rather than number of occurrences or training dataset. We suggest that for tropical snakes, accessible social media currently fails to deliver appreciable benefits for estimating species distributions; but due to the variation between species and the rapid growth in social media data, may still be worth considering in future contexts.

Download Full-text

Multispecies and Watershed Approaches to Freshwater Fish Conservation

Multispecies and Watershed Approaches to Freshwater Fish Conservation ◽

10.47886/9781934874578.ch17 ◽

2019 ◽

Keyword(s):

Climate Change ◽

Land Use ◽

Fisheries Management ◽

Species Distribution ◽

Large Scale ◽

Species Distribution Models ◽

Distribution Models ◽

Climate Conditions ◽

Resource Sustainability ◽

Riparian Protection

<em>Abstract</em>.—Increasingly, fisheries managers must make important decisions in complex environments where rapidly changing landscape and climate conditions interact with historical impacts to influence resource sustainability. Successful fisheries management in this setting will require that we adapt traditional management approaches to incorporate information on these complex interacting factors—a process referred to as resilient fisheries management. Large-scale species distribution data and predictive models have the potential to enhance the management of freshwater fishes through improved understanding of how past, present, and future natural and anthropogenic factors combine to determine species vulnerability and resiliency. Here we describe a resilient fisheries management framework that provides guidance on how and when these models can be incorporated into traditional approaches to meet specific goals and objectives for resource sustainability. In addition to elucidating complex drivers of distributional patterns and change, species distribution models can inform the prioritization, application, and implementation of management activities such as restoration (e.g., instream habitat and riparian), protection (e.g., areas where additional land use would result in a change in species distribution), and regulations (e.g., harvest restriction) in a way that informs resiliency to land use and climate change. Although considerable progress has been made with respect to applying species distribution models to the management of Brook Trout <em>Salvelinus fontinalis </em>and other aquatic species, there are several areas where a more unified research and management effort could increase the ability of distribution models to inform resilient management. Future efforts should aim to improve (1) data availability, consistency (sampling methodology), and quality (accounting for detection); (2) partnerships among researchers, agencies, and managers; and (3) model accessibility and understanding of limitations and potential benefits to managers (e.g., incorporation into publicly available decision support systems). The information and recommendations provided herein can be used to promote and advance the use of models in resilient fisheries management in the face of continued large-scale land use and climate change.

Download Full-text

Applications and limitations of museum data for conservation and ecology, with particular attention to species distribution models

Progress in Physical Geography Earth and Environment ◽

10.1177/0309133309355630 ◽

2010 ◽

Vol 34 (1) ◽

pp. 3-22 ◽

Cited By ~ 205

Author(s):

Tim Newbold

Keyword(s):

Species Distribution ◽

Species Distribution Models ◽

The Internet ◽

Limited Information ◽

Species Occurrence ◽

Distribution Models ◽

Museum Data ◽

Private Collections ◽

Source Of Information ◽

Occurrence Of Species

To conserve biodiversity, it is necessary to understand how species are distributed and which aspects of the environment determine distributions. In large parts of the world and for the majority of species, data describing distributions are very scarce. Museums, private collections and the historical literature offer a vast source of information on distributions. Records of the occurrence of species from these sources are increasingly being captured in electronic databases and made available over the internet. These records may be very valuable in conservation efforts. However, there are a number of limitations with museum data. These limitations are dealt with in the first part of this review. Even if the limitations of museum data can be overcome, these data present a far-from-complete picture of the distributions of species. Species distribution models offer a means to extrapolate limited information in order to estimate the distributions of species over large areas. The second part of this paper reviews the challenges of developing species distribution models for use with museum data and describes some of the questions that species distribution models have been used to address. Given the rapidly increasing number of museum records of species occurrence available over the internet, a review of their usefulness in conservation and ecology is timely.

Download Full-text

Ritual, Identity Fusion, and the Inauguration of President Trump: A pseudo-experiment of Ritual Modes theory

10.31234/osf.io/53waf ◽

2018 ◽

Author(s):

Rohan Kapitány ◽

Christopher Michael Kavanagh ◽

Michael Buhrmester ◽

Martha Newson ◽

Harvey Whitehouse

Keyword(s):

Large Scale ◽

Emotional Responses ◽

Open Science ◽

Social Identities ◽

Self Reflection ◽

The Us ◽

Identity Fusion ◽

Science Framework ◽

Fusion Theory ◽

Diverse Groups

The US Presidential Inauguration is a symbolic event which arouses significant emotional responses among diverse groups, and is of considerable significance to Americans’ personal and social identities. We argue that the inauguration qualifies as an Imagistic Ritual (Whitehouse, 2004). Such ritual experiences are thought to produce identity fusion: a visceral sense of oneness with the group. The 2017 Inauguration of President Trump was a unique opportunity to examine how a large-scale naturalistic imagistic ritual influences the social identities of Americans who supported and opposed President Trump. We conducted a pre-registered 7-week longitudinal investigation among an online sample of Americans in order to examine how President Trump’s Inauguration influenced identity fusion. One core prediction was that the affective responses to the inauguration would predict positive changes in fusion, mediated by self-reflection. We did not find support for this. However, the inauguration was associated with flashbulb-like memories, and positive emotional response at the time of the event predicted changes in fusion to both ingroup and outgroup targets. Finally, both positive and negative emotional responses inspired self-reflection, but did not mediate the relationship with fusion. We discuss the implications of our findings for models linking group psychology, fusion theory, and ritual modes. All material available at the Open Science Framework: https://bit.ly/2Qu0G37.

Download Full-text