scholarly journals Do pseudo-absence selection strategies influence species distribution models and their predictions? An information-theoretic approach based on simulated data

BMC Ecology ◽  
2009 ◽  
Vol 9 (1) ◽  
pp. 8 ◽  
Author(s):  
Mary S Wisz ◽  
Antoine Guisan
2019 ◽  
Author(s):  
Truly Santika ◽  
Michael F. Hutchinson ◽  
Kerrie A. Wilson

ABSTRACTPresence-only data used to develop species distribution models are often biased towards areas that are frequently surveyed. Furthermore, the size of calibration area with respect to the area covered by the species occurrences has been shown to affect model accuracy. However, existing assessments of the effect of data inadequacy and calibration size on model accuracy have predominately been conducted using empirical studies. These studies can give ambiguous results, since the data used to train and test the model can both be biased.These limitations were addressed by applying simulated data to assess how inadequate data coverage and the size of calibration area affect the accuracy of species distribution models generated by MaxEnt and BIOCLIM. The validity of four presence-only performance measures, Contrast Validation Index (CVI), Boyce index, AUC and AUCratio, was also assessed.CVI, AUC and AUCratio ranked the accuracy of univariate models correctly according to the true importance of their defining environmental variable, a desirable property of an accuracy measure. Contrastingly, Boyce index failed to rank the accuracy of univariate models correctly and a high percentage of irrelevant variables produced models with a high Boyce index.Inadequate data coverage and increased calibration area reduced model accuracy by reducing the correct identification of the dominant environmental determinant. BIOCLIM outperformed MaxEnt models in predicting the true distribution of simulated species with a symmetric dominant response. However, MaxEnt outperformed BIOCLIM in predicting the true distribution of simulated species with skew and linear dominant responses. Despite this, the standard performance measures consistently overestimated the performance of MaxEnt models and showed them as always having higher model accuracy than the BIOCLIM models.It has been acknowledged that research should be directed towards testing and improving species distribution modelling tools, particularly how to handle the inevitable bias and scarcity of species occurrence data. Simulated data, as demonstrated here, provides a powerful approach to comprehensively test the performance of modelling tools and to disentangle the effects of data properties and modelling options on model accuracy. This may be impossible to achieve using real-world data.


2003 ◽  
Vol 89 (5) ◽  
pp. 2810-2822 ◽  
Author(s):  
Edmund T. Rolls ◽  
Leonardo Franco ◽  
Nicholas C. Aggelopoulos ◽  
Steven Reece

To analyze the extent to which populations of neurons encode information in the numbers of spikes each neuron emits or in the relative time of firing of the different neurons that might reflect synchronization, we developed and analyzed the performance of an information theoretic approach. The formula quantifies the corrections to the instantaneous information rate that result from correlations in spike emission between pairs of neurons. We showed how these cross-cell terms can be separated from the correlations that occur between the spikes emitted by each neuron, the auto-cell terms in the information rate expansion. We also described a method to test whether the estimate of the amount of information contributed by stimulus-dependent synchronization is significant. With simulated data, we show that the approach can separate information arising from the number of spikes emitted by each neuron from the redundancy that can arise if neurons have common inputs and from the synergy that can arise if cells have stimulus-dependent synchronization. The usefulness of the approach is also demonstrated by showing how it helps to interpret the encoding shown by neurons in the primate inferior temporal visual cortex. When applied to a sample dataset of simultaneously recorded inferior temporal cortex neurons, the algorithm showed that most of the information is available in the number of spikes emitted by each cell; that there is typically just a small degree (approximately 12%) of redundancy between simultaneously recorded inferior temporal cortex (IT) neurons; and that there is very little gain of information that arises from stimulus-dependent synchronization effects in these neurons.


2014 ◽  
Vol 38 (1) ◽  
pp. 117-128 ◽  
Author(s):  
Jennifer A. Miller

Species distribution models (SDMs) have become a dominant paradigm for quantifying species-environment relationships, and both the models and their outcomes have seen widespread use in conservation studies, particularly in the context of climate change research. With the growing interest in SDMs, extensive comparative studies have been undertaken. However, few generalizations and recommendations have resulted from these empirical studies, largely due to the confounding effects of differences in and interactions among the statistical methods, species traits, data characteristics, and accuracy metrics considered. This progress report addresses ‘virtual species distribution models’: the use of spatially explicit simulated data to represent a ‘true’ species distribution in order to evaluate aspects of model conceptualization and implementation. Simulating a ‘true’ species distribution, or a virtual species distribution, and systematically testing how these aspects affect SDMs, can provide an important baseline and generate new insights into how these issues affect model outcomes.


2014 ◽  
Vol 60 (2) ◽  
pp. 170-179 ◽  
Author(s):  
Gentile Francesco Ficetola ◽  
Anna Bonardi ◽  
Paola Mairota ◽  
Vincenzo Leronni ◽  
Emilio Padoa-Schioppa

Abstract Crop damages by wildlife is a frequent form of human-wildlife conflict. Identifying areas where the risk of crop damages is highest is pivotal to set up preventive measures and reduce conflict. Species distribution models are routinely used to predict species distribution in response of environmental changes. The aim of this paper was assessing whether species distribution models can allow to identify the areas most at risk of crop damages, helping to set up management strategies aimed at the mitigation of human-wildlife conflicts. We obtained data on wild boar Sus scrofa damages to crops in the Alta Murgia National Park, Southern Italy, and related them to landscape features, to identify areas where the risk of wild boar damages is highest. We used MaxEnt to build species distribution models. We identified the spatial scale at which landscape mostly affects the distribution damages, and optimized the regularization parameter of models, through an information-theoretic approach based on AIC. Wild boar damages quickly increased in the period 2007-2011; cereals and legumes were the crops more affected. Large areas of the park have a high risk of wild boar damages. The risk of damages was related to low cover of urban areas or olive grows, intermediate values of forest cover, and high values of shrubland cover within a 2-km radius. Temporally independent validation data demonstrated that models can successfully predict damages in the future. Species distribution models can accurately identify the areas most at risk of wildlife damages, as models calibrated on data collected during only a subset of years correctly predicted damages in the subsequent year.


2003 ◽  
Vol 23 (4) ◽  
pp. 490-498 ◽  
Author(s):  
Federico E. Turkheimer ◽  
Rainer Hinz ◽  
Vincent J. Cunningham

This article deals with the problem of model selection for the mathematical description of tracer kinetics in nuclear medicine. It stems from the consideration of some specific data sets where different models have similar performances. In these situations, it is shown that considerate averaging of a parameter's estimates over the entire model set is better than obtaining the estimates from one model only. Furthermore, it is also shown that the procedure of averaging over a small number of “good” models reduces the “generalization error,” the error introduced when the model selected over a particular data set is applied to different conditions, such as subject populations with altered physiologic parameters, modified acquisition protocols, and different signal-to-noise ratios. The method of averaging over the entire model set uses Akaike coefficients as measures of an individual model's likelihood. To facilitate the understanding of these statistical tools, the authors provide an introduction to model selection criteria and a short technical treatment of Akaike's information–theoretic approach. The new method is illustrated and epitomized by a case example on the modeling of [11C]flumazenil kinetics in the brain, containing both real and simulated data.


Sign in / Sign up

Export Citation Format

Share Document