scholarly journals Peer Review #1 of "Why choose Random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence (v0.1)"

PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e2849 ◽  
Author(s):  
Chunrong Mi ◽  
Falk Huettmann ◽  
Yumin Guo ◽  
Xuesong Han ◽  
Lijia Wen

Species distribution models (SDMs) have become an essential tool in ecology, biogeography, evolution and, more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this issue, we used the best available presence records for the Hooded Crane (Grus monacha,n = 33), White-naped Crane (Grus vipio,n = 40), and Black-necked Crane (Grus nigricollis,n = 75) in China as three case studies, employing four powerful and commonly used machine learning algorithms to map the breeding distributions of the three species: TreeNet (Stochastic Gradient Boosting, Boosted Regression Tree Model), Random Forest, CART (Classification and Regression Tree) and Maxent (Maximum Entropy Models). In addition, we developed an ensemble forecast by averaging predicted probability of the above four models results. Commonly used model performance metrics (Area under ROC (AUC) and true skill statistic (TSS)) were employed to evaluate model accuracy. The latest satellite tracking data and compiled literature data were used as two independent testing datasets to confront model predictions. We found Random Forest demonstrated the best performance for the most assessment method, provided a better model fit to the testing data, and achieved better species range maps for each crane species in undersampled areas. Random Forest has been generally available for more than 20 years and has been known to perform extremely well in ecological predictions. However, while increasingly on the rise, its potential is still widely underused in conservation, (spatial) ecological applications and for inference. Our results show that it informs ecological and biogeographical theories as well as being suitable for conservation applications, specifically when the study area is undersampled. This method helps to save model-selection time and effort, and allows robust and rapid assessments and decisions for efficient conservation.


2016 ◽  
Author(s):  
Chunrong Mi ◽  
Falk Huettmann ◽  
Yumin Guo ◽  
Xuesong Han ◽  
Lijia Wen

Species distribution models (SDMs) have become an essential tool in ecology, biogeography, evolution, and more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this issue, we used the best available presence records for the Hooded Crane (Grus monacha, n=33), White-naped Crane (Grus vipio, n=40), and Black-necked Crane (Grus nigricollis, n=75) in China as three case studies, employing four powerful and commonly used machine learning algorithms to map the breeding distributions of the three species: TreeNet (Stochastic Gradient Boosting, Boosted Regression Tree Model), Random Forest, CART (Classification and Regression Tree) and Maxent (Maximum Entropy Models) Besides, we developed an ensemble forecast by averaging predicted probability of above four models results. Commonly-used model performance metrics (Area under ROC (AUC) and true skill statistic (TSS)) were employed to evaluate model accuracy. Latest satellite tracking data and compiled literature data were used as two independent testing datasets to confront model predictions. We found Random Forest demonstrated the best performance for the most assessment method, provided a better model fit to the testing data, and achieved better species range maps for each crane species in undersampled areas. Random Forest has been generally available for more than 20 years, and by now, has been known to perform extremely well in ecological predictions. However, while increasingly on the rise its potential is still widely underused in conservation, (spatial) ecological applications and for inference. Our results show that it informs ecological and biogeographical theories as well as being suitable for conservation applications, specifically when the study area is undersampled. This method helps to save model-selection time and effort, and it allows robust and rapid assessments and decisions for efficient conservation.


2016 ◽  
Author(s):  
Chunrong Mi ◽  
Falk Huettmann ◽  
Yumin Guo ◽  
Xuesong Han ◽  
Lijia Wen

Species distribution models (SDMs) have become an essential tool in ecology, biogeography, evolution, and more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this issue, we used the best available presence records for the Hooded Crane (Grus monacha, n=33), White-naped Crane (Grus vipio, n=40), and Black-necked Crane (Grus nigricollis, n=75) in China as three case studies, employing four powerful and commonly used machine learning algorithms to map the breeding distributions of the three species: TreeNet (Stochastic Gradient Boosting, Boosted Regression Tree Model), Random Forest, CART (Classification and Regression Tree) and Maxent (Maximum Entropy Models) Besides, we developed an ensemble forecast by averaging predicted probability of above four models results. Commonly-used model performance metrics (Area under ROC (AUC) and true skill statistic (TSS)) were employed to evaluate model accuracy. Latest satellite tracking data and compiled literature data were used as two independent testing datasets to confront model predictions. We found Random Forest demonstrated the best performance for the most assessment method, provided a better model fit to the testing data, and achieved better species range maps for each crane species in undersampled areas. Random Forest has been generally available for more than 20 years, and by now, has been known to perform extremely well in ecological predictions. However, while increasingly on the rise its potential is still widely underused in conservation, (spatial) ecological applications and for inference. Our results show that it informs ecological and biogeographical theories as well as being suitable for conservation applications, specifically when the study area is undersampled. This method helps to save model-selection time and effort, and it allows robust and rapid assessments and decisions for efficient conservation.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Chongliang Zhang ◽  
Yong Chen ◽  
Binduo Xu ◽  
Ying Xue ◽  
Yiping Ren

2021 ◽  
Author(s):  
Gabriel Dansereau ◽  
Pierre Legendre ◽  
Timothée Poisot

Aim: Local contributions to beta diversity (LCBD) can be used to identify sites with high ecological uniqueness and exceptional species composition within a region of interest. Yet, these indices are typically used on local or regional scales with relatively few sites, as they require information on complete community compositions difficult to acquire on larger scales. Here, we investigate how LCBD indices can be used to predict ecological uniqueness over broad spatial extents using species distribution modelling and citizen science data. Location: North America. Time period: 2000s. Major taxa studied: Parulidae. Methods: We used Bayesian additive regression trees (BARTs) to predict warbler species distributions in North America based on observations recorded in the eBird database. We then calculated LCBD indices for observed and predicted data and examined the site-wise difference using direct comparison, a spatial autocorrelation test, and generalized linear regression. We also investigated the relationship between LCBD values and species richness in different regions and at various spatial extents and the effect of the proportion of rare species on the relationship. Results: Our results showed that the relationship between richness and LCBD values varies according to the region and the spatial extent at which it is applied. It is also affected by the proportion of rare species in the community. Species distribution models provided highly correlated estimates with observed data, although spatially autocorrelated. Main conclusions: Sites identified as unique over broad spatial extents may vary according to the regional richness, total extent size, and the proportion of rare species. Species distribution modelling can be used to predict ecological uniqueness over broad spatial extents, which could help identify beta diversity hotspots and important targets for conservation purposes in unsampled locations.


Sign in / Sign up

Export Citation Format

Share Document