scholarly journals Spatial validation reveals poor predictive performance of large-scale ecological mapping models

2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Pierre Ploton ◽  
Frédéric Mortier ◽  
Maxime Réjou-Méchain ◽  
Nicolas Barbier ◽  
Nicolas Picard ◽  
...  

Abstract Mapping aboveground forest biomass is central for assessing the global carbon balance. However, current large-scale maps show strong disparities, despite good validation statistics of their underlying models. Here, we attribute this contradiction to a flaw in the validation methods, which ignore spatial autocorrelation (SAC) in data, leading to overoptimistic assessment of model predictive power. To illustrate this issue, we reproduce the approach of large-scale mapping studies using a massive forest inventory dataset of 11.8 million trees in central Africa to train and validate a random forest model based on multispectral and environmental variables. A standard nonspatial validation method suggests that the model predicts more than half of the forest biomass variation, while spatial validation methods accounting for SAC reveal quasi-null predictive power. This study underscores how a common practice in big data mapping studies shows an apparent high predictive power, even when predictors have poor relationships with the ecological variable of interest, thus possibly leading to erroneous maps and interpretations.

2019 ◽  
Vol 11 (23) ◽  
pp. 2744 ◽  
Author(s):  
Yuzhen Zhang ◽  
Shunlin Liang ◽  
Lu Yang

Forest biomass quantification is essential to the global carbon cycle and climate studies. Many studies have estimated forest biomass from a variety of data sources, and consequently generated some regional and global maps. However, these forest biomass maps are not well known and evaluated. In this paper, we reviewed an extensive list of currently available forest biomass maps. For each map, we briefly introduced the data sources, the algorithms used, and the associated uncertainties. Large-scale biomass datasets were compared across Europe, the conterminous United States, Southeast Asia, tropical Africa and South America. Results showed that these forest biomass datasets were almost entirely inconsistent, particularly in woody savannas and savannas across these regions. The uncertainties in biomass maps could be from a variety of sources including the chosen allometric equations used to calculate field data, the choice and quality of remotely sensed data, as well as the algorithms to map forest biomass or extrapolation techniques, but these uncertainties have not been fully quantified. We suggested the future directions for generating more accurate large-scale forest biomass maps should concentrate on the compilation of field biomass data, novel approaches of forest biomass mapping, and comprehensively addressing the accuracy of generated biomass maps.


2014 ◽  
Vol 44 (6) ◽  
pp. 648-656 ◽  
Author(s):  
Sergio de-Miguel ◽  
Lauri Mehtätalo ◽  
Ali Durkaya

Large-scale prediction of forest biomass is of interest for forest science, ecology, and issues related to climate change. Previous research has attempted to provide allometric models suitable for large-scale biomass prediction using different methods. We present a new approach for meta-analysis of existing biomass equations using mixed-effects modelling on simulated data. The resulting generalized meta-models can be calibrated for local conditions. This meta-analytical approach allows for directly benefiting from previous research to minimize data collection and properly take into account the unknown differences between different locations within large areas. The approach is demonstrated by developing pan-Mediterranean mixed-effects meta-models for Pinus brutia Ten. The fixed part of the meta-models enables sound aboveground biomass predictions throughout practically the full native range of the species. Significant improvement in the predictive performance can be further gained by using small local datasets for model calibration. The calibration procedure for location-specific biomass prediction is based on best linear unbiased predictor of random effects. The predictive performance of the meta-models under different sampling strategies is validated in an independent dataset. The results show that mixed-effects meta-models may enable accurate and robust large-scale biomass predictions. Calibration for specific locations based on minimal data collection effort performs better than fitting location-specific equations based on much larger samples. The advantages of mixed-effects meta-models are of interest not only for further biomass-related research and applications, but also for other modelling disciplines within forest science.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e6857 ◽  
Author(s):  
Pierre Mahé ◽  
Meriem El Azami ◽  
Philippine Barlas ◽  
Maud Tournoud

Recent years saw a growing interest in predicting antibiotic resistance from whole-genome sequencing data, with promising results obtained for Staphylococcus aureus and Mycobacterium tuberculosis. In this work, we gathered 6,574 sequencing read datasets of M. tuberculosis public genomes with associated antibiotic resistance profiles for both first and second-line antibiotics. We performed a systematic evaluation of TBProfiler and Mykrobe, two widely recognized softwares allowing to predict resistance in M. tuberculosis. The size of the dataset allowed us to obtain confident estimations of their overall predictive performance, to assess precisely the individual predictive power of the markers they rely on, and to study in addition how these softwares behave across the major M. tuberculosis lineages. While this study confirmed the overall good performance of these tools, it revealed that an important fraction of the catalog of mutations they embed is of limited predictive power. It also revealed that these tools offer different sensitivity/specificity trade-offs, which is mainly due to the different sets of mutation they embed but also to their underlying genotyping pipelines. More importantly, it showed that their level of predictive performance varies greatly across lineages for some antibiotics, therefore suggesting that the predictions made by these softwares should be deemed more or less confident depending on the lineage inferred and the predictive performance of the marker(s) actually detected. Finally, we evaluated the relevance of machine learning approaches operating from the set of markers detected by these softwares and show that they present an attractive alternative strategy, allowing to reach better performance for several drugs while significantly reducing the number of candidate mutations to consider.


2021 ◽  
Vol 13 (11) ◽  
pp. 2074
Author(s):  
Ryan R. Reisinger ◽  
Ari S. Friedlaender ◽  
Alexandre N. Zerbini ◽  
Daniel M. Palacios ◽  
Virginia Andrews-Goff ◽  
...  

Machine learning algorithms are often used to model and predict animal habitat selection—the relationships between animal occurrences and habitat characteristics. For broadly distributed species, habitat selection often varies among populations and regions; thus, it would seem preferable to fit region- or population-specific models of habitat selection for more accurate inference and prediction, rather than fitting large-scale models using pooled data. However, where the aim is to make range-wide predictions, including areas for which there are no existing data or models of habitat selection, how can regional models best be combined? We propose that ensemble approaches commonly used to combine different algorithms for a single region can be reframed, treating regional habitat selection models as the candidate models. By doing so, we can incorporate regional variation when fitting predictive models of animal habitat selection across large ranges. We test this approach using satellite telemetry data from 168 humpback whales across five geographic regions in the Southern Ocean. Using random forests, we fitted a large-scale model relating humpback whale locations, versus background locations, to 10 environmental covariates, and made a circumpolar prediction of humpback whale habitat selection. We also fitted five regional models, the predictions of which we used as input features for four ensemble approaches: an unweighted ensemble, an ensemble weighted by environmental similarity in each cell, stacked generalization, and a hybrid approach wherein the environmental covariates and regional predictions were used as input features in a new model. We tested the predictive performance of these approaches on an independent validation dataset of humpback whale sightings and whaling catches. These multiregional ensemble approaches resulted in models with higher predictive performance than the circumpolar naive model. These approaches can be used to incorporate regional variation in animal habitat selection when fitting range-wide predictive models using machine learning algorithms. This can yield more accurate predictions across regions or populations of animals that may show variation in habitat selection.


Insects ◽  
2021 ◽  
Vol 12 (3) ◽  
pp. 269 ◽  
Author(s):  
Andrew Kalyebi ◽  
Sarina Macfadyen ◽  
Andrew Hulthen ◽  
Patrick Ocitti ◽  
Frances Jacomb ◽  
...  

Cassava (Manihot esculenta Crantz), an important commercial and food security crop in East and Central Africa, continues to be adversely affected by the whitefly Bemisia tabaci. In Uganda, changes in smallholder farming landscapes due to crop rotations can impact pest populations but how these changes affect pest outbreak risk is unknown. We investigated how seasonal changes in land-use have affected B. tabaci population dynamics and its parasitoids. We used a large-scale field experiment to standardize the focal field in terms of cassava age and cultivar, then measured how Bemisia populations responded to surrounding land-use change. Bemisia tabaci Sub-Saharan Africa 1 (SSA1) was identified using molecular diagnostics as the most prevalent species and the same species was also found on surrounding soybean, groundnut, and sesame crops. We found that an increase in the area of cassava in the 3–7-month age range in the landscape resulted in an increase in the abundance of the B. tabaci SSA1 on cassava. There was a negative relationship between the extent of non-crop vegetation in the landscape and parasitism of nymphs suggesting that these parasitoids do not rely on resources in the non-crop patches. The highest abundance of B. tabaci SSA1 nymphs in cassava fields occurred at times when landscapes had large areas of weeds, low to moderate areas of maize, and low areas of banana. Our results can guide the development of land-use strategies that smallholder farmers can employ to manage these pests.


2014 ◽  
Vol 15 (4) ◽  
pp. 820-848
Author(s):  
Pierre-Yves Donzé

Whereas the globalization of medicine since the middle of the 19th century has primarily been approached as the sociopolitical and cultural outcome of imperialism, this article argues that Western big business also played a major role through the worldwide export of standardized medical technologies. It focuses on the expansion of Siemens on the X-ray equipment market in non-Western countries during the first half of the twentieth century. This German multinational enterprise experienced slight growth from the mid-1920s onwards but relied mainly on two markets (Argentina and Brazil). It specialized in providing large-scale equipment to a few urban hospitals and engaged during the 1930s in large-scale hospital development together with local authorities and international organizations in various countries (China, Peru, and Central Africa). However, Siemens had great difficulty in expanding its business to include private doctors and inland outlets, where it faced intense competition from other Western X-ray producers. This paper emphasizes that this shortcoming stemmed from a direct application of the European strategy (high-quality, expensive equipment for hospitals) to non-Western markets, where health systems differed.


2013 ◽  
Vol 680 ◽  
pp. 534-539
Author(s):  
Wei Feng Ma

With the rapid expansion of the campus scale and the increasing of the geographically dispersed campus, how to adopt new theory, new method and new technology to realize the equipment optimized assignment and the information management is a new research challenge. It is the key to safeguard the national fund to use reasonably, and to speed up the development of education healthily. Through analyzing the domestic and foreign related research works, the paper proposed that it can take use of the spatial data expression and analysis with Geographic Information System (GIS) to realize the large-scale and inter-campuses equipment optimized assignment and information management. It discussed the mathematics model and the system architecture. Moreover, the paper described the key implementation technology in great detail such as spatial data mapping with MapInfo professional 9 and the development of WebGIS functions with MapXtreme. The results show that the solution is feasible and effective.


GCB Bioenergy ◽  
2012 ◽  
Vol 4 (6) ◽  
pp. 611-616 ◽  
Author(s):  
Ernst-Detlef Schulze ◽  
Christian Körner ◽  
Beverly E. Law ◽  
Helmut Haberl ◽  
Sebastiaan Luyssaert

2020 ◽  
Author(s):  
Ramon Viñas ◽  
Tiago Azevedo ◽  
Eric R. Gamazon ◽  
Pietro Liò

AbstractA question of fundamental biological significance is to what extent the expression of a subset of genes can be used to recover the full transcriptome, with important implications for biological discovery and clinical application. To address this challenge, we present GAIN-GTEx, a method for gene expression imputation based on Generative Adversarial Imputation Networks. In order to increase the applicability of our approach, we leverage data from GTEx v8, a reference resource that has generated a comprehensive collection of transcriptomes from a diverse set of human tissues. We compare our model to several standard and state-of-the-art imputation methods and show that GAIN-GTEx is significantly superior in terms of predictive performance and runtime. Furthermore, our results indicate strong generalisation on RNA-Seq data from 3 cancer types across varying levels of missingness. Our work can facilitate a cost-effective integration of large-scale RNA biorepositories into genomic studies of disease, with high applicability across diverse tissue types.


2021 ◽  
Author(s):  
Hyeyoung Koh ◽  
Hannah Beth Blum

This study presents a machine learning-based approach for sensitivity analysis to examine how parameters affect a given structural response while accounting for uncertainty. Reliability-based sensitivity analysis involves repeated evaluations of the performance function incorporating uncertainties to estimate the influence of a model parameter, which can lead to prohibitive computational costs. This challenge is exacerbated for large-scale engineering problems which often carry a large quantity of uncertain parameters. The proposed approach is based on feature selection algorithms that rank feature importance and remove redundant predictors during model development which improve model generality and training performance by focusing only on the significant features. The approach allows performing sensitivity analysis of structural systems by providing feature rankings with reduced computational effort. The proposed approach is demonstrated with two designs of a two-bay, two-story planar steel frame with different failure modes: inelastic instability of a single member and progressive yielding. The feature variables in the data are uncertainties including material yield strength, Young’s modulus, frame sway imperfection, and residual stress. The Monte Carlo sampling method is utilized to generate random realizations of the frames from published distributions of the feature parameters, and the response variable is the frame ultimate strength obtained from finite element analyses. Decision trees are trained to identify important features. Feature rankings are derived by four feature selection techniques including impurity-based, permutation, SHAP, and Spearman's correlation. Predictive performance of the model including the important features are discussed using the evaluation metric for imbalanced datasets, Matthews correlation coefficient. Finally, the results are compared with those from reliability-based sensitivity analysis on the same example frames to show the validity of the feature selection approach. As the proposed machine learning-based approach produces the same results as the reliability-based sensitivity analysis with improved computational efficiency and accuracy, it could be extended to other structural systems.


Sign in / Sign up

Export Citation Format

Share Document