scholarly journals Accounting for Training Data Error in Machine Learning Applied to Earth Observations

2020 ◽  
Vol 12 (6) ◽  
pp. 1034 ◽  
Author(s):  
Arthur Elmes ◽  
Hamed Alemohammad ◽  
Ryan Avery ◽  
Kelly Caylor ◽  
J. Eastman ◽  
...  

Remote sensing, or Earth Observation (EO), is increasingly used to understand Earth system dynamics and create continuous and categorical maps of biophysical properties and land cover, especially based on recent advances in machine learning (ML). ML models typically require large, spatially explicit training datasets to make accurate predictions. Training data (TD) are typically generated by digitizing polygons on high spatial-resolution imagery, by collecting in situ data, or by using pre-existing datasets. TD are often assumed to accurately represent the truth, but in practice almost always have error, stemming from (1) sample design, and (2) sample collection errors. The latter is particularly relevant for image-interpreted TD, an increasingly commonly used method due to its practicality and the increasing training sample size requirements of modern ML algorithms. TD errors can cause substantial errors in the maps created using ML algorithms, which may impact map use and interpretation. Despite these potential errors and their real-world consequences for map-based decisions, TD error is often not accounted for or reported in EO research. Here we review the current practices for collecting and handling TD. We identify the sources of TD error, and illustrate their impacts using several case studies representing different EO applications (infrastructure mapping, global surface flux estimates, and agricultural monitoring), and provide guidelines for minimizing and accounting for TD errors. To harmonize terminology, we distinguish TD from three other classes of data that should be used to create and assess ML models: training reference data, used to assess the quality of TD during data generation; validation data, used to iteratively improve models; and map reference data, used only for final accuracy assessment. We focus primarily on TD, but our advice is generally applicable to all four classes, and we ground our review in established best practices for map accuracy assessment literature. EO researchers should start by determining the tolerable levels of map error and appropriate error metrics. Next, TD error should be minimized during sample design by choosing a representative spatio-temporal collection strategy, by using spatially and temporally relevant imagery and ancillary data sources during TD creation, and by selecting a set of legend definitions supported by the data. Furthermore, TD error can be minimized during the collection of individual samples by using consensus-based collection strategies, by directly comparing interpreted training observations against expert-generated training reference data to derive TD error metrics, and by providing image interpreters with thorough application-specific training. We strongly advise that TD error is incorporated in model outputs, either directly in bias and variance estimates or, at a minimum, by documenting the sources and implications of error. TD should be fully documented and made available via an open TD repository, allowing others to replicate and assess its use. To guide researchers in this process, we propose three tiers of TD error accounting standards. Finally, we advise researchers to clearly communicate the magnitude and impacts of TD error on map outputs, with specific consideration given to the likely map audience.

Author(s):  
A. Schlichting ◽  
C. Brenner

LiDAR sensors are proven sensors for accurate vehicle localization. Instead of detecting and matching features in the LiDAR data, we want to use the entire information provided by the scanners. As dynamic objects, like cars, pedestrians or even construction sites could lead to wrong localization results, we use a change detection algorithm to detect these objects in the reference data. If an object occurs in a certain number of measurements at the same position, we mark it and every containing point as static. In the next step, we merge the data of the single measurement epochs to one reference dataset, whereby we only use static points. Further, we also use a classification algorithm to detect trees. <br><br> For the online localization of the vehicle, we use simulated data of a vertical aligned automotive LiDAR sensor. As we only want to use static objects in this case as well, we use a random forest classifier to detect dynamic scan points online. Since the automotive data is derived from the LiDAR Mobile Mapping System, we are able to use the labelled objects from the reference data generation step to create the training data and further to detect dynamic objects online. The localization then can be done by a point to image correlation method using only static objects. We achieved a localization standard deviation of about 5 cm (position) and 0.06° (heading), and were able to successfully localize the vehicle in about 93 % of the cases along a trajectory of 13 km in Hannover, Germany.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Mei Lu ◽  
Kuan-Han Hank Wu ◽  
Sheri Trudeau ◽  
Margaret Jiang ◽  
Joe Zhao ◽  
...  

AbstractTumor mutational burden (TMB) is associated with clinical response to immunotherapy, but application has been limited to a subset of cancer patients. We hypothesized that advanced machine-learning and proper modeling could identify mutations that classify patients most likely to derive clinical benefits. Training data: Two sets of public whole-exome sequencing (WES) data for metastatic melanoma. Validation data: One set of public non-small cell lung cancer (NSCLC) data. Least Absolute Shrinkage and Selection Operator (LASSO) machine-learning and proper modeling were used to identify a set of mutations (biomarker) with maximum predictive accuracy (measured by AUROC). Kaplan–Meier and log-rank methods were used to test prediction of overall survival. The initial model considered 2139 mutations. After pruning, 161 mutations (11%) were retained. An optimal threshold of 0.41 divided patients into high-weight (HW) or low-weight (LW) TMB groups. Classification for HW-TMB was 100% (AUROC = 1.0) on melanoma learning/testing data; HW-TMB was a prognostic marker for longer overall survival. In validation data, HW-TMB was associated with survival (p = 0.0057) and predicted 6-month clinical benefit (AUROC = 0.83) in NSCLC. In conclusion, we developed and validated a 161-mutation genomic signature with “outstanding” 100% accuracy to classify melanoma patients by likelihood of response to immunotherapy. This biomarker can be adapted for clinical practice to improve cancer treatment and care.


2020 ◽  
Author(s):  
Patrick Eriksson ◽  
Simon Pfreundschuh ◽  
Teo Norrestad ◽  
Christian Kummerow

&lt;p&gt;A novel method for the estimation of surface precipitation using passive observations from the GPM constellation is proposed. The method, which makes use of quantile regression neural networks (QRNNs), is shown to provide a more accurate representation of retrieval uncertainties, high processing speed and simplifies the integration of ancillary data into the retrieval. With that, it overcomes limitations of traditionally used methods, such as Monte Carlo integration as well as standard usage of machine learning.&lt;/p&gt;&lt;p&gt;The bulk of precipitation estimates provided by the Global Precipitation Measurement mission (GPM) is based on passive microwave observations. These data are produced by the GPROF algorithm, which applies a Bayesian approach denoted as Monte Carlo integration (MCI). In this work, we investigate the potential of using QRNNs as an alternative to MCI by assessing the performance of both methods using identical input databases.&lt;/p&gt;&lt;p&gt;The methods agree well regarding point estimates, but QRNN provides better estimates of the retrieval uncertainty at the same time as reducing processing times by an order of magnitude. As QRNN gives more precise uncertainty estimates than MCI, it gives an improved basis for further processing of the data, such as identification of extreme precipitation and areal integration.&lt;/p&gt;&lt;p&gt;Results so far indicate that a single network can handle all data from a sensor, which is in contrast to MCI where observations over oceans and different land types have to be treated separately. Moreover, the flexibility of the machine-learning approach opens up opportunities for further improvements of the retrieval: ancillary information can be easily incorporated and QRNN can be applied on multiple footprints, to make better use of spatial information. The effects of these improvements are investigated on independent validation data from ground-based precipitation radars.&lt;/p&gt;&lt;p&gt;QRNN is here shown to be a highly interesting alternative for GPROF, but being a general approach it should be of equally high interest for other precipitation and clouds retrievals.&lt;/p&gt;


2017 ◽  
Vol 54 (2) ◽  
pp. 193-214 ◽  
Author(s):  
Michael Colaresi ◽  
Zuhaib Mahmood

Increasingly, scholars interested in understanding conflict processes have turned to evaluating out-of-sample forecasts to judge and compare the usefulness of their models. Research in this vein has made significant progress in identifying and avoiding the problem of overfitting sample data. Yet there has been less research providing strategies and tools to practically improve the out-of-sample performance of existing models and connect forecasting improvement to the goal of theory development in conflict studies. In this article, we fill this void by building on lessons from machine learning research. We highlight a set of iterative tasks, which David Blei terms ‘Box’s loop’, that can be summarized as build, compute, critique, and think. While the initial steps of Box’s loop will be familiar to researchers, the underutilized process of model criticism allows researchers to iteratively learn more useful representations of the data generation process from the discrepancies between the trained model and held-out data. To benefit from iterative model criticism, we advise researchers not only to split their available data into separate training and test sets, but also sample from their training data to allow for iterative model development, as is common in machine learning applications. Since practical tools for model criticism in particular are underdeveloped, we also provide software for new visualizations that build upon already existing tools. We use models of civil war onset to provide an illustration of how our machine learning-inspired research design can simultaneously improve out-of-sample forecasting performance and identify useful theoretical contributions. We believe these research strategies can complement existing designs to accelerate innovations across conflict processes.


2021 ◽  
Author(s):  
Vahid Gholami ◽  
Hossein Sahour

Abstract Groundwater drawdown is typically measured using pumping tests and field experiments; however, the traditional methods are time-consuming and costly when applied to extensive areas. In this research, a methodology is introduced based on artificial neural network (ANN)s and field measurements in an alluvial aquifer in the north of Iran. First, the annual drawdown as the output of the ANN models in 250 piezometric wells was measured, and the data were divided into three categories of training data, cross-validation data, and test data. Then, the effective factors in groundwater drawdown including groundwater depth, annual precipitation, annual evaporation, the transmissivity of the aquifer formation, elevation, distance from the sea, distance from water sources (recharge), population density, and groundwater extraction in the influence radius of each well (1000 m) were identified and used as the inputs of the ANN models. Several ANN methods were evaluated, and the predictions were compared with the observations. Results show that, the modular neural network (MNN) showed the highest performance in modeling groundwater drawdown ​​(Training R-sqr = 0.96, test R-sqr = 0.81). The optimum network was fitted to available input data to map the annual drawdown ​​across the entire aquifer. The accuracy assessment of the final map yielded favorable results (R-sqr = 0.8). The adopted methodology can be applied for the prediction of groundwater drawdown in the study site and similar settings elsewhere.


2021 ◽  
Vol 9 ◽  
Author(s):  
Shuai Chen ◽  
Zelang Miao ◽  
Lixin Wu ◽  
Anshu Zhang ◽  
Qirong Li ◽  
...  

Machine learning with extensively labeled training samples (e.g., positive and negative data) has received much attention in terms of addressing earthquake-induced landslide susceptibility mapping (LSM). However, the extensive amount of labeled training data required by machine learning, particularly the precise negative data (i.e., non-landslide area), cannot be easily and efficiently collected. To address this issue, this study presents a one-class-classifier-based negative data generation method for rapid earthquake-induced LSM. First, an incomplete landslide inventory (i.e., positive data) was produced with the aid of change detection using before-and-after satellite images and the Geographic Information System (GIS). Second, a one-class classifier was utilized to compute the probability of landslide occurrence based on the incomplete landslide inventory followed by the negative data generation from the low landslide susceptibility areas. Third, the positive data as well as the generated negative data (i.e., non-landslide) were compounded to train a traditional binary classifier to produce the final LSM. Experimental results suggest that the proposed method is capable of achieving a result that is comparable to methods using the complete landslide inventory, and it displays good correspondence with recent landslide events, making it a suitable method for rapid earthquake-induced LSM. The findings in this study would be useful in regional disaster planning and risk reduction.


Author(s):  
A. Schlichting ◽  
C. Brenner

LiDAR sensors are proven sensors for accurate vehicle localization. Instead of detecting and matching features in the LiDAR data, we want to use the entire information provided by the scanners. As dynamic objects, like cars, pedestrians or even construction sites could lead to wrong localization results, we use a change detection algorithm to detect these objects in the reference data. If an object occurs in a certain number of measurements at the same position, we mark it and every containing point as static. In the next step, we merge the data of the single measurement epochs to one reference dataset, whereby we only use static points. Further, we also use a classification algorithm to detect trees. &lt;br&gt;&lt;br&gt; For the online localization of the vehicle, we use simulated data of a vertical aligned automotive LiDAR sensor. As we only want to use static objects in this case as well, we use a random forest classifier to detect dynamic scan points online. Since the automotive data is derived from the LiDAR Mobile Mapping System, we are able to use the labelled objects from the reference data generation step to create the training data and further to detect dynamic objects online. The localization then can be done by a point to image correlation method using only static objects. We achieved a localization standard deviation of about 5 cm (position) and 0.06° (heading), and were able to successfully localize the vehicle in about 93 % of the cases along a trajectory of 13 km in Hannover, Germany.


2020 ◽  
Vol 12 (2) ◽  
pp. 257 ◽  
Author(s):  
Julien Radoux ◽  
François Waldner ◽  
Patrick Bogaert

Reference data collected to validate land-cover maps are generally considered free of errors. In practice, however, they contain errors despite best efforts to minimize them. These errors propagate during accuracy assessment and tweak the validation results. For photo-interpreted reference data, the two most widely studied sources of error are systematic incorrect labeling and vigilance drops. How estimation errors, i.e., errors intrinsic to the response design, affect the accuracy of reference data is far less understood. In this paper, we analyzed the impact of estimation errors for two types of classification systems (binary and multiclass) as well as for two common response designs (point-based and partition-based) with a range of sub-sample sizes. Our quantitative results indicate that labeling errors due to proportion estimations should not be neglected. They further confirm that the accuracy of response designs depends on the class proportions within the sampling units, with complex landscapes being more prone to errors. As a result, response designs where the number of sub-samples is predefined and fixed are inefficient. To guarantee high accuracy standards of validation data with minimum data collection effort, we propose a new method to adapt the number of sub-samples for each sample during the validation process. In practice, sub-samples are incrementally selected and labeled until the estimated class proportions reach the desired level of confidence. As a result, less effort is spent on labeling univocal cases and the spared effort can be allocated to more ambiguous cases. This increases the reliability of reference data and of subsequent accuracy assessment. Across our study site, we demonstrated that such an approach could reduce the labeling effort by 50% to 75%, with greater gains in homogeneous landscapes. We contend that adopting this optimization approach will not only increase the efficiency of reference data collection, but will also help deliver more reliable accuracy estimates to the user community.


Author(s):  
Adrian Richter ◽  
Julia Truthmann ◽  
Jean-François Chenot ◽  
Carsten Oliver Schmidt

(1) Background: Predicting chronic low back pain (LBP) is of clinical and economic interest as LBP leads to disabilities and health service utilization. This study aims to build a competitive and interpretable prediction model; (2) Methods: We used clinical and claims data of 3837 participants of a population-based cohort study to predict future LBP consultations (ICD-10: M40.XX-M54.XX). Best subset selection (BSS) was applied in repeated random samples of training data (75% of data); scoring rules were used to identify the best subset of predictors. The rediction accuracy of BSS was compared to randomforest and support vector machines (SVM) in the validation data (25% of data); (3) Results: The best subset comprised 16 out of 32 predictors. Previous occurrence of LBP increased the odds for future LBP consultations (odds ratio (OR) 6.91 [5.05; 9.45]), while concomitant diseases reduced the odds (1 vs. 0, OR: 0.74 [0.57; 0.98], >1 vs. 0: 0.37 [0.21; 0.67]). The area-under-curve (AUC) of BSS was acceptable (0.78 [0.74; 0.82]) and comparable with SVM (0.78 [0.74; 0.82]) and randomforest (0.79 [0.75; 0.83]); (4) Conclusions: Regarding prediction accuracy, BSS has been considered competitive with established machine-learning approaches. Nonetheless, considerable misclassification is inherent and further refinements are required to improve predictions.


2021 ◽  
Author(s):  
Dong Wang ◽  
JinBo Li ◽  
Yali Sun ◽  
Xianfei Ding ◽  
Xiaojuan Zhang ◽  
...  

Abstract Background: Although numerous studies are conducted every year on how to reduce the fatality rate associated with sepsis, it is still a major challenge faced by patients, clinicians, and medical systems worldwide. Early identification and prediction of patients at risk of sepsis and adverse outcomes associated with sepsis are critical. We aimed to develop an artificial intelligence algorithm that can predict sepsis early.Methods: This was a secondary analysis of an observational cohort study from the Intensive Care Unit of the First Affiliated Hospital of Zhengzhou University. A total of 4449 infected patients were randomly assigned to the development and validation data set at a ratio of 4:1. After extracting electronic medical record data, a set of 55 features (variables) was calculated and passed to the random forest algorithm to predict the onset of sepsis.Results: The pre-procedure clinical variables were used to build a prediction model from the training data set using the random forest machine learning method; a 5-fold cross-validation was used to evaluate the prediction accuracy of the model. Finally, we tested the model using the validation data set. The area obtained by the model under the receiver operating characteristic (ROC) curve (AUC) was 0.91, the sensitivity was 87%, and the specificity was 89%.Conclusions: The newly established model can accurately predict the onset of sepsis in ICU patients in clinical settings as early as possible. Prospective studies are necessary to determine the clinical utility of the proposed sepsis prediction model.


Sign in / Sign up

Export Citation Format

Share Document