Validation of ground truth fire debris classification by supervised machine learning

2021 ◽  
Vol 26 ◽  
pp. 100358
Author(s):  
Michael E. Sigman ◽  
Mary R. Williams ◽  
Nicholas Thurn ◽  
Taylor Wood
SPE Journal ◽  
2020 ◽  
Vol 25 (05) ◽  
pp. 2778-2800 ◽  
Author(s):  
Harpreet Singh ◽  
Yongkoo Seol ◽  
Evgeniy M. Myshakin

Summary The application of specialized machine learning (ML) in petroleum engineering and geoscience is increasingly gaining attention in the development of rapid and efficient methods as a substitute to existing methods. Existing ML-based studies that use well logs contain two inherent limitations. The first limitation is that they start with one predefined combination of well logs that by default assumes that the chosen combination of well logs is poised to give the best outcome in terms of prediction, although the variation in accuracy obtained through different combinations of well logs can be substantial. The second limitation is that most studies apply unsupervised learning (UL) for classification problems, but it underperforms by a substantial margin compared with nearly all the supervised learning (SL) algorithms. In this context, this study investigates a variety of UL and SL ML algorithms applied on multiple well-log combinations (WLCs) to automate the traditional workflow of well-log processing and classification, including an optimization step to achieve the best output. The workflow begins by processing the measured well logs, which includes developing different combinations of measured well logs and their physics-motivated augmentations, followed by removal of potential outliers from the input WLCs. Reservoir lithology with four different rock types is investigated using eight UL and seven SL algorithms in two different case studies. The results from the two case studies are used to identify the optimal set of well logs and the ML algorithm that gives the best matching reservoir lithology to its ground truth. The workflow is demonstrated using two wells from two different reservoirs on Alaska North Slope to distinguish four different rock types along the well (brine-dominated sand, hydrate-dominated sand, shale, and others/mixed compositions). The results show that the automated workflow investigated in this study can discover the ground truth for the lithology with up to 80% accuracy with UL and up to 90% accuracy with SL, using six routine well logs [vp, vs, ρb, ϕneut, Rt, gamma ray (GR)], which is a significant improvement compared with the accuracy reported in the current state of the art, which is less than 70%.


PLoS ONE ◽  
2020 ◽  
Vol 15 (12) ◽  
pp. e0241696
Author(s):  
Xubo Leng ◽  
Margot Wohl ◽  
Kenichi Ishii ◽  
Pavan Nayak ◽  
Kenta Asahina

Automated quantification of behavior is increasingly prevalent in neuroscience research. Human judgments can influence machine-learning-based behavior classification at multiple steps in the process, for both supervised and unsupervised approaches. Such steps include the design of the algorithm for machine learning, the methods used for animal tracking, the choice of training images, and the benchmarking of classification outcomes. However, how these design choices contribute to the interpretation of automated behavioral classifications has not been extensively characterized. Here, we quantify the effects of experimenter choices on the outputs of automated classifiers of Drosophila social behaviors. Drosophila behaviors contain a considerable degree of variability, which was reflected in the confidence levels associated with both human and computer classifications. We found that a diversity of sex combinations and tracking features was important for robust performance of the automated classifiers. In particular, features concerning the relative position of flies contained useful information for training a machine-learning algorithm. These observations shed light on the importance of human influence on tracking algorithms, the selection of training images, and the quality of annotated sample images used to benchmark the performance of a classifier (the ‘ground truth’). Evaluation of these factors is necessary for researchers to accurately interpret behavioral data quantified by a machine-learning algorithm and to further improve automated classifications.


2021 ◽  
pp. 1-32
Author(s):  
R. Stuart Geiger ◽  
Dominique Cope ◽  
Jamie Ip ◽  
Marsha Lotosh ◽  
Aayush Shah ◽  
...  

Abstract Supervised machine learning, in which models are automatically derived from labeled training data, is only as good as the quality of that data. This study builds on prior work that investigated to what extent ‘best practices’ around labeling training data were followed in applied ML publications within a single domain (social media platforms). In this paper, we expand by studying publications that apply supervised ML in a far broader spectrum of disciplines, focusing on human-labeled data. We report to what extent a random sample of ML application papers across disciplines give specific details about whether best practices were followed, while acknowledging that a greater range of application fields necessarily produces greater diversity of labeling and annotation methods. Because much of machine learning research and education only focuses on what is done once a “ground truth” or “gold standard” of training data is available, it is especially relevant to discuss issues around the equally-important aspect of whether such data is reliable in the first place. This determination becomes increasingly complex when applied to a variety of specialized fields, as labeling can range from a task requiring little-to-no background knowledge to one that must be performed by someone with career expertise. Peer Review https://publons.com/publon/10.1162/qss_a_00144


2021 ◽  
Vol 7 ◽  
pp. e623
Author(s):  
Davide Chicco ◽  
Matthijs J. Warrens ◽  
Giuseppe Jurman

Regression analysis makes up a large part of supervised machine learning, and consists of the prediction of a continuous independent target from a set of other predictor variables. The difference between binary classification and regression is in the target range: in binary classification, the target can have only two values (usually encoded as 0 and 1), while in regression the target can have multiple values. Even if regression analysis has been employed in a huge number of machine learning studies, no consensus has been reached on a single, unified, standard metric to assess the results of the regression itself. Many studies employ the mean square error (MSE) and its rooted variant (RMSE), or the mean absolute error (MAE) and its percentage variant (MAPE). Although useful, these rates share a common drawback: since their values can range between zero and +infinity, a single value of them does not say much about the performance of the regression with respect to the distribution of the ground truth elements. In this study, we focus on two rates that actually generate a high score only if the majority of the elements of a ground truth group has been correctly predicted: the coefficient of determination (also known as R-squared or R2) and the symmetric mean absolute percentage error (SMAPE). After showing their mathematical properties, we report a comparison between R2 and SMAPE in several use cases and in two real medical scenarios. Our results demonstrate that the coefficient of determination (R-squared) is more informative and truthful than SMAPE, and does not have the interpretability limitations of MSE, RMSE, MAE and MAPE. We therefore suggest the usage of R-squared as standard metric to evaluate regression analyses in any scientific domain.


2020 ◽  
Vol 154 (Supplement_1) ◽  
pp. S19-S19
Author(s):  
Bradley Drumheller ◽  
Mohamed Amgad ◽  
Ahmed Aljudi ◽  
Elliott Burdette ◽  
Leila Kutob ◽  
...  

Abstract Newer data suggest that double expression of MYC and BCL2 proteins (DE) evaluated by quantitative immunohistochemistry (qIHC) may be a powerful marker of worse prognosis in diffuse large B cell lymphoma (DLBCL). Testing for DE status, defined as >40% MYC+ and >50% BCL2+ tumor cells, is recommended in the WHO 2016 classification and clinical trials are using DE scoring to assign therapy arms. However, other data suggest that significant variability in manual DE scoring diminishes the predictive value. Error sources include high interobserver variability (IOV) associated with field choice, discrimination of tumor immunoreactivity from adjacent non-neoplastic cells, cell-to-cell variability in staining intensity, crush artifacts and necrosis. Thus, there is a need for standardized, reproducible approaches for DE scoring by qIHC. To address this need, we have begun developing a novel machine-learning approach to analyze IHC digital pathology whole-slide images, focusing initially on MYC IHC. Digital whole-slide images (400x) of 22 DLBCL cases were uploaded to a web-based annotation platform. Using all cases, one annotator created 138 regions of interest (ROIs) containing approximately 200 nucleated cells and representing a variety of tissue types. Eight pathologists were assigned the same 10 ROIs in which to annotate all nuclei from which ground-truth seed nucleus labels (location, classification) were created for a validation set. Nuclei were classified as “tumor-positive”, “tumor-negative”, “non-tumor-positive”, “non-tumor-negative”, or “unknown”. This generated a set of 15,792 annotations with 1974 +/- 272 (Avg+/-STD) labels/annotator. Agglomerative hierarchical clustering afforded the creation of 2299 ground-truth seed locations. A maximum diameter of 3 mm/cluster was set by visual inspection of annotations. Of these seed locations, 1041 (45%) were detected by 8/8 annotators and, on average, 6/8 agreed on class. 302 +/- 72 (Avg+/-STD) “tumor positive” labels per annotator generated 382 seeds locations, 178 (47%) of which were detected by 8/8 annotators, with an average of 7.5/8 agreeing on class. 286 +/- 168 (Avg+/-STD) “tumor-negative” labels per annotator generated 336 seeds, 195 (58%) of which were detected by 8/8 annotators, with an average of 5/8 agreeing on class. Among all classes, the “tumor-positive” label displayed best overall label agreement whereas the “tumor-negative“ label yielded similar localization rate, but lower class agreement. These promising early findings provide a novel basis for quantifying IOV and utilizing multi-observer agreement to create a ground-truth validation set for a supervised machine learning approach to qIHC. Future efforts will make use of these data to optimize the validation set by rationally determining the number of additional annotations required, optimizing the number of annotators per ROI required, devising an adaptive approach to nuclear clustering based on nuclear density, and utilizing the additional 31,422 annotations in hand from all annotators as a robust algorithm training set.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Hemaxi Narotamo ◽  
Maria Sofia Fernandes ◽  
Ana Margarida Moreira ◽  
Soraia Melo ◽  
Raquel Seruca ◽  
...  

AbstractThe cell nucleus is a tightly regulated organelle and its architectural structure is dynamically orchestrated to maintain normal cell function. Indeed, fluctuations in nuclear size and shape are known to occur during the cell cycle and alterations in nuclear morphology are also hallmarks of many diseases including cancer. Regrettably, automated reliable tools for cell cycle staging at single cell level using in situ images are still limited. It is therefore urgent to establish accurate strategies combining bioimaging with high-content image analysis for a bona fide classification. In this study we developed a supervised machine learning method for interphase cell cycle staging of individual adherent cells using in situ fluorescence images of nuclei stained with DAPI. A Support Vector Machine (SVM) classifier operated over normalized nuclear features using more than 3500 DAPI stained nuclei. Molecular ground truth labels were obtained by automatic image processing using fluorescent ubiquitination-based cell cycle indicator (Fucci) technology. An average F1-Score of 87.7% was achieved with this framework. Furthermore, the method was validated on distinct cell types reaching recall values higher than 89%. Our method is a robust approach to identify cells in G1 or S/G2 at the individual level, with implications in research and clinical applications.


2021 ◽  
Author(s):  
Jason Daniel Marshall ◽  
Francis J. Yammarino ◽  
Srikanth Parameswaran ◽  
Minyoung Cheong

Increased computing power and greater access to online data have led to rapid growth in the use of computer-aided text analysis (CATA) and machine learning methods. Using “big data”, researchers have not only advanced new streams of research, but also new research methodologies. Noting this trend and simultaneously recognizing the value of traditional research methods, we lay out a methodology that bridges the gap between old and new approaches to operationalize old constructs in new ways. With a combination of web scraping, CATA, and supervised machine learning, using ground truth data, we train a model to predict CIP (Charismatic-Ideological-Pragmatic) categorical leadership styles from running text. To illustrate this method, we apply the model to classify U.S. state governors’ COVID-19 press briefings according to their CIP leadership style. In addition, we demonstrate content and convergent validity of the method.


Author(s):  
Michael Smith ◽  
Stefan Cronjaeger ◽  
Navid Ershad ◽  
Randy Nickle ◽  
Matthias Peussner

Effective integrity management of a corroded pipeline requires a significant quantity of data. Common data sources include in-line inspection (ILI), process monitoring, or external surveys. The key challenge for an integrity engineer is to leverage the data to understand the level of corrosion activity along the pipeline route, and make optimal decisions on future repair, mitigation and monitoring. This practice of gaining business insights from historical datasets is often referred to as ‘data analytics’. In this paper, a single application of data analytics is investigated — that of improving the estimation of corrosion growth rates (CGRs) from ILI data. When two or more sets of ILI data are available for the same pipeline, a process known as ‘box matching’ is typically used to estimate CGRs. Corresponding feature ‘boxes’ are linked between the two ILIs and a population of CGRs is generated based on changes in reported depth. While this is a well-established technique, there are uncertainties related to ILI sizing, detection limitations, and data censoring. Great care is required if these uncertain CGRs are used to predict future pipeline integrity. A superior technique is ‘signal matching’, which involves the direct alignment, normalization and comparison of magnetic flux leakage (MFL) signals. This delivers CGRs with a higher accuracy than box matching. However, signal matching is not always feasible (e.g. when conducting a cross-vendor or cross-technology comparison). When box matching is the only option for a pipeline, there is great value in understanding how the box matching CGRs can be improved in order to more closely resemble those from signal matching. This limits the extent to which uncertainties are propagated into any subsequent analyses, such as repair plan generation or remaining life assessment. Given their relative accuracy, signal matching CGRs can be utilized as a ‘ground truth’ against which box matching results can be validated. This is analogous to the ILI verification process, where in-field measurements (e.g. with laser scan) are used to validate feature depths reported by an ILI. By extension, a model to estimate CGRs following a box matching analysis can be trained with CGRs from a signal matching analysis, using supervised machine learning. The outcome is an enhanced output from box matching, which more closely resembles the true state of corrosion growth in a pipeline. Through testing on real pipeline data, it is shown that this new technique has the potential to improve pipeline integrity management decisions and support economical, safe and compliant operation.


2021 ◽  
Vol 11 (3-4) ◽  
pp. 1-38
Author(s):  
Rita Sevastjanova ◽  
Wolfgang Jentner ◽  
Fabian Sperrle ◽  
Rebecca Kehlbeck ◽  
Jürgen Bernard ◽  
...  

Linguistic insight in the form of high-level relationships and rules in text builds the basis of our understanding of language. However, the data-driven generation of such structures often lacks labeled resources that can be used as training data for supervised machine learning. The creation of such ground-truth data is a time-consuming process that often requires domain expertise to resolve text ambiguities and characterize linguistic phenomena. Furthermore, the creation and refinement of machine learning models is often challenging for linguists as the models are often complex, in-transparent, and difficult to understand. To tackle these challenges, we present a visual analytics technique for interactive data labeling that applies concepts from gamification and explainable Artificial Intelligence (XAI) to support complex classification tasks. The visual-interactive labeling interface promotes the creation of effective training data. Visual explanations of learned rules unveil the decisions of the machine learning model and support iterative and interactive optimization. The gamification-inspired design guides the user through the labeling process and provides feedback on the model performance. As an instance of the proposed technique, we present QuestionComb , a workspace tailored to the task of question classification (i.e., in information-seeking vs. non-information-seeking questions). Our evaluation studies confirm that gamification concepts are beneficial to engage users through continuous feedback, offering an effective visual analytics technique when combined with active learning and XAI.


2020 ◽  
Vol 14 (2) ◽  
pp. 140-159
Author(s):  
Anthony-Paul Cooper ◽  
Emmanuel Awuni Kolog ◽  
Erkki Sutinen

This article builds on previous research around the exploration of the content of church-related tweets. It does so by exploring whether the qualitative thematic coding of such tweets can, in part, be automated by the use of machine learning. It compares three supervised machine learning algorithms to understand how useful each algorithm is at a classification task, based on a dataset of human-coded church-related tweets. The study finds that one such algorithm, Naïve-Bayes, performs better than the other algorithms considered, returning Precision, Recall and F-measure values which each exceed an acceptable threshold of 70%. This has far-reaching consequences at a time where the high volume of social media data, in this case, Twitter data, means that the resource-intensity of manual coding approaches can act as a barrier to understanding how the online community interacts with, and talks about, church. The findings presented in this article offer a way forward for scholars of digital theology to better understand the content of online church discourse.


Sign in / Sign up

Export Citation Format

Share Document