scholarly journals Reporting of prognostic clinical prediction models based on machine learning methods in oncology needs to be improved

Author(s):  
Paula Dhiman ◽  
Jie Ma ◽  
Constanza Andaur Navarro ◽  
Benjamin Speich ◽  
Garrett Bullock ◽  
...  
2021 ◽  
Vol 10 (4) ◽  
pp. 199
Author(s):  
Francisco M. Bellas Aláez ◽  
Jesus M. Torres Palenzuela ◽  
Evangelos Spyrakos ◽  
Luis González Vilas

This work presents new prediction models based on recent developments in machine learning methods, such as Random Forest (RF) and AdaBoost, and compares them with more classical approaches, i.e., support vector machines (SVMs) and neural networks (NNs). The models predict Pseudo-nitzschia spp. blooms in the Galician Rias Baixas. This work builds on a previous study by the authors (doi.org/10.1016/j.pocean.2014.03.003) but uses an extended database (from 2002 to 2012) and new algorithms. Our results show that RF and AdaBoost provide better prediction results compared to SVMs and NNs, as they show improved performance metrics and a better balance between sensitivity and specificity. Classical machine learning approaches show higher sensitivities, but at a cost of lower specificity and higher percentages of false alarms (lower precision). These results seem to indicate a greater adaptation of new algorithms (RF and AdaBoost) to unbalanced datasets. Our models could be operationally implemented to establish a short-term prediction system.


2020 ◽  
Vol 4 (Supplement_1) ◽  
pp. 268-269
Author(s):  
Jaime Speiser ◽  
Kathryn Callahan ◽  
Jason Fanning ◽  
Thomas Gill ◽  
Anne Newman ◽  
...  

Abstract Advances in computational algorithms and the availability of large datasets with clinically relevant characteristics provide an opportunity to develop machine learning prediction models to aid in diagnosis, prognosis, and treatment of older adults. Some studies have employed machine learning methods for prediction modeling, but skepticism of these methods remains due to lack of reproducibility and difficulty understanding the complex algorithms behind models. We aim to provide an overview of two common machine learning methods: decision tree and random forest. We focus on these methods because they provide a high degree of interpretability. We discuss the underlying algorithms of decision tree and random forest methods and present a tutorial for developing prediction models for serious fall injury using data from the Lifestyle Interventions and Independence for Elders (LIFE) study. Decision tree is a machine learning method that produces a model resembling a flow chart. Random forest consists of a collection of many decision trees whose results are aggregated. In the tutorial example, we discuss evaluation metrics and interpretation for these models. Illustrated in data from the LIFE study, prediction models for serious fall injury were moderate at best (area under the receiver operating curve of 0.54 for decision tree and 0.66 for random forest). Machine learning methods may offer improved performance compared to traditional models for modeling outcomes in aging, but their use should be justified and output should be carefully described. Models should be assessed by clinical experts to ensure compatibility with clinical practice.


Minerals ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 601
Author(s):  
Nelson K. Dumakor-Dupey ◽  
Sampurna Arya ◽  
Ankit Jha

Rock fragmentation in mining and construction industries is widely achieved using drilling and blasting technique. The technique remains the most effective and efficient means of breaking down rock mass into smaller pieces. However, apart from its intended purpose of rock breakage, throw, and heave, blasting operations generate adverse impacts, such as ground vibration, airblast, flyrock, fumes, and noise, that have significant operational and environmental implications on mining activities. Consequently, blast impact studies are conducted to determine an optimum blast design that can maximize the desirable impacts and minimize the undesirable ones. To achieve this objective, several blast impact estimation empirical models have been developed. However, despite being the industry benchmark, empirical model results are based on a limited number of factors affecting the outcomes of a blast. As a result, modern-day researchers are employing machine learning (ML) techniques for blast impact prediction. The ML approach can incorporate several factors affecting the outcomes of a blast, and therefore, it is preferred over empirical and other statistical methods. This paper reviews the various blast impacts and their prediction models with a focus on empirical and machine learning methods. The details of the prediction methods for various blast impacts—including their applications, advantages, and limitations—are discussed. The literature reveals that the machine learning methods are better predictors compared to the empirical models. However, we observed that presently these ML models are mainly applied in academic research.


2021 ◽  
pp. postgradmedj-2020-139352
Author(s):  
Simon Allan ◽  
Raphael Olaiya ◽  
Rasan Burhan

Cardiovascular disease (CVD) is one of the leading causes of death across the world. CVD can lead to angina, heart attacks, heart failure, strokes, and eventually, death; among many other serious conditions. The early intervention with those at a higher risk of developing CVD, typically with statin treatment, leads to better health outcomes. For this reason, clinical prediction models (CPMs) have been developed to identify those at a high risk of developing CVD so that treatment can begin at an earlier stage. Currently, CPMs are built around statistical analysis of factors linked to developing CVD, such as body mass index and family history. The emerging field of machine learning (ML) in healthcare, using computer algorithms that learn from a dataset without explicit programming, has the potential to outperform the CPMs available today. ML has already shown exciting progress in the detection of skin malignancies, bone fractures and many other medical conditions. In this review, we will analyse and explain the CPMs currently in use with comparisons to their developing ML counterparts. We have found that although the newest non-ML CPMs are effective, ML-based approaches consistently outperform them. However, improvements to the literature need to be made before ML should be implemented over current CPMs.


2021 ◽  
Vol 36 (Supplement_1) ◽  
Author(s):  
Enrico Favaro ◽  
Roberta Lazzarin ◽  
Daniela Cremasco ◽  
Erika Pierobon ◽  
Marta Guizzo ◽  
...  

Abstract Background and Aims The modern development of the black box approach in clinical nephrology is inconceivable without a logical theory of renal function and a comprehension of anatomical architecture of the kidney, in health and disease: this is the undisputed contribution offered by Malpighi, Oliver and Trueta starting from the seventeenth century. The machine learning model for the prediction of acute kidney injury, progression of renal failure and tubulointerstitial nephritis is a good example of how different knowledge about kidney are an indispensable tool for the interpretation of model itself. Method Historical data were collected from literature, textbooks, encyclopedias, scientific periodicals and laboratory experimental data concerning these three authors. Results The Italian Marcello Malpighi (1628-1694), born in Crevalcore near Bologna, was Professor of anatomy at Bologna, Pisa and Messina. The historic description of the pulmonary capillaries was made in his second epistle to Borelli published in 1661 and intitled De pulmonibus, by means of the frog as “the microscope of nature” (Fig. 1). It is the first description of capillaries in any circulation. William Harvey in De motu cordis in 1628 (year of publication the same of date of birth of Italian anatomist!) could not see the capillary vessels. This thriumphant discovery will serve for the next reconnaissance of characteristic renal rete mirabile.in the corpuscle of Malpighi, lying within the capsule of Bowman. Jean Redman Oliver (1889-1976), a pathologist born and raised in Northern California, was able to bridge the gap between the nephron and collecting system through meticulous dissections, hand drawn illustrations and experiments which underpin our current understanding of renal anatomy and physiology. In the skillful lecture “When is the kidney not a kidney?” (1949) Oliver summarizes his far-sighted vision on renal physiology and disease in the following sentence: the Kidney in health, if you will, but the Nephrons in disease. Because, the “nephron” like the “kidney” is an abstraction that must be qualified in terms of its various parts, its cellular components and the molecular mechanisms involved in each discrete activity (Fig. 2). The Catalan surgeon Josep Trueta I Raspall (1897-1977) was born in the Poblenou neighborhood of Barcelona. His impact of pioneering and visionary contribution to the changes in renal circulation for the pathogenesis of acute kidney injury was pivotal for history of renal physiology. “The kidney has two potential circulatory circulations. Blood may pass either almost exclusively through one or other of two pathways, or to a varying degree through both”. (Studies of the Renal Circulation, published in 1947). Now this diversion of blood from cortex to the less resistant medullary circulation is known with the eponym Trueta shunt. Conclusion The black box approach to the kidney diseases should be considered by practitioners as a further tool to help to inform model update in many clinical setting. The number of machine learning clinical prediction models being published is rising, as new fields of application are being explored in medicine (Fig. 3). A challenge in the clinical nephrology is to explore the “kidney machine” during each therapeutic diagnostic procedure. Always, the intriguing relationship between the set of nephrological syndromes and kidney diseases cannot disregard the precious notions the specific organization of kidney microcirculation, fruit of many scientific contributions of the work by Malpighi, Oliver and Trueta (Fig. 3).


2020 ◽  
Vol 18 (1) ◽  
Author(s):  
Kerry E. Poppenberg ◽  
Vincent M. Tutino ◽  
Lu Li ◽  
Muhammad Waqas ◽  
Armond June ◽  
...  

Abstract Background Intracranial aneurysms (IAs) are dangerous because of their potential to rupture. We previously found significant RNA expression differences in circulating neutrophils between patients with and without unruptured IAs and trained machine learning models to predict presence of IA using 40 neutrophil transcriptomes. Here, we aim to develop a predictive model for unruptured IA using neutrophil transcriptomes from a larger population and more robust machine learning methods. Methods Neutrophil RNA extracted from the blood of 134 patients (55 with IA, 79 IA-free controls) was subjected to next-generation RNA sequencing. In a randomly-selected training cohort (n = 94), the Least Absolute Shrinkage and Selection Operator (LASSO) selected transcripts, from which we constructed prediction models via 4 well-established supervised machine-learning algorithms (K-Nearest Neighbors, Random Forest, and Support Vector Machines with Gaussian and cubic kernels). We tested the models in the remaining samples (n = 40) and assessed model performance by receiver-operating-characteristic (ROC) curves. Real-time quantitative polymerase chain reaction (RT-qPCR) of 9 IA-associated genes was used to verify gene expression in a subset of 49 neutrophil RNA samples. We also examined the potential influence of demographics and comorbidities on model prediction. Results Feature selection using LASSO in the training cohort identified 37 IA-associated transcripts. Models trained using these transcripts had a maximum accuracy of 90% in the testing cohort. The testing performance across all methods had an average area under ROC curve (AUC) = 0.97, an improvement over our previous models. The Random Forest model performed best across both training and testing cohorts. RT-qPCR confirmed expression differences in 7 of 9 genes tested. Gene ontology and IPA network analyses performed on the 37 model genes reflected dysregulated inflammation, cell signaling, and apoptosis processes. In our data, demographics and comorbidities did not affect model performance. Conclusions We improved upon our previous IA prediction models based on circulating neutrophil transcriptomes by increasing sample size and by implementing LASSO and more robust machine learning methods. Future studies are needed to validate these models in larger cohorts and further investigate effect of covariates.


Endocrine ◽  
2021 ◽  
Author(s):  
Olivier Zanier ◽  
Matteo Zoli ◽  
Victor E. Staartjes ◽  
Federica Guaraldi ◽  
Sofia Asioli ◽  
...  

Abstract Purpose Biochemical remission (BR), gross total resection (GTR), and intraoperative cerebrospinal fluid (CSF) leaks are important metrics in transsphenoidal surgery for acromegaly, and prediction of their likelihood using machine learning would be clinically advantageous. We aim to develop and externally validate clinical prediction models for outcomes after transsphenoidal surgery for acromegaly. Methods Using data from two registries, we develop and externally validate machine learning models for GTR, BR, and CSF leaks after endoscopic transsphenoidal surgery in acromegalic patients. For the model development a registry from Bologna, Italy was used. External validation was then performed using data from Zurich, Switzerland. Gender, age, prior surgery, as well as Hardy and Knosp classification were used as input features. Discrimination and calibration metrics were assessed. Results The derivation cohort consisted of 307 patients (43.3% male; mean [SD] age, 47.2 [12.7] years). GTR was achieved in 226 (73.6%) and BR in 245 (79.8%) patients. In the external validation cohort with 46 patients, 31 (75.6%) achieved GTR and 31 (77.5%) achieved BR. Area under the curve (AUC) at external validation was 0.75 (95% confidence interval: 0.59–0.88) for GTR, 0.63 (0.40–0.82) for BR, as well as 0.77 (0.62–0.91) for intraoperative CSF leaks. While prior surgery was the most important variable for prediction of GTR, age, and Hardy grading contributed most to the predictions of BR and CSF leaks, respectively. Conclusions Gross total resection, biochemical remission, and CSF leaks remain hard to predict, but machine learning offers potential in helping to tailor surgical therapy. We demonstrate the feasibility of developing and externally validating clinical prediction models for these outcomes after surgery for acromegaly and lay the groundwork for development of a multicenter model with more robust generalization.


2021 ◽  
Author(s):  
Cathy C. Westhues ◽  
Henner Simianer ◽  
Timothy M. Beissinger

We introduce the R-package learnMET, developed as a flexible framework to enable a collection of analyses on multi-environment trial (MET) breeding data with machine learning-based models. learnMET allows the combination of genomic information with environmental data such as climate and/or soil characteristics. Notably, the package offers the possibility of incorporating weather data from field weather stations, or can retrieve global meteorological datasets from a NASA database. Daily weather data can be aggregated in daily windows based on naive (for instance, daily windows with a fixed number of days) or phenological approaches. Different machine learning methods for genomic prediction are implemented, including gradient boosted trees, random forests, stacked ensemble models, and multi-layer perceptrons. These prediction models can be evaluated via a collection of cross-validation schemes that mimic typical scenarios encountered by plant breeders working with MET experimental data in a user-friendly way. The package is fully open source and accessible on GitHub.


2020 ◽  
Vol 12 (8) ◽  
pp. 3269
Author(s):  
Shinyoung Kwag ◽  
Daegi Hahm ◽  
Minkyu Kim ◽  
Seunghyun Eem

The objective of this study is to propose a model that can predict the seismic performance of slope relatively accurately and efficiently by using machine learning methods. Probabilistic seismic fragility analyses of the slope had been carried out in other studies, and a closed-form equation for slope seismic performance was proposed through a multiple linear regression analysis. However, the traditional statistical linear regression analysis showed a limit that could not accurately represent such nonlinear slope seismic performances. To overcome this limit, in this study, we used three machine learning methods (i.e., support vector machine (SVM), artificial neural network (ANN), Gaussian process regression (GPR)) to generate prediction models of the slope seismic performance. The models obtained through the machine learning methods basically showed better performance compared to the models of the traditional statistical methods. The results of the SVM showed no significant performance difference compared with the results of the nonlinear regression analysis method, but the results based on the ANN and GPR showed a remarkable improvement in the prediction performance over the other models. Furthermore, this study confirmed that the GPR-based model predicted relatively accurate seismic performance values compared with the model through the ANN.


Author(s):  
Jaime Lynn Speiser ◽  
Kathryn E Callahan ◽  
Denise K Houston ◽  
Jason Fanning ◽  
Thomas M Gill ◽  
...  

Abstract Background Advances in computational algorithms and the availability of large datasets with clinically relevant characteristics provide an opportunity to develop machine learning prediction models to aid in diagnosis, prognosis, and treatment of older adults. Some studies have employed machine learning methods for prediction modeling, but skepticism of these methods remains due to lack of reproducibility and difficulty in understanding the complex algorithms that underlie models. We aim to provide an overview of two common machine learning methods: decision tree and random forest. We focus on these methods because they provide a high degree of interpretability. Method We discuss the underlying algorithms of decision tree and random forest methods and present a tutorial for developing prediction models for serious fall injury using data from the Lifestyle Interventions and Independence for Elders (LIFE) study. Results Decision tree is a machine learning method that produces a model resembling a flow chart. Random forest consists of a collection of many decision trees whose results are aggregated. In the tutorial example, we discuss evaluation metrics and interpretation for these models. Illustrated using data from the LIFE study, prediction models for serious fall injury were moderate at best (area under the receiver operating curve of 0.54 for decision tree and 0.66 for random forest). Conclusions Machine learning methods offer an alternative to traditional approaches for modeling outcomes in aging, but their use should be justified and output should be carefully described. Models should be assessed by clinical experts to ensure compatibility with clinical practice.


Sign in / Sign up

Export Citation Format

Share Document