Reporting of prognostic clinical prediction models based on machine learning methods in oncology needs to be improved

Machine Learning Methods Applied to the Prediction of Pseudo-nitzschia spp. Blooms in the Galician Rias Baixas (NW Spain)

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10040199 ◽

2021 ◽

Vol 10 (4) ◽

pp. 199

Author(s):

Francisco M. Bellas Aláez ◽

Jesus M. Torres Palenzuela ◽

Evangelos Spyrakos ◽

Luis González Vilas

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Prediction Models ◽

Support Vector ◽

False Alarms ◽

Learning Approaches ◽

Learning Methods ◽

Machine Learning Methods ◽

Rías Baixas ◽

New Algorithms

This work presents new prediction models based on recent developments in machine learning methods, such as Random Forest (RF) and AdaBoost, and compares them with more classical approaches, i.e., support vector machines (SVMs) and neural networks (NNs). The models predict Pseudo-nitzschia spp. blooms in the Galician Rias Baixas. This work builds on a previous study by the authors (doi.org/10.1016/j.pocean.2014.03.003) but uses an extended database (from 2002 to 2012) and new algorithms. Our results show that RF and AdaBoost provide better prediction results compared to SVMs and NNs, as they show improved performance metrics and a better balance between sensitivity and specificity. Classical machine learning approaches show higher sensitivities, but at a cost of lower specificity and higher percentages of false alarms (lower precision). These results seem to indicate a greater adaptation of new algorithms (RF and AdaBoost) to unbalanced datasets. Our models could be operationally implemented to establish a short-term prediction system.

Download Full-text

Machine Learning in Aging: An Example of Developing Prediction Models for Serious Fall Injury in Older Adults

Innovation in Aging ◽

10.1093/geroni/igaa057.859 ◽

2020 ◽

Vol 4 (Supplement_1) ◽

pp. 268-269

Author(s):

Jaime Speiser ◽

Kathryn Callahan ◽

Jason Fanning ◽

Thomas Gill ◽

Anne Newman ◽

...

Keyword(s):

Machine Learning ◽

Older Adults ◽

Random Forest ◽

Decision Tree ◽

Prediction Models ◽

Receiver Operating Curve ◽

Learning Methods ◽

Life Study ◽

Fall Injury ◽

Machine Learning Methods

Abstract Advances in computational algorithms and the availability of large datasets with clinically relevant characteristics provide an opportunity to develop machine learning prediction models to aid in diagnosis, prognosis, and treatment of older adults. Some studies have employed machine learning methods for prediction modeling, but skepticism of these methods remains due to lack of reproducibility and difficulty understanding the complex algorithms behind models. We aim to provide an overview of two common machine learning methods: decision tree and random forest. We focus on these methods because they provide a high degree of interpretability. We discuss the underlying algorithms of decision tree and random forest methods and present a tutorial for developing prediction models for serious fall injury using data from the Lifestyle Interventions and Independence for Elders (LIFE) study. Decision tree is a machine learning method that produces a model resembling a flow chart. Random forest consists of a collection of many decision trees whose results are aggregated. In the tutorial example, we discuss evaluation metrics and interpretation for these models. Illustrated in data from the LIFE study, prediction models for serious fall injury were moderate at best (area under the receiver operating curve of 0.54 for decision tree and 0.66 for random forest). Machine learning methods may offer improved performance compared to traditional models for modeling outcomes in aging, but their use should be justified and output should be carefully described. Models should be assessed by clinical experts to ensure compatibility with clinical practice.

Download Full-text

Advances in Blast-Induced Impact Prediction—A Review of Machine Learning Applications

Minerals ◽

10.3390/min11060601 ◽

2021 ◽

Vol 11 (6) ◽

pp. 601

Author(s):

Nelson K. Dumakor-Dupey ◽

Sampurna Arya ◽

Ankit Jha

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Academic Research ◽

Empirical Models ◽

Rock Breakage ◽

Environmental Implications ◽

Learning Methods ◽

Factors Affecting ◽

Impact Prediction ◽

Machine Learning Methods

Rock fragmentation in mining and construction industries is widely achieved using drilling and blasting technique. The technique remains the most effective and efficient means of breaking down rock mass into smaller pieces. However, apart from its intended purpose of rock breakage, throw, and heave, blasting operations generate adverse impacts, such as ground vibration, airblast, flyrock, fumes, and noise, that have significant operational and environmental implications on mining activities. Consequently, blast impact studies are conducted to determine an optimum blast design that can maximize the desirable impacts and minimize the undesirable ones. To achieve this objective, several blast impact estimation empirical models have been developed. However, despite being the industry benchmark, empirical model results are based on a limited number of factors affecting the outcomes of a blast. As a result, modern-day researchers are employing machine learning (ML) techniques for blast impact prediction. The ML approach can incorporate several factors affecting the outcomes of a blast, and therefore, it is preferred over empirical and other statistical methods. This paper reviews the various blast impacts and their prediction models with a focus on empirical and machine learning methods. The details of the prediction methods for various blast impacts—including their applications, advantages, and limitations—are discussed. The literature reveals that the machine learning methods are better predictors compared to the empirical models. However, we observed that presently these ML models are mainly applied in academic research.

Download Full-text

Reviewing the use and quality of machine learning in developing clinical prediction models for cardiovascular disease

Postgraduate Medical Journal ◽

10.1136/postgradmedj-2020-139352 ◽

2021 ◽

pp. postgradmedj-2020-139352

Author(s):

Simon Allan ◽

Raphael Olaiya ◽

Rasan Burhan

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Prediction Models ◽

Bone Fractures ◽

Statin Treatment ◽

Computer Algorithms ◽

Clinical Prediction ◽

Clinical Prediction Models ◽

Heart Attacks

Cardiovascular disease (CVD) is one of the leading causes of death across the world. CVD can lead to angina, heart attacks, heart failure, strokes, and eventually, death; among many other serious conditions. The early intervention with those at a higher risk of developing CVD, typically with statin treatment, leads to better health outcomes. For this reason, clinical prediction models (CPMs) have been developed to identify those at a high risk of developing CVD so that treatment can begin at an earlier stage. Currently, CPMs are built around statistical analysis of factors linked to developing CVD, such as body mass index and family history. The emerging field of machine learning (ML) in healthcare, using computer algorithms that learn from a dataset without explicit programming, has the potential to outperform the CPMs available today. ML has already shown exciting progress in the detection of skin malignancies, bone fractures and many other medical conditions. In this review, we will analyse and explain the CPMs currently in use with comparisons to their developing ML counterparts. We have found that although the newest non-ML CPMs are effective, ML-based approaches consistently outperform them. However, improvements to the literature need to be made before ML should be implemented over current CPMs.

Download Full-text

MO1031FROM MARCELLO MALPIGHI THROUGH JEAN REDMAN OLIVER AND JOSEP TRUETA: A CONTINUING CONTRIBUTION TO THE “BLACK BOX” CLINICAL PREDICTION MODELS IN NEPHROLOGY

Nephrology Dialysis Transplantation ◽

10.1093/ndt/gfab105.003 ◽

2021 ◽

Vol 36 (Supplement_1) ◽

Author(s):

Enrico Favaro ◽

Roberta Lazzarin ◽

Daniela Cremasco ◽

Erika Pierobon ◽

Marta Guizzo ◽

...

Keyword(s):

Machine Learning ◽

Acute Kidney Injury ◽

Prediction Models ◽

Kidney Diseases ◽

Kidney Injury ◽

Black Box ◽

Renal Physiology ◽

Clinical Prediction ◽

Renal Circulation ◽

Clinical Prediction Models

Abstract Background and Aims The modern development of the black box approach in clinical nephrology is inconceivable without a logical theory of renal function and a comprehension of anatomical architecture of the kidney, in health and disease: this is the undisputed contribution offered by Malpighi, Oliver and Trueta starting from the seventeenth century. The machine learning model for the prediction of acute kidney injury, progression of renal failure and tubulointerstitial nephritis is a good example of how different knowledge about kidney are an indispensable tool for the interpretation of model itself. Method Historical data were collected from literature, textbooks, encyclopedias, scientific periodicals and laboratory experimental data concerning these three authors. Results The Italian Marcello Malpighi (1628-1694), born in Crevalcore near Bologna, was Professor of anatomy at Bologna, Pisa and Messina. The historic description of the pulmonary capillaries was made in his second epistle to Borelli published in 1661 and intitled De pulmonibus, by means of the frog as “the microscope of nature” (Fig. 1). It is the first description of capillaries in any circulation. William Harvey in De motu cordis in 1628 (year of publication the same of date of birth of Italian anatomist!) could not see the capillary vessels. This thriumphant discovery will serve for the next reconnaissance of characteristic renal rete mirabile.in the corpuscle of Malpighi, lying within the capsule of Bowman. Jean Redman Oliver (1889-1976), a pathologist born and raised in Northern California, was able to bridge the gap between the nephron and collecting system through meticulous dissections, hand drawn illustrations and experiments which underpin our current understanding of renal anatomy and physiology. In the skillful lecture “When is the kidney not a kidney?” (1949) Oliver summarizes his far-sighted vision on renal physiology and disease in the following sentence: the Kidney in health, if you will, but the Nephrons in disease. Because, the “nephron” like the “kidney” is an abstraction that must be qualified in terms of its various parts, its cellular components and the molecular mechanisms involved in each discrete activity (Fig. 2). The Catalan surgeon Josep Trueta I Raspall (1897-1977) was born in the Poblenou neighborhood of Barcelona. His impact of pioneering and visionary contribution to the changes in renal circulation for the pathogenesis of acute kidney injury was pivotal for history of renal physiology. “The kidney has two potential circulatory circulations. Blood may pass either almost exclusively through one or other of two pathways, or to a varying degree through both”. (Studies of the Renal Circulation, published in 1947). Now this diversion of blood from cortex to the less resistant medullary circulation is known with the eponym Trueta shunt. Conclusion The black box approach to the kidney diseases should be considered by practitioners as a further tool to help to inform model update in many clinical setting. The number of machine learning clinical prediction models being published is rising, as new fields of application are being explored in medicine (Fig. 3). A challenge in the clinical nephrology is to explore the “kidney machine” during each therapeutic diagnostic procedure. Always, the intriguing relationship between the set of nephrological syndromes and kidney diseases cannot disregard the precious notions the specific organization of kidney microcirculation, fruit of many scientific contributions of the work by Malpighi, Oliver and Trueta (Fig. 3).

Download Full-text

Classification models using circulating neutrophil transcripts can detect unruptured intracranial aneurysm

Journal of Translational Medicine ◽

10.1186/s12967-020-02550-2 ◽

2020 ◽

Vol 18 (1) ◽

Author(s):

Kerry E. Poppenberg ◽

Vincent M. Tutino ◽

Lu Li ◽

Muhammad Waqas ◽

Armond June ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Prediction Models ◽

Model Performance ◽

Supervised Machine Learning ◽

Support Vector ◽

Learning Methods ◽

Training Cohort ◽

Network Analyses ◽

Machine Learning Methods

Abstract Background Intracranial aneurysms (IAs) are dangerous because of their potential to rupture. We previously found significant RNA expression differences in circulating neutrophils between patients with and without unruptured IAs and trained machine learning models to predict presence of IA using 40 neutrophil transcriptomes. Here, we aim to develop a predictive model for unruptured IA using neutrophil transcriptomes from a larger population and more robust machine learning methods. Methods Neutrophil RNA extracted from the blood of 134 patients (55 with IA, 79 IA-free controls) was subjected to next-generation RNA sequencing. In a randomly-selected training cohort (n = 94), the Least Absolute Shrinkage and Selection Operator (LASSO) selected transcripts, from which we constructed prediction models via 4 well-established supervised machine-learning algorithms (K-Nearest Neighbors, Random Forest, and Support Vector Machines with Gaussian and cubic kernels). We tested the models in the remaining samples (n = 40) and assessed model performance by receiver-operating-characteristic (ROC) curves. Real-time quantitative polymerase chain reaction (RT-qPCR) of 9 IA-associated genes was used to verify gene expression in a subset of 49 neutrophil RNA samples. We also examined the potential influence of demographics and comorbidities on model prediction. Results Feature selection using LASSO in the training cohort identified 37 IA-associated transcripts. Models trained using these transcripts had a maximum accuracy of 90% in the testing cohort. The testing performance across all methods had an average area under ROC curve (AUC) = 0.97, an improvement over our previous models. The Random Forest model performed best across both training and testing cohorts. RT-qPCR confirmed expression differences in 7 of 9 genes tested. Gene ontology and IPA network analyses performed on the 37 model genes reflected dysregulated inflammation, cell signaling, and apoptosis processes. In our data, demographics and comorbidities did not affect model performance. Conclusions We improved upon our previous IA prediction models based on circulating neutrophil transcriptomes by increasing sample size and by implementing LASSO and more robust machine learning methods. Future studies are needed to validate these models in larger cohorts and further investigate effect of covariates.

Download Full-text

Machine learning-based clinical outcome prediction in surgery for acromegaly

Endocrine ◽

10.1007/s12020-021-02890-z ◽

2021 ◽

Author(s):

Olivier Zanier ◽

Matteo Zoli ◽

Victor E. Staartjes ◽

Federica Guaraldi ◽

Sofia Asioli ◽

...

Keyword(s):

Machine Learning ◽

Transsphenoidal Surgery ◽

Prediction Models ◽

External Validation ◽

Clinical Prediction ◽

Clinical Prediction Models ◽

Total Resection ◽

Prior Surgery ◽

Using Data ◽

Csf Leaks

Abstract Purpose Biochemical remission (BR), gross total resection (GTR), and intraoperative cerebrospinal fluid (CSF) leaks are important metrics in transsphenoidal surgery for acromegaly, and prediction of their likelihood using machine learning would be clinically advantageous. We aim to develop and externally validate clinical prediction models for outcomes after transsphenoidal surgery for acromegaly. Methods Using data from two registries, we develop and externally validate machine learning models for GTR, BR, and CSF leaks after endoscopic transsphenoidal surgery in acromegalic patients. For the model development a registry from Bologna, Italy was used. External validation was then performed using data from Zurich, Switzerland. Gender, age, prior surgery, as well as Hardy and Knosp classification were used as input features. Discrimination and calibration metrics were assessed. Results The derivation cohort consisted of 307 patients (43.3% male; mean [SD] age, 47.2 [12.7] years). GTR was achieved in 226 (73.6%) and BR in 245 (79.8%) patients. In the external validation cohort with 46 patients, 31 (75.6%) achieved GTR and 31 (77.5%) achieved BR. Area under the curve (AUC) at external validation was 0.75 (95% confidence interval: 0.59–0.88) for GTR, 0.63 (0.40–0.82) for BR, as well as 0.77 (0.62–0.91) for intraoperative CSF leaks. While prior surgery was the most important variable for prediction of GTR, age, and Hardy grading contributed most to the predictions of BR and CSF leaks, respectively. Conclusions Gross total resection, biochemical remission, and CSF leaks remain hard to predict, but machine learning offers potential in helping to tailor surgical therapy. We demonstrate the feasibility of developing and externally validating clinical prediction models for these outcomes after surgery for acromegaly and lay the groundwork for development of a multicenter model with more robust generalization.

Download Full-text

learnMET: an R package to apply machine learning methods for genomic prediction using multi-environment trial data

10.1101/2021.12.13.472185 ◽

2021 ◽

Author(s):

Cathy C. Westhues ◽

Henner Simianer ◽

Timothy M. Beissinger

Keyword(s):

Machine Learning ◽

Genomic Prediction ◽

Prediction Models ◽

R Package ◽

Fixed Number ◽

Environmental Data ◽

Weather Data ◽

Learning Methods ◽

Machine Learning Methods ◽

Daily Weather Data

We introduce the R-package learnMET, developed as a flexible framework to enable a collection of analyses on multi-environment trial (MET) breeding data with machine learning-based models. learnMET allows the combination of genomic information with environmental data such as climate and/or soil characteristics. Notably, the package offers the possibility of incorporating weather data from field weather stations, or can retrieve global meteorological datasets from a NASA database. Daily weather data can be aggregated in daily windows based on naive (for instance, daily windows with a fixed number of days) or phenological approaches. Different machine learning methods for genomic prediction are implemented, including gradient boosted trees, random forests, stacked ensemble models, and multi-layer perceptrons. These prediction models can be evaluated via a collection of cross-validation schemes that mimic typical scenarios encountered by plant breeders working with MET experimental data in a user-friendly way. The package is fully open source and accessible on GitHub.

Download Full-text

Development of a Probabilistic Seismic Performance Assessment Model of Slope Using Machine Learning Methods

Sustainability ◽

10.3390/su12083269 ◽

2020 ◽

Vol 12 (8) ◽

pp. 3269

Author(s):

Shinyoung Kwag ◽

Daegi Hahm ◽

Minkyu Kim ◽

Seunghyun Eem

Keyword(s):

Machine Learning ◽

Regression Analysis ◽

Linear Regression ◽

Seismic Performance ◽

Prediction Models ◽

Linear Regression Analysis ◽

Assessment Model ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods

The objective of this study is to propose a model that can predict the seismic performance of slope relatively accurately and efficiently by using machine learning methods. Probabilistic seismic fragility analyses of the slope had been carried out in other studies, and a closed-form equation for slope seismic performance was proposed through a multiple linear regression analysis. However, the traditional statistical linear regression analysis showed a limit that could not accurately represent such nonlinear slope seismic performances. To overcome this limit, in this study, we used three machine learning methods (i.e., support vector machine (SVM), artificial neural network (ANN), Gaussian process regression (GPR)) to generate prediction models of the slope seismic performance. The models obtained through the machine learning methods basically showed better performance compared to the models of the traditional statistical methods. The results of the SVM showed no significant performance difference compared with the results of the nonlinear regression analysis method, but the results based on the ANN and GPR showed a remarkable improvement in the prediction performance over the other models. Furthermore, this study confirmed that the GPR-based model predicted relatively accurate seismic performance values compared with the model through the ANN.

Download Full-text

Machine Learning in Aging: An Example of Developing Prediction Models for Serious Fall Injury in Older Adults

The Journals of Gerontology Series A ◽

10.1093/gerona/glaa138 ◽

2020 ◽

Author(s):

Jaime Lynn Speiser ◽

Kathryn E Callahan ◽

Denise K Houston ◽

Jason Fanning ◽

Thomas M Gill ◽

...

Keyword(s):

Machine Learning ◽

Older Adults ◽

Random Forest ◽

Decision Tree ◽

Prediction Models ◽

Learning Methods ◽

Life Study ◽

Fall Injury ◽

Machine Learning Methods ◽

Using Data

Abstract Background Advances in computational algorithms and the availability of large datasets with clinically relevant characteristics provide an opportunity to develop machine learning prediction models to aid in diagnosis, prognosis, and treatment of older adults. Some studies have employed machine learning methods for prediction modeling, but skepticism of these methods remains due to lack of reproducibility and difficulty in understanding the complex algorithms that underlie models. We aim to provide an overview of two common machine learning methods: decision tree and random forest. We focus on these methods because they provide a high degree of interpretability. Method We discuss the underlying algorithms of decision tree and random forest methods and present a tutorial for developing prediction models for serious fall injury using data from the Lifestyle Interventions and Independence for Elders (LIFE) study. Results Decision tree is a machine learning method that produces a model resembling a flow chart. Random forest consists of a collection of many decision trees whose results are aggregated. In the tutorial example, we discuss evaluation metrics and interpretation for these models. Illustrated using data from the LIFE study, prediction models for serious fall injury were moderate at best (area under the receiver operating curve of 0.54 for decision tree and 0.66 for random forest). Conclusions Machine learning methods offer an alternative to traditional approaches for modeling outcomes in aging, but their use should be justified and output should be carefully described. Models should be assessed by clinical experts to ensure compatibility with clinical practice.

Download Full-text