Clinical-learning versus machine-learning for transdiagnostic prediction of psychosis onset in individuals at-risk

Abstract Predicting the onset of psychosis in individuals at-risk is based on robust prognostic model building methods including a priori clinical knowledge (also termed clinical-learning) to preselect predictors or machine-learning methods to select predictors automatically. To date, there is no empirical research comparing the prognostic accuracy of these two methods for the prediction of psychosis onset. In a first experiment, no improved performance was observed when machine-learning methods (LASSO and RIDGE) were applied—using the same predictors—to an individualised, transdiagnostic, clinically based, risk calculator previously developed on the basis of clinical-learning (predictors: age, gender, age by gender, ethnicity, ICD-10 diagnostic spectrum), and externally validated twice. In a second experiment, two refined versions of the published model which expanded the granularity of the ICD-10 diagnosis were introduced: ICD-10 diagnostic categories and ICD-10 diagnostic subdivisions. Although these refined versions showed an increase in apparent performance, their external performance was similar to the original model. In a third experiment, the three refined models were analysed under machine-learning and clinical-learning with a variable event per variable ratio (EPV). The best performing model under low EPVs was obtained through machine-learning approaches. The development of prognostic models on the basis of a priori clinical knowledge, large samples and adequate events per variable is a robust clinical prediction method to forecast psychosis onset in patients at-risk, and is comparable to machine-learning methods, which are more difficult to interpret and implement. Machine-learning methods should be preferred for high dimensional data when no a priori knowledge is available.

Download Full-text

Novel Machine Learning Methods for ERP Analysis: A Validation From Research on Infants at Risk for Autism

Developmental Neuropsychology ◽

10.1080/87565641.2011.650808 ◽

2012 ◽

Vol 37 (3) ◽

pp. 274-298 ◽

Cited By ~ 32

Author(s):

Daniel Stahl ◽

Andrew Pickles ◽

Mayada Elsabbagh ◽

Mark H. Johnson ◽

The BASIS Team

Keyword(s):

Machine Learning ◽

At Risk ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Prognozowanie zapotrzebowania na gaz metodami sztucznej inteligencji

Nafta-Gaz ◽

10.18668/ng.2019.02.07 ◽

2019 ◽

Vol 75 (2) ◽

pp. 111-117

Author(s):

Andrzej Paliński ◽

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Decision Tree ◽

Computational Intelligence ◽

Prognostic Models ◽

Learning Methods ◽

Explanatory Variables ◽

Machine Learning Methods ◽

Gas Consumption ◽

Classical Linear Regression

The paper presents contemporary trends in artificial intelligence and machine learning methods, which include, among others, artificial neural networks, decision trees, fuzzy logic systems and others. Computational intelligence methods are part of the field of research on artificial intelligence. Selected methods of computational intelligence were used to build medium-term monthly forecasts of natural gas demand for Poland. The accuracy of forecasts obtained using the artificial neural network and the decision tree with classical linear regression was compared based on historical data from a ten-year period. The explanatory variables were: gas consumption in other EU countries, average monthly temperature, industrial production, wages in the economy and the price of natural gas. Forecasting was carried out in five stages differing in the selection of the learning and testing sample, the use of data preprocessing and the elimination of some variables. For raw data and a random training set, the highest accuracy was achieved by linear regression. For the preprocessed data and the random learning set, the decision tree was the most accurate. The forecast obtained on the basis of the first eight years and tested on the last two was most accurately created by regression, but only slightly better than with the decision tree or neural network, regardless of data normalization and elimination of collinear variables. Machine learning methods showed good accuracy of monthly gas consumption forecasts, but nevertheless slightly gave way to classical linear regression, due to too narrow set of explanatory variables. Machine learning methods will be able to show higher effectiveness as the number of data increases and the set of potential explanatory variables is expanded. In the sea of data, machine learning methods are able to create prognostic models more effectively, without the analyst’s laborious involvement in data preparation and multi-stage analysis. They will also allow for the frequent updating of the form of prognostic models even after each addition of new data into the database.

Download Full-text

Prediction of Maternal Hemorrhage: Using Machine Learning to Identify Patients at Risk (Preprint)

10.2196/preprints.34108 ◽

2021 ◽

Author(s):

Jill M Westcott ◽

Francine Hughes ◽

Wenke Liu ◽

Mark Grivainis ◽

Iffath Hoskins ◽

...

Keyword(s):

Machine Learning ◽

At Risk ◽

Postpartum Hemorrhage ◽

Mode Of Delivery ◽

Receiver Operating Curve ◽

Stage Model ◽

Learning Methods ◽

Second Stage ◽

Machine Learning Methods ◽

Patients At Risk

BACKGROUND Postpartum hemorrhage remains one of the largest causes of maternal morbidity and mortality in the United States. OBJECTIVE To utilize machine learning techniques to identify patients at risk for postpartum hemorrhage at obstetric delivery. METHODS Women aged 18 to 55 delivering at a major academic center from July 2013 to October 2018 were included for analysis (n = 30,867). A total of 497 variables were collected from the electronic medical record including demographic information, obstetric, medical, surgical, and family history, vital signs, laboratory results, labor medication exposures, and delivery outcomes. Postpartum hemorrhage was defined as a blood loss of ≥ 1000 mL at the time of delivery, regardless of delivery method, with 2179 positive cases observed (7.06%). Supervised learning with regression-, tree-, and kernel-based machine learning methods was used to create classification models based upon training (n = 21,606) and validation (n = 4,630) cohorts. Models were tuned using feature selection algorithms and domain knowledge. An independent test cohort (n = 4,631) determined final performance by assessing for accuracy, area under the receiver operating curve (AUC), and sensitivity for proper classification of postpartum hemorrhage. Separate models were created using all collected data versus limited to data available prior to the second stage of labor/at the time of decision to proceed with cesarean delivery. Additional models examined patients by mode of delivery. RESULTS Gradient boosted decision trees achieved the best discrimination in the overall model. The model including all data mildly outperformed the second stage model (AUC 0.979, 95% CI 0.971-0.986 vs. AUC 0.955, 95% CI 0.939-0.970). Optimal model accuracy was 98.1% with a sensitivity of 0.763 for positive prediction of postpartum hemorrhage. The second stage model achieved an accuracy of 98.0% with a sensitivity of 0.737. Other selected algorithms returned models that performed with decreased discrimination. Models stratified by mode of delivery achieved good to excellent discrimination, but lacked sensitivity necessary for clinical applicability. CONCLUSIONS Machine learning methods can be used to identify women at risk for postpartum hemorrhage who may benefit from individualized preventative measures. Models limited to data available prior to delivery perform nearly as well as those with more complete datasets, supporting their potential utility in the clinical setting. Further work is necessary to create successful models based upon mode of delivery. An unbiased approach to hemorrhage risk prediction may be superior to human risk assessment and represents an area for future research.

Download Full-text

Using Machine Learning to Advance Early Warning Systems: Promise and Pitfalls

Teachers College Record ◽

10.1177/016146812012201403 ◽

2020 ◽

Vol 122 (14) ◽

pp. 1-30

Author(s):

James Soland ◽

Benjamin Domingue ◽

David Lang

Keyword(s):

Machine Learning ◽

High School ◽

At Risk ◽

Early Warning ◽

Early Warning Systems ◽

Machine Learning Techniques ◽

Dropping Out ◽

Learning Methods ◽

Machine Learning Methods ◽

Learning Techniques

Background/Context Early warning indicators (EWI) are often used by states and districts to identify students who are not on track to finish high school, and provide supports/interventions to increase the odds the student will graduate. While EWI are diverse in terms of the academic behaviors they capture, research suggests that indicators like course failures, chronic absenteeism, and suspensions can help identify students in need of additional supports. In parallel with the expansion of administrative data that have made early versions of EWI possible, new machine learning methods have been developed. These methods are data-driven and often designed to sift through thousands of variables with the purpose of identifying the best predictors of a given outcome. While applications of machine learning techniques to identify students at-risk of high school dropout have obvious appeal, few studies consider the benefits and limitations of applying those models in an EWI context, especially as they relate to questions of fairness and equity. Focus of Study In this study, we will provide applied examples of how machine learning can be used to support EWI selection. The purpose is to articulate the broad risks and benefits of using machine learning methods to identify students who may be at risk of dropping out. We focus on dropping out given its salience in the EWI literature, but also anticipate generating insights that will be germane to EWI used for a variety of outcomes. Research Design We explore these issues by using several hypothetical examples of how ML techniques might be used to identify EWI. For example, we show results from decision tree algorithms used to identify predictors of dropout that use simulated data. Conclusions/Recommendations Generally, we argue that machine learning techniques have several potential benefits in the EWI context. For example, some related methods can help create clear decision rules for which students are a dropout risk, and their predictive accuracy can be higher than for more traditional, regression-based models. At the same time, these methods often require additional statistical and data management expertise to be used appropriately. Further, the black-box nature of machine learning algorithms could invite their users to interpret results through the lens of preexisting biases about students and educational settings.

Download Full-text

Detection of quasi-harmonic signals with a priori unknown parameters in strong additive noise by machine learning methods

Journal of Physics Conference Series ◽

10.1088/1742-6596/1368/5/052014 ◽

2019 ◽

Vol 1368 ◽

pp. 052014

Author(s):

A A Nevzorov ◽

A A Orlov ◽

D A Stankevich

Keyword(s):

Machine Learning ◽

Additive Noise ◽

A Priori ◽

Unknown Parameters ◽

Learning Methods ◽

Machine Learning Methods ◽

Harmonic Signals

Download Full-text

Application of machine learning methods in big data analytics at management of contracts in the construction industry

MATEC Web of Conferences ◽

10.1051/matecconf/201817001106 ◽

2018 ◽

Vol 170 ◽

pp. 01106 ◽

Cited By ~ 9

Author(s):

Marina Valpeters ◽

Ivan Kireev ◽

Nikolay Ivanov

Keyword(s):

Machine Learning ◽

Big Data ◽

Construction Industry ◽

Big Data Analytics ◽

Machine Learning Algorithms ◽

Prognostic Models ◽

Problem Analysis ◽

Learning Methods ◽

Machine Learning Methods

The number of experts who realize the importance of big data continues to increase in various fields of the economy. Experts begin to use big data more frequently for the solution of their specific objectives. One of the probable big data tasks in the construction industry is the determination of the probability of contract execution at a stage of its establishment. The contract holder cannot guarantee execution of the contract. Therefore it leads to a lot of risks for the customer. This article is devoted to the applicability of machine learning methods to the task of determination of the probability of a successful contract execution. Authors try to reveal the factors influencing the possibility of contract default and then try to define the following corrective actions for a customer. In the problem analysis, authors used the linear and non-linear algorithms, feature extraction, feature transformation and feature selection. The results of investigation include the prognostic models with a predictive force based on the machine learning algorithms such as logistic regression, decision tree, randomize forest. Authors have validated models on available historical data. The developed models have the potential for practical use in the construction organizations while making new contracts.

Download Full-text

CLASSIFICATION MODEL BUILDING USING MACHINE LEARNING METHODS

CHERKASY UNIVERSITY BULLETIN: APPLIED MATHEMATICS. INFORMATICS ◽

10.31651/2076-5886-2019-1-42-53 ◽

2020 ◽

pp. 42-53

Author(s):

Oleksandr PISKUN ◽

Keyword(s):

Machine Learning ◽

Model Building ◽

Classification Model ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Replicability of Machine Learning Models in the Social Sciences

Zeitschrift für Psychologie ◽

10.1027/2151-2604/a000344 ◽

2018 ◽

Vol 226 (4) ◽

pp. 259-273 ◽

Cited By ~ 2

Author(s):

Ranjith Vijayakumar ◽

Mike W.-L. Cheung

Keyword(s):

Social Sciences ◽

Machine Learning ◽

Variable Selection ◽

Model Building ◽

Predictive Accuracy ◽

Explanatory Power ◽

Multivariate Adaptive Regression Splines ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods

Abstract. Machine learning tools are increasingly used in social sciences and policy fields due to their increase in predictive accuracy. However, little research has been done on how well the models of machine learning methods replicate across samples. We compare machine learning methods with regression on the replicability of variable selection, along with predictive accuracy, using an empirical dataset as well as simulated data with additive, interaction, and non-linear squared terms added as predictors. Methods analyzed include support vector machines (SVM), random forests (RF), multivariate adaptive regression splines (MARS), and the regularized regression variants, least absolute shrinkage and selection operator (LASSO), and elastic net. In simulations with additive and linear interactions, machine learning methods performed similarly to regression in replicating predictors; they also performed mostly equal or below regression on measures of predictive accuracy. In simulations with square terms, machine learning methods SVM, RF, and MARS improved predictive accuracy and replicated predictors better than regression. Thus, in simulated datasets, the gap between machine learning methods and regression on predictive measures foreshadowed the gap in variable selection. In replications on the empirical dataset, however, improved prediction by machine learning methods was not accompanied by a visible improvement in replicability in variable selection. This disparity is explained by the overall explanatory power of the models. When predictors have small effects and noise predominates, improved global measures of prediction in a sample by machine learning methods may not lead to the robust selection of predictors; thus, in the presence of weak predictors and noise, regression remains a useful tool for model building and replication.

Download Full-text