Application of Machine Learning Techniques to Identify Data Reliability and Factors Affecting Outcome After Stroke Using Electronic Administrative Records

Aim: To use available electronic administrative records to identify data reliability, predict discharge destination, and identify risk factors associated with specific outcomes following hospital admission with stroke, compared to stroke specific clinical factors, using machine learning techniques.Method: The study included 2,531 patients having at least one admission with a confirmed diagnosis of stroke, collected from a regional hospital in Australia within 2009–2013. Using machine learning (penalized regression with Lasso) techniques, patients having their index admission between June 2009 and July 2012 were used to derive predictive models, and patients having their index admission between July 2012 and June 2013 were used for validation. Three different stroke types [intracerebral hemorrhage (ICH), ischemic stroke, transient ischemic attack (TIA)] were considered and five different comparison outcome settings were considered. Our electronic administrative record based predictive model was compared with a predictive model composed of “baseline” clinical features, more specific for stroke, such as age, gender, smoking habits, co-morbidities (high cholesterol, hypertension, atrial fibrillation, and ischemic heart disease), types of imaging done (CT scan, MRI, etc.), and occurrence of in-hospital pneumonia. Risk factors associated with likelihood of negative outcomes were identified.Results: The data was highly reliable at predicting discharge to rehabilitation and all other outcomes vs. death for ICH (AUC 0.85 and 0.825, respectively), all discharge outcomes except home vs. rehabilitation for ischemic stroke, and discharge home vs. others and home vs. rehabilitation for TIA (AUC 0.948 and 0.873, respectively). Electronic health record data appeared to provide improved prediction of outcomes over stroke specific clinical factors from the machine learning models. Common risk factors associated with a negative impact on expected outcomes appeared clinically intuitive, and included older age groups, prior ventilatory support, urinary incontinence, need for imaging, and need for allied health input.Conclusion: Electronic administrative records from this cohort produced reliable outcome prediction and identified clinically appropriate factors negatively impacting most outcome variables following hospital admission with stroke. This presents a means of future identification of modifiable factors associated with patient discharge destination. This may potentially aid in patient selection for certain interventions and aid in better patient and clinician education regarding expected discharge outcomes.

Download Full-text

Schoolchildren’ Depression and Anxiety Prediction Using Machine Learning Algorithms (Preprint)

10.2196/preprints.32736 ◽

2021 ◽

Author(s):

Radwan Qasrawi ◽

Stephanny Vicuna Polo ◽

Diala Abu Al-Halawah ◽

Sameh Hallaq ◽

Ziad Abdeen

Keyword(s):

Mental Health ◽

Machine Learning ◽

Risk Factors ◽

Cognitive Development ◽

Random Forest ◽

Machine Learning Techniques ◽

Support Vector ◽

Depression And Anxiety ◽

Factors Associated ◽

Learning Techniques

BACKGROUND : Depression and anxiety symptoms in early childhood have a major effect on children's mental health growth and cognitive development. Studying the effect of mental health problems on cognitive development has gained researchers' attention for the last two decades OBJECTIVE In this paper, we seek to use machine learning techniques to predict the risk factors associated with school children's depression and anxiety METHODS The study data consisted of 5685 students in grades 5-9, aged 10-17 years, studying at public and refugee schools in the West Bank. The data were collected using the health behaviors school children questionnaire in the 2012-2013 academic year and analyzed using machine learning to predict the risk factors associated with student mental health symptoms. Five machine learning techniques (Random Forest, Neural Network, Decision Tree, Support Vector Machine, and Naïve Bayes) were used for the prediction. RESULTS The results indicated that the Random Forest model had the highest accuracy levels (72.6%, 68.5%) for depression and anxiety respectively. Thus, the Random Forest had the best performance in classifying and predicting the student's depression and anxiety. The results showed that school violence and bullying, home violence, academic performance, and family income were the most important factors affecting depression and anxiety scales CONCLUSIONS Overall, machine learning proved to be an efficient tool for identifying and predicting the associated factors that influence student depression and anxiety. The deployment of machine learning within the school information systems might facilitate the development of health prevention and intervention programs that will enhance students’ mental health and cognitive development.

Download Full-text

Identifying the underlying factors associated with antidepressant drug discontinuation: Content analysis of patients’ drug reviews (Preprint)

10.2196/preprints.23572 ◽

2020 ◽

Author(s):

Mohammad Alarifi ◽

Somaieh Goudarzvand3 ◽

Abdulrahman Jabour ◽

Doreen Foy ◽

Maryam Zolnoori

Keyword(s):

Machine Learning ◽

Antidepressant Drug ◽

Prediction Method ◽

Analytical Framework ◽

Structured Data ◽

Withdrawal Symptoms ◽

Machine Learning Techniques ◽

Drug Discontinuation ◽

Factors Associated ◽

Learning Techniques

BACKGROUND The rate of antidepressant prescriptions is globally increasing. A large portion of patients stop their medications which could lead to many side effects including relapse, and anxiety. OBJECTIVE The aim of this was to develop a drug-continuity prediction model and identify the factors associated with drug-continuity using online patient forums. METHODS We retrieved 982 antidepressant drug reviews from the online patient’s forum AskaPatient.com. We followed the Analytical Framework Method to extract structured data from unstructured data. Using the structured data, we examined the factors associated with antidepressant discontinuity and developed a predictive model using multiple machine learning techniques. RESULTS We tested multiple machine learning techniques which resulted in different performances ranging from accuracy of 65% to 82%. We found that Radom Forest algorithm provides the highest prediction method with 82% Accuracy, 78% Precision, 88.03% Recall, and 84.2% F1-Score. The factors associated with drug discontinuity the most were; withdrawal symptoms, effectiveness-ineffectiveness, perceived-distress-adverse drug reaction, rating, and perceived-distress related to withdrawal symptoms. CONCLUSIONS Although the nature of data available at online forums differ from data collected through surveys, we found that online patients forum can be a valuable source of data for drug-continuity prediction and understanding patients experience. The factors identified through our techniques were consistent with the findings of prior studies that used surveys.

Download Full-text

FRI0585 HIGH-THROUGHPUT METHODOLOGY FOR EMR-BASED IDENTIFICATION OF CLINICAL SUB-PHENOTYPES IN COMPLEX PATIENT POPULATIONS

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2020-eular.3489 ◽

2020 ◽

Vol 79 (Suppl 1) ◽

pp. 897.2-897

Author(s):

M. Maurits ◽

T. Huizinga ◽

M. Reinders ◽

S. Raychaudhuri ◽

E. Karlson ◽

...

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Dimensionality Reduction ◽

High Throughput ◽

Brain Cancer ◽

Machine Learning Techniques ◽

Summary Statistics ◽

Medical Problems ◽

Learning Techniques ◽

Icd Codes

Background:Heterogeneity in disease populations complicates discovery of risk factors. To identify risk factors for subpopulations of diseases, we need analytical methods that can deal with unidentified disease subgroups.Objectives:Inspired by successful approaches from the Big Data field, we developed a high-throughput approach to identify subpopulations within patients with heterogeneous, complex diseases using the wealth of information available in Electronic Medical Records (EMRs).Methods:We extracted longitudinal healthcare-interaction records coded by 1,853 PheCodes[1] of the 64,819 patients from the Boston’s Partners-Biobank. Through dimensionality reduction using t-SNE[2] we created a 2D embedding of 32,424 of these patients (set A). We then identified distinct clusters post-t-SNE using DBscan[3] and visualized the relative importance of individual PheCodes within them using specialized spectrographs. We replicated this procedure in the remaining 32,395 records (set B).Results:Summary statistics of both sets were comparable (Table 1).Table 1.Summary statistics of the total Partners Biobank dataset and the 2 partitions.Set-Aset-BTotalEntries12,200,31112,177,13124,377,442Patients32,42432,39564,819Patientyears369,546.33368,597.92738,144.2unique ICD codes25,05624,95326,305unique Phecodes1,8511,8531,853We found 284 clusters in set A and 295 in set B, of which 63.4% from set A could be mapped to a cluster in set B with a median (range) correlation of 0.24 (0.03 – 0.58).Clusters represented similar yet distinct clinical phenotypes; e.g. patients diagnosed with “other headache syndrome” were separated into four distinct clusters characterized by migraines, neurofibromatosis, epilepsy or brain cancer, all resulting in patients presenting with headaches (Fig. 1 & 2). Though EMR databases tend to be noisy, our method was also able to differentiate misclassification from true cases; SLE patients with RA codes clustered separately from true RA cases.Figure 1.Two dimensional representation of Set A generated using dimensionality reduction (tSNE) and clustering (DBScan).Figure 2.Phenotype Spectrographs (PheSpecs) of four clusters characterized by “Other headache syndromes”, driven by codes relating to migraine, epilepsy, neurofibromatosis or brain cancer.Conclusion:We have shown that EMR data can be used to identify and visualize latent structure in patient categorizations, using an approach based on dimension reduction and clustering machine learning techniques. Our method can identify misclassified patients as well as separate patients with similar problems into subsets with different associated medical problems. Our approach adds a new and powerful tool to aid in the discovery of novel risk factors in complex, heterogeneous diseases.References:[1] Denny, J.C. et al. Bioinformatics (2010)[2]van der Maaten et al. Journal of Machine Learning Research (2008)[3] Ester, M. et al. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. (1996)Disclosure of Interests:Marc Maurits: None declared, Thomas Huizinga Grant/research support from: Ablynx, Bristol-Myers Squibb, Roche, Sanofi, Consultant of: Ablynx, Bristol-Myers Squibb, Roche, Sanofi, Marcel Reinders: None declared, Soumya Raychaudhuri: None declared, Elizabeth Karlson: None declared, Erik van den Akker: None declared, Rachel Knevel: None declared

Download Full-text

Prediction of Clinical Risk Factors of Diabetes Using Multiple Machine Learning Techniques Resolving Class Imbalance

2020 23rd International Conference on Computer and Information Technology (ICCIT) ◽

10.1109/iccit51783.2020.9392694 ◽

2020 ◽

Author(s):

Kazi Amit Hasan ◽

Md. Al Mehedi Hasan

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Class Imbalance ◽

Clinical Risk Factors ◽

Machine Learning Techniques ◽

Clinical Risk ◽

Learning Techniques

Download Full-text

Survival prediction models since liver transplantation - comparisons between Cox models and machine learning techniques

BMC Medical Research Methodology ◽

10.1186/s12874-020-01153-1 ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Georgios Kantidakis ◽

Hein Putter ◽

Carlo Lancia ◽

Jacob de Boer ◽

Andries E. Braat ◽

...

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Neural Networks ◽

Liver Transplantation ◽

Prediction Models ◽

Machine Learning Techniques ◽

Brier Score ◽

Cox Models ◽

Learning Techniques ◽

Random Survival Forest

Abstract Background Predicting survival of recipients after liver transplantation is regarded as one of the most important challenges in contemporary medicine. Hence, improving on current prediction models is of great interest.Nowadays, there is a strong discussion in the medical field about machine learning (ML) and whether it has greater potential than traditional regression models when dealing with complex data. Criticism to ML is related to unsuitable performance measures and lack of interpretability which is important for clinicians. Methods In this paper, ML techniques such as random forests and neural networks are applied to large data of 62294 patients from the United States with 97 predictors selected on clinical/statistical grounds, over more than 600, to predict survival from transplantation. Of particular interest is also the identification of potential risk factors. A comparison is performed between 3 different Cox models (with all variables, backward selection and LASSO) and 3 machine learning techniques: a random survival forest and 2 partial logistic artificial neural networks (PLANNs). For PLANNs, novel extensions to their original specification are tested. Emphasis is given on the advantages and pitfalls of each method and on the interpretability of the ML techniques. Results Well-established predictive measures are employed from the survival field (C-index, Brier score and Integrated Brier Score) and the strongest prognostic factors are identified for each model. Clinical endpoint is overall graft-survival defined as the time between transplantation and the date of graft-failure or death. The random survival forest shows slightly better predictive performance than Cox models based on the C-index. Neural networks show better performance than both Cox models and random survival forest based on the Integrated Brier Score at 10 years. Conclusion In this work, it is shown that machine learning techniques can be a useful tool for both prediction and interpretation in the survival context. From the ML techniques examined here, PLANN with 1 hidden layer predicts survival probabilities the most accurately, being as calibrated as the Cox model with all variables. Trial registration Retrospective data were provided by the Scientific Registry of Transplant Recipients under Data Use Agreement number 9477 for analysis of risk factors after liver transplantation.

Download Full-text

Classification of Neurodegenerative Disorders Based on Major Risk Factors Employing Machine Learning Techniques

International Journal of Engineering and Technology ◽

10.7763/ijet.2010.v2.146 ◽

2010 ◽

Vol 2 (4) ◽

pp. 350-355 ◽

Cited By ~ 5

Author(s):

Sandhya Joshi ◽

P. Deepa Shenoy ◽

Vibhudendra Simha G.G. ◽

Venugopal K. R ◽

L.M. Patnaik

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Neurodegenerative Disorders ◽

Machine Learning Techniques ◽

Learning Techniques

Download Full-text

USE OF MACHINE LEARNING TECHNIQUES TO IDENTIFY RISK FACTORS FOR CARDIAC IMPLANTABLE ELECTRONIC DEVICE (CIED) INFECTION: LESSONS FROM THE WRAP-IT TRIAL

Journal of the American College of Cardiology ◽

10.1016/s0735-1097(20)30920-7 ◽

2020 ◽

Vol 75 (11) ◽

pp. 293

Author(s):

Khaldoun G. Tarakji ◽

Andrew D. Krahn ◽

Jeanne Poole ◽

Suneet Mittal ◽

Charles Kennergren ◽

...

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Electronic Device ◽

Machine Learning Techniques ◽

Cardiac Implantable Electronic Device ◽

Learning Techniques

Download Full-text

Cervical Cancer: Machine Learning Techniques for Detection, Risk Factors and Prevention Measures

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c4316.099320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 158-163

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Cervical Cancer ◽

Preventive Measures ◽

Hpv Infection ◽

Machine Learning Techniques ◽

Future Research ◽

Prevention Measures ◽

Detection Techniques ◽

Learning Techniques

Cervical Cancer is considered the fourth most common female malignancy worldwide and represents a major global health challenge. As a result, in recent years, various proposals and researches have been conducted. This study aims to analyze the data presented in current researches regarding cervical cancer and contribute to future research, all through the framework of literature review, based on 3 research questions: Q1: What are the risk factors that cause cervical cancer? Q2: What preventive measures are currently established for cervical cancer? and, Q3: What are the techniques to detect cervical cancer? Findings show that detection techniques are complementary since they are categorized under machine learning. Therefore, we recommend that further study be promoted in these techniques as they are helpful in the detection process. In addition, risk factors can be considered for a greater scope in detection, such as HPV infection, since it is the most relevant factor for the development of cervical cancer. Finally, we suggest to conduct further research on preventive measures for cervical cancer.

Download Full-text

Aprendizado de Máquina Aplicado à Predição de Doenças Cardiometabólicas com Utilização de Indicadores Metabólicos e Comportamentais de Risco à Saúde

10.14210/cotb.v12.p301-308 ◽

2021 ◽

Author(s):

Alan Lopes de Sousa Freitas ◽

Ana Silvia Degasperi Ieker ◽

Josiane Melchiori Pinheiro ◽

Wilson Rinaldi ◽

Heloise Manica Paris Teixeira

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Logistic Regression ◽

Decision Tree ◽

Causes Of Death ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Cardiometabolic Diseases ◽

Learning Techniques ◽

Good Classification

Cardiometabolic diseases, developed throughout the worker’s life,such as hypertension, diabetes, dyslipidemia and obesity are amongthe main causes of death and are associated with modifiable andcontrollable risk factors. The general objective of this study wasto apply supervised Machine Learning techniques and to comparetheir performance to predict the risk of developing cardiometabolicdisease from servers working at the School Hospital of south inBrazil. We sought to map the characteristics of individuals who aremore likely to develop cardiometabolic diseases. The machine learningmodels evaluated were Naive Bayes, Decision Tree, RandomForest, KNN, Logistic Regression and SVM. The results obtained inthe experiments showed that some supervised machine learningmodels produce a good classification, depending on the attributesand hyperparameters used.

Download Full-text

Identifying factors associated with roadside work zone collisions using machine learning techniques

Accident Analysis & Prevention ◽

10.1016/j.aap.2021.106203 ◽

2021 ◽

Vol 158 ◽

pp. 106203

Author(s):

Amir A. Nasrollahzadeh ◽

Ardalan R. Sofi ◽

Bahram Ravani

Keyword(s):

Machine Learning ◽

Work Zone ◽

Machine Learning Techniques ◽

Factors Associated ◽

Learning Techniques

Download Full-text