scholarly journals Predictive Modeling of Psychiatric Illness using Electronic Health Records and a Novel Machine Learning Approach with Artificial Intelligence

2020 ◽  
Author(s):  
Matthew David Nemesure ◽  
Michael Heinz ◽  
Raphael Huang ◽  
Nicholas C. Jacobson

Background: Generalized anxiety disorder (GAD) and major depressive disorder (MDD) are highly prevalent and impairing problems, but frequently go undetected, leading to substantial treatment delays. Electronic medical records (EMRs) collect a great deal of biometric markers and patient characteristics that could foster the detection of GAD and MDD in primary care settings. Methods: We approach the problem of predicting MDD and GAD using a novel machine learning pipeline. The pipeline constitutes an ensemble of algorithmically distinct machine learning methods, including deep learning. A sample of 4,184 undergraduate students completed the study, undergoing a general health screening and completing a psychiatric assessment for MDD and GAD. Using 59 biomedical and demographic features from the general health survey and an additional set of engineered features, we trained the model to predict GAD and MDD. Results: We assessed the model's performance on a held-out test set and found an AUC of 0.72 and 0.66 for GAD, and MDD, respectively. Additionally, we used advanced techniques (Shapley values) to illuminate which features had the greatest impact on prediction for each disease. The top predictive features for MDD were “difficulty memorizing lessons”, “financial difficulties” and “alcohol consumption”. The top predictive features for GAD were the necessity for a control examination, being overweight/obese, and irregular meal consumption. Conclusions: Our results indicate a successful application of machine learning methods in detection of GAD and MDD based on EMR-like data. By identifying biomarkers of GAD and MDD, these results may be used in future research to aid in the early detection of MDD and GAD.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Matthew D. Nemesure ◽  
Michael V. Heinz ◽  
Raphael Huang ◽  
Nicholas C. Jacobson

AbstractGeneralized anxiety disorder (GAD) and major depressive disorder (MDD) are highly prevalent and impairing problems, but frequently go undetected, leading to substantial treatment delays. Electronic health records (EHRs) collect a great deal of biometric markers and patient characteristics that could foster the detection of GAD and MDD in primary care settings. We approached the problem of predicting MDD and GAD using a novel machine learning pipeline to re-analyze data from an observational study. The pipeline constitutes an ensemble of algorithmically distinct machine learning methods, including deep learning. A sample of 4,184 undergraduate students completed the study, undergoing a general health screening and completing a psychiatric assessment for MDD and GAD. After explicitly excluding all psychiatric information, 59 biomedical and demographic features from the general health survey in addition to a set of engineered features were used for model training. We assessed the model's performance on a held-out test set and found an AUC of 0.73 (sensitivity: 0.66, specificity: 0.7) and 0.67 (sensitivity: 0.55, specificity: 0.7) for GAD, and MDD, respectively. Additionally, we used advanced techniques (SHAP values) to illuminate which features had the greatest impact on prediction for each disease. The top predictive features for MDD were being satisfied with living conditions and having public health insurance. The top predictive features for GAD were vaccinations being up to date and marijuana use. Our results indicate moderate predictive performance for the application of machine learning methods in detection of GAD and MDD based on EHR data. By identifying important predictors of GAD and MDD, these results may be used in future research to aid in the early detection of MDD and GAD.


Author(s):  
Paul van Gent ◽  
Timo Melman ◽  
Haneen Farah ◽  
Nicole van Nes ◽  
Bart van Arem

The present study aims to add to the literature on driver workload prediction using machine learning methods. The main aim is to develop workload prediction on a multi-level basis, rather than a binary high/low distinction as often found in literature. The presented approach relies on measures that can be obtained unobtrusively in the driving environment with off-the-shelf sensors, and on machine learning methods that can be implemented in low-power embedded systems. Two simulator studies were performed, one inducing workload using realistic driving conditions, and one inducing workload with a relatively demanding lane-keeping task. Individual and group-based machine learning models were trained on both datasets and evaluated. For the group-based models the generalizing capability, that is the performance when predicting data from previously unseen individuals, was also assessed. Results show that multi-level workload prediction on the individual and group level works well, achieving high correct rates and accuracy scores. Generalizing between individuals proved difficult using realistic driving conditions but worked well in the highly demanding lane-keeping task. Reasons for this discrepancy are discussed as well as future research directions.


2021 ◽  
Vol 21 ◽  
Author(s):  
Han Yu ◽  
Zi-Ang Shen ◽  
Yuan-Ke Zhou ◽  
Pu-Feng Du

: Long non-coding RNAs (LncRNAs) are a type of RNA with little or no protein-coding ability. Their length is more than 200 nucleotides. A large number of studies have indicated that lncRNAs play a significant role in various biological processes, including chromatin organizations, epigenetic programmings, transcriptional regulations, post-transcriptional processing, and circadian mechanism at the cellular level. Since lncRNAs perform vast functions through their interactions with proteins, identifying lncRNA-protein interaction is crucial to the understandings of the lncRNA molecular functions. However, due to the high cost and time-consuming disadvantage of experimental methods, a variety of computational methods have emerged. Recently, many effective and novel machine learning methods have been developed. In general, these methods fall into two categories: semi-supervised learning methods and supervised learning methods. The latter category can be further classified into the deep learning-based method, the ensemble learning-based method, and the hybrid method. In this paper, we focused on supervised learning methods. We summarized the state-of-the-art methods in predicting lncRNA-protein interactions. Furthermore, the performance and the characteristics of different methods have also been compared in this work. Considering the limits of the existing models, we analyzed the problems and discussed future research potentials.


2007 ◽  
Vol 33 (3) ◽  
pp. 397-427 ◽  
Author(s):  
Raquel Fernández ◽  
Jonathan Ginzburg ◽  
Shalom Lappin

In this article we use well-known machine learning methods to tackle a novel task, namely the classification of non-sentential utterances (NSUs) in dialogue. We introduce a fine-grained taxonomy of NSU classes based on corpus work, and then report on the results of several machine learning experiments. First, we present a pilot study focused on one of the NSU classes in the taxonomy—bare wh-phrases or “sluices”—and explore the task of disambiguating between the different readings that sluices can convey. We then extend the approach to classify the full range of NSU classes, obtaining results of around an 87% weighted F-score. Thus our experiments show that, for the taxonomy adopted, the task of identifying the right NSU class can be successfully learned, and hence provide a very encouraging basis for the more general enterprise of fully processing NSUs.


2008 ◽  
Vol 17 (2) ◽  
pp. 121-142 ◽  
Author(s):  
Guido Heumer ◽  
Heni Ben Amor ◽  
Bernhard Jung

This paper presents a comparison of various machine learning methods applied to the problem of recognizing grasp types involved in object manipulations performed with a data glove. Conventional wisdom holds that data gloves need calibration in order to obtain accurate results. However, calibration is a time-consuming process, inherently user-specific, and its results are often not perfect. In contrast, the present study aims at evaluating recognition methods that do not require prior calibration of the data glove. Instead, raw sensor readings are used as input features that are directly mapped to different categories of hand shapes. An experiment was carried out in which test persons wearing a data glove had to grasp physical objects of different shapes corresponding to the various grasp types of the Schlesinger taxonomy. The collected data was comprehensively analyzed using numerous classification techniques provided in an open-source machine learning toolbox. Evaluated machine learning methods are composed of (a) 38 classifiers including different types of function learners, decision trees, rule-based learners, Bayes nets, and lazy learners; (b) data preprocessing using principal component analysis (PCA) with varying degrees of dimensionality reduction; and (c) five meta-learning algorithms under various configurations where selection of suitable base classifier combinations was informed by the results of the foregoing classifier evaluation. Classification performance was analyzed in six different settings, representing various application scenarios with differing generalization demands. The results of this work are twofold: (1) We show that a reasonably good to highly reliable recognition of grasp types can be achieved—depending on whether or not the glove user is among those training the classifier—even with uncalibrated data gloves. (2) We identify the best performing classification methods for the recognition of various grasp types. To conclude, cumbersome calibration processes before productive usage of data gloves can be spared in many situations.


2020 ◽  
Author(s):  
Toni Lange ◽  
Guido Schwarzer ◽  
Thomas Datzmann ◽  
Harald Binder

AbstractBackgroundUpdating systematic reviews is often a time-consuming process involving a lot of human effort and is therefore not carried out as often as it should be. Our aim was therefore to explore the potential of machine learning methods to reduce the human workload, and to particularly also gauge the performance of deep learning methods as compared to more established machine learning methods.MethodsWe used three available reviews of diagnostic test studies as data basis. In order to identify relevant publications we used typical text pre-processing methods. The reference standard for the evaluation was the human-consensus based binary classification (inclusion, exclusion). For the evaluation of models various scenarios were generated using a grid of combinations of data preprocessing steps. Furthermore, we evaluated each machine learning approach with an approach-specific predefined grid of tuning parameters using the Brier score metric.ResultsThe best performance was obtained with an ensemble method for two of the reviews, and by a deep learning approach for the other review. Yet, the final performance of approaches is seen to strongly depend on data preparation. Overall, machine learning methods provided reasonable classification.ConclusionIt seems possible to reduce the human workload in updating systematic reviews by using machine learning methods. Yet, as the influence of data preprocessing on the final performance seems to be at least as important as choosing the specific machine learning approach, users should not blindly expect good performance just by using approaches from a popular class, such as deep learning.


Author(s):  
Andrius Daranda ◽  
Gintautas Dzemyda

Machine learning is compelling in solving various applied problems. Nevertheless, machine learning methods lack the contextual reasoning capabilities and cannot be fitted to utilize additional information about circumstances, environments, backgrounds, etc. Such information provides essential knowledge about possible reasons for particular actions. This knowledge could not be processed directly by either machine learning methods. This paper presents the context-aware machine learning approach for actor behavior contextual reasoning analysis and context-based prediction for threat assessment. Moreover, the proposed approach uses context-aware prediction to tackle the interaction between actors. An idea of the technique lies in the cooperative use of two classification methods when one way predicts an actor’s behavior. The second method discloses such predicted action (behavior) that is non-typical or unusual. Such integration of two-method allows the actor to make the self-awareness threat assessment based on relations between different actors where some multidimensional numerical data define the connections. This approach predicts the possible further situation and makes its threat assessment without any waiting for future actions. The suggested approach is based on the Decision Tree and Support Vector Method algorithm. Due to the complexity of context, marine traffic data was chosen to demonstrate the proposed approach capability. This technique could deal with the end-to-end approach for safe vessel navigation in maritime traffic with considerable ship congestion.


2021 ◽  
Author(s):  
Richard Frankel ◽  
Jared Jennings ◽  
Joshua Lee

We compare the ability of dictionary-based and machine-learning methods to capture disclosure sentiment at 10-K filing and conference-call dates. Like Loughran and McDonald [Loughran T, McDonald B (2011) When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. J. Finance 66(1):35–65.], we use returns to assess sentiment. We find that measures based on machine learning offer a significant improvement in explanatory power over dictionary-based measures. Specifically, machine-learning measures explain returns at 10-K filing dates, whereas measures based on the Loughran and McDonald dictionary only explain returns at 10-K filing dates during the time period of their study. Moreover, at conference-call dates, machine-learning methods offer an improvement over the Loughran and McDonald dictionary method of a greater magnitude than the improvement of the Loughran and McDonald dictionary over the Harvard Psychosociological Dictionary. We further find that the random-forest-regression-tree method better captures disclosure sentiment than alternative algorithms, simplifying the application of the machine-learning approach. Overall, our results suggest that machine-learning methods offer an easily implementable, more powerful, and reliable measure of disclosure sentiment than dictionary-based methods. This paper was accepted by Brian Bushee, management science.


2021 ◽  
Vol 23 ◽  
Author(s):  
Xiong Li ◽  
Yangping Qiu ◽  
Juan Zhou ◽  
Ziruo Xie

Background: Recent development in neuroimaging and genetic testing technologies have made it possible to measure pathological features associated with Alzheimer's disease (AD) in vivo. Mining potential molecular markers of AD from high-dimensional, multi-modal neuroimaging and omics data will provide a new basis for early diagnosis and intervention in AD. In order to discover the real pathogenic mutation and even understand the pathogenic mechanism of AD, lots of machine learning methods have been designed and successfully applied to the analysis and processing of large-scale AD biomedical data. Objective: To introduce and summarize the applications and challenges of machine learning methods in Alzheimer's disease multi-source data analysis. Methods: The literature selected in the review is obtained from Google Scholar, PubMed, and Web of Science. The keywords of literature retrieval include Alzheimer's disease, bioinformatics, image genetics, genome-wide association research, molecular interaction network, multi-omics data integration, and so on. Conclusion: This study comprehensively introduces machine learning-based processing techniques for AD neuroimaging data and then shows the progress of computational analysis methods in omics data, such as the genome, proteome, and so on. Subsequently, machine learning methods for AD imaging analysis are also summarized. Finally, we elaborate on the current emerging technology of multi-modal neuroimaging, multi-omics data joint analysis, and present some outstanding issues and future research directions.


Sign in / Sign up

Export Citation Format

Share Document