scholarly journals Treatment effect prediction with adversarial deep learning using electronic health records

2020 ◽  
Vol 20 (S4) ◽  
Author(s):  
Jiebin Chu ◽  
Wei Dong ◽  
Jinliang Wang ◽  
Kunlun He ◽  
Zhengxing Huang

Abstract Background Treatment effect prediction (TEP) plays an important role in disease management by ensuring that the expected clinical outcomes are obtained after performing specialized and sophisticated treatments on patients given their personalized clinical status. In recent years, the wide adoption of electronic health records (EHRs) has provided a comprehensive data source for intelligent clinical applications including the TEP investigated in this study. Method We examined the problem of using a large volume of heterogeneous EHR data to predict treatment effects and developed an adversarial deep treatment effect prediction model to address the problem. Our model employed two auto-encoders for learning the representative and discriminative features of both patient characteristics and treatments from EHR data. The discriminative power of the learned features was further enhanced by decoding the correlational information between the patient characteristics and subsequent treatments by means of a generated adversarial learning strategy. Thereafter, a logistic regression layer was appended on the top of the resulting feature representation layer for TEP. Result The proposed model was evaluated on two real clinical datasets collected from the cardiology department of a Chinese hospital. In particular, on acute coronary syndrome (ACS) dataset, the proposed adversarial deep treatment effect prediction (ADTEP) (0.662) exhibited 1.4, 2.2, and 6.3% performance gains in terms of the area under the ROC curve (AUC) over deep treatment effect prediction (DTEP) (0.653), logistic regression (LR) (0.648), and support vector machine (SVM) (0.621), respectively. As for heart failure (HF) case study, the proposed ADTEP also outperformed all benchmarks. The experimental results demonstrated that our proposed model achieved competitive performance compared to state-of-the-art models in tackling the TEP problem. Conclusion In this work, we propose a novel model to address the TEP problem by utilizing a large volume of observational data from EHR. With adversarial learning strategy, our proposed model can further explore the correlational information between patient statuses and treatments to extract more robust and discriminative representation of patient samples from their EHR data. Such representation finally benefits the model on TEP. The experimental results of two case studies demonstrate the superiority of our proposed method compared to state-of-the-art methods.

2006 ◽  
Vol 45 (03) ◽  
pp. 240-245 ◽  
Author(s):  
A. Shabo

Summary Objectives: This paper pursues the challenge of sustaining lifetime electronic health records (EHRs) based on a comprehensive socio-economic-medico-legal model. The notion of a lifetime EHR extends the emerging concept of a longitudinal and cross-institutional EHR and is invaluable information for increasing patient safety and quality of care. Methods: The challenge is how to compile and sustain a coherent EHR across the lifetime of an individual. Several existing and hypothetical models are described, analyzed and compared in an attempt to suggest a preferred approach. Results: The vision is that lifetime EHRs should be sustained by new players in the healthcare arena, who will function as independent health record banks (IHRBs). Multiple competing IHRBs would be established and regulated following preemptive legislation. They should be neither owned by healthcare providers nor by health insurer/payers or government agencies. The new legislation should also stipulate that the records located in these banks be considered the medico-legal copies of an individual’s records, and that healthcare providers no longer serve as the legal record keepers. Conclusions: The proposed model is not centered on any of the current players in the field; instead, it is focussed on the objective service of sustaining individual EHRs, much like financial banks maintain and manage financial assets. This revolutionary structure provides two main benefits: 1) Healthcare organizations will be able to cut the costs of long-term record keeping, and 2) healthcare providers will be able to provide better care based on the availability of a lifelong EHR of their new patients.


2020 ◽  
Vol 59 (14) ◽  
pp. 1274-1281
Author(s):  
Christine B. San Giovanni ◽  
Myla Ebeling ◽  
Robert A. Davis ◽  
C. Shaun Wagner ◽  
William T. Basco

Objective. This study tested the sensitivity of obesity diagnosis in electronic health records (EHRs) using body mass index (BMI) classification and identified variables associated with obesity diagnosis. Methods. Eligible children aged 2 to 18 years had a calculable BMI in 2017 and had at least 1 visit in 2016 and 2017. Sensitivity of clinical obesity diagnosis compared with children’s BMI percentile was calculated. Logistic regression was performed to determine variables associated with obesity diagnosis. Results. Analyses included 31 059 children with BMI at or above 95th percentile. Sensitivity of clinical obesity diagnosis was 35.81%. Clinical obesity diagnosis was more likely if the child had a well visit, had Medicaid insurance, was female, Hispanic or Black, had a chronic disease diagnosis, and saw a provider in a practice in an urban area or with academic affiliation. Conclusion. Sensitivity of clinical obesity diagnosis in EHR is low. Clinical obesity diagnosis is associated with nonmodifiable child-specific factors but also modifiable practice-specific factors.


10.2196/21798 ◽  
2020 ◽  
Vol 8 (10) ◽  
pp. e21798 ◽  
Author(s):  
Feng Xie ◽  
Bibhas Chakraborty ◽  
Marcus Eng Hock Ong ◽  
Benjamin Alan Goldstein ◽  
Nan Liu

Background Risk scores can be useful in clinical risk stratification and accurate allocations of medical resources, helping health providers improve patient care. Point-based scores are more understandable and explainable than other complex models and are now widely used in clinical decision making. However, the development of the risk scoring model is nontrivial and has not yet been systematically presented, with few studies investigating methods of clinical score generation using electronic health records. Objective This study aims to propose AutoScore, a machine learning–based automatic clinical score generator consisting of 6 modules for developing interpretable point-based scores. Future users can employ the AutoScore framework to create clinical scores effortlessly in various clinical applications. Methods We proposed the AutoScore framework comprising 6 modules that included variable ranking, variable transformation, score derivation, model selection, score fine-tuning, and model evaluation. To demonstrate the performance of AutoScore, we used data from the Beth Israel Deaconess Medical Center to build a scoring model for mortality prediction and then compared the data with other baseline models using the receiver operating characteristic analysis. A software package in R 3.5.3 (R Foundation) was also developed to demonstrate the implementation of AutoScore. Results Implemented on the data set with 44,918 individual admission episodes of intensive care, the AutoScore-created scoring models performed comparably well as other standard methods (ie, logistic regression, stepwise regression, least absolute shrinkage and selection operator, and random forest) in terms of predictive accuracy and model calibration but required fewer predictors and presented high interpretability and accessibility. The nine-variable, AutoScore-created, point-based scoring model achieved an area under the curve (AUC) of 0.780 (95% CI 0.764-0.798), whereas the model of logistic regression with 24 variables had an AUC of 0.778 (95% CI 0.760-0.795). Moreover, the AutoScore framework also drives the clinical research continuum and automation with its integration of all necessary modules. Conclusions We developed an easy-to-use, machine learning–based automatic clinical score generator, AutoScore; systematically presented its structure; and demonstrated its superiority (predictive performance and interpretability) over other conventional methods using a benchmark database. AutoScore will emerge as a potential scoring tool in various medical applications.


Information ◽  
2021 ◽  
Vol 12 (9) ◽  
pp. 374
Author(s):  
Babacar Gaye ◽  
Dezheng Zhang ◽  
Aziguli Wulamu

With the extensive availability of social media platforms, Twitter has become a significant tool for the acquisition of peoples’ views, opinions, attitudes, and emotions towards certain entities. Within this frame of reference, sentiment analysis of tweets has become one of the most fascinating research areas in the field of natural language processing. A variety of techniques have been devised for sentiment analysis, but there is still room for improvement where the accuracy and efficacy of the system are concerned. This study proposes a novel approach that exploits the advantages of the lexical dictionary, machine learning, and deep learning classifiers. We classified the tweets based on the sentiments extracted by TextBlob using a stacked ensemble of three long short-term memory (LSTM) as base classifiers and logistic regression (LR) as a meta classifier. The proposed model proved to be effective and time-saving since it does not require feature extraction, as LSTM extracts features without any human intervention. We also compared our proposed approach with conventional machine learning models such as logistic regression, AdaBoost, and random forest. We also included state-of-the-art deep learning models in comparison with the proposed model. Experiments were conducted on the sentiment140 dataset and were evaluated in terms of accuracy, precision, recall, and F1 Score. Empirical results showed that our proposed approach manifested state-of-the-art results by achieving an accuracy score of 99%.


2018 ◽  
pp. 1-9 ◽  
Author(s):  
Sami-Ramzi Leyh-Bannurah ◽  
Zhe Tian ◽  
Pierre I. Karakiewicz ◽  
Ulrich Wolffgang ◽  
Guido Sauter ◽  
...  

Purpose Entering all information from narrative documentation for clinical research into databases is time consuming, costly, and nearly impossible. Even high-volume databases do not cover all patient characteristics and drawn results may be limited. A new viable automated solution is machine learning based on deep neural networks applied to natural language processing (NLP), extracting detailed information from narratively written (eg, pathologic radical prostatectomy [RP]) electronic health records (EHRs). Methods Within an RP pathologic database, 3,679 RP EHRs were randomly split into 70% training and 30% test data sets. Training EHRs were automatically annotated, providing a semiautomatically annotated corpus of narratively written pathologic reports with initially context-free gold standard encodings. Primary and secondary Gleason pattern, corresponding percentages, tumor stage, nodal stage, total volume, tumor volume and diameter, and surgical margin were variables of interest. Second, state-of-the-art NLP techniques were used to train an industry-standard language model for pathologic EHRs by transfer learning. Finally, accuracy of the named entity extractors was compared with the gold standard encodings. Results Agreement rates (95% confidence interval) for primary and secondary Gleason patterns each were 91.3% (89.4 to 93.0), corresponding to the following: Gleason percentages, 70.5% (67.6 to 73.3) and 80.9% (78.4 to 83.3); tumor stage, 99.3% (98.6 to 99.7); nodal stage, 98.7% (97.8 to 99.3); total volume, 98.3% (97.3 to 99.0); tumor volume, 93.3% (91.6 to 94.8); maximum diameter, 96.3% (94.9 to 97.3); and surgical margin, 98.7% (97.8 to 99.3). Cumulative agreement was 91.3%. Conclusion Our proposed NLP pipeline offers new abilities for precise and efficient data management from narrative documentation for clinical research. The scalable approach potentially allows the NLP pipeline to be generalized to other genitourinary EHRs, tumor entities, and other medical disciplines.


2020 ◽  
Vol 16 (2) ◽  
pp. 1-18
Author(s):  
Ali Odeh Aljaafreh

This study empirically examines the system satisfaction of employees from the Ministry of Health in Jordan toward the enhancement of the electronic health records (EHR) named HAKEEM. The proposed model has assimilated factors from the enriched end-user computer satisfaction (EUCS) model along with self-efficacy as a new predictor. The participants were 463 respondents distributed in public hospitals through all the country of Jordan. The data were collected by means of a self-administered survey and analyzed using SEM technique. The findings revealed that EUCS is significantly and positively affected by information quality, system quality, and self-efficacy. The study is also looking forward to providing empirical results and applicable recommendations for the Ministry of Health and HAKEEM provider in order to enhance and maximize the benefit of such EHR.


2019 ◽  
Vol 26 (4) ◽  
pp. 364-379 ◽  
Author(s):  
Theresa A Koleck ◽  
Caitlin Dreisbach ◽  
Philip E Bourne ◽  
Suzanne Bakken

Abstract Objective Natural language processing (NLP) of symptoms from electronic health records (EHRs) could contribute to the advancement of symptom science. We aim to synthesize the literature on the use of NLP to process or analyze symptom information documented in EHR free-text narratives. Materials and Methods Our search of 1964 records from PubMed and EMBASE was narrowed to 27 eligible articles. Data related to the purpose, free-text corpus, patients, symptoms, NLP methodology, evaluation metrics, and quality indicators were extracted for each study. Results Symptom-related information was presented as a primary outcome in 14 studies. EHR narratives represented various inpatient and outpatient clinical specialties, with general, cardiology, and mental health occurring most frequently. Studies encompassed a wide variety of symptoms, including shortness of breath, pain, nausea, dizziness, disturbed sleep, constipation, and depressed mood. NLP approaches included previously developed NLP tools, classification methods, and manually curated rule-based processing. Only one-third (n = 9) of studies reported patient demographic characteristics. Discussion NLP is used to extract information from EHR free-text narratives written by a variety of healthcare providers on an expansive range of symptoms across diverse clinical specialties. The current focus of this field is on the development of methods to extract symptom information and the use of symptom information for disease classification tasks rather than the examination of symptoms themselves. Conclusion Future NLP studies should concentrate on the investigation of symptoms and symptom documentation in EHR free-text narratives. Efforts should be undertaken to examine patient characteristics and make symptom-related NLP algorithms or pipelines and vocabularies openly available.


2020 ◽  
Vol 36 (Supplement_1) ◽  
pp. i436-i444 ◽  
Author(s):  
Mengshi Zhou ◽  
Chunlei Zheng ◽  
Rong Xu

Abstract Motivation Predicting drug–target interactions (DTIs) using human phenotypic data have the potential in eliminating the translational gap between animal experiments and clinical outcomes in humans. One challenge in human phenome-driven DTI predictions is integrating and modeling diverse drug and disease phenotypic relationships. Leveraging large amounts of clinical observed phenotypes of drugs and diseases and electronic health records (EHRs) of 72 million patients, we developed a novel integrated computational drug discovery approach by seamlessly combining DTI prediction and clinical corroboration. Results We developed a network-based DTI prediction system (TargetPredict) by modeling 855 904 phenotypic and genetic relationships among 1430 drugs, 4251 side effects, 1059 diseases and 17 860 genes. We systematically evaluated TargetPredict in de novo cross-validation and compared it to a state-of-the-art phenome-driven DTI prediction approach. We applied TargetPredict in identifying novel repositioned candidate drugs for Alzheimer’s disease (AD), a disease affecting over 5.8 million people in the United States. We evaluated the clinical efficiency of top repositioned drug candidates using EHRs of over 72 million patients. The area under the receiver operating characteristic (ROC) curve was 0.97 in the de novo cross-validation when evaluated using 910 drugs. TargetPredict outperformed a state-of-the-art phenome-driven DTI prediction system as measured by precision–recall curves [measured by average precision (MAP): 0.28 versus 0.23, P-value < 0.0001]. The EHR-based case–control studies identified that the prescriptions top-ranked repositioned drugs are significantly associated with lower odds of AD diagnosis. For example, we showed that the prescription of liraglutide, a type 2 diabetes drug, is significantly associated with decreased risk of AD diagnosis [adjusted odds ratios (AORs): 0.76; 95% confidence intervals (CI) (0.70, 0.82), P-value < 0.0001]. In summary, our integrated approach that seamlessly combines computational DTI prediction and large-scale patients’ EHRs-based clinical corroboration has high potential in rapidly identifying novel drug targets and drug candidates for complex diseases. Availability and implementation nlp.case.edu/public/data/TargetPredict.


Sign in / Sign up

Export Citation Format

Share Document