Development of deep learning models for predicting in-hospital mortality using an administrative claims database (Preprint)

Background: Current guidelines suggest that low risk pulmonary embolism (PE) patients may be managed as outpatients or with an abbreviated hospital stay. There is need for a claims-based prediction rule that payers and hospitals can use to efficiently risk stratify PE patients. The authors recently derived a rule found to have high sensitivity and moderate specificity for predicting in-hospital mortality. Objective: To validate the In-hospital Mortality for PulmonAry embolism using Claims daTa (IMPACT) prediction rule originally developed in a commercial claims database in an all-payer administrative database restricted to inpatient claims. Methods: This study utilized data from the 2012 Healthcare Cost and Utilization Project Nationwide Inpatient Sample (NIS). Adult PE admissions were identified by the presence of an appropriate International Classification of Diseases, ninth edition, Clinical Modification (ICD-9-CM) code either in the primary position or secondary position when accompanied by a primary code for a PE complication. The IMPACT rule, consists of age + 11 weighted comorbidities calculated based upon the maximum of 25 ICD-9-CM diagnosis codes and 25 procedural codes reported for each discharge in the NIS (myocardial infarction, chronic lung disease, stroke, prior major bleeding, atrial fibrillation, cognitive impairment, heart failure, renal failure, liver disease, coagulopathy, cancer), and was used to estimate patients' risk of in-hospital mortality. Low risk was defined as in-hospital mortality ≤1.5%. We present the validity of the rule by calculating prognostic test characteristics and 95% confidence intervals (CIs). In order to estimate the potential cost savings from an early discharge, we calculated the difference in total hospital costs between low-risk patients having and not having an abbreviated hospital stay (defined as ≤1, ≤2 or ≤3 days). Results: A total of 34,108 admissions for PE were included (46.7% male, mean ± standard deviation age of 61.9±17.2); and we observed a 3.4% in-hospital PE case-fatality rate. The IMPACT prediction rule classified 11,025 (32.3%) patient admissions as low-risk; and had a sensitivity of 92.4% (95%CI=90.7-93.8), specificity of 33.2% (95%CI=32.7-33.7), negative and positive predictive values of 99.2% (95%CI=99.0-99.4) and 4.6% (95%CI=4.4-4.9) and a C-statistic of 0.74 (95%CI=0.73-0.76) for in-hospital mortality. Low-risk patients had significantly lower in-hospital mortality (0.8% vs. 4.6%, odds reduction of 83%; 95%CI=79-87), shorter LOSs (-1.2 days, p<0.001) and lower total treatment costs (-$3,074, p<0.001) than patients classified as higher-risk. Of low-risk patients, 13.1%, 31.1% and 47.7% were discharged within 1, 2 and 3 days of admission. Low-risk patients discharged within 1 day accrued $5,465 (95%CI=$5,018-$5,911) less in treatment costs than those staying longer. Discharge within 2 or 3 days in low-risk patients was also associated with a reduced cost of hospital treatment [$5,820 (95%CI=$5,506-$6,133) and $6,314 (95%CI=$6,031-$6,597), respectively] when compared to those staying longer. Conclusion: The prior claims-based in-hospital mortality prediction rule was valid when used in this all-payer, inpatient only administrative claims database. The rule classified patients' mortality risk with high sensitivity and had a high negative predictive value; and consequently, may be valuable to those wishing to benchmark rates of PE treated at home or following an abbreviated hospital admission. Disclosures Coleman: Janssen Scientific Affairs, LLC: Consultancy, Research Funding. Crivera:Janssen Scientific Affairs, LLC: Employment, Equity Ownership. Schein:Janssen Scientific Affairs, LLC: Employment. Peacock:Singulex: Consultancy; Prevencio: Consultancy; The Medicines Company: Consultancy, Research Funding; Roche: Consultancy, Research Funding; Portola: Consultancy, Research Funding; Janssen Pharmaceuticals: Consultancy, Research Funding; Cardiorentis: Research Funding; Banyan: Research Funding; Alere: Research Funding; Abbott: Research Funding; Comprehensive Research Associates, LLC: Equity Ownership; Emergencies in Medicine, LLC: Equity Ownership.

Download Full-text

Validation of the multivariable In-hospital Mortality for PulmonAry embolism using Claims daTa (IMPACT) prediction rule within an all-payer inpatient administrative claims database

BMJ Open ◽

10.1136/bmjopen-2015-009251 ◽

2015 ◽

Vol 5 (10) ◽

pp. e009251 ◽

Cited By ~ 12

Author(s):

Craig I Coleman ◽

Christine G Kohn ◽

Concetta Crivera ◽

Jeffrey R Schein ◽

W Frank Peacock

Keyword(s):

Pulmonary Embolism ◽

Hospital Mortality ◽

Claims Data ◽

Prediction Rule ◽

Administrative Claims ◽

Claims Database ◽

Impact Prediction

Download Full-text

Development of Deep Learning Models for Predicting in-Hospital Mortality using an Administrative Claims Database

10.21203/rs.3.rs-176518/v1 ◽

2021 ◽

Author(s):

Hiroki Matsui ◽

Hayato Yamana ◽

Kiyohide Fushimi ◽

Hideo Yasunaga

Keyword(s):

Deep Learning ◽

Hospital Mortality ◽

Prediction Models ◽

Calibration Plot ◽

Administrative Claims ◽

Operating Characteristics ◽

Discrimination Ability ◽

Fully Connected ◽

Main Model ◽

Disease Specific

Abstract Background: To develop and validate deep learning–based prediction models for in-hospital mortality of acute-care patients. Methods: The main model was developed using only administrative claims data (age, sex, diagnoses, and procedures on the day of admission). We also constructed disease-specific models for acute myocardial infarction, heart failure, stroke, or pneumonia using common severity indices for these diseases. Using the Japanese Diagnosis Procedure Combination data from July 2010 to March 2017, we identified 46,665,933 inpatients and divided them into derivation and validation cohorts in a ratio of 95:5. The main model was developed using a 9-layer deep neural network with four hidden dense layers that had 1000 nodes and were fully connected to adjacent layers. We evaluated model discrimination ability by an area under the receiver operating characteristics curve and calibration ability by calibration plot. Results: Among the eligible patients, 2,005,035 (4.3%) died. Discrimination and calibration of the models were satisfactory. The AUC of the main model in the validation cohort was 0.954 (95% confidential interval 0.9537–0.9547). The main model had higher discrimination ability than the disease-specific models. Conclusions: Our deep learning-based model using diagnoses and procedures produced valid predictions of in-house mortality.

Download Full-text

Development of deep learning models for predicting in-hospital mortality using an administrative claims database (Preprint)

10.2196/preprints.27936 ◽

2021 ◽

Author(s):

Hiroki Matsui ◽

Hayato Yamana ◽

Kiyohide Fushimi ◽

Hideo Yasunaga

Keyword(s):

Deep Learning ◽

Hospital Mortality ◽

Prediction Models ◽

Administrative Databases ◽

Calibration Plot ◽

Administrative Claims ◽

Operating Characteristics ◽

Discrimination Ability ◽

Main Model ◽

Disease Specific

BACKGROUND Administrative claims databases have been used widely in studies because they have large sample sizes and are easily available. However, studies using administrative databases lack the severity of the disease, so a risk adjustment method needs to be developed. OBJECTIVE To develop and validate deep learning–based prediction models for in-hospital mortality of acute-care patients. METHODS The main model was developed using only administrative claims data (age, sex, diagnoses, and procedures on the day of admission). We also constructed disease-specific models for acute myocardial infarction, heart failure, stroke, or pneumonia using common severity indices for these diseases. Using the Japanese Diagnosis Procedure Combination data from July 2010 to March 2017, we identified 46,665,933 inpatients and divided them into derivation and validation cohorts in a ratio of 95:5. The main model was developed using a 9-layer deep neural network with four hidden dense layers that had 1000 nodes and were fully connected to adjacent layers. We evaluated model discrimination ability by an area under the receiver operating characteristics curve and calibration ability by calibration plot. RESULTS Among the eligible patients, 2,005,035 (4.3%) died. Discrimination and calibration of the models were satisfactory. The AUC of the main model in the validation cohort was 0.954 (95% confidential interval 0.9537–0.9547). The main model had higher discrimination ability than the disease-specific models. CONCLUSIONS Our deep learning-based model using diagnoses and procedures produced valid predictions of in-hospital mortality. CLINICALTRIAL

Download Full-text

Levenshtein Augmentation Improves Performance of SMILES Based Deep-Learning Synthesis Prediction

10.26434/chemrxiv.12562121 ◽

2020 ◽

Author(s):

Dean Sumner ◽

Jiazhen He ◽

Amol Thakkar ◽

Ola Engkvist ◽

Esben Jannik Bjerrum

Keyword(s):

Neural Networks ◽

Pattern Recognition ◽

Deep Learning ◽

Recurrent Neural Networks ◽

Data Augmentation ◽

State Of The Art ◽

Sequence Similarity ◽

Learning Models ◽

Underlying Network

SMILES randomization, a form of data augmentation, has previously been shown to increase the performance of deep learning models compared to non-augmented baselines. Here, we propose a novel data augmentation method we call “Levenshtein augmentation” which considers local SMILES sub-sequence similarity between reactants and their respective products when creating training pairs. The performance of Levenshtein augmentation was tested using two state of the art models - transformer and sequence-to-sequence based recurrent neural networks with attention. Levenshtein augmentation demonstrated an increase performance over non-augmented, and conventionally SMILES randomization augmented data when used for training of baseline models. Furthermore, Levenshtein augmentation seemingly results in what we define as attentional gain – an enhancement in the pattern recognition capabilities of the underlying network to molecular motifs.

Download Full-text

Improving the Accuracy of Protein-Ligand Binding Affinity Prediction by Deep Learning Models: Benchmark and Model

10.26434/chemrxiv.9866912 ◽

2019 ◽

Author(s):

Mohammad Rezaei ◽

Yanjun Li ◽

Xiaolin Li ◽

Chenglong Li

Keyword(s):

Deep Learning ◽

Drug Design ◽

Binding Affinity ◽

Benchmark Dataset ◽

Rational Drug Design ◽

Learning Models ◽

Structure Based Drug Design ◽

Binding Affinity Prediction ◽

Affinity Prediction ◽

Rational Drug

Introduction: The ability to discriminate among ligands binding to the same protein target in terms of their relative binding affinity lies at the heart of structure-based drug design. Any improvement in the accuracy and reliability of binding affinity prediction methods decreases the discrepancy between experimental and computational results. Objectives: The primary objectives were to find the most relevant features affecting binding affinity prediction, least use of manual feature engineering, and improving the reliability of binding affinity prediction using efficient deep learning models by tuning the model hyperparameters. Methods: The binding site of target proteins was represented as a grid box around their bound ligand. Both binary and distance-dependent occupancies were examined for how an atom affects its neighbor voxels in this grid. A combination of different features including ANOLEA, ligand elements, and Arpeggio atom types were used to represent the input. An efficient convolutional neural network (CNN) architecture, DeepAtom, was developed, trained and tested on the PDBbind v2016 dataset. Additionally an extended benchmark dataset was compiled to train and evaluate the models. Results: The best DeepAtom model showed an improved accuracy in the binding affinity prediction on PDBbind core subset (Pearson’s R=0.83) and is better than the recent state-of-the-art models in this field. In addition when the DeepAtom model was trained on our proposed benchmark dataset, it yields higher correlation compared to the baseline which confirms the value of our model. Conclusions: The promising results for the predicted binding affinities is expected to pave the way for embedding deep learning models in virtual screening and rational drug design fields.

Download Full-text

Data science in economics: comprehensive review of advanced machine learning and deep learning methods

10.31232/osf.io/4pxq2 ◽

2020 ◽

Author(s):

Saeed Nosratabadi ◽

Amir Mosavi ◽

Puhong Duan ◽

Pedram Ghamisi ◽

Ferdinand Filip ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Data Science ◽

State Of The Art ◽

Science Methods ◽

Learning Models ◽

Diverse Range ◽

Hybrid Machine ◽

Economics Research

This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse range of economics research from the stock market, marketing, and e-commerce to corporate banking and cryptocurrency. Prisma method, a systematic literature review methodology, was used to ensure the quality of the survey. The findings reveal that the trends follow the advancement of hybrid models, which, based on the accuracy metric, outperform other learning algorithms. It is further expected that the trends will converge toward the advancements of sophisticated hybrid deep learning models.

Download Full-text

A Study on the Auxiliary Diagnosis of Thyroid Disease Images Based on Multiple Dimensional Deep Learning Algorithms

Current Medical Imaging Formerly Current Medical Imaging Reviews ◽

10.2174/1573405615666190115155223 ◽

2020 ◽

Vol 16 (3) ◽

pp. 199-205

Author(s):

Yuejun Liu ◽

Yifei Xu ◽

Xiangzheng Meng ◽

Xuguang Wang ◽

Tianxu Bai

Keyword(s):

Deep Learning ◽

Learning Algorithms ◽

Region Of Interest ◽

Classification Performance ◽

Thyroid Diseases ◽

Great Success ◽

Learning Models ◽

Good Classification Performance ◽

Spect Images

Background: Medical imaging plays an important role in the diagnosis of thyroid diseases. In the field of machine learning, multiple dimensional deep learning algorithms are widely used in image classification and recognition, and have achieved great success. Objective: The method based on multiple dimensional deep learning is employed for the auxiliary diagnosis of thyroid diseases based on SPECT images. The performances of different deep learning models are evaluated and compared. Methods: Thyroid SPECT images are collected with three types, they are hyperthyroidism, normal and hypothyroidism. In the pre-processing, the region of interest of thyroid is segmented and the amount of data sample is expanded. Four CNN models, including CNN, Inception, VGG16 and RNN, are used to evaluate deep learning methods. Results: Deep learning based methods have good classification performance, the accuracy is 92.9%-96.2%, AUC is 97.8%-99.6%. VGG16 model has the best performance, the accuracy is 96.2% and AUC is 99.6%. Especially, the VGG16 model with a changing learning rate works best. Conclusion: The standard CNN, Inception, VGG16, and RNN four deep learning models are efficient for the classification of thyroid diseases with SPECT images. The accuracy of the assisted diagnostic method based on deep learning is higher than that of other methods reported in the literature.

Download Full-text

Deep Learning in Disease Diagnosis: Models and Datasets

Current Bioinformatics ◽

10.2174/1574893615999201002124021 ◽

2020 ◽

Vol 15 ◽

Author(s):

Deeksha Saxena ◽

Mohammed Haris Siddiqui ◽

Rajnish Kumar

Keyword(s):

Biological Sciences ◽

Machine Learning ◽

Deep Learning ◽

Disease Diagnosis ◽

Learning Models ◽

Data Types ◽

Related Data ◽

Abstract Level ◽

Experimental Validations ◽

Selection Of

Background: Deep learning (DL) is an Artificial neural network-driven framework with multiple levels of representation for which non-linear modules combined in such a way that the levels of representation can be enhanced from lower to a much abstract level. Though DL is used widely in almost every field, it has largely brought a breakthrough in biological sciences as it is used in disease diagnosis and clinical trials. DL can be clubbed with machine learning, but at times both are used individually as well. DL seems to be a better platform than machine learning as the former does not require an intermediate feature extraction and works well with larger datasets. DL is one of the most discussed fields among the scientists and researchers these days for diagnosing and solving various biological problems. However, deep learning models need some improvisation and experimental validations to be more productive. Objective: To review the available DL models and datasets that are used in disease diagnosis. Methods: Available DL models and their applications in disease diagnosis were reviewed discussed and tabulated. Types of datasets and some of the popular disease related data sources for DL were highlighted. Results: We have analyzed the frequently used DL methods, data types and discussed some of the recent deep learning models used for solving different biological problems. Conclusion: The review presents useful insights about DL methods, data types, selection of DL models for the disease diagnosis.

Download Full-text