Development of machine learning model for diagnostic disease prediction based on laboratory tests

AbstractThe use of deep learning and machine learning (ML) in medical science is increasing, particularly in the visual, audio, and language data fields. We aimed to build a new optimized ensemble model by blending a DNN (deep neural network) model with two ML models for disease prediction using laboratory test results. 86 attributes (laboratory tests) were selected from datasets based on value counts, clinical importance-related features, and missing values. We collected sample datasets on 5145 cases, including 326,686 laboratory test results. We investigated a total of 39 specific diseases based on the International Classification of Diseases, 10th revision (ICD-10) codes. These datasets were used to construct light gradient boosting machine (LightGBM) and extreme gradient boosting (XGBoost) ML models and a DNN model using TensorFlow. The optimized ensemble model achieved an F1-score of 81% and prediction accuracy of 92% for the five most common diseases. The deep learning and ML models showed differences in predictive power and disease classification patterns. We used a confusion matrix and analyzed feature importance using the SHAP value method. Our new ML model achieved high efficiency of disease prediction through classification of diseases. This study will be useful in the prediction and diagnosis of diseases.

Download Full-text

Classification of hazelnut cultivars: comparison of DL4J and ensemble learning algorithms

Notulae Botanicae Horti Agrobotanici Cluj-Napoca ◽

10.15835/nbha48412041 ◽

2020 ◽

Vol 48 (4) ◽

pp. 2316-2327

Author(s):

Caner KOC ◽

Dilara GERDAN ◽

Maksut B. EMİNOĞLU ◽

Uğur YEGÜL ◽

Bulent KOC ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Random Forest ◽

Ensemble Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Performance Criteria ◽

Gradient Boosting ◽

Data Set

Classification of hazelnuts is one of the values adding processes that increase the marketability and profitability of its production. While traditional classification methods are used commonly, machine learning and deep learning can be implemented to enhance the hazelnut classification processes. This paper presents the results of a comparative study of machine learning frameworks to classify hazelnut (Corylus avellana L.) cultivars (‘Sivri’, ‘Kara’, ‘Tombul’) using DL4J and ensemble learning algorithms. For each cultivar, 50 samples were used for evaluations. Maximum length, width, compression strength, and weight of hazelnuts were measured using a caliper and a force transducer. Gradient boosting machine (Boosting), random forest (Bagging), and DL4J feedforward (Deep Learning) algorithms were applied in traditional machine learning algorithms. The data set was partitioned into a 10-fold-cross validation method. The classifier performance criteria of accuracy (%), error percentage (%), F-Measure, Cohen’s Kappa, recall, precision, true positive (TP), false positive (FP), true negative (TN), false negative (FN) values are provided in the results section. The results showed classification accuracies of 94% for Gradient Boosting, 100% for Random Forest, and 94% for DL4J Feedforward algorithms.

Download Full-text

An Ensemble of Random Forest Gradient Boosting Machine and Deep Learning Methods for Stock Price Prediction

Journal of Information Technology Research ◽

10.4018/jitr.2022010102 ◽

2022 ◽

Vol 15 (1) ◽

pp. 1-19

Author(s):

Ravinder Kumar ◽

Lokesh Kumar Shrivastav

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Random Forest ◽

Stock Market ◽

Data Analytics ◽

Gradient Boosting ◽

Ensemble Model ◽

Market Data ◽

Stock Price Prediction ◽

Gradient Boosting Machine

Stochastic time series analysis of high-frequency stock market data is a very challenging task for the analysts due to the lack availability of efficient tool and techniques for big data analytics. This has opened the door of opportunities for the developer and researcher to develop intelligent and machine learning based tools and techniques for data analytics. This paper proposed an ensemble for stock market data prediction using three most prominent machine learning based techniques. The stock market dataset with raw data size of 39364 KB with all attributes and processed data size of 11826 KB having 872435 instances. The proposed work implements an ensemble model comprises of Deep Learning, Gradient Boosting Machine (GBM) and distributed Random Forest techniques of data analytics. The performance results of the ensemble model are compared with each of the individual methods i.e. deep learning, Gradient Boosting Machine (GBM) and Random Forest. The ensemble model performs better and achieves the highest accuracy of 0.99 and lowest error (RMSE) of 0.1.

Download Full-text

An Ensemble Model for Short-Term Wind Power Forecasting using Deep Learning and Gradient Boosting Algorithms

2020 21st National Power Systems Conference (NPSC) ◽

10.1109/npsc49263.2020.9331902 ◽

2020 ◽

Author(s):

Devesh Kumar ◽

Rishabh Abhinav ◽

Naran Pindoriya

Keyword(s):

Deep Learning ◽

Wind Power ◽

Gradient Boosting ◽

Ensemble Model ◽

Short Term ◽

Wind Power Forecasting ◽

Boosting Algorithms ◽

Power Forecasting

Download Full-text

Diagnosing breast cancer tumors using stacked ensemble model

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219176 ◽

2021 ◽

pp. 1-9

Author(s):

Ahmet Haşim Yurttakal ◽

Hasan Erbay ◽

Türkan İkizceli ◽

Seyhan Karaçavuş ◽

Cenker Biçer

Keyword(s):

Breast Cancer ◽

Deep Learning ◽

Medical Imaging ◽

Early Stage ◽

False Negative ◽

Gradient Boosting ◽

Physical Sign ◽

Ensemble Model ◽

Learning Methods ◽

Dce Mri

Breast cancer is the most common cancer that progresses from cells in the breast tissue among women. Early-stage detection could reduce death rates significantly, and the detection-stage determines the treatment process. Mammography is utilized to discover breast cancer at an early stage prior to any physical sign. However, mammography might return false-negative, in which case, if it is suspected that lesions might have cancer of chance greater than two percent, a biopsy is recommended. About 30 percent of biopsies result in malignancy that means the rate of unnecessary biopsies is high. So to reduce unnecessary biopsies, recently, due to its excellent capability in soft tissue imaging, Dynamic Contrast-Enhanced Magnetic Resonance Imaging (DCE-MRI) has been utilized to detect breast cancer. Nowadays, DCE-MRI is a highly recommended method not only to identify breast cancer but also to monitor its development, and to interpret tumorous regions. However, in addition to being a time-consuming process, the accuracy depends on radiologists’ experience. Radiomic data, on the other hand, are used in medical imaging and have the potential to extract disease characteristics that can not be seen by the naked eye. Radiomics are hard-coded features and provide crucial information about the disease where it is imaged. Conversely, deep learning methods like convolutional neural networks(CNNs) learn features automatically from the dataset. Especially in medical imaging, CNNs’ performance is better than compared to hard-coded features-based methods. However, combining the power of these two types of features increases accuracy significantly, which is especially critical in medicine. Herein, a stacked ensemble of gradient boosting and deep learning models were developed to classify breast tumors using DCE-MRI images. The model makes use of radiomics acquired from pixel information in breast DCE-MRI images. Prior to train the model, radiomics had been applied to the factor analysis to refine the feature set and eliminate unuseful features. The performance metrics, as well as the comparisons to some well-known machine learning methods, state the ensemble model outperforms its counterparts. The ensembled model’s accuracy is 94.87% and its AUC value is 0.9728. The recall and precision are 1.0 and 0.9130, respectively, whereas F1-score is 0.9545.

Download Full-text

Automatic Prediction and Classification of Diseases in Melons using Stacked RNN based Deep Learning Model

2020 International Conference on System, Computation, Automation and Networking (ICSCAN) ◽

10.1109/icscan49426.2020.9262414 ◽

2020 ◽

Author(s):

D. Jayakumar ◽

A. Elakkiya ◽

R. Rajmohan ◽

M.O. Ramkumar

Keyword(s):

Deep Learning ◽

Learning Model ◽

Classification Of Diseases ◽

Deep Learning Model

Download Full-text

SURG-02. SURVIVAL PREDICTION AFTER NEUROSURGICAL RESECTION OF BRAIN METASTASES: A MACHINE LEARNING APPROACH

Neuro-Oncology ◽

10.1093/neuonc/noaa215.849 ◽

2020 ◽

Vol 22 (Supplement_2) ◽

pp. ii203-ii203

Author(s):

Alexander Hulsbergen ◽

Yu Tung Lo ◽

Vasileios Kavouridis ◽

John Phillips ◽

Timothy Smith ◽

...

Keyword(s):

Machine Learning ◽

Brain Metastases ◽

External Validation ◽

Superior Performance ◽

Prognostic Models ◽

Receiver Operating Curve ◽

Gradient Boosting ◽

Survival Prediction ◽

Ensemble Model ◽

Adaptive Boosting

Abstract INTRODUCTION Survival prediction in brain metastases (BMs) remains challenging. Current prognostic models have been created and validated almost completely with data from patients receiving radiotherapy only, leaving uncertainty about surgical patients. Therefore, the aim of this study was to build and validate a model predicting 6-month survival after BM resection using different machine learning (ML) algorithms. METHODS An institutional database of 1062 patients who underwent resection for BM was split into a 80:20 training and testing set. Seven different ML algorithms were trained and assessed for performance. Moreover, an ensemble model was created incorporating random forest, adaptive boosting, gradient boosting, and logistic regression algorithms. Five-fold cross validation was used for hyperparameter tuning. Model performance was assessed using area under the receiver-operating curve (AUC) and calibration and was compared against the diagnosis-specific graded prognostic assessment (ds-GPA); the most established prognostic model in BMs. RESULTS The ensemble model showed superior performance with an AUC of 0.81 in the hold-out test set, a calibration slope of 1.14, and a calibration intercept of -0.08, outperforming the ds-GPA (AUC 0.68). Patients were stratified into high-, medium- and low-risk groups for death at 6 months; these strata strongly predicted both 6-months and longitudinal overall survival (p < 0.001). CONCLUSIONS We developed and internally validated an ensemble ML model that accurately predicts 6-month survival after neurosurgical resection for BM, outperforms the most established model in the literature, and allows for meaningful risk stratification. Future efforts should focus on external validation of our model.

Download Full-text

Predicting abnormal laboratory blood test results in the intensive care unit using novel features based on information theory and historical conditional probability (Preprint)

10.2196/preprints.35250 ◽

2021 ◽

Author(s):

Camilo E. Valderrama ◽

Daniel J. Niven ◽

Henry T. Stelfox ◽

Joon Lee

Keyword(s):

Machine Learning ◽

Intensive Care ◽

Conditional Probability ◽

Blood Test ◽

Laboratory Tests ◽

Conditional Entropy ◽

Admission Diagnosis ◽

Test Results ◽

Abnormal Laboratory ◽

Blood Tests

BACKGROUND Redundancy in laboratory blood tests is common in intensive care units (ICU), affecting patients' health and increasing healthcare expenses. Medical communities have made recommendations to order laboratory tests more judiciously. Wise selection can rely on modern data-driven approaches that have been shown to help identify redundant laboratory blood tests in ICUs. However, most of these works have been developed for highly selected clinical conditions such as gastrointestinal bleeding. Moreover, features based on conditional entropy and conditional probability distribution have not been used to inform the need for performing a new test. OBJECTIVE We aimed to address the limitations of previous works by adapting conditional entropy and conditional probability to extract features to predict abnormal laboratory blood test results. METHODS We used an ICU dataset collected across Alberta, Canada which included 55,689 ICU admissions from 48,672 patients with different diagnoses. We investigated conditional entropy and conditional probability-based features by comparing the performances of two machine learning approaches to predict normal and abnormal results for 18 blood laboratory tests. Approach 1 used patients' vitals, age, sex, admission diagnosis, and other laboratory blood test results as features. Approach 2 used the same features plus the new conditional entropy and conditional probability-based features. RESULTS Across the 18 blood laboratory tests, both Approach 1 and Approach 2 achieved a median F1-score, AUC, precision-recall AUC, and Gmean above 80%. We found that the inclusion of the new features statistically significantly improved the capacity to predict abnormal laboratory blood test results in between ten and fifteen laboratory blood tests depending on the machine learning model. CONCLUSIONS Our novel approach with promising prediction results can help reduce over-testing in ICUs, as well as risks for patients and healthcare systems. CLINICALTRIAL N/A

Download Full-text

Predicting hospitalization following psychiatric crisis care using machine learning

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-020-01361-1 ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Matthijs Blankers ◽

Louk F. M. van der Post ◽

Jack J. M. Dekker

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Prediction Models ◽

Learning Algorithms ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Ensemble Model ◽

K Nearest Neighbors ◽

Crisis Care

Abstract Background Accurate prediction models for whether patients on the verge of a psychiatric criseis need hospitalization are lacking and machine learning methods may help improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate the accuracy of ten machine learning algorithms, including the generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact. We also evaluate an ensemble model to optimize the accuracy and we explore individual predictors of hospitalization. Methods Data from 2084 patients included in the longitudinal Amsterdam Study of Acute Psychiatry with at least one reported psychiatric crisis care contact were included. Target variable for the prediction models was whether the patient was hospitalized in the 12 months following inclusion. The predictive power of 39 variables related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts was evaluated. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared and we also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis and the five best performing algorithms were combined in an ensemble model using stacking. Results All models performed above chance level. We found Gradient Boosting to be the best performing algorithm (AUC = 0.774) and K-Nearest Neighbors to be the least performing (AUC = 0.702). The performance of GLM/logistic regression (AUC = 0.76) was slightly above average among the tested algorithms. In a Net Reclassification Improvement analysis Gradient Boosting outperformed GLM/logistic regression by 2.9% and K-Nearest Neighbors by 11.3%. GLM/logistic regression outperformed K-Nearest Neighbors by 8.7%. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was in most cases modest. The results show that a predictive accuracy similar to the best performing model can be achieved when combining multiple algorithms in an ensemble model.

Download Full-text

Utilisation of laboratory test results for patient management by clinicians at two large referral hospitals in Zambia

10.1101/2020.09.23.20200071 ◽

2020 ◽

Author(s):

Sabe Mwape ◽

Victor Daka ◽

Scott Matafwali ◽

Kapambwe Mwape ◽

Jay Sikalima ◽

...

Keyword(s):

Teaching Hospital ◽

Laboratory Tests ◽

Patient Management ◽

Medical Laboratory ◽

Test Results ◽

Cross Sectional ◽

Childrens Hospital ◽

Laboratory Test Results ◽

Laboratory Results ◽

Referral Hospitals

Background Medical laboratory diagnosis is a critical component of patient management in the healthcare setup. Despite the availability of laboratory tests, clinicians may not utilise them to make clinical decisions. We investigated utilsation of laboratory tests for patient management among clinicians at Ndola Teaching Hospital (NTH) and Arthur Davison Childrens Hospital (ADCH), two large referral hospitals in the Copperbelt Province, Ndola, Zambia. Method We conducted a descriptive cross-sectional study among clinicians. The study deployed self-administered questionnaires to evaluate clinician utilisation, querying and confidence in laboratory results. Additional data on demographics and possible laboratory improvements were also obtained. Data were entered in Microsoft excel and exported to SPSS version 16 for statistical analysis. Results Of the 80 clinicians interviewed, 96.2% (77) reported using laboratory tests and their results in patient management. 77.5% (62) of the clinicians indicated they always used laboratory results to influence their patient management decisions. Of the selected laboratory tests, clinicians were more confident in using haemoglobin test results (91.2%). There was no statistically significant association between the clinicians gender or qualification and use of test results in patient management. Conclusion Our findings show that despite the majority querying laboratory results, most of the clinicians use laboratory results for patient management. There is need for interactions between the laboratory and clinical area to assure clinician confidence in laboratory results. Key words: utilisation, clinicians, laboratory tests, Ndola Teaching Hospital, Arthur Davison Childrens Hospital

Download Full-text

Non-specific diagnoses are frequent in patients hospitalized after calling 112 and their mortality is high – a register-based Danish cohort study

10.21203/rs.3.rs-23364/v1 ◽

2020 ◽

Author(s):

Frederikke Vestergaard Nielsen ◽

Mette Rønn Nielsen ◽

Ida Lund Lorenzen ◽

Jesper Amstrup ◽

Torben Anders Kløjgaard ◽

...

Keyword(s):

Cohort Study ◽

International Classification Of Diseases ◽

Rank Test ◽

Test Results ◽

Historical Cohort ◽

Specific Diagnosis ◽

Number Of Patients ◽

Classification Of Diseases ◽

Icd 10

Abstract Background The number of patients calling for an ambulance increases. A considerable number of patients receive a non-specific diagnosis at discharge from the hospital, and this could imply less serious acute conditions, but the mortality has only scarcely been studied. The aim of this study was to examine the most frequent sub-diagnoses among patients with hospital non-specific diagnoses after calling 112 and their subsequent mortality. Methods A historical cohort study of patients brought to the hospital by ambulance after calling 112 in 2007-2014 and diagnosed with a non-specific diagnosis, chapter R or Z, in the International Classification of Diseases, 10 th edition (ICD-10). 1-day and 30-day mortality was analyzed by survival analyses and compared by the log-rank test. Results We included 74,847 ambulance runs in 53,937 unique individuals. The most frequent diagnoses were ‘unspecified disease’ (Z039), constituting 47.0 % (n 35,279). In children 0-9 years old, ‘febrile convulsions’ was the most frequent non-specific diagnosis used in 54.3 % (n 1,602). Overall, 1- and 30-day mortality was 2.2 % (n 1,205) and 6.0 % (n 3,258). The highest mortality was in the diagnostic group ‘suspected cardiovascular disease’ (Z035) and ‘unspecified disease’ (Z039) with 1-day mortality 2.6 % (n 43) and 2.4 % (n 589), and 30 day mortality of 6.32 % (n 104) and 8.1 % (n 1,975). Conclusion Among patients calling an ambulance and discharged with non-specific diagnoses the 1- and 30-day mortality, despite modest mortality percentages lead to a high number of deaths.

Download Full-text