Performance assessment of the metastatic spinal tumor frailty index using machine learning algorithms: limitations and future directions

OBJECTIVE Frailty is recognized as an important consideration in patients with cancer who are undergoing therapies, including spine surgery. The definition of frailty in the context of spinal metastases is unclear, and few have studied such markers and their association with postoperative outcomes and survival. Using national databases, the metastatic spinal tumor frailty index (MSTFI) was developed as a tool to predict outcomes in this specific patient population and has not been tested with external data. The purpose of this study was to test the performance of the MSTFI with institutional data and determine whether machine learning methods could better identify measures of frailty as predictors of outcomes. METHODS Electronic health record data from 479 adult patients admitted to the Massachusetts General Hospital for metastatic spinal tumor surgery from 2010 to 2019 formed a validation cohort for the MSTFI to predict major complications, in-hospital mortality, and length of stay (LOS). The 9 parameters of the MSTFI were modeled in 3 machine learning algorithms (lasso regularization logistic regression, random forest, and gradient-boosted decision tree) to assess clinical outcome prediction and determine variable importance. Prediction performance of the models was measured by computing areas under the receiver operating characteristic curve (AUROCs), calibration, and confusion matrix metrics (positive predictive value, sensitivity, and specificity) and was subjected to internal bootstrap validation. RESULTS Of 479 patients (median age 64 years [IQR 55–71 years]; 58.7% male), 28.4% had complications after spine surgery. The in-hospital mortality rate was 1.9%, and the mean LOS was 7.8 days. The MSTFI demonstrated poor discrimination for predicting complications (AUROC 0.56, 95% CI 0.50–0.62) and in-hospital mortality (AUROC 0.69, 95% CI 0.54–0.85) in the validation cohort. For postoperative complications, machine learning approaches showed a greater advantage over the logistic regression model used to develop the MSTFI (AUROC 0.62, 95% CI 0.56–0.68 for random forest vs AUROC 0.56, 95% CI 0.50–0.62 for logistic regression). The random forest model had the highest positive predictive value (0.53, 95% CI 0.43–0.64) and the highest negative predictive value (0.77, 95% CI 0.72–0.81), with chronic lung disease, coagulopathy, anemia, and malnutrition identified as the most important predictors of postoperative complications. CONCLUSIONS This study highlights the challenges of defining and quantifying frailty in the metastatic spine tumor population. Further study is required to improve the determination of surgical frailty in this specific cohort.

Download Full-text

Machine Learning-based in-hospital Mortality Prediction Models for Patients With Acute Coronary Syndrome

10.21203/rs.3.rs-134944/v1 ◽

2020 ◽

Author(s):

Jun Ke ◽

Yiwei Chen ◽

Xiaoping Wang ◽

Zhiyong Wu ◽

qiongyao Zhang ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Hospital Mortality ◽

Operating Characteristic ◽

Prediction Models ◽

Characteristic Curve ◽

Multivariate Logistic Regression Analysis ◽

Hdl Cholesterol ◽

Coronary Syndrome

Abstract BackgroundThe purpose of this study is to identify the risk factors of in-hospital mortality in patients with acute coronary syndrome (ACS) and to evaluate the performance of traditional regression and machine learning prediction models.MethodsThe data of ACS patients who entered the emergency department of Fujian Provincial Hospital from January 1, 2017 to March 31, 2020 for chest pain were retrospectively collected. The study used univariate and multivariate logistic regression analysis to identify risk factors for in-hospital mortality of ACS patients. The traditional regression and machine learning algorithms were used to develop predictive models, and the sensitivity, specificity, and receiver operating characteristic curve were used to evaluate the performance of each model.ResultsA total of 7810 ACS patients were included in the study, and the in-hospital mortality rate was 1.75%. Multivariate logistic regression analysis found that age and levels of D-dimer, cardiac troponin I, N-terminal pro-B-type natriuretic peptide (NT-proBNP), lactate dehydrogenase (LDH), high-density lipoprotein (HDL) cholesterol, and calcium channel blockers were independent predictors of in-hospital mortality. The study found that the area under the receiver operating characteristic curve of the models developed by logistic regression, gradient boosting decision tree (GBDT), random forest, and support vector machine (SVM) for predicting the risk of in-hospital mortality were 0.963, 0.960, 0.963, and 0.959, respectively. Feature importance evaluation found that NT-proBNP, LDH, and HDL cholesterol were top three variables that contribute the most to the prediction performance of the GBDT model and random forest model.ConclusionsThe predictive model developed using logistic regression, GBDT, random forest, and SVM algorithms can be used to predict the risk of in-hospital death of ACS patients. Based on our findings, we recommend that clinicians focus on monitoring the changes of NT-proBNP, LDH, and HDL cholesterol, as this may improve the clinical outcomes of ACS patients.

Download Full-text

Machine learning algorithm to predict delirium from emergency department data

10.1101/2021.02.19.21251956 ◽

2021 ◽

Author(s):

Sangil Lee ◽

Brianna Mueller ◽

W. Nick Street ◽

Ryan M. Carnahan

Keyword(s):

Machine Learning ◽

Emergency Department ◽

Logistic Regression ◽

Random Forest ◽

Barthel Index ◽

Risk Model ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Sensitivity Threshold ◽

Mortality And Morbidity

AbstractIntroductionDelirium is a cerebral dysfunction seen commonly in the acute care setting. Delirium is associated with increased mortality and morbidity and is frequently missed in the emergency department (ED) by clinical gestalt alone. Identifying those at risk of delirium may help prioritize screening and interventions.ObjectiveOur objective was to identify clinically valuable predictive models for prevalent delirium within the first 24 hours of hospitalization based on the available data by assessing the performance of logistic regression and a variety of machine learning models.MethodsThis was a retrospective cohort study to develop and validate a predictive risk model to detect delirium using patient data obtained around an ED encounter. Data from electronic health records for patients hospitalized from the ED between January 1, 2014, and December 31, 2019, were extracted. Eligible patients were aged 65 or older, admitted to an inpatient unit from the emergency department, and had at least one DOSS assessment or CAM-ICU recorded while hospitalized. The outcome measure of this study was delirium within one day of hospitalization determined by a positive DOSS or CAM assessment. We developed the model with and without the Barthel index for activity of daily living, since this was measured after hospital admission.ResultsThe area under the ROC curves for delirium ranged from .69 to .77 without the Barthel index. Random forest and gradient-boosted machine showed the highest AUC of .77. At the 90% sensitivity threshold, gradient-boosted machine, random forest, and logistic regression achieved a specificity of 35%. After the Barthel index was included, random forest, gradient-boosted machine, and logistic regression models demonstrated the best predictive ability with respective AUCs of .85 to .86.ConclusionThis study demonstrated the use of machine learning algorithms to identify the combination of variables that are predictive of delirium within 24 hours of hospitalization from the ED.

Download Full-text

Mortality Prediction in Cerebral Hemorrhage Patients Using Machine Learning Algorithms in Intensive Care Units

Frontiers in Neurology ◽

10.3389/fneur.2020.610531 ◽

2021 ◽

Vol 11 ◽

Author(s):

Ximing Nie ◽

Yuan Cai ◽

Jingyi Liu ◽

Xiran Liu ◽

Jiahui Zhao ◽

...

Keyword(s):

Machine Learning ◽

Intensive Care ◽

Random Forest ◽

Hospital Mortality ◽

Intensive Care Units ◽

Cerebral Hemorrhage ◽

Characteristic Curve ◽

Learning Algorithms ◽

Mortality Prediction ◽

Machine Learning Algorithms

Objectives: This study aims to investigate whether the machine learning algorithms could provide an optimal early mortality prediction method compared with other scoring systems for patients with cerebral hemorrhage in intensive care units in clinical practice.Methods: Between 2008 and 2012, from Intensive Care III (MIMIC-III) database, all cerebral hemorrhage patients monitored with the MetaVision system and admitted to intensive care units were enrolled in this study. The calibration, discrimination, and risk classification of predicted hospital mortality based on machine learning algorithms were assessed. The primary outcome was hospital mortality. Model performance was assessed with accuracy and receiver operating characteristic curve analysis.Results: Of 760 cerebral hemorrhage patients enrolled from MIMIC database [mean age, 68.2 years (SD, ±15.5)], 383 (50.4%) patients died in hospital, and 377 (49.6%) patients survived. The area under the receiver operating characteristic curve (AUC) of six machine learning algorithms was 0.600 (nearest neighbors), 0.617 (decision tree), 0.655 (neural net), 0.671(AdaBoost), 0.819 (random forest), and 0.725 (gcForest). The AUC was 0.423 for Acute Physiology and Chronic Health Evaluation II score. The random forest had the highest specificity and accuracy, as well as the greatest AUC, showing the best ability to predict in-hospital mortality.Conclusions: Compared with conventional scoring system and the other five machine learning algorithms in this study, random forest algorithm had better performance in predicting in-hospital mortality for cerebral hemorrhage patients in intensive care units, and thus further research should be conducted on random forest algorithm.

Download Full-text

The Contribution of CD148, CD180 and CD200 Combination in the Diagnosis of Chronic B-Cell Lymphoproliferative Disorders

Blood ◽

10.1182/blood-2021-150810 ◽

2021 ◽

Vol 138 (Supplement 1) ◽

pp. 3520-3520

Author(s):

Laurent Miguet ◽

Caroline Mayeur-Rousse ◽

Alice Eischen ◽

Anne-Cecile Galoisy ◽

Delphine C. M. Rolland ◽

...

Keyword(s):

Machine Learning ◽

Bone Marrow ◽

Flow Cytometry ◽

Logistic Regression ◽

Random Forest ◽

Expression Patterns ◽

Machine Learning Algorithms ◽

Relative Importance ◽

Predictive Values ◽

Molecular Features

Abstract Introduction: B-cell immunophenotype could be swiftly assessed by flow cytometry on blood samples or bone marrow aspirate specimens. It provides crucial information later refined with histologic, genetic and molecular features to assert accurate diagnosis of chronic B-cell lymphoproliferative disorders (B-CLPD). Besides Matutes score we identified additional useful markers, i.e. CD148 and CD180 to classify mantle cell lymphoma (MCL) and marginal zone lymphoma (MZL), respectively. Furthermore, CD200 is known to be highly expressed in chronic lymphoid leukemia (CLL) while absent in MCL. Hypothesis: The determination of CD148, CD180 and CD200 expression on B-cells by flow cytometry on blood samples and/or bone marrow aspirates could be a potent tool to accurately identify B-CLPD. We postulated the existence of the following specific expression patterns in B-CLPD: CD148 dim/CD180 dim/CD200 bright for CLL, CD148 dim/CD180 dim/CD200 dim for lymphoplasmocytic lymphoma (LPL), CD148 bright/CD180 dim/CD200 neg/dim for MCL and CD148 dim/CD180 bright/CD200 dim for MZL . Methods: In a prospective study we investigated the expression of CD148/CD180/CD200 on B-cells from 673 patients at the time of B-CLPD diagnosis in our hospital from 2014 to 2020. We analyzed 440 blood and 233 bone marrow aspirate specimens using a BD FACSCanto II flow cytometry instrument. Based solely on CD148/CD180/CD200 specific expression patterns we postulated a diagnosis of CLL, LPL, MCL or MZL. These postulated diagnoses were later confronted to the final diagnoses when all histologic, genetic and molecular features were finalized. Sensitivity, specificity, positive and negative predictive values of the expression profiles were determined. In addition, to investigate the relative importance of these three CD markers we then normalized their mean fluorescence intensities (MFI) and applied several supervised machine learning algorithms including Logistic Regression, Random Forest and Light Gradient Boosting Machine (LightGBM). Results: Out of the 673 clinical samples the CD148/CD180/CD200 expression patterns classified 212 specimens as CLL/SLL (30.8%), 160 as LPL (23.8%), 76 as MCL (11.28%) and 169 as MZL (25%). These diagnosis hypotheses were retrospectively compared to the final diagnoses based on all histologic, genetic and molecular features These diagnosis hypotheses of CLL, LPL, MCL and MZL were consistent with the final diagnosis in 583 out of the 617 corresponding cases (94%) with high positive and negative predictive values. The characteristics of the diagnosis accuracy are detailed in the table below. HCL and FL were not further investigated as their immunophenotype usually do not overlap with those of other B-CLPD. Seventeen out of 617 patients (17/617, 5.3%) did not displayed a clear CD148/CD180/CD200 pattern: 9 LPL, 4 CLL and 4 MZL. In sixteen patients (16/617, 5.0%) the diagnosis hypothesis based on this strategy was not confirmed after completion of the exploration including karyotype, MYD88 L265P mutational status, CCND1 overexpression and pathology explorations. We next investigated the relative importance of these 3 markers. We focused on MFI values of CD148, CD180 and CD200 and three categorical "positive or negative" markers (CD5, CD23, FMC7) that were assembled into a composite marker. After Cox-box normalization of CD148, CD180 and CD200 MFIs, a set of supervised machine learning algorithms including Logistic Regression, Random Forest and Light Gradient Boosting Machine (LightGBM) were applied to the cohort of CLL, LPL, MCL and MZL. We established that the highest diagnosis weights were obtained for CD200 in CLL, CD200 and CD148 in MCL (negatively and positively, respectively), CD180 in MZL. In LPL, CD148, CD180 and CD200 had the highest weights using LightGBM and Random Forest algorithms, while Logistic Regression determined that CD5 and CD23 had the highest (negative) weights. In conclusion, the determination of CD148/CD180/CD200 surface expression patterns by flow cytometry, along with morphology, allowed to assert an accurate diagnosis hypothesis in CLL, MCL, LPL and MZL with high positive and negative predictive values. Machine learning algorithms allowed to measure the relative importance of these markers, that could be of great help in case of discordant expression of the main diagnosis markers. Figure 1 Figure 1. Disclosures No relevant conflicts of interest to declare.

Download Full-text

Machine learning in the diagnosis of Myocardial Infarction with Non-Obstructive Coronary Arteries

European Heart Journal ◽

10.1093/eurheartj/ehab724.3067 ◽

2021 ◽

Vol 42 (Supplement_1) ◽

Author(s):

M J Espinosa Pascual ◽

P Vaquero Martinez ◽

V Vaquero Martinez ◽

J Lopez Pais ◽

B Izquierdo Coronel ◽

...

Keyword(s):

Machine Learning ◽

Myocardial Infarction ◽

Support Vector Machine ◽

Logistic Regression ◽

Random Forest ◽

Obstructive Coronary Artery Disease ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Classification Model ◽

Support Vector

Abstract Introduction Out of all patients admitted with Myocardial Infarction, 10 to 15% have Myocardial Infarction with Non-Obstructive Coronaries Arteries (MINOCA). Classification algorithms based on deep learning substantially exceed traditional diagnostic algorithms. Therefore, numerous machine learning models have been proposed as useful tools for the detection of various pathologies, but to date no study has proposed a diagnostic algorithm for MINOCA. Purpose The aim of this study was to estimate the diagnostic accuracy of several automated learning algorithms (Support-Vector Machine [SVM], Random Forest [RF] and Logistic Regression [LR]) to discriminate between people suffering from MINOCA from those with Myocardial Infarction with Obstructive Coronary Artery Disease (MICAD) at the time of admission and before performing a coronary angiography, whether invasive or not. Methods A Diagnostic Test Evaluation study was carried out applying the proposed algorithms to a database constituted by 553 consecutive patients admitted to our Hospital with Myocardial Infarction. According to the definitions of 2016 ESC Position Paper on MINOCA, patients were classified into two groups: MICAD and MINOCA. Out of the total 553 patients, 214 were discarded due to the lack of complete data. The set of machine learning algorithms was trained on 244 patients (training sample: 75%) and tested on 80 patients (test sample: 25%). A total of 64 variables were available for each patient, including demographic, clinical and laboratorial features before the angiographic procedure. Finally, the diagnostic precision of each architecture was taken. Results The most accurate classification model was the Random Forest algorithm (Specificity [Sp] 0.88, Sensitivity [Se] 0.57, Negative Predictive Value [NPV] 0.93, Area Under the Curve [AUC] 0.85 [CI 0.83–0.88]) followed by the standard Logistic Regression (Sp 0.76, Se 0.57, NPV 0.92 AUC 0.74 and Support-Vector Machine (Sp 0.84, Se 0.38, NPV 0.90, AUC 0.78) (see graph). The variables that contributed the most in order to discriminate a MINOCA from a MICAD were the traditional cardiovascular risk factors, biomarkers of myocardial injury, hemoglobin and gender. Results were similar when the 19 patients with Takotsubo syndrome were excluded from the analysis. Conclusion A prediction system for diagnosing MINOCA before performing coronary angiographies was developed using machine learning algorithms. Results show higher accuracy of diagnosing MINOCA than conventional statistical methods. This study supports the potential of machine learning algorithms in clinical cardiology. However, further studies are required in order to validate our results. FUNDunding Acknowledgement Type of funding sources: None. ROC curves of different algorithms

Download Full-text

Landslide Detection and Susceptibility Modeling on Cameron Highlands (Malaysia): A Comparison between Random Forest, Logistic Regression and Logistic Model Tree Algorithms

Forests ◽

10.3390/f11080830 ◽

2020 ◽

Vol 11 (8) ◽

pp. 830 ◽

Cited By ~ 2

Author(s):

Viet-Ha Nhu ◽

Ayub Mohammadi ◽

Himan Shahabi ◽

Baharin Bin Ahmad ◽

Nadhir Al-Ansari ◽

...

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Logistic Regression ◽

Random Forest ◽

Landslide Susceptibility ◽

Predictive Value ◽

Logistic Model ◽

Model Tree ◽

Cameron Highlands ◽

Logistic Model Tree

We used remote sensing techniques and machine learning to detect and map landslides, and landslide susceptibility in the Cameron Highlands, Malaysia. We located 152 landslides using a combination of interferometry synthetic aperture radar (InSAR), Google Earth (GE), and field surveys. Of the total slide locations, 80% (122 landslides) were utilized for training the selected algorithms, and the remaining 20% (30 landslides) were applied for validation purposes. We employed 17 conditioning factors, including slope angle, aspect, elevation, curvature, profile curvature, stream power index (SPI), topographic wetness index (TWI), lithology, soil type, land cover, normalized difference vegetation index (NDVI), distance to river, distance to fault, distance to road, river density, fault density, and road density, which were produced from satellite imageries, geological map, soil maps, and a digital elevation model (DEM). We used these factors to produce landslide susceptibility maps using logistic regression (LR), logistic model tree (LMT), and random forest (RF) models. To assess prediction accuracy of the models we employed the following statistical measures: negative predictive value (NPV), sensitivity, positive predictive value (PPV), specificity, root-mean-squared error (RMSE), accuracy, and area under the receiver operating characteristic (ROC) curve (AUC). Our results indicated that the AUC was 92%, 90%, and 88% for the LMT, LR, and RF algorithms, respectively. To assess model performance, we also applied non-parametric statistical tests of Friedman and Wilcoxon, where the results revealed that there were no practical differences among the used models in the study area. While landslide mapping in tropical environment such as Cameron Highlands remains difficult, the remote sensing (RS) along with machine learning techniques, such as the LMT model, show promise for landslide susceptibility mapping in the study area.

Download Full-text

ClickbaitTR: Dataset for clickbait detection from Turkish news sites and social media with a comparative analysis via machine learning algorithms

Journal of Information Science ◽

10.1177/01655515211007746 ◽

2021 ◽

pp. 016555152110077

Author(s):

Şura Genç ◽

Elif Surer

Keyword(s):

Machine Learning ◽

Social Media ◽

Logistic Regression ◽

Random Forest ◽

Short Term Memory ◽

Ensemble Classifier ◽

Machine Learning Algorithms ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory

Clickbait is a strategy that aims to attract people’s attention and direct them to specific content. Clickbait titles, created by the information that is not included in the main content or using intriguing expressions with various text-related features, have become very popular, especially in social media. This study expands the Turkish clickbait dataset that we had constructed for clickbait detection in our proof-of-concept study, written in Turkish. We achieve a 48,060 sample size by adding 8859 tweets and release a publicly available dataset – ClickbaitTR – with its open-source data analysis library. We apply machine learning algorithms such as Artificial Neural Network (ANN), Logistic Regression, Random Forest, Long Short-Term Memory Network (LSTM), Bidirectional Long Short-Term Memory (BiLSTM) and Ensemble Classifier on 48,060 news headlines extracted from Twitter. The results show that the Logistic Regression algorithm has 85% accuracy; the Random Forest algorithm has a performance of 86% accuracy; the LSTM has 93% accuracy; the ANN has 93% accuracy; the Ensemble Classifier has 93% accuracy; and finally, the BiLSTM has 97% accuracy. A thorough discussion is provided for the psychological aspects of clickbait strategy focusing on curiosity and interest arousal. In addition to a successful clickbait detection performance and the detailed analysis of clickbait sentences in terms of language and psychological aspects, this study also contributes to clickbait detection studies with the largest clickbait dataset in Turkish.

Download Full-text

Prediction on Domestic Violence in Bangladesh during the COVID-19 Outbreak Using Machine Learning Methods

Applied System Innovation ◽

10.3390/asi4040077 ◽

2021 ◽

Vol 4 (4) ◽

pp. 77

Author(s):

Md. Murad Hossain ◽

Md. Asadullah ◽

Abidur Rahaman ◽

Md. Sipon Miah ◽

M. Zahid Hasan ◽

...

Keyword(s):

Machine Learning ◽

Domestic Violence ◽

Logistic Regression ◽

Random Forest ◽

Family Violence ◽

Violence Against Women ◽

Machine Learning Algorithms ◽

Data Set ◽

Domestic Violence Against Women ◽

Women And Children

The COVID-19 outbreak resulted in preventative measures and restrictions for Bangladesh during the summer of 2020—these unstable and stressful times led to multiple social problems (e.g., domestic violence and divorce). Globally, researchers, policymakers, governments, and civil societies have been concerned about the increase in domestic violence against women and children during the ongoing COVID-19 pandemic. In Bangladesh, domestic violence against women and children has increased during the COVID-19 pandemic. In this article, we investigated family violence among 511 families during the COVID-19 outbreak. Participants were given questionnaires to answer, for a period of over ten days; we predicted family violence using a machine learning-based model. To predict domestic violence from our data set, we applied random forest, logistic regression, and Naive Bayes machine learning algorithms to our model. We employed an oversampling strategy named the Synthetic Minority Oversampling Technique (SMOTE) and the chi-squared statistical test to, respectively, solve the imbalance problem and discover the feature importance of our data set. The performances of the machine learning algorithms were evaluated based on accuracy, precision, recall, and F-score criteria. Finally, the receiver operating characteristic (ROC) and confusion matrices were developed and analyzed for three algorithms. On average, our model, with the random forest, logistic regression, and Naive Bayes algorithms, predicted family violence with 77%, 69%, and 62% accuracy for our data set. The findings of this study indicate that domestic violence has increased and is highly related to two features: family income level during the COVID-19 pandemic and education level of the family members.

Download Full-text

Development and Validation of a Predictive Model for Coronary Artery Disease Using Machine Learning

Frontiers in Cardiovascular Medicine ◽

10.3389/fcvm.2021.614204 ◽

2021 ◽

Vol 8 ◽

Author(s):

Chen Wang ◽

Yue Zhao ◽

Bingyu Jin ◽

Xuedong Gan ◽

Bin Liang ◽

...

Keyword(s):

Machine Learning ◽

Coronary Artery Disease ◽

Coronary Artery ◽

Random Forest ◽

Positive Predictive Value ◽

Negative Predictive Value ◽

Predictive Value ◽

Validation Cohort ◽

Random Forest Algorithm ◽

Artery Disease

Early identification of coronary artery disease (CAD) can prevent the progress of CAD and effectually lower the mortality rate, so we intended to construct and validate a machine learning model to predict the risk of CAD based on conventional risk factors and lab test data. There were 3,112 CAD patients and 3,182 controls enrolled from three centers in China. We compared the baseline and clinical characteristics between two groups. Then, Random Forest algorithm was used to construct a model to predict CAD and the model was assessed by receiver operating characteristic (ROC) curve. In the development cohort, the Random Forest model showed a good AUC 0.948 (95%CI: 0.941–0.954) to identify CAD patients from controls, with a sensitivity of 90%, a specificity of 85.4%, a positive predictive value of 0.863 and a negative predictive value of 0.894. Validation of the model also yielded a favorable discriminatory ability with the AUC, sensitivity, specificity, positive predictive value, and negative predictive value of 0.944 (95%CI: 0.934–0.955), 89.5%, 85.8%, 0.868, and 0.886 in the validation cohort 1, respectively, and 0.940 (95%CI: 0.922–0.960), 79.5%, 94.3%, 0.932, and 0.823 in the validation cohort 2, respectively. An easy-to-use tool that combined 15 indexes to assess the CAD risk was constructed and validated using Random Forest algorithm, which showed favorable predictive capability (http://45.32.120.149:3000/randomforest). Our model is extremely valuable for clinical practice, which will be helpful for the management and primary prevention of CAD patients.

Download Full-text

Statistical Analysis for Selective Identifications of VOCs by Using Surface Functionalized MoS2 Based Sensor Array

Chemistry Proceedings ◽

10.3390/csac2021-10451 ◽

2021 ◽

Vol 5 (1) ◽

pp. 35

Author(s):

Uttam Narendra Thakur ◽

Radha Bhardwaj ◽

Arnab Hazra

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Decision Tree ◽

Sensor Array ◽

Multinomial Logistic Regression ◽

Learning Algorithms ◽

Disease Diagnosis ◽

Machine Learning Algorithms ◽

Human Breath

Disease diagnosis through breath analysis has attracted significant attention in recent years due to its noninvasive nature, rapid testing ability, and applicability for patients of all ages. More than 1000 volatile organic components (VOCs) exist in human breath, but only selected VOCs are associated with specific diseases. Selective identification of those disease marker VOCs using an array of multiple sensors are highly desirable in the current scenario. The use of efficient sensors and the use of suitable classification algorithms is essential for the selective and reliable detection of those disease markers in complex breath. In the current study, we fabricated a noble metal (Au, Pd and Pt) nanoparticle-functionalized MoS2 (Chalcogenides, Sigma Aldrich, St. Louis, MO, USA)-based sensor array for the selective identification of different VOCs. Four sensors, i.e., pure MoS2, Au/MoS2, Pd/MoS2, and Pt/MoS2 were tested under exposure to different VOCs, such as acetone, benzene, ethanol, xylene, 2-propenol, methanol and toluene, at 50 °C. Initially, principal component analysis (PCA) and linear discriminant analysis (LDA) were used to discriminate those seven VOCs. As compared to the PCA, LDA was able to discriminate well between the seven VOCs. Four different machine learning algorithms such as k-nearest neighbors (kNN), decision tree, random forest, and multinomial logistic regression were used to further identify those VOCs. The classification accuracy of those seven VOCs using KNN, decision tree, random forest, and multinomial logistic regression was 97.14%, 92.43%, 84.1%, and 98.97%, respectively. These results authenticated that multinomial logistic regression performed best between the four machine learning algorithms to discriminate and differentiate the multiple VOCs that generally exist in human breath.

Download Full-text