Severity Assessment of COVID-19 based on Clinical and Imaging Data

Objectives This study aims to develop a machine learning approach for automated severity assessment of COVID-19 patients based on clinical and imaging data. Materials and Methods Clinical data, including demographics, signs, symptoms, comorbidities and blood test results and chest CT scans of 346 patients from two hospitals in the Hubei province, China, were used to develop machine learning models for automated severity assessment of diagnosed COVID-19 cases. We compared the predictive power of clinical and imaging data by testing multiple machine learning models, and further explored the use of four oversampling methods to address the imbalance distribution issue. Features with the highest predictive power were identified using the SHAP framework. Results Targeting differentiation between mild and severe cases, logistic regression models achieved the best performance on clinical features (AUC:0.848, sensitivity:0.455, specificity:0.906), imaging features (AUC:0.926, sensitivity:0.818, specificity:0.901) and the combined features (AUC:0.950, sensitivity:0.764, specificity:0.919). The SMOTE oversampling method further improved the performance of the combined features to AUC of 0.960 (sensitivity:0.845, specificity:0.929). Discussion Imaging features had the strongest impact on the model output, while a combination of clinical and imaging features yielded the best performance overall. The identified predictive features were consistent with findings from previous studies. Oversampling yielded mixed results, although it achieved the best performance in our study. Conclusions This study indicates that clinical and imaging features can be used for automated severity assessment of COVID-19 patients and have the potential to assist with triaging COVID-19 patients and prioritizing care for patients at higher risk of severe cases. [Manuscript last updated on 31 July, 2020]

Download Full-text

Development and Validation of a Machine Learning Approach for Automated Severity Assessment of COVID-19 Based on Clinical and Imaging Data: Retrospective Study

JMIR Medical Informatics ◽

10.2196/24572 ◽

2021 ◽

Vol 9 (2) ◽

pp. e24572

Author(s):

Juan Carlos Quiroz ◽

You-Zhen Feng ◽

Zhong-Yuan Cheng ◽

Dana Rezazadegan ◽

Ping-Kang Chen ◽

...

Keyword(s):

Machine Learning ◽

Predictive Power ◽

Care Delivery ◽

Learning Approach ◽

Imaging Features ◽

Severity Assessment ◽

Imaging Data ◽

Learning Models ◽

Machine Learning Approach ◽

Machine Learning Models

Background COVID-19 has overwhelmed health systems worldwide. It is important to identify severe cases as early as possible, such that resources can be mobilized and treatment can be escalated. Objective This study aims to develop a machine learning approach for automated severity assessment of COVID-19 based on clinical and imaging data. Methods Clinical data—including demographics, signs, symptoms, comorbidities, and blood test results—and chest computed tomography scans of 346 patients from 2 hospitals in the Hubei Province, China, were used to develop machine learning models for automated severity assessment in diagnosed COVID-19 cases. We compared the predictive power of the clinical and imaging data from multiple machine learning models and further explored the use of four oversampling methods to address the imbalanced classification issue. Features with the highest predictive power were identified using the Shapley Additive Explanations framework. Results Imaging features had the strongest impact on the model output, while a combination of clinical and imaging features yielded the best performance overall. The identified predictive features were consistent with those reported previously. Although oversampling yielded mixed results, it achieved the best model performance in our study. Logistic regression models differentiating between mild and severe cases achieved the best performance for clinical features (area under the curve [AUC] 0.848; sensitivity 0.455; specificity 0.906), imaging features (AUC 0.926; sensitivity 0.818; specificity 0.901), and a combination of clinical and imaging features (AUC 0.950; sensitivity 0.764; specificity 0.919). The synthetic minority oversampling method further improved the performance of the model using combined features (AUC 0.960; sensitivity 0.845; specificity 0.929). Conclusions Clinical and imaging features can be used for automated severity assessment of COVID-19 and can potentially help triage patients with COVID-19 and prioritize care delivery to those at a higher risk of severe disease.

Download Full-text

Development and Validation of a Machine Learning Approach for Automated Severity Assessment of COVID-19 Based on Clinical and Imaging Data: Retrospective Study (Preprint)

10.2196/preprints.24572 ◽

2020 ◽

Author(s):

Juan Carlos Quiroz ◽

You-Zhen Feng ◽

Zhong-Yuan Cheng ◽

Dana Rezazadegan ◽

Ping-Kang Chen ◽

...

Keyword(s):

Machine Learning ◽

Predictive Power ◽

Care Delivery ◽

Learning Approach ◽

Imaging Features ◽

Severity Assessment ◽

Imaging Data ◽

Learning Models ◽

Machine Learning Approach ◽

Machine Learning Models

BACKGROUND COVID-19 has overwhelmed health systems worldwide. It is important to identify severe cases as early as possible, such that resources can be mobilized and treatment can be escalated. OBJECTIVE This study aims to develop a machine learning approach for automated severity assessment of COVID-19 based on clinical and imaging data. METHODS Clinical data—including demographics, signs, symptoms, comorbidities, and blood test results—and chest computed tomography scans of 346 patients from 2 hospitals in the Hubei Province, China, were used to develop machine learning models for automated severity assessment in diagnosed COVID-19 cases. We compared the predictive power of the clinical and imaging data from multiple machine learning models and further explored the use of four oversampling methods to address the imbalanced classification issue. Features with the highest predictive power were identified using the Shapley Additive Explanations framework. RESULTS Imaging features had the strongest impact on the model output, while a combination of clinical and imaging features yielded the best performance overall. The identified predictive features were consistent with those reported previously. Although oversampling yielded mixed results, it achieved the best model performance in our study. Logistic regression models differentiating between mild and severe cases achieved the best performance for clinical features (area under the curve [AUC] 0.848; sensitivity 0.455; specificity 0.906), imaging features (AUC 0.926; sensitivity 0.818; specificity 0.901), and a combination of clinical and imaging features (AUC 0.950; sensitivity 0.764; specificity 0.919). The synthetic minority oversampling method further improved the performance of the model using combined features (AUC 0.960; sensitivity 0.845; specificity 0.929). CONCLUSIONS Clinical and imaging features can be used for automated severity assessment of COVID-19 and can potentially help triage patients with COVID-19 and prioritize care delivery to those at a higher risk of severe disease.

Download Full-text

What Predicts Corruption?

10.31235/osf.io/fq2xb ◽

2020 ◽

Author(s):

Emanuele Colonnelli ◽

Jorge Gallego ◽

Mounu Prem

Keyword(s):

Machine Learning ◽

Human Capital ◽

Cost Effectiveness ◽

Public Sector ◽

Financial Development ◽

Predictive Power ◽

Public Spending ◽

Learning Models ◽

Micro Data ◽

Machine Learning Models

The ability to predict corruption is crucial to policy. Using rich micro-data from Brazil, we show that multiple machine learning models display high levels of performance in predicting municipality-level corruption in public spending. We then quantify which individual municipality features and groups of similar characteristics have the highest predictive power. We find that measures of private sector activity, financial development, and human capital are the strongest predictors of corruption, while public sector and political features play a secondary role. Our findings have implications for the design and cost-effectiveness of various anti-corruption policies.

Download Full-text

Machine Learning Models Have Better Performance than Traditional Logistic Regression Models in Predicting the Risk of Diabetes

SSRN Electronic Journal ◽

10.2139/ssrn.3854672 ◽

2021 ◽

Author(s):

Yaqian Mao ◽

Shuyao Pan ◽

Zheng Zhu ◽

Wei Lin ◽

Junping Wen ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Regression Models ◽

Learning Models ◽

Logistic Regression Models ◽

Machine Learning Models

Download Full-text

Telugu News Data Classification Using Machine Learning Approach

10.4018/978-1-7998-7685-4.ch014 ◽

2022 ◽

pp. 181-194

Author(s):

Bala Krishna Priya G. ◽

Jabeen Sultana ◽

Usha Rani M.

Keyword(s):

Machine Learning ◽

Social Media ◽

Research Work ◽

Learning Approach ◽

Fake News ◽

Learning Models ◽

Machine Learning Classifiers ◽

Proposed Model ◽

Machine Learning Approach ◽

Machine Learning Models

Mining Telugu news data and categorizing based on public sentiments is quite important since a lot of fake news emerged with rise of social media. Identifying whether news text is positive, negative, or neutral and later classifying the data in which areas they fall like business, editorial, entertainment, nation, and sports is included throughout this research work. This research work proposes an efficient model by adopting machine learning classifiers to perform classification on Telugu news data. The results obtained by various machine-learning models are compared, and an efficient model is found, and it is observed that the proposed model outperformed with reference to accuracy, precision, recall, and F1-score.

Download Full-text

Analysis of Machine Learning Techniques Applied to Sensory Detection of Vehicles in Intelligent Crosswalks

Sensors ◽

10.3390/s20216019 ◽

2020 ◽

Vol 20 (21) ◽

pp. 6019

Author(s):

José Manuel Lozano Domínguez ◽

Faroq Al-Tam ◽

Tomás de J. Mateo Sanguino ◽

Noélia Correia

Keyword(s):

Machine Learning ◽

Smart Cities ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Models ◽

Fuzzy Classifier ◽

Logistic Regression Models ◽

The Road ◽

Learning Agent ◽

Machine Learning Models

Improving road safety through artificial intelligence-based systems is now crucial turning smart cities into a reality. Under this highly relevant and extensive heading, an approach is proposed to improve vehicle detection in smart crosswalks using machine learning models. Contrarily to classic fuzzy classifiers, machine learning models do not require the readjustment of labels that depend on the location of the system and the road conditions. Several machine learning models were trained and tested using real traffic data taken from urban scenarios in both Portugal and Spain. These include random forest, time-series forecasting, multi-layer perceptron, support vector machine, and logistic regression models. A deep reinforcement learning agent, based on a state-of-the-art double-deep recurrent Q-network, is also designed and compared with the machine learning models just mentioned. Results show that the machine learning models can efficiently replace the classic fuzzy classifier.

Download Full-text

Machine Learning Diagnostic Modeling for Classifying Fibromyalgia Using B-mode Ultrasound Images

Ultrasonic Imaging ◽

10.1177/0161734620908789 ◽

2020 ◽

Vol 42 (3) ◽

pp. 135-147 ◽

Cited By ~ 1

Author(s):

Michael Behr ◽

Saba Saiel ◽

Valerie Evans ◽

Dinesh Kumbhare

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Regression Models ◽

Trapezius Muscle ◽

Image Texture ◽

Learning Models ◽

Test Set ◽

Logistic Regression Models ◽

Performance Accuracy ◽

Machine Learning Models

Fibromyalgia (FM) diagnosis remains a challenge for clinicians due to a lack of objective diagnostic tools. One proposed solution is the use of quantitative ultrasound (US) techniques, such as image texture analysis, which has demonstrated discriminatory capabilities with other chronic pain conditions. From this, we propose the use of image texture variables to construct and compare two machine learning models (support vector machine [SVM] and logistic regression) for differentiating between the trapezius muscle in healthy and FM patients. US videos of the right and left trapezius muscle were acquired from healthy ( n = 51) participants and those with FM ( n = 57). The videos were converted into 64,800 skeletal muscle regions of interest (ROIs) using MATLAB. The ROIs were filtered by an algorithm using the complex wavelet structural similarity index (CW-SSIM), which removed ROIs that were similar. Thirty-one texture variables were extracted from the ROIs, which were then used in nested cross-validation to construct SVM and elastic net regularized logistic regression models. The generalized performance accuracy of both models was estimated and confirmed with a final validation on a holdout test set. The predicted generalized performance accuracy of the SVM and logistic regression models was computed to be 83.9 ± 2.6% and 65.8 ± 1.7%, respectively. The models achieved accuracies of 84.1%, and 66.0% on the final holdout test set, validating performance estimates. Although both machine learning models differentiate between healthy trapezius muscle and that of patients with FM, only the SVM model demonstrated clinically relevant performance levels.

Download Full-text

SCADA System Testbed for Cybersecurity Research Using Machine Learning Approach

Future Internet ◽

10.3390/fi10080076 ◽

2018 ◽

Vol 10 (8) ◽

pp. 76 ◽

Cited By ~ 12

Author(s):

Marcio Teixeira ◽

Tara Salman ◽

Maede Zolanvari ◽

Raj Jain ◽

Nader Meskin ◽

...

Keyword(s):

Machine Learning ◽

Supervisory Control ◽

Network Traffic ◽

Learning Algorithms ◽

Cyber Attacks ◽

Machine Learning Algorithms ◽

Learning Models ◽

Scada System ◽

Machine Learning Approach ◽

Machine Learning Models

This paper presents the development of a Supervisory Control and Data Acquisition (SCADA) system testbed used for cybersecurity research. The testbed consists of a water storage tank’s control system, which is a stage in the process of water treatment and distribution. Sophisticated cyber-attacks were conducted against the testbed. During the attacks, the network traffic was captured, and features were extracted from the traffic to build a dataset for training and testing different machine learning algorithms. Five traditional machine learning algorithms were trained to detect the attacks: Random Forest, Decision Tree, Logistic Regression, Naïve Bayes and KNN. Then, the trained machine learning models were built and deployed in the network, where new tests were made using online network traffic. The performance obtained during the training and testing of the machine learning models was compared to the performance obtained during the online deployment of these models in the network. The results show the efficiency of the machine learning models in detecting the attacks in real time. The testbed provides a good understanding of the effects and consequences of attacks on real SCADA environments.

Download Full-text

Predicting which genes will respond to perturbations of a TF: TF-independent properties of genes are major determinants of their responsiveness

10.1101/2020.12.15.422864 ◽

2020 ◽

Author(s):

Yiming Kang ◽

Michael Brent

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Predictive Power ◽

Yeast Cells ◽

Expression Level ◽

Learning Models ◽

Histone Marks ◽

Expression Variation ◽

Location Data ◽

Machine Learning Models

Background: The ability to predict which genes will respond to perturbation of a TF's activity serves as a benchmark for our systems-level understanding of transcriptional regulatory networks. In previous work, machine learning models have been trained to predict static gene expression levels in a given sample by using data from the same or similar conditions, including data on TF binding locations, histone marks, or DNA sequence. We report on a different challenge -- training machine learning models that can predict which genes will respond to perturbation of a TF without using any data from the perturbed cells. Results: Existing TF location data (ChIP-Seq) from human K562 cells have no detectable utility for predicting which genes will respond to perturbation of the TF, but data obtained by newer methods in yeast cells are useful. TF-independent features of genes, including their preperturbation expression level and expression variation, are very useful for predicting responses to TF perturbations. This shows that some genes are poised to respond to TF perturbations and others are resistant, shedding significant light on why it has been so difficult to predict responses from binding locations. Certain histone marks (HMs), including H3K4me1 and H3K4me3, have some predictive power, especially when downstream of the transcription start site. In human, the predictive power of HMs is much less than that of gene expression level and variation. Code is available at https://github.com/yiming-kang/TFPertRespExplainer. Conclusions: Sequence-based or epigenetic properties of genes strongly influence their tendency to respond to direct TF perturbations, partially explaining the oft-noted difficulty of predicting responsiveness from TF binding location data. These molecular features are largely reflected in and summarized by the gene's expression level and expression variation.

Download Full-text

An open-source framework for fast-yet-accurate calculation of quantum mechanical features

10.26434/chemrxiv-2021-8gthw ◽

2021 ◽

Author(s):

Eike Caldeweyher ◽

Christoph Bauer ◽

Ali Soltani Tehrani

Keyword(s):

Machine Learning ◽

Open Source ◽

Predictive Power ◽

Medium Size ◽

Quantum Mechanical ◽

Learning Models ◽

Molecular Fingerprints ◽

Open Source Framework ◽

Molecular Polarizabilities ◽

Machine Learning Models

We present the open-source framework kallisto that enables the efficient and robust calculation of quantum mechanical features for atoms and molecules. For a benchmark set of 49 experimental molecular polarizabilities, the predictive power of the presented method competes against second-order perturbation theory in a converged atomic-orbital basis set at a fraction of its computational costs. Robustness tests within a diverse validation set of more than 80,000 molecules show that the calculation of isotropic molecular polarizabilities has a low failure-rate of only 0.3 %. We present furthermore a generally applicable van der Waals radius model that is rooted on atomic static polarizabilites. Efficiency tests show that such radii can even be calculated for small- to medium-size proteins where the largest system (SARS-CoV-2 spike protein) has 42,539 atoms. Following the work of Domingo-Alemenara et al. [Domingo-Alemenara et al., Nat. Comm., 2019, 10, 5811], we present computational predictions for retention times for different chromatographic methods and describe how physicochemical features improve the predictive power of machine-learning models that otherwise only rely on two-dimensional features like molecular fingerprints. Additionally, we developed an internal benchmark set of experimental super-critical fluid chromatography retention times. For those methods, improvements of up to 17 % are obtained when combining molecular fingerprints with physicochemical descriptors. Shapley additive explanation values show furthermore that the physical nature of the applied features can be retained within the final machine-learning models. We generally recommend the kallisto framework as a robust, low-cost, and physically motivated featurizer for upcoming state-of-the-art machine-learning studies.

Download Full-text