Explainable machine learning models to understand determinants of COVID-19 mortality in the United States

Mapping Intimacies ◽

10.1101/2020.05.23.20110189 ◽

2020 ◽

Author(s):

Piyush Mathur ◽

Tavpritesh Sethi ◽

Anya Mathur ◽

Kamal Maheshwari ◽

Jacek B Cywinski ◽

...

Keyword(s):

United States ◽

Machine Learning ◽

Random Forest ◽

Population Density ◽

The United States ◽

Median Income ◽

Relative Importance ◽

Learning Models ◽

Categorical Models ◽

Machine Learning Models

AbstractBackgroundCOVID-19 is now one of the leading causes of mortality amongst adults in the United States for the year 2020. Multiple epidemiological models have been built, often based on limited data, to understand the spread and impact of the pandemic. However, many geographic and local factors may have played an important role in higher morbidity and mortality in certain populations.ObjectiveThe goal of this study was to develop machine learning models to understand the relative association of socioeconomic, demographic, travel, and health care characteristics of different states across the United States and COVID-19 mortality.MethodsUsing multiple public data sets, 24 variables linked to COVID-19 disease were chosen to build the models. Two independent machine learning models using CatBoost regression and random forest were developed. SHAP feature importance and a Boruta algorithm were used to elucidate the relative importance of features on COVID-19 mortality in the United States.ResultsFeature importances from both the categorical models, i.e., CatBoost and random forest consistently showed that a high population density, number of nursing homes, number of nursing home beds and foreign travel were strongest predictors of COVID-19 mortality. Percentage of African American amongst the population was also found to be of high importance in prediction of COVID-19 mortality whereas racial majority (primarily, Caucasian) was not. Both models fitted the data well with a training R2 of 0.99 and 0.88 respectively. The effect of median age,median income, climate and disease mitigation measures on COVID-19 related mortality remained unclear.ConclusionsCOVID-19 policy making will need to take population density, pre-existing medical care and state travel policies into account. Our models identified and quantified the relative importance of each of these for mortality predictions using machine learning.

Download Full-text

A Brief Analysis of Key Machine Learning Methods for Predicting Medicare Payments Related to Physical Therapy Practices in the United States

Information ◽

10.3390/info12020057 ◽

2021 ◽

Vol 12 (2) ◽

pp. 57

Author(s):

Shrirang A. Kulkarni ◽

Jodh S. Pannu ◽

Andriy V. Koval ◽

Gabriel J. Merrin ◽

Varadraj P. Gurupur ◽

...

Keyword(s):

United States ◽

Machine Learning ◽

Physical Therapy ◽

Random Forest ◽

Generalized Additive Model ◽

Additive Model ◽

The United States ◽

Random Forest Regression ◽

Key Variables ◽

Machine Learning Models

Background and objectives: Machine learning approaches using random forest have been effectively used to provide decision support in health and medical informatics. This is especially true when predicting variables associated with Medicare reimbursements. However, more work is needed to analyze and predict data associated with reimbursements through Medicare and Medicaid services for physical therapy practices in the United States. The key objective of this study is to analyze different machine learning models to predict key variables associated with Medicare standardized payments for physical therapy practices in the United States. Materials and Methods: This study employs five methods, namely, multiple linear regression, decision tree regression, random forest regression, K-nearest neighbors, and linear generalized additive model, (GAM) to predict key variables associated with Medicare payments for physical therapy practices in the United States. Results: The study described in this article adds to the body of knowledge on the effective use of random forest regression and linear generalized additive model in predicting Medicare Standardized payment. It turns out that random forest regression may have any edge over other methods employed for this purpose. Conclusions: The study provides a useful insight into comparing the performance of the aforementioned methods, while identifying a few intricate details associated with predicting Medicare costs while also ascertaining that linear generalized additive model and random forest regression as the most suitable machine learning models for predicting key variables associated with standardized Medicare payments.

Download Full-text

Analysis of Covid-19 in the United States using Machine Learning

Machine Learning and Applications An International Journal ◽

10.5121/mlaij.2021.8102 ◽

2020 ◽

Vol 8 (1) ◽

pp. 15-21

Author(s):

James G. Koomson

Keyword(s):

United States ◽

Machine Learning ◽

The United States ◽

Global Level ◽

Learning Models ◽

The World ◽

Day By Day ◽

Machine Learning Models

The unprecedented outbreak of COVID-19 also known as the coronavirus has caused a pandemic like none ever seen before this century. Its impact has been massive on a global level. The deadly virus has commanded nations around the world to increase their efforts to fight against the spread of the virus after the stress it has put on resources. With the number of new cases increasing day by day around the world, the objective of this paper is to contribute towards the analysis of the virus by leveraging machine learning models to understand its behavior and predict future patterns in the United States (US) based on data obtained from the COVID-19 Tracking Project.

Download Full-text

Forecasting American COVID-19 Cases and Deaths through Machine Learning

10.1101/2020.08.13.20174631 ◽

2020 ◽

Author(s):

Anaiy Somalwar

Keyword(s):

United States ◽

Machine Learning ◽

Curve Fitting ◽

The United States ◽

Learning Model ◽

Learning Models ◽

Gaussian Curve ◽

Machine Learning Model ◽

Gradient Based ◽

Machine Learning Models

COVID-19 has become a great national security problem for the United States and many other countries, where public policy and healthcare decisions are based on the several models for the prediction of the future deaths and cases of COVID-19. While the most commonly used models for COVID-19 include epidemiological models and Gaussian curve-fitting models, recent literature has indicated that these models could be improved by incorporating machine learning. However, within this research on potential machine learning models for COVID-19 forecasting, there has been a large emphasis on providing an array of different types of machine learning models rather than optimizing a single one. In this research, we suggest and optimize a linear machine learning model with a gradient-based optimizer for the prediction of future COVID-19 cases and deaths in the United States. We also suggest that a hybrid of a machine learning model for shorter range predictions and a Gaussian curve-fitting model or an epidemiological model for longer range predictions could greatly increase the accuracy of COVID-19 forecasting.

Download Full-text

Prediction for the Risk of Multiple Chronic Conditions among Working Population in the United States with Machine Learning Models

IEEE Open Journal of Engineering in Medicine and Biology ◽

10.1109/ojemb.2021.3117872 ◽

2021 ◽

pp. 1-1

Author(s):

Jingmei Yang ◽

Xinglong Ju ◽

Feng Liu ◽

Onur Asan ◽

Timothy S Church ◽

...

Keyword(s):

United States ◽

Machine Learning ◽

Chronic Conditions ◽

The United States ◽

Multiple Chronic Conditions ◽

Learning Models ◽

Working Population ◽

Machine Learning Models

Download Full-text

Forecasting American COVID-19 Cases and Deaths through Machine Learning (Preprint)

10.2196/preprints.23605 ◽

2020 ◽

Author(s):

Anaiy Somalwar

Keyword(s):

United States ◽

Machine Learning ◽

Curve Fitting ◽

The United States ◽

Learning Model ◽

Learning Models ◽

Gaussian Curve ◽

Machine Learning Model ◽

Gradient Based ◽

Machine Learning Models

UNSTRUCTURED COVID-19 has become a great national security problem for the United States and many other countries, where public policy and healthcare decisions are based on the several models for the prediction of the future deaths and cases of COVID-19. While the most commonly used models for COVID-19 include epidemiological models and Gaussian curve-fitting models, recent literature has indicated that these models could be improved by incorporating machine learning. However, within this research on potential machine learning models for COVID-19 forecasting, there has been a large emphasis on providing an array of different types of machine learning models rather than optimizing a single one. In this research, we suggest and optimize a linear machine learning model with a gradient-based optimizer for the prediction of future COVID-19 cases and deaths in the United States. We also suggest that a hybrid of a machine learning model for shorter range predictions and a Gaussian curve-fitting model or an epidemiological model for longer range predictions could greatly increase the accuracy of COVID-19 forecasting. INTERNATIONAL REGISTERED REPORT RR2-https://doi.org/10.1101/2020.08.13.20174631

Download Full-text

Document Preprocessing with TF-IDF to Improve the Polarity Classification Performance of Unstructured Sentiment Analysis

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v5i3.1066 ◽

2020 ◽

pp. 235-242

Author(s):

Farrikh Alzami ◽

Erika Devi Udayanti ◽

Dwi Puji Prabowo ◽

Rama Aria Megantara

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Random Forest ◽

Sentiment Analysis ◽

Classification Performance ◽

Document Preparation ◽

Learning Models ◽

Polarity Classification ◽

Negative Sentiment ◽

Machine Learning Models

Sentiment analysis in terms of polarity classification is very important in everyday life, with the existence of polarity, many people can find out whether the respected document has positive or negative sentiment so that it can help in choosing and making decisions. Sentiment analysis usually done manually. Therefore, an automatic sentiment analysis classification process is needed. However, it is rare to find studies that discuss extraction features and which learning models are suitable for unstructured sentiment analysis types with the Amazon food review case. This research explores some extraction features such as Word Bags, TF-IDF, Word2Vector, as well as a combination of TF-IDF and Word2Vector with several machine learning models such as Random Forest, SVM, KNN and Naïve Bayes to find out a combination of feature extraction and learning models that can help add variety to the analysis of polarity sentiments. By assisting with document preparation such as html tags and punctuation and special characters, using snowball stemming, TF-IDF results obtained with SVM are suitable for obtaining a polarity classification in unstructured sentiment analysis for the case of Amazon food review with a performance result of 87,3 percent.

Download Full-text

Random forest and long short-term memory based machine learning models for classification of ion mobility spectrometry spectra

Chemical, Biological, Radiological, Nuclear, and Explosives (CBRNE) Sensing XXII ◽

10.1117/12.2585829 ◽

2021 ◽

Author(s):

Patrick C. Riley ◽

Samir V. Deshpande ◽

Brian S. Ince ◽

Brian C. Hauck ◽

Kyle P. O'Donnell ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Ion Mobility ◽

Short Term Memory ◽

Learning Models ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory ◽

Machine Learning Models

Download Full-text

Comparison of Resampling Algorithms to Address Class Imbalance when Developing Machine Learning Models to Predict Foodborne Pathogen Presence in Agricultural Water

Frontiers in Environmental Science ◽

10.3389/fenvs.2021.701288 ◽

2021 ◽

Vol 9 ◽

Author(s):

Daniel Lowell Weller ◽

Tanzy M. T. Love ◽

Martin Wiedmann

Keyword(s):

Machine Learning ◽

Random Forest ◽

Predictive Models ◽

Training Data ◽

Agricultural Water ◽

Learning Models ◽

Safety Hazards ◽

E Coli ◽

Resampling Method ◽

Machine Learning Models

Recent studies have shown that predictive models can supplement or provide alternatives to E. coli-testing for assessing the potential presence of food safety hazards in water used for produce production. However, these studies used balanced training data and focused on enteric pathogens. As such, research is needed to determine 1) if predictive models can be used to assess Listeria contamination of agricultural water, and 2) how resampling (to deal with imbalanced data) affects performance of these models. To address these knowledge gaps, this study developed models that predict nonpathogenic Listeria spp. (excluding L. monocytogenes) and L. monocytogenes presence in agricultural water using various combinations of learner (e.g., random forest, regression), feature type, and resampling method (none, oversampling, SMOTE). Four feature types were used in model training: microbial, physicochemical, spatial, and weather. “Full models” were trained using all four feature types, while “nested models” used between one and three types. In total, 45 full (15 learners*3 resampling approaches) and 108 nested (5 learners*9 feature sets*3 resampling approaches) models were trained per outcome. Model performance was compared against baseline models where E. coli concentration was the sole predictor. Overall, the machine learning models outperformed the baseline E. coli models, with random forests outperforming models built using other learners (e.g., rule-based learners). Resampling produced more accurate models than not resampling, with SMOTE models outperforming, on average, oversampling models. Regardless of resampling method, spatial and physicochemical water quality features drove accurate predictions for the nonpathogenic Listeria spp. and L. monocytogenes models, respectively. Overall, these findings 1) illustrate the need for alternatives to existing E. coli-based monitoring programs for assessing agricultural water for the presence of potential food safety hazards, and 2) suggest that predictive models may be one such alternative. Moreover, these findings provide a conceptual framework for how such models can be developed in the future with the ultimate aim of developing models that can be integrated into on-farm risk management programs. For example, future studies should consider using random forest learners, SMOTE resampling, and spatial features to develop models to predict the presence of foodborne pathogens, such as L. monocytogenes, in agricultural water when the training data is imbalanced.

Download Full-text

Application of Random Forest Machine Learning Models to Forecast Combustion Profile Parameters of a Natural Gas Spark Ignition Engine

10.1115/1.0004390v ◽

2021 ◽

Author(s):

Jinlong Liu ◽

Christopher Ulishney ◽

Cosmin Dumitrescu

Keyword(s):

Machine Learning ◽

Random Forest ◽

Natural Gas ◽

Spark Ignition ◽

Spark Ignition Engine ◽

Learning Models ◽

Machine Learning Models

Download Full-text

A comparison of regularized logistic regression and random forest machine learning models for daytime diagnosis of obstructive sleep apnea

Medical & Biological Engineering & Computing ◽

10.1007/s11517-020-02206-9 ◽

2020 ◽

Vol 58 (10) ◽

pp. 2517-2529

Author(s):

Farahnaz Hajipour ◽

Mohammad Jafari Jozani ◽

Zahra Moussavi

Keyword(s):

Machine Learning ◽

Obstructive Sleep Apnea ◽

Logistic Regression ◽

Sleep Apnea ◽

Random Forest ◽

Learning Models ◽

Obstructive Sleep ◽

Machine Learning Models

Download Full-text