Machine learning based prognostic model for predicting infection susceptibility of COVID-19 using health care data

Abstract From public health perspectives of COVID-19 pandemic, accurate estimates of infection severity of individuals are extremely valuable for the informed decision making and targeted response to an emerging pandemic. This paper presents machine learning based prognostic model for providing early warning to the individuals for COVID-19 infection using the health care data set. In the present work, a prognostic model using Random Forest classifier and support vector regression is developed for predicting the susceptibility of COVID-19 infection and it is applied on an open health care data set containing 27 field values. The typical fields of the health care data set include basic personal details such as age, gender, number of children in the household, marital status along with medical data like Coma score, Pulmonary score, Blood Glucose level, HDL cholesterol etc. An effective preprocessing method is carried out for handling the numerical, categorical values (non-numerical), missing data in the health care data set. Principal component analysis is applied for dimensionality reduction of the health care data set. From the classification results, it is noted that the random forest classifier provides a higher accuracy as compared to Support vector regression for the given health data set. Proposed machine learning approach can help the individuals to take additional precautions for protecting against COVID-19 infection. Based on the results of the proposed method, clinicians and government officials can focus on the highly susceptible people for limiting the pandemic spread. Methods In the present work, Random Forest classifier and support vector regression techniques are applied to a medical health care dataset containing 27 variables for predicting the susceptibility score of an individual towards COVID-19 infection and the accuracy of prediction is compared. An effective preprocessing is carried for handling the missing data in the health care data set. Principal Component Analysis is carried out on the data set for dimensionality reduction of the feature vectors. Results From the classification results, it is noted that the Random Forest classifier provides an accuracy of 90%, sensitivity of 94% and specificity of 81% for the given medical data set.Conclusion Proposed machine learning approach can help the individuals to take additional precautions for protecting people from the COVID-19 infection, clinicians and government officials can focus on the highly susceptible people for limiting the pandemic spread.

Download Full-text

Machine learning based prognostic model and mobile application software platform for predicting infection susceptibility of COVID-19 using health care data

10.1101/2020.10.09.20165431 ◽

2020 ◽

Author(s):

R Srivatsan ◽

Prithviraj N Indi ◽

Swapnil Agrahari ◽

Siddharth Menon ◽

S Denis Ashok

Keyword(s):

Machine Learning ◽

Health Care ◽

Random Forest ◽

Support Vector Regression ◽

Prognostic Model ◽

Random Forest Classifier ◽

Support Vector ◽

Data Set ◽

Health Care Data ◽

Infection Susceptibility

AbstractFrom public health perspectives of COVID-19 pandemic, accurate estimates of infection severity of individuals are extremely valuable for the informed decision making and targeted response to an emerging pandemic. This paper presents machine learning based prognostic model for providing early warning to the individuals for COVID-19 infection using the health care data set. In the present work, a prognostic model using Random Forest classifier and support vector regression is developed for predicting the Infection Susceptibility Probability (ISP) score of COVID-19 and it is applied on an open health care data set containing 27 field values. The typical fields of the health care data set include basic personal details such as age, gender, number of children in the household, marital status along with medical data like Coma score, Pulmonary score, Blood Glucose level, HDL cholesterol etc. An effective preprocessing method is carried out for handling the numerical, categorical values (non-numerical), missing data in the health care data set. The correlation between the variables in the health care data is analyzed using the correlation coefficient and heat map with a color code is used to identify the influencing factors on the Infection Susceptibility Probability (ISP) score of COVID-19. Based on the accuracy, Precision, Sensitivity and F-scores, it is noted that the random forest classifier provides an improved classification performance as compared to Support vector regression for the given health care data set. Android based mobile application software platform is developed using the proposed prognostic approach for enabling the healthy individuals to predict the susceptibility infection score of COVID-19 to take the precautionary measures. Based on the results of the proposed method, clinicians and government officials can focus on the highly susceptible people for limiting the pandemic spreadMethodsIn the present work, Random Forest classifier and support vector regression techniques are applied to a medical health care dataset containing 27 variables for predicting the susceptibility score of an individual towards COVID-19 infection and the accuracy of prediction is compared. An effective preprocessing is carried for handling the missing data in the health care data set. Correlation analysis using heat map is carried on the health care data for analyzing the influencing factors of Infection Susceptibility Probability (ISP) score of COVID-19. A confusion matrix is calculated for understanding the performance of classification of the based on the number of True-Positives, True-Negatives, False-Positives and False-Negatives. These values further used to calculate the accuracy, Precision, Sensitivity and F-scores.ResultsFrom the classification results, it is noted that the Random Forest classifier provides an classification accuracy of 99.7% precision of 99.8%, sensitivity of 98.8% and F-score of 99.29% for the given medical data set.ConclusionProposed machine learning approach can help the individuals to take additional precautions for protecting people from the COVID-19 infection, clinicians and government officials can focus on the highly susceptible people for limiting the pandemic spread.Abbreviation Table

Download Full-text

Machine learning based prognostic model and mobile application software platform for predicting infection susceptibility of COVID-19 using health care data

10.21203/rs.3.rs-46681/v2 ◽

2020 ◽

Author(s):

R Srivat ◽

Prithviraj N Indi ◽

Swapnil Agrahari ◽

Siddharth Menon ◽

S. Denis Ashok

Keyword(s):

Machine Learning ◽

Health Care ◽

Mobile Application ◽

Prognostic Model ◽

Random Forest Classifier ◽

Support Vector ◽

Application Software ◽

Data Set ◽

Health Care Data ◽

Infection Susceptibility

Download Full-text

Estimation of Soil Cohesion Using Machine Learning Method: A Random Forest Approach

Advances in Civil Engineering ◽

10.1155/2021/8873993 ◽

2021 ◽

Vol 2021 ◽

pp. 1-14

Author(s):

Hai-Bang Ly ◽

Thuy-Anh Nguyen ◽

Binh Thai Pham

Keyword(s):

Machine Learning ◽

Random Forest ◽

Soil Properties ◽

Clay Content ◽

Absolute Error ◽

Experimental Methods ◽

Liquid Limit ◽

Support Vector ◽

Data Set ◽

Soil Cohesion

Soil cohesion (C) is one of the critical soil properties and is closely related to basic soil properties such as particle size distribution, pore size, and shear strength. Hence, it is mainly determined by experimental methods. However, the experimental methods are often time-consuming and costly. Therefore, developing an alternative approach based on machine learning (ML) techniques to solve this problem is highly recommended. In this study, machine learning models, namely, support vector machine (SVM), Gaussian regression process (GPR), and random forest (RF), were built based on a data set of 145 soil samples collected from the Da Nang-Quang Ngai expressway project, Vietnam. The database also includes six input parameters, that is, clay content, moisture content, liquid limit, plastic limit, specific gravity, and void ratio. The performance of the model was assessed by three statistical criteria, namely, the correlation coefficient (R), mean absolute error (MAE), and root mean square error (RMSE). The results demonstrated that the proposed RF model could accurately predict soil cohesion with high accuracy (R = 0.891) and low error (RMSE = 3.323 and MAE = 2.511), and its predictive capability is better than SVM and GPR. Therefore, the RF model can be used as a cost-effective approach in predicting soil cohesion forces used in the design and inspection of constructions.

Download Full-text

A novel method for total chlorine detection using machine learning with electrode arrays

RSC Advances ◽

10.1039/c9ra06609h ◽

2019 ◽

Vol 9 (59) ◽

pp. 34196-34206

Author(s):

Zhe Li ◽

Shunhao Huang ◽

Juan Chen

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Cyclic Voltammetry ◽

Support Vector Regression ◽

Principal Component ◽

Measurement Model ◽

Support Vector ◽

Electrode Arrays ◽

Soft Measurement ◽

Novel Method

Establish soft measurement model of total chlorine: cyclic voltammetry curves, principal component analysis and support vector regression.

Download Full-text

Utilização de técnicas de Machine Learning e de Deep Learning para a predição de casos de internações causadas por dengue em municípios da Paraíba

10.5753/ercemapi.2021.17914 ◽

2021 ◽

Author(s):

Ewerthon Dyego de Araújo Batista ◽

Wellington Candeia de Araújo ◽

Romeryto Vieira Lira ◽

Laryssa Izabel de Araújo Batista

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Random Forest ◽

Convolutional Neural Network ◽

Support Vector Regression ◽

Multilayer Perceptron ◽

Support Vector

Dengue é um problema de saúde pública no Brasil, os casos da doença voltaram a crescer na Paraíba. O boletim epidemiológico da Paraíba, divulgado em agosto de 2021, informa um aumento de 53% de casos em relação ao ano anterior. Técnicas de Machine Learning (ML) e de Deep Learning estão sendo utilizadas como ferramentas para a predição da doença e suporte ao seu combate. Por meio das técnicas Random Forest (RF), Support Vector Regression (SVR), Multilayer Perceptron (MLP), Long ShortTerm Memory (LSTM) e Convolutional Neural Network (CNN), este artigo apresenta um sistema capaz de realizar previsões de internações causadas por dengue para as cidades Bayeux, Cabedelo, João Pessoa e Santa Rita. O sistema conseguiu realizar previsões para Bayeux com taxa de erro 0,5290, já em Cabedelo o erro foi 0,92742, João Pessoa 9,55288 e Santa Rita 0,74551.

Download Full-text

Waste Management System Fraud Detection Using Machine Learning Algorithms to Minimize Penalties Avoidance and Redemption Abuse

Recycling ◽

10.3390/recycling6040065 ◽

2021 ◽

Vol 6 (4) ◽

pp. 65

Author(s):

Ali Hewiagh ◽

Kannan Ramakrishnan ◽

Timothy Tzen Vun Yap ◽

Ching Seong Tan

Keyword(s):

Machine Learning ◽

Random Forest ◽

Waste Management ◽

Management System ◽

Learning Algorithms ◽

Fraud Detection ◽

Machine Learning Algorithms ◽

Support Vector ◽

Data Set ◽

Waste Management System

Online frauds have pernicious impacts on different system domains, including waste management systems. Fraudsters illegally obtain rewards for their recycling activities or avoid penalties for those who are required to recycle their own waste. Although some approaches have been introduced to prevent such fraudulent activities, the fraudsters continuously seek new ways to commit illegal actions. Machine learning technology has shown significant and impressive results in identifying new online fraud patterns in different system domains such as e-commerce, insurance, and banking. The purpose of this paper, therefore, is to analyze a waste management system and develop a machine learning model to detect fraud in the system. The intended system allows consumers, individuals, and organizations to track, monitor, and update their performance in their recycling activities. The data set provided by a waste management organization is used for the analysis and the model training. This data set contains transactions of users’ recycling activities and behaviors. Three machine learning algorithms, random forest, support vector machine, and multi-layer perceptron are used in the experiments and the best detection model is selected based on the model’s performance. Results show that each of these algorithms can be used for fraud detection in waste managements with high accuracy. The random forest algorithm produces the optimal model with an accuracy of 96.33%, F1-score of 95.20%, and ROC of 98.92%.

Download Full-text

A Machine Learning Approach to Determine Maturity Stages of Tomatoes

Oriental journal of computer science and technology ◽

10.13005/ojcst/10.03.19 ◽

2017 ◽

Vol 10 (3) ◽

pp. 683-690 ◽

Cited By ~ 2

Author(s):

Kamalpreet Kaur ◽

O.P. Guptata

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Fruits And Vegetables ◽

Support Vector ◽

Large Set ◽

Accuracy Score ◽

Data Set ◽

Maturity Stages ◽

Total Data

Maturity checking has become mandatory for the food industries as well as for the farmers so as to ensure that the fruits and vegetables are not diseased and are ripe. However, manual inspection leads to human error, unripe fruits and vegetables may decrease the production [3]. Thus, this study proposes a Tomato Classification system for determining maturity stages of tomato through Machine Learning which involves training of different algorithms like Decision Tree, Logistic Regression, Gradient Boosting, Random Forest, Support Vector Machine, K-NN and XG Boost. This system consists of image collection, feature extraction and training the classifiers on 80% of the total data. Rest 20% of the total data is used for the testing purpose. It is concluded from the results that the performance of the classifier depends on the size and kind of features extracted from the data set. The results are obtained in the form of Learning Curve, Confusion Matrix and Accuracy Score. It is observed that out of seven classifiers, Random Forest is successful with 92.49% accuracy due to its high capability of handling large set of data. Support Vector Machine has shown the least accuracy due to its inability to train large data set.

Download Full-text

Amino Acid Composition and Charge Based Prediction of Antisepsis Peptides by Random Forest Machine Learning Algorithm

10.1101/2021.09.26.461860 ◽

2021 ◽

Author(s):

Aayushi Rathore ◽

Anu Saini ◽

Navjot Kaur ◽

Aparna Singh ◽

Ojasvi Dutta ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithm ◽

Learning Algorithms ◽

Multiple Organ ◽

The Body ◽

Machine Learning Algorithms ◽

Support Vector ◽

Data Set ◽

Organ Systems

ABSTRACTSepsis is a severe infectious disease with high mortality, and it occurs when chemicals released in the bloodstream to fight an infection trigger inflammation throughout the body and it can cause a cascade of changes that damage multiple organ systems, leading them to fail, even resulting in death. In order to reduce the possibility of sepsis or infection antiseptics are used and process is known as antisepsis. Antiseptic peptides (ASPs) show properties similar to antigram-negative peptides, antigram-positive peptides and many more. Machine learning algorithms are useful in screening and identification of therapeutic peptides and thus provide initial filters or built confidence before using time consuming and laborious experimental approaches. In this study, various machine learning algorithms like Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbour (KNN) and Logistic Regression (LR) were evaluated for prediction of ASPs. Moreover, the characteristics physicochemical features of ASPs were also explored to use them in machine learning. Both manual and automatic feature selection methodology was employed to achieve best performance of machine learning algorithms. A 5-fold cross validation and independent data set validation proved RF as the best model for prediction of ASPs. Our RF model showed an accuracy of 97%, Matthew’s Correlation Coefficient (MCC) of 0.93, which are indication of a robust and good model. To our knowledge this is the first attempt to build a machine learning classifier for prediction of ASPs.

Download Full-text

Malaria Prediction Model Using Machine Learning Algorithms

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i10.5655 ◽

2021 ◽

Vol 12 (10) ◽

pp. 7488-7496

Author(s):

Yusuf Aliyu Adamu, Et. al.

Keyword(s):

Machine Learning ◽

Random Forest ◽

Prediction Model ◽

Public Awareness ◽

Health Sector ◽

Weather Condition ◽

Machine Learning Algorithms ◽

Support Vector ◽

African Countries ◽

Data Set

Measures have been taking to ensure the safety of individuals from the burden of vector-borne disease but it remains the causative agent of death than any other diseases in Africa. Many human lives are lost particularly of children below five years regardless of the efforts made. The effect of malaria is much more challenging mostly in developing countries. In 2019, 51% of malaria fatality happen in Africa which it increased by 20% in 2020 due to the covid-19 pandemic. The majority of African countries lack a proper or a sound health care system, proper environmental settlement, economic hardship, limited funding in the health sector, and absence of good policies to ensure the safety of individuals. Information has to become available to the peoples on the effect of malaria by making public awareness program to make sure people become acquainted with the disease so that certain measure can be maintained. The prediction model can help the policymakers to know more about the expected time of the malaria occurrence based on the existing features so that people will get to know the information regarding the disease on time, health equipment and medication to be made available by government through it policy. In this research weather condition, non-climatic features, and malaria cases are considered in designing the model for prediction purposes and also the performance of six different machine learning classifiers for instance Support Vector Machine, K-Nearest Neighbour, Random Forest, Decision Tree, Logistic Regression, and Naïve Bayes is identified and found that Random Forest is the best with accuracy (97.72%), AUC (98%) AUC, and (100%) precision based on the data set used in the analysis.

Download Full-text

Reliable Identification of Oolong Tea Species: Nondestructive Testing Classification Based on Fluorescence Hyperspectral Technology and Machine Learning

Agriculture ◽

10.3390/agriculture11111106 ◽

2021 ◽

Vol 11 (11) ◽

pp. 1106

Author(s):

Yan Hu ◽

Lijia Xu ◽

Peng Huang ◽

Xiong Luo ◽

Peng Wang ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Principal Component ◽

Classification Model ◽

Recursive Feature Elimination ◽

Support Vector ◽

K Nearest Neighbor ◽

Oolong Tea ◽

The Impact ◽

T Distribution

A rapid and nondestructive tea classification method is of great significance in today’s research. This study uses fluorescence hyperspectral technology and machine learning to distinguish Oolong tea by analyzing the spectral features of tea in the wavelength ranging from 475 to 1100 nm. The spectral data are preprocessed by multivariate scattering correction (MSC) and standard normal variable (SNV), which can effectively reduce the impact of baseline drift and tilt. Then principal component analysis (PCA) and t-distribution random neighborhood embedding (t-SNE) are adopted for feature dimensionality reduction and visual display. Random Forest-Recursive Feature Elimination (RF-RFE) is used for feature selection. Decision Tree (DT), Random Forest Classification (RFC), K-Nearest Neighbor (KNN) and Support Vector Machine (SVM) are used to establish the classification model. The results show that MSC-RF-RFE-SVM is the best model for the classification of Oolong tea in which the accuracy of the training set and test set is 100% and 98.73%, respectively. It can be concluded that fluorescence hyperspectral technology and machine learning are feasible to classify Oolong tea.

Download Full-text