Design and Deployment of E-Health System in Perspective of Developing Countries: Machine Learning Based Approach (Preprint)

BACKGROUND We are living in a world where data science and machine learning is tightening its grasp on many sectors of modern life. The medical sector is not an exception. In developing countries, healthcare is one of the domains that need immediate attention. Due to the lack of manpower and technical resources, a large number of people in these regions do not receive proper medical care. Designing an E-health system with the help of machine learning and web technologies would be a great aid in such circumstances. OBJECTIVE This proposed E-health System will assist the medical professionals in determining diseases. Moreover, the system will be also helpful for the patients to check whether they have been diagnosed correctly. Based on their diagnosis results they can get medical specialist recommendation and medicine suggestions from the system. The automation of identifying the diseases and suggestion models with the help of machine learning will be cost-efficient and time-saving compared to the traditional methods. The main objective of this E-health system is to provide health care with the help of sustainable and realistic machine learning technologies. METHODS In this research, for the disease identification part, machine learning techniques have been applied to identify three diseases which are Dengue, Diabetes, and Thyroid. Decision Tree, Gaussian Naive-Bayes, Random Forest, Logistic Regression, k-Nearest Neighbors, Multilayer Perceptron, and Support Vector Machine Classifiers have been used for all three diseases. The E-health system comprised of disease identification model, medical specialist recommendation model, and the medicine suggestion model has been deployed on the web. The medical specialist recommendation model and the medicine suggestion model results are based on the finding of the disease identification model. Any user can insert their disease-specific data to use these three features of the E-health system. RESULTS For the disease identification model, Multilayer Perceptron for Dengue, Logistic Regression for Diabetes, and Random Forest for Thyroid performed the best with accuracies of 88.3%, 82.5%, and 98.5% respectively. These classifiers also showed good precision, recall, and F1 score. CONCLUSIONS The E-health system has performed well with real-time data. By making the dataset more enriched, the disease identification model will be more robust and thorough. Moreover, usability and acceptance tests can help us in finding different real-time scenarios of the E-health system.

Download Full-text

Amide proton transfer weighted (APTw) imaging based radiomics allows for the differentiation of gliomas from metastases

Scientific Reports ◽

10.1038/s41598-021-85168-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Elisabeth Sartoretti ◽

Thomas Sartoretti ◽

Michael Wyss ◽

Carolin Reischauer ◽

Luuk van Smoorenburg ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Brain Tumors ◽

Proton Transfer ◽

Multilayer Perceptron ◽

Random Forest Classifier ◽

Amide Proton ◽

Low Grade ◽

Who Grade ◽

Amide Proton Transfer

AbstractWe sought to evaluate the utility of radiomics for Amide Proton Transfer weighted (APTw) imaging by assessing its value in differentiating brain metastases from high- and low grade glial brain tumors. We retrospectively identified 48 treatment-naïve patients (10 WHO grade 2, 1 WHO grade 3, 10 WHO grade 4 primary glial brain tumors and 27 metastases) with either primary glial brain tumors or metastases who had undergone APTw MR imaging. After image analysis with radiomics feature extraction and post-processing, machine learning algorithms (multilayer perceptron machine learning algorithm; random forest classifier) with stratified tenfold cross validation were trained on features and were used to differentiate the brain neoplasms. The multilayer perceptron achieved an AUC of 0.836 (receiver operating characteristic curve) in differentiating primary glial brain tumors from metastases. The random forest classifier achieved an AUC of 0.868 in differentiating WHO grade 4 from WHO grade 2/3 primary glial brain tumors. For the differentiation of WHO grade 4 tumors from grade 2/3 tumors and metastases an average AUC of 0.797 was achieved. Our results indicate that the use of radiomics for APTw imaging is feasible and the differentiation of primary glial brain tumors from metastases is achievable with a high degree of accuracy.

Download Full-text

Machine Learning-based in-hospital Mortality Prediction Models for Patients With Acute Coronary Syndrome

10.21203/rs.3.rs-134944/v1 ◽

2020 ◽

Author(s):

Jun Ke ◽

Yiwei Chen ◽

Xiaoping Wang ◽

Zhiyong Wu ◽

qiongyao Zhang ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Hospital Mortality ◽

Operating Characteristic ◽

Prediction Models ◽

Characteristic Curve ◽

Multivariate Logistic Regression Analysis ◽

Hdl Cholesterol ◽

Coronary Syndrome

Abstract BackgroundThe purpose of this study is to identify the risk factors of in-hospital mortality in patients with acute coronary syndrome (ACS) and to evaluate the performance of traditional regression and machine learning prediction models.MethodsThe data of ACS patients who entered the emergency department of Fujian Provincial Hospital from January 1, 2017 to March 31, 2020 for chest pain were retrospectively collected. The study used univariate and multivariate logistic regression analysis to identify risk factors for in-hospital mortality of ACS patients. The traditional regression and machine learning algorithms were used to develop predictive models, and the sensitivity, specificity, and receiver operating characteristic curve were used to evaluate the performance of each model.ResultsA total of 7810 ACS patients were included in the study, and the in-hospital mortality rate was 1.75%. Multivariate logistic regression analysis found that age and levels of D-dimer, cardiac troponin I, N-terminal pro-B-type natriuretic peptide (NT-proBNP), lactate dehydrogenase (LDH), high-density lipoprotein (HDL) cholesterol, and calcium channel blockers were independent predictors of in-hospital mortality. The study found that the area under the receiver operating characteristic curve of the models developed by logistic regression, gradient boosting decision tree (GBDT), random forest, and support vector machine (SVM) for predicting the risk of in-hospital mortality were 0.963, 0.960, 0.963, and 0.959, respectively. Feature importance evaluation found that NT-proBNP, LDH, and HDL cholesterol were top three variables that contribute the most to the prediction performance of the GBDT model and random forest model.ConclusionsThe predictive model developed using logistic regression, GBDT, random forest, and SVM algorithms can be used to predict the risk of in-hospital death of ACS patients. Based on our findings, we recommend that clinicians focus on monitoring the changes of NT-proBNP, LDH, and HDL cholesterol, as this may improve the clinical outcomes of ACS patients.

Download Full-text

Development of an ensemble machine learning prognostic model to predict 60-day risk of major adverse cardiac events in adults with chest pain

10.1101/2021.03.08.21252615 ◽

2021 ◽

Author(s):

Chris J. Kennedy ◽

Dustin G. Mark ◽

Jie Huang ◽

Mark J. van der Laan ◽

Alan E. Hubbard ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Chest Pain ◽

Random Forest ◽

Decision Trees ◽

Low Risk ◽

Major Adverse Cardiac Events ◽

Risk Scores ◽

Cardiac Events ◽

Adverse Cardiac Events

Background: Chest pain is the second leading reason for emergency department (ED) visits and is commonly identified as a leading driver of low-value health care. Accurate identification of patients at low risk of major adverse cardiac events (MACE) is important to improve resource allocation and reduce over-treatment. Objectives: We sought to assess machine learning (ML) methods and electronic health record (EHR) covariate collection for MACE prediction. We aimed to maximize the pool of low-risk patients that are accurately predicted to have less than 0.5% MACE risk and may be eligible for reduced testing. Population Studied: 116,764 adult patients presenting with chest pain in the ED and evaluated for potential acute coronary syndrome (ACS). 60-day MACE rate was 1.9%. Methods: We evaluated ML algorithms (lasso, splines, random forest, extreme gradient boosting, Bayesian additive regression trees) and SuperLearner stacked ensembling. We tuned ML hyperparameters through nested ensembling, and imputed missing values with generalized low-rank models (GLRM). We benchmarked performance to key biomarkers, validated clinical risk scores, decision trees, and logistic regression. We explained the models through variable importance ranking and accumulated local effect visualization. Results: The best discrimination (area under the precision-recall [PR-AUC] and receiver operating characteristic [ROC-AUC] curves) was provided by SuperLearner ensembling (0.148, 0.867), followed by random forest (0.146, 0.862). Logistic regression (0.120, 0.842) and decision trees (0.094, 0.805) exhibited worse discrimination, as did risk scores [HEART (0.064, 0.765), EDACS (0.046, 0.733)] and biomarkers [serum troponin level (0.064, 0.708), electrocardiography (0.047, 0.686)]. The ensemble's risk estimates were miscalibrated by 0.2 percentage points. The ensemble accurately identified 50% of patients to be below a 0.5% 60-day MACE risk threshold. The most important predictors were age, peak troponin, HEART score, EDACS score, and electrocardiogram. GLRM imputation achieved 90% reduction in root mean-squared error compared to median-mode imputation. Conclusion: Use of ML algorithms, combined with broad predictor sets, improved MACE risk prediction compared to simpler alternatives, while providing calibrated predictions and interpretability. Standard risk scores may neglect important health information available in other characteristics and combined in nuanced ways via ML.

Download Full-text

Machine-Learning vs. Expert-Opinion Driven Logistic Regression Modelling for Predicting 30-Day Unplanned Rehospitalisation in Preterm Babies: A Prospective, Population-Based Study (EPIPAGE 2)

Frontiers in Pediatrics ◽

10.3389/fped.2020.585868 ◽

2021 ◽

Vol 8 ◽

Author(s):

Robert A. Reed ◽

Andrei S. Morgan ◽

Jennifer Zeitlin ◽

Pierre-Henri Jarreau ◽

Héloïse Torchin ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Regression Model ◽

Expert Opinion ◽

Logistic Regression Model ◽

Population Based ◽

Regression Modelling ◽

Preterm Babies ◽

Logistic Regression Modelling

Introduction: Preterm babies are a vulnerable population that experience significant short and long-term morbidity. Rehospitalisations constitute an important, potentially modifiable adverse event in this population. Improving the ability of clinicians to identify those patients at the greatest risk of rehospitalisation has the potential to improve outcomes and reduce costs. Machine-learning algorithms can provide potentially advantageous methods of prediction compared to conventional approaches like logistic regression.Objective: To compare two machine-learning methods (least absolute shrinkage and selection operator (LASSO) and random forest) to expert-opinion driven logistic regression modelling for predicting unplanned rehospitalisation within 30 days in a large French cohort of preterm babies.Design, Setting and Participants: This study used data derived exclusively from the population-based prospective cohort study of French preterm babies, EPIPAGE 2. Only those babies discharged home alive and whose parents completed the 1-year survey were eligible for inclusion in our study. All predictive models used a binary outcome, denoting a baby's status for an unplanned rehospitalisation within 30 days of discharge. Predictors included those quantifying clinical, treatment, maternal and socio-demographic factors. The predictive abilities of models constructed using LASSO and random forest algorithms were compared with a traditional logistic regression model. The logistic regression model comprised 10 predictors, selected by expert clinicians, while the LASSO and random forest included 75 predictors. Performance measures were derived using 10-fold cross-validation. Performance was quantified using area under the receiver operator characteristic curve, sensitivity, specificity, Tjur's coefficient of determination and calibration measures.Results: The rate of 30-day unplanned rehospitalisation in the eligible population used to construct the models was 9.1% (95% CI 8.2–10.1) (350/3,841). The random forest model demonstrated both an improved AUROC (0.65; 95% CI 0.59–0.7; p = 0.03) and specificity vs. logistic regression (AUROC 0.57; 95% CI 0.51–0.62, p = 0.04). The LASSO performed similarly (AUROC 0.59; 95% CI 0.53–0.65; p = 0.68) to logistic regression.Conclusions: Compared to an expert-specified logistic regression model, random forest offered improved prediction of 30-day unplanned rehospitalisation in preterm babies. However, all models offered relatively low levels of predictive ability, regardless of modelling method.

Download Full-text

Detecting Face Touching Using Smartwatches to Mitigate the Spread of COVID-19: Pilot Study (Preprint)

10.2196/preprints.28799 ◽

2021 ◽

Author(s):

Chen Bai ◽

Yu-Peng Chen ◽

Adam Wolach ◽

Lisa Anthony ◽

Mamoun Mardini

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Respiratory Diseases ◽

Window Size ◽

Support Vector ◽

Accelerometer Data ◽

Respiratory Illnesses ◽

Motion Data ◽

Machine Learning Methods

BACKGROUND Frequent spontaneous facial self-touches, predominantly during outbreaks, have the theoretical potential to be a mechanism of contracting and transmitting diseases. Despite the recent advent of vaccines, behavioral approaches remain an integral part of reducing the spread of COVID-19 and other respiratory illnesses. Real-time biofeedback of face touching can potentially mitigate the spread of respiratory diseases. The gap addressed in this study is the lack of an on-demand platform that utilizes motion data from smartwatches to accurately detect face touching. OBJECTIVE The aim of this study was to utilize the functionality and the spread of smartwatches to develop a smartwatch application to identifying motion signatures that are mapped accurately to face touching. METHODS Participants (n=10, 50% women, aged 20-83) performed 10 physical activities classified into: face touching (FT) and non-face touching (NFT) categories, in a standardized laboratory setting. We developed a smartwatch application on Samsung Galaxy Watch to collect raw accelerometer data from participants. Then, data features were extracted from consecutive non-overlapping windows varying from 2-16 seconds. We examined the performance of state-of-the-art machine learning methods on face touching movements recognition (FT vs NFT) and individual activity recognition (IAR): logistic regression, support vector machine, decision trees and random forest. RESULTS Machine learning models were accurate in recognizing face touching categories; logistic regression achieved the best performance across all metrics (Accuracy: 0.93 +/- 0.08, Recall: 0.89 +/- 0.16, Precision: 0.93 +/- 0.08, F1-score: 0.90 +/- 0.11, AUC: 0.95 +/- 0.07) at the window size of 5 seconds. IAR models resulted in lower performance; the random forest classifier achieved the best performance across all metrics (Accuracy: 0.70 +/- 0.14, Recall: 0.70 +/- 0.14, Precision: 0.70 +/- 0.16, F1-score: 0.67 +/- 0.15) at the window size of 9 seconds. CONCLUSIONS Wearable devices, powered with machine learning, are effective in detecting facial touches. This is highly significant during respiratory infection outbreaks, as it has a great potential to refrain people from touching their faces and potentially mitigate the possibility of transmitting COVID-19 and future respiratory diseases.

Download Full-text

Development and Evaluation of the Combined Machine Learning Models for the Prediction of Dam Inflow

Water ◽

10.3390/w12102927 ◽

2020 ◽

Vol 12 (10) ◽

pp. 2927

Author(s):

Jiyeong Hong ◽

Seoro Lee ◽

Joo Hyun Bae ◽

Jimin Lee ◽

Woon Ji Park ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Random Forest ◽

Multilayer Perceptron ◽

Short Term Memory ◽

Learning Algorithms ◽

Absolute Error ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Dam Inflow

Predicting dam inflow is necessary for effective water management. This study created machine learning algorithms to predict the amount of inflow into the Soyang River Dam in South Korea, using weather and dam inflow data for 40 years. A total of six algorithms were used, as follows: decision tree (DT), multilayer perceptron (MLP), random forest (RF), gradient boosting (GB), recurrent neural network–long short-term memory (RNN–LSTM), and convolutional neural network–LSTM (CNN–LSTM). Among these models, the multilayer perceptron model showed the best results in predicting dam inflow, with the Nash–Sutcliffe efficiency (NSE) value of 0.812, root mean squared errors (RMSE) of 77.218 m3/s, mean absolute error (MAE) of 29.034 m3/s, correlation coefficient (R) of 0.924, and determination coefficient (R2) of 0.817. However, when the amount of dam inflow is below 100 m3/s, the ensemble models (random forest and gradient boosting models) performed better than MLP for the prediction of dam inflow. Therefore, two combined machine learning (CombML) models (RF_MLP and GB_MLP) were developed for the prediction of the dam inflow using the ensemble methods (RF and GB) at precipitation below 16 mm, and the MLP at precipitation above 16 mm. The precipitation of 16 mm is the average daily precipitation at the inflow of 100 m3/s or more. The results show the accuracy verification results of NSE 0.857, RMSE 68.417 m3/s, MAE 18.063 m3/s, R 0.927, and R2 0.859 in RF_MLP, and NSE 0.829, RMSE 73.918 m3/s, MAE 18.093 m3/s, R 0.912, and R2 0.831 in GB_MLP, which infers that the combination of the models predicts the dam inflow the most accurately. CombML algorithms showed that it is possible to predict inflow through inflow learning, considering flow characteristics such as flow regimes, by combining several machine learning algorithms.

Download Full-text

A Framework for Effective Application of Machine Learning to Microbiome-Based Classification Problems

mBio ◽

10.1128/mbio.00434-20 ◽

2020 ◽

Vol 11 (3) ◽

Cited By ~ 9

Author(s):

Begüm D. Topçuoğlu ◽

Nicholas A. Lesniak ◽

Mack T. Ruffin ◽

Jenna Wiens ◽

Patrick D. Schloss

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Sequence Data ◽

Characteristic Curve ◽

Predictive Performance ◽

Model Complexity ◽

Support Vector ◽

Classification Problems ◽

Microbial Biomarkers

ABSTRACT Machine learning (ML) modeling of the human microbiome has the potential to identify microbial biomarkers and aid in the diagnosis of many diseases such as inflammatory bowel disease, diabetes, and colorectal cancer. Progress has been made toward developing ML models that predict health outcomes using bacterial abundances, but inconsistent adoption of training and evaluation methods call the validity of these models into question. Furthermore, there appears to be a preference by many researchers to favor increased model complexity over interpretability. To overcome these challenges, we trained seven models that used fecal 16S rRNA sequence data to predict the presence of colonic screen relevant neoplasias (SRNs) (n = 490 patients, 261 controls and 229 cases). We developed a reusable open-source pipeline to train, validate, and interpret ML models. To show the effect of model selection, we assessed the predictive performance, interpretability, and training time of L2-regularized logistic regression, L1- and L2-regularized support vector machines (SVM) with linear and radial basis function kernels, a decision tree, random forest, and gradient boosted trees (XGBoost). The random forest model performed best at detecting SRNs with an area under the receiver operating characteristic curve (AUROC) of 0.695 (interquartile range [IQR], 0.651 to 0.739) but was slow to train (83.2 h) and not inherently interpretable. Despite its simplicity, L2-regularized logistic regression followed random forest in predictive performance with an AUROC of 0.680 (IQR, 0.625 to 0.735), trained faster (12 min), and was inherently interpretable. Our analysis highlights the importance of choosing an ML approach based on the goal of the study, as the choice will inform expectations of performance and interpretability. IMPORTANCE Diagnosing diseases using machine learning (ML) is rapidly being adopted in microbiome studies. However, the estimated performance associated with these models is likely overoptimistic. Moreover, there is a trend toward using black box models without a discussion of the difficulty of interpreting such models when trying to identify microbial biomarkers of disease. This work represents a step toward developing more-reproducible ML practices in applying ML to microbiome research. We implement a rigorous pipeline and emphasize the importance of selecting ML models that reflect the goal of the study. These concepts are not particular to the study of human health but can also be applied to environmental microbiology studies.

Download Full-text

A comparison of regularized logistic regression and random forest machine learning models for daytime diagnosis of obstructive sleep apnea

Medical & Biological Engineering & Computing ◽

10.1007/s11517-020-02206-9 ◽

2020 ◽

Vol 58 (10) ◽

pp. 2517-2529

Author(s):

Farahnaz Hajipour ◽

Mohammad Jafari Jozani ◽

Zahra Moussavi

Keyword(s):

Machine Learning ◽

Obstructive Sleep Apnea ◽

Logistic Regression ◽

Sleep Apnea ◽

Random Forest ◽

Learning Models ◽

Obstructive Sleep ◽

Machine Learning Models

Download Full-text

Machine Learning Techniques Applied to Profile Mobile Banking Users in India

International Journal of Information Systems in the Service Sector ◽

10.4018/jisss.2013010105 ◽

2013 ◽

Vol 5 (1) ◽

pp. 82-92 ◽

Cited By ~ 8

Author(s):

M. Carr ◽

V. Ravi ◽

G. Sridharan Reddy ◽

D. Veranna

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Decision Tree ◽

Decision Trees ◽

Multilayer Perceptron ◽

Machine Learning Techniques ◽

Mobile Banking ◽

Classification Rules ◽

Learning Techniques ◽

Potential Customers

This paper profiles mobile banking users using machine learning techniques viz. Decision Tree, Logistic Regression, Multilayer Perceptron, and SVM to test a research model with fourteen independent variables and a dependent variable (adoption). A survey was conducted and the results were analysed using these techniques. Using Decision Trees the profile of the mobile banking adopter’s profile was identified. Comparing different machine learning techniques it was found that Decision Trees outperformed the Logistic Regression and Multilayer Perceptron and SVM. Out of all the techniques, Decision Tree is recommended for profiling studies because apart from obtaining high accurate results, it also yields ‘if–then’ classification rules. The classification rules provided here can be used to target potential customers to adopt mobile banking by offering them appropriate incentives.

Download Full-text

Oropharyngeal squamous cell carcinoma: radiomic machine-learning classifiers from multiparametric MR images for determination of HPV infection status

Scientific Reports ◽

10.1038/s41598-020-74479-x ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Chong Hyun Suh ◽

Kyung Hwa Lee ◽

Young Jun Choi ◽

Sae Rom Chung ◽

Jung Hwan Baek ◽

...

Keyword(s):

Machine Learning ◽

Squamous Cell Carcinoma ◽

Logistic Regression ◽

Random Forest ◽

Cell Carcinoma ◽

Squamous Cell ◽

Oropharyngeal Squamous Cell Carcinoma ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

Hpv Status

Abstract We investigated the ability of machine-learning classifiers on radiomics from pre-treatment multiparametric magnetic resonance imaging (MRI) to accurately predict human papillomavirus (HPV) status in patients with oropharyngeal squamous cell carcinoma (OPSCC). This retrospective study collected data of 60 patients (48 HPV-positive and 12 HPV-negative) with newly diagnosed histopathologically proved OPSCC, who underwent head and neck MRIs consisting of axial T1WI, T2WI, CE-T1WI, and apparent diffusion coefficient (ADC) maps from diffusion-weighted imaging (DWI). The median age was 59 years (the range being 35 to 85 years), and 83.3% of patients were male. The imaging data were randomised into a training set (32 HPV-positive and 8 HPV-negative OPSCC) and a test set (16 HPV-positive and 4 HPV-negative OPSCC) in each fold. 1618 quantitative features were extracted from manually delineated regions-of-interest of primary tumour and one definite lymph node in each sequence. After feature selection by using the least absolute shrinkage and selection operator (LASSO), three different machine-learning classifiers (logistic regression, random forest, and XG boost) were trained and compared in the setting of various combinations between four sequences. The highest diagnostic accuracies were achieved when using all sequences, and the difference was significant only when the combination did not include the ADC map. Using all sequences, logistic regression and the random forest classifier yielded higher accuracy compared with the that of the XG boost classifier, with mean area under curve (AUC) values of 0.77, 0.76, and 0.71, respectively. The machine-learning classifier of non-invasive and quantitative radiomics signature could guide the classification of the HPV status.

Download Full-text