Development of glaucoma predictive model and risk factors assessment based on supervised models

Abstract Objectives To develop and to propose a machine learning model for predicting glaucoma and identifying its risk factors. Method Data analysis pipeline is designed for this study based on Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology. The main steps of the pipeline include data sampling, preprocessing, classification and evaluation and validation. Data sampling for providing the training dataset was performed with balanced sampling based on over-sampling and under-sampling methods. Data preprocessing steps were missing value imputation and normalization. For classification step, several machine learning models were designed for predicting glaucoma including Decision Trees (DTs), K-Nearest Neighbors (K-NN), Support Vector Machines (SVM), Random Forests (RFs), Extra Trees (ETs) and Bagging Ensemble methods. Moreover, in the classification step, a novel stacking ensemble model is designed and proposed using the superior classifiers. Results The data were from Shahroud Eye Cohort Study including demographic and ophthalmology data for 5190 participants aged 40-64 living in Shahroud, northeast Iran. The main variables considered in this dataset were 67 demographics, ophthalmologic, optometric, perimetry, and biometry features for 4561 people, including 4474 non-glaucoma participants and 87 glaucoma patients. Experimental results show that DTs and RFs trained based on under-sampling of the training dataset have superior performance for predicting glaucoma than the compared single classifiers and bagging ensemble methods with the average accuracy of 87.61 and 88.87, the sensitivity of 73.80 and 72.35, specificity of 87.88 and 89.10 and area under the curve (AUC) of 91.04 and 94.53, respectively. The proposed stacking ensemble has an average accuracy of 83.56, a sensitivity of 82.21, a specificity of 81.32, and an AUC of 88.54. Conclusions In this study, a machine learning model is proposed and developed to predict glaucoma disease among persons aged 40-64. Top predictors in this study considered features for discriminating and predicting non-glaucoma persons from glaucoma patients include the number of the visual field detect on perimetry, vertical cup to disk ratio, white to white diameter, systolic blood pressure, pupil barycenter on Y coordinate, age, and axial length.

Download Full-text

AN EFFICIENT MACHINE LEARNING MODEL FOR PREDICTION OF ACUTE MYOCARDIAL INFARCTION

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666200325104317 ◽

2020 ◽

Vol 13 ◽

Author(s):

Dhilsath Fathima.M ◽

S. Justin Samuel ◽

R. Hari Haran

Keyword(s):

Machine Learning ◽

Myocardial Infarction ◽

Acute Myocardial Infarction ◽

Logistic Regression ◽

Decision Tree ◽

Learning Model ◽

Training Dataset ◽

Data Set ◽

Machine Learning Model ◽

Proposed Model

Aim: This proposed work is used to develop an improved and robust machine learning model for predicting Myocardial Infarction (MI) could have substantial clinical impact. Objectives: This paper explains how to build machine learning based computer-aided analysis system for an early and accurate prediction of Myocardial Infarction (MI) which utilizes framingham heart study dataset for validation and evaluation. This proposed computer-aided analysis model will support medical professionals to predict myocardial infarction proficiently. Methods: The proposed model utilize the mean imputation to remove the missing values from the data set, then applied principal component analysis to extract the optimal features from the data set to enhance the performance of the classifiers. After PCA, the reduced features are partitioned into training dataset and testing dataset where 70% of the training dataset are given as an input to the four well-liked classifiers as support vector machine, k-nearest neighbor, logistic regression and decision tree to train the classifiers and 30% of test dataset is used to evaluate an output of machine learning model using performance metrics as confusion matrix, classifier accuracy, precision, sensitivity, F1-score, AUC-ROC curve. Results: Output of the classifiers are evaluated using performance measures and we observed that logistic regression provides high accuracy than K-NN, SVM, decision tree classifiers and PCA performs sound as a good feature extraction method to enhance the performance of proposed model. From these analyses, we conclude that logistic regression having good mean accuracy level and standard deviation accuracy compared with the other three algorithms. AUC-ROC curve of the proposed classifiers is analyzed from the output figure.4, figure.5 that logistic regression exhibits good AUC-ROC score, i.e. around 70% compared to k-NN and decision tree algorithm. Conclusion: From the result analysis, we infer that this proposed machine learning model will act as an optimal decision making system to predict the acute myocardial infarction at an early stage than an existing machine learning based prediction models and it is capable to predict the presence of an acute myocardial Infarction with human using the heart disease risk factors, in order to decide when to start lifestyle modification and medical treatment to prevent the heart disease.

Download Full-text

Machine Learning Prediction Approach to Enhance Congestion Control in 5G IoT Environment

Electronics ◽

10.3390/electronics8060607 ◽

2019 ◽

Vol 8 (6) ◽

pp. 607 ◽

Cited By ~ 5

Author(s):

Ihab Ahmed Najm ◽

Alaa Khalaf Hamoud ◽

Jaime Lloret ◽

Ignacio Bosch

Keyword(s):

Machine Learning ◽

Congestion Control ◽

Learning Model ◽

Training Dataset ◽

Transmission Protocol ◽

Control Mechanisms ◽

Next Generation ◽

Control Approach ◽

Machine Learning Model ◽

Prediction Approach

The 5G network is a next-generation wireless form of communication and the latest mobile technology. In practice, 5G utilizes the Internet of Things (IoT) to work in high-traffic networks with multiple nodes/sensors in an attempt to transmit their packets to a destination simultaneously, which is a characteristic of IoT applications. Due to this, 5G offers vast bandwidth, low delay, and extremely high data transfer speed. Thus, 5G presents opportunities and motivations for utilizing next-generation protocols, especially the stream control transmission protocol (SCTP). However, the congestion control mechanisms of the conventional SCTP negatively influence overall performance. Moreover, existing mechanisms contribute to reduce 5G and IoT performance. Thus, a new machine learning model based on a decision tree (DT) algorithm is proposed in this study to predict optimal enhancement of congestion control in the wireless sensors of 5G IoT networks. The model was implemented on a training dataset to determine the optimal parametric setting in a 5G environment. The dataset was used to train the machine learning model and enable the prediction of optimal alternatives that can enhance the performance of the congestion control approach. The DT approach can be used for other functions, especially prediction and classification. DT algorithms provide graphs that can be used by any user to understand the prediction approach. The DT C4.5 provided promising results, with more than 92% precision and recall.

Download Full-text

Efficient simulation of flood events using machine learning

10.5194/egusphere-egu2020-6254 ◽

2020 ◽

Author(s):

Jihane Elyahyioui ◽

Valentijn Pauwels ◽

Edoardo Daly ◽

Francois Petitjean ◽

Mahesh Prakash

Keyword(s):

Machine Learning ◽

Water Depth ◽

Input Data ◽

Learning Model ◽

Water Levels ◽

Training Dataset ◽

Time Step ◽

Machine Learning Model ◽

Maximum Water ◽

Spatio Temporal

Flooding is one of the most common and costly natural hazards at global scale. Flood models are important in supporting flood management. This is a computationally expensive process, due to the high nonlinearity of the equations involved and the complexity of the surface topography. New modelling approaches based on deep learning algorithms have recently emerged for multiple applications.This study aims to investigate the capacity of machine learning to achieve spatio-temporal flood modelling. The combination of spatial and temporal input data to obtain dynamic results of water levels and flows from a machine learning model on multiple domains for applications in flood risk assessments has not been achieved yet. Here, we develop increasingly complex architectures aimed at interpreting the raw input data of precipitation and terrain to generate essential spatio-temporal variables (water level and velocity fields) and derived products (flood maps) by training these based on hydrodynamic simulations.An extensive training dataset is generated by solving the 2D shallow water equations on simplified topographies using Lisflood-FP.As a first task, the machine learning model is trained to reproduce the maximum water depth, using as inputs the precipitation time series and the topographic grid. The models combine the spatial and temporal information through a combination of 1D and 2D convolutional layers, pooling, merging and upscaling. Multiple variations of this generic architecture are trained to determine the best one(s). Overall, the trained models return good results regarding performance indices (mean squared error, mean absolute error and classification accuracy) but fail at predicting the maximum water depths with sufficient precision for practical applications.A major limitation of this approach is the availability of training examples. As a second task, models will be trained to bring the state of the system (spatially distributed water depth and velocity) from one time step to the next, based on the same inputs as previously, generating the full solution equivalent to that of a hydrodynamic solver. The training database becomes much larger as each pair of consecutive time steps constitutes one training example.Assuming that a reliable model can be built and trained, such methodology could be applied to build models that are faster and less computationally demanding than hydrodynamic models. Indeed, in with the synthetic cases shown here, the simulation times of the machine learning models (< seconds) are far shorter than those of the hydrodynamic model (a few minutes at least). These data-driven models could be used for interpolation and forecasting. The potential for extrapolation beyond the range of training datasets will also be investigated (different topography and high intensity precipitation events).&#160;

Download Full-text

Prediction of Tumor Shrinkage Pattern to Neoadjuvant Chemotherapy Using a Multiparametric MRI-Based Machine Learning Model in Patients With Breast Cancer

Frontiers in Bioengineering and Biotechnology ◽

10.3389/fbioe.2021.662749 ◽

2021 ◽

Vol 9 ◽

Author(s):

Yuhong Huang ◽

Wenben Chen ◽

Xiaoling Zhang ◽

Shaofu He ◽

Nan Shao ◽

...

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Cross Validation ◽

Learning Model ◽

Training Dataset ◽

Tumor Shrinkage ◽

Clinicopathologic Characteristics ◽

Testing Dataset ◽

Machine Learning Model ◽

Fold Cross Validation

Aim: After neoadjuvant chemotherapy (NACT), tumor shrinkage pattern is a more reasonable outcome to decide a possible breast-conserving surgery (BCS) than pathological complete response (pCR). The aim of this article was to establish a machine learning model combining radiomics features from multiparametric MRI (mpMRI) and clinicopathologic characteristics, for early prediction of tumor shrinkage pattern prior to NACT in breast cancer.Materials and Methods: This study included 199 patients with breast cancer who successfully completed NACT and underwent following breast surgery. For each patient, 4,198 radiomics features were extracted from the segmented 3D regions of interest (ROI) in mpMRI sequences such as T1-weighted dynamic contrast-enhanced imaging (T1-DCE), fat-suppressed T2-weighted imaging (T2WI), and apparent diffusion coefficient (ADC) map. The feature selection and supervised machine learning algorithms were used to identify the predictors correlated with tumor shrinkage pattern as follows: (1) reducing the feature dimension by using ANOVA and the least absolute shrinkage and selection operator (LASSO) with 10-fold cross-validation, (2) splitting the dataset into a training dataset and testing dataset, and constructing prediction models using 12 classification algorithms, and (3) assessing the model performance through an area under the curve (AUC), accuracy, sensitivity, and specificity. We also compared the most discriminative model in different molecular subtypes of breast cancer.Results: The Multilayer Perception (MLP) neural network achieved higher AUC and accuracy than other classifiers. The radiomics model achieved a mean AUC of 0.975 (accuracy = 0.912) on the training dataset and 0.900 (accuracy = 0.828) on the testing dataset with 30-round 6-fold cross-validation. When incorporating clinicopathologic characteristics, the mean AUC was 0.985 (accuracy = 0.930) on the training dataset and 0.939 (accuracy = 0.870) on the testing dataset. The model further achieved good AUC on the testing dataset with 30-round 5-fold cross-validation in three molecular subtypes of breast cancer as following: (1) HR+/HER2–: 0.901 (accuracy = 0.816), (2) HER2+: 0.940 (accuracy = 0.865), and (3) TN: 0.837 (accuracy = 0.811).Conclusions: It is feasible that our machine learning model combining radiomics features and clinical characteristics could provide a potential tool to predict tumor shrinkage patterns prior to NACT. Our prediction model will be valuable in guiding NACT and surgical treatment in breast cancer.

Download Full-text

A machine learning model for predicting ICU readmissions and key risk factors: analysis from a longitudinal health records

Health and Technology ◽

10.1007/s12553-019-00329-0 ◽

2019 ◽

Vol 9 (3) ◽

pp. 297-309

Author(s):

Alvaro Ribeiro Botelho Junqueira ◽

Farhaan Mirza ◽

Mirza Mansoor Baig

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Learning Model ◽

Health Records ◽

Factors Analysis ◽

Machine Learning Model

Download Full-text

Risk Identification, Assessments, and Prediction for Mega Construction Projects: A Risk Prediction Paradigm Based on Cross Analytical-Machine Learning Model

Buildings ◽

10.3390/buildings11040172 ◽

2021 ◽

Vol 11 (4) ◽

pp. 172

Author(s):

Debalina Banerjee Chattapadhyay ◽

Jagadeesh Putta ◽

Rama Mohan Rao P

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Risk Management ◽

High Risk ◽

Risk Prediction ◽

Learning Model ◽

Risk Identification ◽

Large Set ◽

High Risk Factors ◽

Machine Learning Model

Risk identification and management are the two most important parts of construction project management. Better risk management can help in determining the future consequences, but identifying possible risk factors has a direct and indirect impact on the risk management process. In this paper, a risk prediction system based on a cross analytical-machine learning model was developed for construction megaprojects. A total of 63 risk factors pertaining to the cost, time, quality, and scope of the megaproject and primary data were collected from industry experts on a five-point Likert scale. The obtained sample was further processed statistically to generate a significantly large set of features to perform K-means clustering based on high-risk factor and allied sub-risk component identification. Descriptive analysis, followed by the synthetic minority over-sampling technique (SMOTE) and the Wilcoxon rank-sum test was performed to retain the most significant features pertaining to cost, time, quality, and scope. Eventually, unlike classical K-means clustering, a genetic-algorithm-based K-means clustering algorithm (GA–K-means) was applied with dual-objective functions to segment high-risk factors and allied sub-risk components. The proposed model identified different high-risk factors and sub-risk factors, which cumulatively can impact overall performance. Thus, identifying these high-risk factors and corresponding sub-risk components can help stakeholders in achieving project success.

Download Full-text

PredictMed: A Machine Learning Model for Identifying Risk Factors of Neuromuscular Hip Dysplasia: A Multicenter Descriptive Study

Neuropediatrics ◽

10.1055/s-0040-1721703 ◽

2020 ◽

Author(s):

Carlo M. Bertoncelli ◽

Paola Altamura ◽

Domenico Bertoncelli ◽

Virginie Rampal ◽

Edgar Ramos Vieira ◽

...

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Logistic Regression ◽

Motor Function ◽

Hip Dysplasia ◽

Descriptive Study ◽

Learning Model ◽

Trunk Muscles ◽

Lateral Migration ◽

Machine Learning Model

AbstractNeuromuscular hip dysplasia (NHD) is a common and severe problem in patients with cerebral palsy (CP). Previous studies have so far identified only spasticity (SP) and high levels of Gross Motor Function Classification System as factors associated with NHD. The aim of this study is to develop a machine learning model to identify additional risk factors of NHD. This was a cross-sectional multicenter descriptive study of 102 teenagers with CP (60 males, 42 females; 60 inpatients, 42 outpatients; mean age 16.5 ± 1.2 years, range 12–18 years). Data on etiology, diagnosis, SP, epilepsy (E), clinical history, and functional assessments were collected between 2007 and 2017. Hip dysplasia was defined as femoral head lateral migration percentage > 33% on pelvic radiogram. A logistic regression-prediction model named PredictMed was developed to identify risk factors of NHD. Twenty-eight (27%) teenagers with CP had NHD, of which 18 (67%) had dislocated hips. Logistic regression model identified poor walking abilities (p < 0.001; odds ratio [OR] infinity; 95% confidence interval [CI] infinity), scoliosis (p = 0.01; OR 3.22; 95% CI 1.30–7.92), trunk muscles' tone disorder (p = 0.002; OR 4.81; 95% CI 1.75–13.25), SP (p = 0.006; OR 6.6; 95% CI 1.46–30.23), poor motor function (p = 0.02; OR 5.5; 95% CI 1.2–25.2), and E (p = 0.03; OR 2.6; standard error 0.44) as risk factors of NHD. The accuracy of the model was 77%. PredictMed identified trunk muscles' tone disorder, severe scoliosis, E, and SP as risk factors of NHD in teenagers with CP.

Download Full-text

Geographically weighted machine learning model for untangling spatial heterogeneity of type 2 diabetes mellitus (T2D) prevalence in the USA

Scientific Reports ◽

10.1038/s41598-021-85381-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Sarah Quiñones ◽

Aditya Goyal ◽

Zia U. Ahmed

Keyword(s):

Diabetes Mellitus ◽

Machine Learning ◽

Risk Factors ◽

Type 2 Diabetes ◽

Type 2 Diabetes Mellitus ◽

Spatial Heterogeneity ◽

Learning Model ◽

Machine Learning Model ◽

Non Parametric

AbstractType 2 diabetes mellitus (T2D) prevalence in the United States varies substantially across spatial and temporal scales, attributable to variations of socioeconomic and lifestyle risk factors. Understanding these variations in risk factors contributions to T2D would be of great benefit to intervention and treatment approaches to reduce or prevent T2D. Geographically-weighted random forest (GW-RF), a tree-based non-parametric machine learning model, may help explore and visualize the relationships between T2D and risk factors at the county-level. GW-RF outputs are compared to global (RF and OLS) and local (GW-OLS) models between the years of 2013–2017 using low education, poverty, obesity, physical inactivity, access to exercise, and food environment as inputs. Our results indicate that a non-parametric GW-RF model shows a high potential for explaining spatial heterogeneity of, and predicting, T2D prevalence over traditional local and global models when inputting six major risk factors. Some of these predictions, however, are marginal. These findings of spatial heterogeneity using GW-RF demonstrate the need to consider local factors in prevention approaches. Spatial analysis of T2D and associated risk factor prevalence offers useful information for targeting the geographic area for prevention and disease interventions.

Download Full-text

MRI-based Radiomics for Prognosis of Pediatric Diffuse Intrinsic Pontine Glioma: An International Study

Neuro-Oncology Advances ◽

10.1093/noajnl/vdab042 ◽

2021 ◽

Author(s):

Lydia T Tam ◽

Kristen W Yeom ◽

Jason N Wright ◽

Alok Jaju ◽

Alireza Radmanesh ◽

...

Keyword(s):

Machine Learning ◽

International Study ◽

Learning Model ◽

Image Feature ◽

Training Dataset ◽

Post Contrast ◽

Clinical Variables ◽

Testing Dataset ◽

Machine Learning Model ◽

Independent Testing Dataset

Abstract Background Diffuse Intrinsic pontine gliomas (DIPGs) are lethal pediatric brain tumors. Presently, MRI is the mainstay of disease diagnosis and surveillance. We identify clinically significant computational features from MRI and create a prognostic machine learning model. Methods We isolated tumor volumes of T1-post contrast (T1) and T2-weighted (T2) MRIs from 177 treatment-naïve DIPG patients from an international cohort for model training and testing. The Quantitative Image Feature Pipeline and PyRadiomics was used for feature extraction. Ten-fold cross-validation of LASSO Cox regression selected optimal features to predict overall survival (OS) in the training dataset and tested in the independent testing dataset. We analyzed model performance using clinical variables (age at diagnosis and sex) only, radiomics only, and radiomics plus clinical variables. Results All selected features were intensity and texture-based on the wavelet filtered images (three T1 grey-level co-occurrence matrix (GLCM) texture features, T2 GLCM texture feature, and T2 first order-mean). This multivariable Cox model demonstrated a concordance of 0.68 [95% CI: 0.61-0.74] in the training dataset, significantly outperforming the clinical-only model (C=0.57 [95% CI: 0.49-0.64]). Adding clinical features to radiomics slightly improved performance (C=0.70 [95% CI: 0.64-0.77]). The combined radiomics and clinical model was validated in the independent testing dataset (C=0.59 [95% CI: 0.51-0.67], Noether’s test p=0.02). Conclusion In this international study, we demonstrate the use of radiomic signatures to create a machine learning model for DIPG prognostication. Standardized, quantitative approaches that objectively measure DIPG changes, including computational MRI evaluation, could offer new approaches to assessing tumor phenotype and serve a future role for optimizing clinical trial eligibility and tumor surveillance.

Download Full-text

Towards a multivariate prediction model of pharmacological treatment for women with gestational diabetes mellitus (Preprint)

10.2196/preprints.21435 ◽

2020 ◽

Author(s):

Carmelo Velardo ◽

David Clifton ◽

Steven Hamblin ◽

Rabia Khan ◽

Lionel Tarassenko ◽

...

Keyword(s):

Diabetes Mellitus ◽

Machine Learning ◽

Risk Factors ◽

Blood Glucose ◽

Gestational Diabetes ◽

Gestational Diabetes Mellitus ◽

Real Time ◽

Pharmacological Treatment ◽

Learning Model ◽

Machine Learning Model

BACKGROUND Successful management of gestational diabetes mellitus (GDM) reduces the risk of morbidity in women and newborns. A woman’s BG readings and risk factors are used by clinical staff to make decisions regarding the initiation of pharmacological treatment in women with GDM. Mobile-Health (mHealth) solutions allow the real-time follow-up of women with GDM and allow timely treatment and management. Machine learning offers the opportunity to quickly analyse large quantities of data to automatically flag women at risk of requiring pharmacological treatment. OBJECTIVE We sought to assess whether data collected through a mHealth system can be analysed to automatically evaluate the switch to pharmacological treatment from diet-based management of GDM. METHODS We collected data from 3,029 patients to design a machine-learning model that can identify when a woman with GDM needs to switch to medications (Insulin or Metformin) by analysing the data related to blood glucose and other risk factors. RESULTS Through the analysis of 411,785 blood glucose (BG) readings we have designed a machine learning model that can predict the timing of initiation of pharmacological treatment. After one hundred experimental repetitions we have obtained an average performance of 0.80 AUC and an algorithm that allows the flexibility of setting the operating point rather than relying on a static heuristic method, currently used in clinical practice. CONCLUSIONS Using real-time data collected via a mHealth system may further improve the timeliness of intervention and potentially improve patient care. Further real-time clinical testing will enable validating our algorithm using real-world data.

Download Full-text