scholarly journals Predicting the 10-year risk of cataract surgery using machine learning techniques on questionnaire data: findings from the 45 and Up Study

2021 ◽  
pp. bjophthalmol-2020-318609
Author(s):  
Wei Wang ◽  
Xiaotong Han ◽  
Jiaqing Zhang ◽  
Xianwen Shang ◽  
Jason Ha ◽  
...  

Background/aimsTo investigate the feasibility and accuracy of using machine learning (ML) techniques on self-reported questionnaire data to predict the 10-year risk of cataract surgery, and to identify meaningful predictors of cataract surgery in middle-aged and older Australians.MethodsBaseline information regarding demographic, socioeconomic, medical history and family history, lifestyle, dietary and self-rated health status were collected as risk factors. Cataract surgery events were confirmed by the Medicare Benefits Schedule Claims dataset. Three ML algorithms (random forests [RF], gradient boosting machine and deep learning) and one traditional regression algorithm (logistic model) were compared on the accuracy of their predictions for the risk of cataract surgery. The performance was assessed using 10-fold cross-validation. The main outcome measures were areas under the receiver operating characteristic curves (AUCs).ResultsIn total, 207 573 participants, aged 45 years and above without a history of cataract surgery at baseline, were recruited from the 45 and Up Study. The performance of gradient boosting machine (AUC 0.790, 95% CI 0.785 to 0.795), RF (AUC 0.785, 95% CI 0.780 to 0.790) and deep learning (AUC 0.781, 95% CI 0.775 to 61 0.786) were robust and outperformed the traditional logistic regression method (AUC 0.767, 95% CI 0.762 to 0.773, all p<0.05). Age, self-rated eye vision and health insurance were consistently identified as important predictors in all models.ConclusionsThe study demonstrated that ML modelling was able to reasonably accurately predict the 10-year risk of cataract surgery based on questionnaire data alone and was marginally superior to the conventional logistic model.

2015 ◽  
Vol 9s3 ◽  
pp. BBI.S29469 ◽  
Author(s):  
Lucas J. Adams ◽  
Ghalib Bello ◽  
Gerard G. Dumancas

The problem of selecting important variables for predictive modeling of a specific outcome of interest using questionnaire data has rarely been addressed in clinical settings. In this study, we implemented a genetic algorithm (GA) technique to select optimal variables from questionnaire data for predicting a five-year mortality. We examined 123 questions (variables) answered by 5,444 individuals in the National Health and Nutrition Examination Survey. The GA iterations selected the top 24 variables, including questions related to stroke, emphysema, and general health problems requiring the use of special equipment, for use in predictive modeling by various parametric and nonparametric machine learning techniques. Using these top 24 variables, gradient boosting yielded the nominally highest performance (area under curve [AUC] = 0.7654), although there were other techniques with lower but not significantly different AUC. This study shows how GA in conjunction with various machine learning techniques could be used to examine questionnaire data to predict a binary outcome.


2021 ◽  
Author(s):  
Viviane Costa Silva ◽  
Mateus Silva Rocha ◽  
Glaucia Amorim Faria ◽  
Silvio Fernando Alves Xavier Junior ◽  
Tiago Almeida de Oliveira ◽  
...  

Abstract The Agriculture sector has created and collected large amounts of data. It can be gathered, stored, and analyzed to assist in decision making generating competitive value, and the use of Machine Learning techniques has been very effective for this market. In this work, a Machine Learning study was carried out using supervised classification models based on boosting to predict disease in a crop, thus identifying the model with the best areas under curve metrics. Light Gradient Boosting Machine, CatBoost Classifier, Extreme Gradient, Gradient Boosting Classifier, Adaboost models were used to qualify the crop as healthy or sick. One can see that the LightGBM algorithm provided a better fit to the data with an area under the curve of 0.76 under the use of BORUTA variable selection.


2021 ◽  
Vol 11 (24) ◽  
pp. 12083
Author(s):  
Rasa Zalakeviciute ◽  
Yves Rybarczyk ◽  
Katiuska Alexandrino ◽  
Santiago Bonilla-Bedoya ◽  
Danilo Mejia ◽  
...  

Political and economic protests build-up due to the financial uncertainty and inequality spreading throughout the world. In 2019, Latin America took the main stage in a wave of protests. While the social side of protests is widely explored, the focus of this study is the evolution of gaseous urban air pollutants during and after one of these events. Changes in concentrations of NO2, CO, O3 and SO2 during and after the strike, were studied in Quito, Ecuador using two approaches: (i) inter-period observational analysis; and (ii) machine learning (ML) gradient boosting machine (GBM) developed business-as-usual (BAU) comparison to the observations. During the strike, both methods showed a large reduction in the concentrations of NO2 (31.5–32.36%) and CO (15.55–19.85%) and a slight reduction for O3 and SO2. The GBM approach showed an exclusive potential, especially for a lengthier period of predictions, to estimate strike impact on air quality even after the strike was over. This advocates for the use of machine learning techniques to estimate an extended effect of changes in human activities on urban gaseous pollution.


Author(s):  
Vikash Chandra Sharma ◽  
David Frankenfield ◽  
Anupam Gupta ◽  
Rama Krishna Singh

More than two-third of emerging infectious diseases in recent decades are zoonotic in origin. Timely prediction of these diseases which migrate from animals to humans and preventive measures to stop the loss in terms of morbidity and mortality is the requirement of healthcare industry. Avian Influenza is one of the zoonotic diseases that have created havoc in recent past especially in Asian subcontinent. In past, attempts have been made to predict influenza using traditional time-series techniques (AR, MA, ARMA, ARIMA etc.) as well as machine learning techniques to capture the cyclicity and seasonality of these virus strains. In current research an effort has been made to utilize the Empirical Mode Decomposition (EMD) to extract the Intrinsic Mode function (IMF) and then apply state of art Machine Learning (ML) techniques to predict the series. Several machine learning techniques like Random Forest (RF) along with Gradient Boosting Machine (GBM) and Support Vector Regression (SVR)have been applied on the decomposed series. Exogenous models showed variables like temperature, humidity and precipitation have been incorporated to improve upon the forecast. An ensemble approach of ML models showed significant improvement over the traditional models in terms of long term forecast accuracy.


2019 ◽  
Vol 8 (3) ◽  
pp. 1268-1271

On the 15th of April, 1912 the titanic witnessed a disaster resulting in the sinking of her passengers on the maiden voyage near North Atlantic. Even though it is a very long time since this maritime disaster took place, the idea behind what impacts each individual survival is still a great research attracting researcher’s attention. The approach taken in this paper is to utilize the publically available data set from website called Kaggle. Kaggle is a popular data science webpage that put together information of people in the titanic into a data set for the data mining competition: “Titanic: Machine Learning from Disaster”. The research and comparisons in this paper uses a few machine learning techniques and algorithms to analyse the data for classification and prediction of survivors. The prediction and efficiency of these algorithms depend greatly on data analysis and model. The techniques used to do so are Random Forest, Support Vector Machine, Gradient Boosting Machine.


2022 ◽  
Vol 15 (1) ◽  
pp. 1-19
Author(s):  
Ravinder Kumar ◽  
Lokesh Kumar Shrivastav

Stochastic time series analysis of high-frequency stock market data is a very challenging task for the analysts due to the lack availability of efficient tool and techniques for big data analytics. This has opened the door of opportunities for the developer and researcher to develop intelligent and machine learning based tools and techniques for data analytics. This paper proposed an ensemble for stock market data prediction using three most prominent machine learning based techniques. The stock market dataset with raw data size of 39364 KB with all attributes and processed data size of 11826 KB having 872435 instances. The proposed work implements an ensemble model comprises of Deep Learning, Gradient Boosting Machine (GBM) and distributed Random Forest techniques of data analytics. The performance results of the ensemble model are compared with each of the individual methods i.e. deep learning, Gradient Boosting Machine (GBM) and Random Forest. The ensemble model performs better and achieves the highest accuracy of 0.99 and lowest error (RMSE) of 0.1.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Moojung Kim ◽  
Young Jae Kim ◽  
Sung Jin Park ◽  
Kwang Gi Kim ◽  
Pyung Chun Oh ◽  
...  

Abstract Background Annual influenza vaccination is an important public health measure to prevent influenza infections and is strongly recommended for cardiovascular disease (CVD) patients, especially in the current coronavirus disease 2019 (COVID-19) pandemic. The aim of this study is to develop a machine learning model to identify Korean adult CVD patients with low adherence to influenza vaccination Methods Adults with CVD (n = 815) from a nationally representative dataset of the Fifth Korea National Health and Nutrition Examination Survey (KNHANES V) were analyzed. Among these adults, 500 (61.4%) had answered "yes" to whether they had received seasonal influenza vaccinations in the past 12 months. The classification process was performed using the logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) machine learning techniques. Because the Ministry of Health and Welfare in Korea offers free influenza immunization for the elderly, separate models were developed for the < 65 and ≥ 65 age groups. Results The accuracy of machine learning models using 16 variables as predictors of low influenza vaccination adherence was compared; for the ≥ 65 age group, XGB (84.7%) and RF (84.7%) have the best accuracies, followed by LR (82.7%) and SVM (77.6%). For the < 65 age group, SVM has the best accuracy (68.4%), followed by RF (64.9%), LR (63.2%), and XGB (61.4%). Conclusions The machine leaning models show comparable performance in classifying adult CVD patients with low adherence to influenza vaccination.


Vibration ◽  
2021 ◽  
Vol 4 (2) ◽  
pp. 341-356
Author(s):  
Jessada Sresakoolchai ◽  
Sakdirat Kaewunruen

Various techniques have been developed to detect railway defects. One of the popular techniques is machine learning. This unprecedented study applies deep learning, which is a branch of machine learning techniques, to detect and evaluate the severity of rail combined defects. The combined defects in the study are settlement and dipped joint. Features used to detect and evaluate the severity of combined defects are axle box accelerations simulated using a verified rolling stock dynamic behavior simulation called D-Track. A total of 1650 simulations are run to generate numerical data. Deep learning techniques used in the study are deep neural network (DNN), convolutional neural network (CNN), and recurrent neural network (RNN). Simulated data are used in two ways: simplified data and raw data. Simplified data are used to develop the DNN model, while raw data are used to develop the CNN and RNN model. For simplified data, features are extracted from raw data, which are the weight of rolling stock, the speed of rolling stock, and three peak and bottom accelerations from two wheels of rolling stock. In total, there are 14 features used as simplified data for developing the DNN model. For raw data, time-domain accelerations are used directly to develop the CNN and RNN models without processing and data extraction. Hyperparameter tuning is performed to ensure that the performance of each model is optimized. Grid search is used for performing hyperparameter tuning. To detect the combined defects, the study proposes two approaches. The first approach uses one model to detect settlement and dipped joint, and the second approach uses two models to detect settlement and dipped joint separately. The results show that the CNN models of both approaches provide the same accuracy of 99%, so one model is good enough to detect settlement and dipped joint. To evaluate the severity of the combined defects, the study applies classification and regression concepts. Classification is used to evaluate the severity by categorizing defects into light, medium, and severe classes, and regression is used to estimate the size of defects. From the study, the CNN model is suitable for evaluating dipped joint severity with an accuracy of 84% and mean absolute error (MAE) of 1.25 mm, and the RNN model is suitable for evaluating settlement severity with an accuracy of 99% and mean absolute error (MAE) of 1.58 mm.


Sign in / Sign up

Export Citation Format

Share Document