Accurate Prediction of Stroke for Hypertensive Patients Based on Medical Big Data and Machine Learning Algorithms: A Retrospective Study (Preprint)

BACKGROUND Stroke risk assessment is an importance means of primary prevention, but the applicability of existing stroke risk assessment scales in Chinese population is still controversial. Prospective study is a common method of medical research, but it is time-consuming and labor-intensive. Medical big data has been demonstrated to promote discovery of disease risk factors and prognosis, and attracts broad research interests. OBJECTIVE We aimed to establish a high-precision stroke risk prediction model for hypertensive patients through historical stock electronic medical records and machine learning algorithms. METHODS Based on Shen Health Information Big Data Platform, a total number of 57,671 patients were screened from 250,788 registered hypertensive patients, of whom 9,421 had stroke onset after three years of follow-up. In addition to baseline features and historical symptoms, we constructed several trend characteristics from multi-temporal medical records. Stratified sampling was implemented according to gender ratio and age stratification to balance positive and negative cases, and then 19,953 samples were randomly divided into training set and test set according to a ratio of 7:3. Four machine learning methods were adopted for modeling, and risk performance was compared with several traditional risk scales. We also analyzed the non-linear effects of continuous features on stroke onset. RESULTS The integrated tree-based XGBoost achieved better performance with area under the receiver operating characteristic curve (AUC) of 0.9220, surpassing the other three traditional machine learning methods. Comparison with two traditional risk scales, the Framingham stroke risk profiles and the Chinese Multi-provincial Cohort Study, our proposed model achieved higher performance on an independent validation set, and AUC increased by 0.17. Further analysis of non-linear effects reveals the importance of multi-temporal trend characteristics for stroke risk prediction, which is beneficial to the standardized management of hypertensive patients. CONCLUSIONS A high-precision three-year stroke risk prediction model for hypertensive patients was established, and verified the model performance over traditional risk scales. Multi-temporal trend characteristics play an important role in stroke onset, and then the model could be deployed to electronic health record systems to assist in more pervasive, preemptive screening of stroke risk, enabling higher efficiency of early disease prevention and intervention.

Download Full-text

Accurate Prediction of Stroke for Hypertensive Patients Based on Medical Big Data and Machine Learning Algorithms: A Retrospective Study (Preprint)

JMIR Medical Informatics ◽

10.2196/30277 ◽

2021 ◽

Author(s):

Yujie Yang ◽

Jing Zheng ◽

Zhenzhen Du ◽

Ye Li ◽

Yunpeng Cai

Keyword(s):

Machine Learning ◽

Big Data ◽

Retrospective Study ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Accurate Prediction ◽

Hypertensive Patients ◽

Medical Big Data

Download Full-text

Review of Machine Learning Algorithms for Health-care Management Medical Big Data Systems

2020 International Conference on Inventive Computation Technologies (ICICT) ◽

10.1109/icict48043.2020.9112458 ◽

2020 ◽

Author(s):

Zou Xiang ◽

Cao Jinghua ◽

Wen Tao

Keyword(s):

Machine Learning ◽

Health Care ◽

Big Data ◽

Care Management ◽

Learning Algorithms ◽

Health Care Management ◽

Machine Learning Algorithms ◽

Data Systems ◽

Big Data Systems ◽

Medical Big Data

Download Full-text

A feature-based hybrid recommender system for risk prediction : Machine learning approach (Preprint)

10.2196/preprints.11010 ◽

2020 ◽

Author(s):

Uzair Bhatti

Keyword(s):

Machine Learning ◽

Risk Prediction ◽

Predictive Accuracy ◽

Correct Diagnosis ◽

Recommendation Systems ◽

Data Integrity ◽

Machine Learning Algorithms ◽

Patient Counseling ◽

Hybrid Filtering ◽

Novel Algorithm

BACKGROUND In the era of health informatics, exponential growth of information generated by health information systems and healthcare organizations demands expert and intelligent recommendation systems. It has become one of the most valuable tools as it reduces problems such as information overload while selecting and suggesting doctors, hospitals, medicine, diagnosis etc according to patients’ interests. OBJECTIVE Recommendation uses Hybrid Filtering as one of the most popular approaches, but the major limitations of this approach are selectivity and data integrity issues.Mostly existing recommendation systems & risk prediction algorithms focus on a single domain, on the other end cross-domain hybrid filtering is able to alleviate the degree of selectivity and data integrity problems to a better extent. METHODS We propose a novel algorithm for recommendation & predictive model using KNN algorithm with machine learning algorithms and artificial intelligence (AI). We find the factors that directly impact on diseases and propose an approach for predicting the correct diagnosis of different diseases. We have constructed a series of models with good reliability for predicting different surgery complications and identified several novel clinical associations. We proposed a novel algorithm pr-KNN to use KNN for prediction and recommendation of diseases RESULTS Beside that we compared the performance of our algorithm with other machine algorithms and found better performance of our algorithm, with predictive accuracy improving by +3.61%. CONCLUSIONS The potential to directly integrate these predictive tools into EHRs may enable personalized medicine and decision-making at the point of care for patient counseling and as a teaching tool. CLINICALTRIAL dataset for the trials of patient attached

Download Full-text

Machine Learning Algorithms for Short-Term Load Forecast in Residential Buildings Using Smart Meters, Sensors and Big Data Solutions

IEEE Access ◽

10.1109/access.2019.2958383 ◽

2019 ◽

Vol 7 ◽

pp. 177874-177889 ◽

Cited By ~ 10

Author(s):

Simona-Vasilica Oprea ◽

Adela Bara

Keyword(s):

Machine Learning ◽

Big Data ◽

Residential Buildings ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Short Term ◽

Smart Meters ◽

Load Forecast

Download Full-text

Accelerating organic solar cell material's discovery: high-throughput screening and big data

Energy & Environmental Science ◽

10.1039/d1ee00559f ◽

2021 ◽

Author(s):

Xabier Rodríguez-Martínez ◽

Enrique Pascual-San-José ◽

Mariano Campoy-Quiles

Keyword(s):

Machine Learning ◽

Big Data ◽

High Throughput ◽

Organic Solar Cells ◽

High Throughput Screening ◽

Organic Solar Cell ◽

State Of The Art ◽

Review Article ◽

Machine Learning Algorithms ◽

Device Optimization

This review article presents the state-of-the-art in high-throughput computational and experimental screening routines with application in organic solar cells, including materials discovery, device optimization and machine-learning algorithms.

Download Full-text

Efficient and Rapid Machine Learning Algorithms for Big Data and Dynamic Varying Systems

IEEE Transactions on Systems Man and Cybernetics Systems ◽

10.1109/tsmc.2017.2741558 ◽

2017 ◽

Vol 47 (10) ◽

pp. 2625-2626 ◽

Cited By ~ 16

Author(s):

Fuchun Sun ◽

Guang-Bin Huang ◽

Q. M. Jonathan Wu ◽

Shiji Song ◽

Donald C. Wunsch II

Keyword(s):

Machine Learning ◽

Big Data ◽

Learning Algorithms ◽

Machine Learning Algorithms

Download Full-text

Research on Parallel Support Vector Machine Based on Spark Big Data Platform

Scientific Programming ◽

10.1155/2021/7998417 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Yao Huimin

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Big Data ◽

Support Vector Machines ◽

Cross Validation ◽

Machine Learning Algorithms ◽

Support Vector ◽

Lambda Architecture ◽

Vector Machines ◽

Data Platform

With the development of cloud computing and distributed cluster technology, the concept of big data has been expanded and extended in terms of capacity and value, and machine learning technology has also received unprecedented attention in recent years. Traditional machine learning algorithms cannot solve the problem of effective parallelization, so a parallelization support vector machine based on Spark big data platform is proposed. Firstly, the big data platform is designed with Lambda architecture, which is divided into three layers: Batch Layer, Serving Layer, and Speed Layer. Secondly, in order to improve the training efficiency of support vector machines on large-scale data, when merging two support vector machines, the “special points” other than support vectors are considered, that is, the points where the nonsupport vectors in one subset violate the training results of the other subset, and a cross-validation merging algorithm is proposed. Then, a parallelized support vector machine based on cross-validation is proposed, and the parallelization process of the support vector machine is realized on the Spark platform. Finally, experiments on different datasets verify the effectiveness and stability of the proposed method. Experimental results show that the proposed parallelized support vector machine has outstanding performance in speed-up ratio, training time, and prediction accuracy.

Download Full-text

Predicting the Development of Type 2 Diabetes in a Large Australian Cohort Using Machine-Learning Techniques: Longitudinal Survey Study (Preprint)

10.2196/preprints.16850 ◽

2019 ◽

Author(s):

Lei Zhang ◽

Xianwen Shang ◽

Subhashaan Sreedharan ◽

Xixi Yan ◽

Jianbin Liu ◽

...

Keyword(s):

Machine Learning ◽

Type 2 Diabetes ◽

Risk Prediction ◽

Machine Learning Algorithms ◽

Survey Study ◽

Gradient Boosting ◽

Diabetes Incidence ◽

Diabetes Onset ◽

Prediction Of Diabetes

BACKGROUND Previous conventional models for the prediction of diabetes could be updated by incorporating the increasing amount of health data available and new risk prediction methodology. OBJECTIVE We aimed to develop a substantially improved diabetes risk prediction model using sophisticated machine-learning algorithms based on a large retrospective population cohort of over 230,000 people who were enrolled in the study during 2006-2017. METHODS We collected demographic, medical, behavioral, and incidence data for type 2 diabetes mellitus (T2DM) in over 236,684 diabetes-free participants recruited from the 45 and Up Study. We predicted and compared the risk of diabetes onset in these participants at 3, 5, 7, and 10 years based on three machine-learning approaches and the conventional regression model. RESULTS Overall, 6.05% (14,313/236,684) of the participants developed T2DM during an average 8.8-year follow-up period. The 10-year diabetes incidence in men was 8.30% (8.08%-8.49%), which was significantly higher (odds ratio 1.37, 95% CI 1.32-1.41) than that in women at 6.20% (6.00%-6.40%). The incidence of T2DM was doubled in individuals with obesity (men: 17.78% [17.05%-18.43%]; women: 14.59% [13.99%-15.17%]) compared with that of nonobese individuals. The gradient boosting machine model showed the best performance among the four models (area under the curve of 79% in 3-year prediction and 75% in 10-year prediction). All machine-learning models predicted BMI as the most significant factor contributing to diabetes onset, which explained 12%-50% of the variance in the prediction of diabetes. The model predicted that if BMI in obese and overweight participants could be hypothetically reduced to a healthy range, the 10-year probability of diabetes onset would be significantly reduced from 8.3% to 2.8% (<i>P</i><.001). CONCLUSIONS A one-time self-reported survey can accurately predict the risk of diabetes using a machine-learning approach. Achieving a healthy BMI can significantly reduce the risk of developing T2DM.

Download Full-text

Predicting the Risk of Hypertension Based on Several Easy-to-Collect Risk Factors: A Machine Learning Method

Frontiers in Public Health ◽

10.3389/fpubh.2021.619429 ◽

2021 ◽

Vol 9 ◽

Author(s):

Huanhuan Zhao ◽

Xiaoyu Zhang ◽

Yang Xu ◽

Lisheng Gao ◽

Zuchang Ma ◽

...

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Logistic Regression ◽

Risk Prediction ◽

Disease Risk ◽

Learning Algorithms ◽

Large Population ◽

Machine Learning Algorithms ◽

Hypertension Risk ◽

Model Training

Hypertension is a widespread chronic disease. Risk prediction of hypertension is an intervention that contributes to the early prevention and management of hypertension. The implementation of such intervention requires an effective and easy-to-implement hypertension risk prediction model. This study evaluated and compared the performance of four machine learning algorithms on predicting the risk of hypertension based on easy-to-collect risk factors. A dataset of 29,700 samples collected through a physical examination was used for model training and testing. Firstly, we identified easy-to-collect risk factors of hypertension, through univariate logistic regression analysis. Then, based on the selected features, 10-fold cross-validation was utilized to optimize four models, random forest (RF), CatBoost, MLP neural network and logistic regression (LR), to find the best hyper-parameters on the training set. Finally, the performance of models was evaluated by AUC, accuracy, sensitivity and specificity on the test set. The experimental results showed that the RF model outperformed the other three models, and achieved an AUC of 0.92, an accuracy of 0.82, a sensitivity of 0.83 and a specificity of 0.81. In addition, Body Mass Index (BMI), age, family history and waist circumference (WC) are the four primary risk factors of hypertension. These findings reveal that it is feasible to use machine learning algorithms, especially RF, to predict hypertension risk without clinical or genetic data. The technique can provide a non-invasive and economical way for the prevention and management of hypertension in a large population.

Download Full-text

Implementing machine learning algorithms for suicide risk prediction in clinical practice: A focus group study

10.31234/osf.io/6m5qd ◽

2021 ◽

Author(s):

Kate Bentley ◽

Kelly Zuromski ◽

Rebecca Fortgang ◽

Emily Madsen ◽

Daniel Kessler ◽

...

Keyword(s):

Machine Learning ◽

Risk Assessment ◽

Clinical Practice ◽

Risk Prediction ◽

Suicide Risk ◽

Prediction Models ◽

Clinical Decision ◽

Machine Learning Algorithms ◽

Suicide Risk Assessment ◽

Focus Group Study

Background: Interest in developing machine learning algorithms that use electronic health record data to predict patients’ risk of suicidal behavior has recently proliferated. Whether and how such models might be implemented and useful in clinical practice, however, remains unknown. In order to ultimately make automated suicide risk prediction algorithms useful in practice, and thus better prevent patient suicides, it is critical to partner with key stakeholders (including the frontline providers who will be using such tools) at each stage of the implementation process.Objective: The aim of this focus group study was to inform ongoing and future efforts to deploy suicide risk prediction models in clinical practice. The specific goals were to better understand hospital providers’ current practices for assessing and managing suicide risk; determine providers’ perspectives on using automated suicide risk prediction algorithms; and identify barriers, facilitators, recommendations, and factors to consider for initiatives in this area. Methods: We conducted 10 two-hour focus groups with a total of 40 providers from psychiatry, internal medicine and primary care, emergency medicine, and obstetrics and gynecology departments within an urban academic medical center. Audio recordings of open-ended group discussions were transcribed and coded for relevant and recurrent themes by two independent study staff members. All coded text was reviewed and discrepancies resolved in consensus meetings with doctoral-level staff. Results: Though most providers reported using standardized suicide risk assessment tools in their clinical practices, existing tools were commonly described as unhelpful and providers indicated dissatisfaction with current suicide risk assessment methods. Overall, providers’ general attitudes toward the practical use of automated suicide risk prediction models and corresponding clinical decision support tools were positive. Providers were especially interested in the potential to identify high-risk patients who might be missed by traditional screening methods. Some expressed skepticism about the potential usefulness of these models in routine care; specific barriers included concerns about liability, alert fatigue, and increased demand on the healthcare system. Key facilitators included presenting specific patient-level features contributing to risk scores, emphasizing changes in risk over time, and developing systematic clinical workflows and provider trainings. Participants also recommended considering risk-prediction windows, timing of alerts, who will have access to model predictions, and variability across treatment settings.Conclusions: Providers were dissatisfied with current suicide risk assessment methods and open to the use of a machine learning-based risk prediction system to inform clinical decision-making. They also raised multiple concerns about potential barriers to the usefulness of this approach and suggested several possible facilitators. Future efforts in this area will benefit from incorporating systematic qualitative feedback from providers, patients, administrators, and payers on the use of new methods in routine care, especially given the complex, sensitive, and unfortunately still stigmatized nature of suicide risk.

Download Full-text