Evaluation of risk analysis process in medical big data using Machine Learning

BACKGROUND Stroke risk assessment is an importance means of primary prevention, but the applicability of existing stroke risk assessment scales in Chinese population is still controversial. Prospective study is a common method of medical research, but it is time-consuming and labor-intensive. Medical big data has been demonstrated to promote discovery of disease risk factors and prognosis, and attracts broad research interests. OBJECTIVE We aimed to establish a high-precision stroke risk prediction model for hypertensive patients through historical stock electronic medical records and machine learning algorithms. METHODS Based on Shen Health Information Big Data Platform, a total number of 57,671 patients were screened from 250,788 registered hypertensive patients, of whom 9,421 had stroke onset after three years of follow-up. In addition to baseline features and historical symptoms, we constructed several trend characteristics from multi-temporal medical records. Stratified sampling was implemented according to gender ratio and age stratification to balance positive and negative cases, and then 19,953 samples were randomly divided into training set and test set according to a ratio of 7:3. Four machine learning methods were adopted for modeling, and risk performance was compared with several traditional risk scales. We also analyzed the non-linear effects of continuous features on stroke onset. RESULTS The integrated tree-based XGBoost achieved better performance with area under the receiver operating characteristic curve (AUC) of 0.9220, surpassing the other three traditional machine learning methods. Comparison with two traditional risk scales, the Framingham stroke risk profiles and the Chinese Multi-provincial Cohort Study, our proposed model achieved higher performance on an independent validation set, and AUC increased by 0.17. Further analysis of non-linear effects reveals the importance of multi-temporal trend characteristics for stroke risk prediction, which is beneficial to the standardized management of hypertensive patients. CONCLUSIONS A high-precision three-year stroke risk prediction model for hypertensive patients was established, and verified the model performance over traditional risk scales. Multi-temporal trend characteristics play an important role in stroke onset, and then the model could be deployed to electronic health record systems to assist in more pervasive, preemptive screening of stroke risk, enabling higher efficiency of early disease prevention and intervention.

Download Full-text

Advanced Interpretable Machine Learning Methods for Clinical NGS Big Data of Complex Hereditary Diseases

10.3389/978-2-88966-274-6 ◽

2020 ◽

Keyword(s):

Machine Learning ◽

Big Data ◽

Hereditary Diseases ◽

Learning Methods ◽

Machine Learning Methods ◽

Interpretable Machine Learning

Download Full-text

Bipolar Disorder and Oxidative Stress Injury Mechanism - Clinical Big Data Analysis Based on Machine Learning

Case Medical Research ◽

10.31525/ct1-nct03949218 ◽

2019 ◽

Author(s):

Keyword(s):

Oxidative Stress ◽

Machine Learning ◽

Bipolar Disorder ◽

Big Data ◽

Data Analysis ◽

Big Data Analysis ◽

Injury Mechanism ◽

Stress Injury ◽

Oxidative Stress Injury ◽

And Oxidative Stress

Download Full-text

The Cross-Sectional Pricing of Corporate Bonds Using Big Data and Machine Learning

SSRN Electronic Journal ◽

10.2139/ssrn.3686164 ◽

2020 ◽

Cited By ~ 2

Author(s):

Turan G. Bali ◽

Amit Goyal ◽

Dashan Huang ◽

Fuwei Jiang ◽

Quan Wen

Keyword(s):

Machine Learning ◽

Big Data ◽

Corporate Bonds ◽

Cross Sectional ◽

The Cross

Download Full-text

Data Driven Smart Proxy for CFD Application of Big Data Analytics & Machine Learning in Computational Fluid Dynamics, Report Two: Model Building at the Cell Level

10.2172/1431303 ◽

2018 ◽

Cited By ~ 1

Author(s):

A. Ansari ◽

S. Mohaghegh ◽

M. Shahnam ◽

J. F. Dietiker ◽

T. Li

Keyword(s):

Machine Learning ◽

Fluid Dynamics ◽

Computational Fluid Dynamics ◽

Big Data ◽

Data Analytics ◽

Model Building ◽

Big Data Analytics ◽

Data Driven ◽

Cell Level

Download Full-text

Identifying Cancer Targets Based on Machine Learning Methods via Chou’s 5-steps Rule and General Pseudo Components

Current Topics in Medicinal Chemistry ◽

10.2174/1568026619666191016155543 ◽

2019 ◽

Vol 19 (25) ◽

pp. 2301-2317 ◽

Cited By ~ 2

Author(s):

Ruirui Liang ◽

Jiayang Xie ◽

Chi Zhang ◽

Mengying Zhang ◽

Hai Huang ◽

...

Keyword(s):

Machine Learning ◽

Growth Rate ◽

Big Data ◽

Human Genome Project ◽

Genome Project ◽

Support Vector ◽

Successful Implementation ◽

Learning Methods ◽

Machine Learning Methods ◽

Vector Machines

In recent years, the successful implementation of human genome project has made people realize that genetic, environmental and lifestyle factors should be combined together to study cancer due to the complexity and various forms of the disease. The increasing availability and growth rate of ‘big data’ derived from various omics, opens a new window for study and therapy of cancer. In this paper, we will introduce the application of machine learning methods in handling cancer big data including the use of artificial neural networks, support vector machines, ensemble learning and naïve Bayes classifiers.

Download Full-text

Medical Big Data Analytics and Smart Internet of Things-enabled Mobile-based Health Monitoring Systems

American Journal of Medical Research ◽

10.22381/ajmr6220194 ◽

2019 ◽

Vol 6 (2) ◽

pp. 31 ◽

Cited By ~ 1

Keyword(s):

Big Data ◽

Internet Of Things ◽

Health Monitoring ◽

Data Analytics ◽

Big Data Analytics ◽

Monitoring Systems ◽

Health Monitoring Systems ◽

Medical Big Data

Download Full-text