Chronic Diseases Prediction over Bigdata by using Machine Learning

With big data growth in biomedical and healthcare communities, accurate analysis of medical data benefits early disease detection, patient care and community services. However, the analysis accuracy is reduced when the quality of medical data is incomplete. Moreover, different regions exhibit unique characteristics of certain regional diseases, which may weaken the prediction of disease outbreaks. In this paper, we streamline machine-learning algorithms for effective prediction of chronic disease outbreak in disease-frequent communities. We experiment the modified prediction models over real-life hospital data collected from central China in 2013-2015. To overcome the difficulty of incomplete data, we use a latent factor model to reconstruct the missing data. We experiment on a regional chronic disease of cerebral infarction. To the best of our knowledge, none of the existing work focused on both data types in the area of medical big data analytics. Compared to several typical prediction algorithms, the prediction accuracy of our proposed algorithm reaches 94.8% with a convergence speed which is faster than that of the CNN-based unimodal disease risk prediction (CNN-UDRP) algorithm.

Download Full-text

Determination of Significant Features for Building an Efficient Heart Disease Prediction System

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b3393.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 4499-4504

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Prediction Model ◽

Prediction Models ◽

Heart Diseases ◽

Medical Diagnostics ◽

Medical Data ◽

Machine Learning Algorithms ◽

Prediction System ◽

Early Stages

Heart diseases are responsible for the greatest number of deaths all over the world. These diseases are usually not detected in early stages as the cost of medical diagnostics is not affordable by a majority of the people. Research has shown that machine learning methods have a great capability to extract valuable information from the medical data. This information is used to build the prediction models which provide cost effective technological aid for a medical practitioner to detect the heart disease in early stages. However, the presence of some irrelevant and redundant features in medical data deteriorates the competence of the prediction system. This research was aimed to improve the accuracy of the existing methods by removing such features. In this study, brute force-based algorithm of feature selection was used to determine relevant significant features. After experimenting rigorously with 7528 possible combinations of features and 5 machine learning algorithms, 8 important features were identified. A prediction model was developed using these significant features. Accuracy of this model is experimentally calculated to be 86.4%which is higher than the results of existing studies. The prediction model proposed in this study shall help in predicting heart disease efficiently.

Download Full-text

Machine Learning and Its Application in Monitoring Diabetes Mellitus

Advances in Data Mining and Database Management - Handbook of Research on Engineering, Business, and Healthcare Applications of Data Science and Analytics ◽

10.4018/978-1-7998-3053-5.ch012 ◽

2021 ◽

pp. 228-288

Author(s):

Vandana Kalra ◽

Indu Kashyap ◽

Harmeet Kaur

Keyword(s):

Machine Learning ◽

Chronic Disease ◽

Data Science ◽

Prediction Models ◽

Simulation Models ◽

Machine Learning Algorithms ◽

Common Chronic Disease ◽

Future Success ◽

Automatic Pattern Recognition ◽

Knowledge Exploration

Data science is a fast-growing area that deals with data from its origin to the knowledge exploration. It comprises of two main subdomains, data analytics for preparing data, and machine learning to probe into this data for hidden patterns. Machine learning (ML) endows powerful algorithms for the automatic pattern recognition and producing prediction models for the structured and unstructured data. The available historical data has patterns having high predictive value used for the future success of an industry. These algorithms also help to obtain accurate prediction, classification, and simulation models by eliminating insignificant and faulty patterns. Machine learning provides major advancement in the healthcare industry by assisting doctors to diagnose chronic diseases correctly. Diabetes is one of the most common chronic disease that occurs when the pancreas cells are damaged and do not secrete sufficient amount of insulin required by the human body. Machine learning algorithms can help in early diagnosis of this chronic disease by studying its predictor parameter values.

Download Full-text

Improved Ant Colony on Feature Selection and Weighted Ensemble to Neural Network Based Multimodal Disease Risk Prediction (WENN-MDRP) Classifier for Disease Prediction Over Big Data

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.27.17654 ◽

2018 ◽

Vol 7 (3.27) ◽

pp. 56 ◽

Cited By ~ 2

Author(s):

Gakwaya Nkundimana Joel ◽

S Manju Priya

Keyword(s):

Neural Network ◽

Feature Selection ◽

Big Data ◽

Risk Prediction ◽

Disease Risk ◽

Community Services ◽

Medical Data ◽

Ant Colony ◽

Unstructured Data ◽

Features Selection

As the big data is growing in biomedical and healthcare communities, so are precise analyses of medical data aids, premature disease identification, patient care as well as community services. On the other hand, the accuracy of the analysis decreases, if the medical data quality is imperfect. As a result, the choice of features from the dataset turns out to be an extremely significant task. Feature selection has exposed its efficiency in numerous applications by means of constructing modest and more comprehensive models, enlightening learning performance and preparing clean and clear data. The proposed method analyzes the difficulties of feature selection for big data analytics. Improved Ant Colony Optimization based Feature Selection (IACO) algorithm is presented for resolving this issue. The reconstruction of missing data before the incomplete data available was performed with help of latent factor mode. Therefore, it was not easy to choose the best features from the structured and unstructured data. the unheard technique which is called Weighted Ensemble Based Neural Network for multimodal disease risk prediction(WENN-MDRP) algorithm is implemented in order to provide the best features selection among structured as well as unstructured data. The research method provides improved prediction accuracy when matched with conventional techniques. In the MATLAB environment, the presented classifiers are implemented. The outcomes are computed in regard to recall, precision, accuracy, f-measure and error rate.

Download Full-text

Development of Prediction Models Using Machine Learning Algorithms for Girls with Suspected Central Precocious Puberty: Retrospective Study (Preprint)

10.2196/preprints.11728 ◽

2018 ◽

Author(s):

Liyan Pan ◽

Guangjian Liu ◽

Xiaojian Mao ◽

Huixian Li ◽

Jiexin Zhang ◽

...

Keyword(s):

Machine Learning ◽

Retrospective Study ◽

Random Forest ◽

Precocious Puberty ◽

Prediction Models ◽

Central Precocious Puberty ◽

Machine Learning Algorithms ◽

Stimulation Test ◽

Gnrh Analogue ◽

Prediction Probability

BACKGROUND Central precocious puberty (CPP) in girls seriously affects their physical and mental development in childhood. The method of diagnosis—gonadotropin-releasing hormone (GnRH)–stimulation test or GnRH analogue (GnRHa)–stimulation test—is expensive and makes patients uncomfortable due to the need for repeated blood sampling. OBJECTIVE We aimed to combine multiple CPP–related features and construct machine learning models to predict response to the GnRHa-stimulation test. METHODS In this retrospective study, we analyzed clinical and laboratory data of 1757 girls who underwent a GnRHa test in order to develop XGBoost and random forest classifiers for prediction of response to the GnRHa test. The local interpretable model-agnostic explanations (LIME) algorithm was used with the black-box classifiers to increase their interpretability. We measured sensitivity, specificity, and area under receiver operating characteristic (AUC) of the models. RESULTS Both the XGBoost and random forest models achieved good performance in distinguishing between positive and negative responses, with the AUC ranging from 0.88 to 0.90, sensitivity ranging from 77.91% to 77.94%, and specificity ranging from 84.32% to 87.66%. Basal serum luteinizing hormone, follicle-stimulating hormone, and insulin-like growth factor-I levels were found to be the three most important factors. In the interpretable models of LIME, the abovementioned variables made high contributions to the prediction probability. CONCLUSIONS The prediction models we developed can help diagnose CPP and may be used as a prescreening tool before the GnRHa-stimulation test.

Download Full-text

Machine Learning Algorithms for Short-Term Load Forecast in Residential Buildings Using Smart Meters, Sensors and Big Data Solutions

IEEE Access ◽

10.1109/access.2019.2958383 ◽

2019 ◽

Vol 7 ◽

pp. 177874-177889 ◽

Cited By ~ 10

Author(s):

Simona-Vasilica Oprea ◽

Adela Bara

Keyword(s):

Machine Learning ◽

Big Data ◽

Residential Buildings ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Short Term ◽

Smart Meters ◽

Load Forecast

Download Full-text

Accelerating organic solar cell material's discovery: high-throughput screening and big data

Energy & Environmental Science ◽

10.1039/d1ee00559f ◽

2021 ◽

Author(s):

Xabier Rodríguez-Martínez ◽

Enrique Pascual-San-José ◽

Mariano Campoy-Quiles

Keyword(s):

Machine Learning ◽

Big Data ◽

High Throughput ◽

Organic Solar Cells ◽

High Throughput Screening ◽

Organic Solar Cell ◽

State Of The Art ◽

Review Article ◽

Machine Learning Algorithms ◽

Device Optimization

This review article presents the state-of-the-art in high-throughput computational and experimental screening routines with application in organic solar cells, including materials discovery, device optimization and machine-learning algorithms.

Download Full-text

Development of Machine Learning Models for Prediction of Smoking Cessation Outcome

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18052584 ◽

2021 ◽

Vol 18 (5) ◽

pp. 2584

Author(s):

Cheng-Chien Lai ◽

Wei-Hsin Huang ◽

Betty Chia-Chen Chang ◽

Lee-Ching Hwang

Keyword(s):

Machine Learning ◽

Smoking Cessation ◽

Success Rate ◽

Prediction Models ◽

Smoking Status ◽

Medical Center ◽

Machine Learning Algorithms ◽

Classification And Regression Tree ◽

Support Vector ◽

Smoking Cessation Outcome

Predictors for success in smoking cessation have been studied, but a prediction model capable of providing a success rate for each patient attempting to quit smoking is still lacking. The aim of this study is to develop prediction models using machine learning algorithms to predict the outcome of smoking cessation. Data was acquired from patients underwent smoking cessation program at one medical center in Northern Taiwan. A total of 4875 enrollments fulfilled our inclusion criteria. Models with artificial neural network (ANN), support vector machine (SVM), random forest (RF), logistic regression (LoR), k-nearest neighbor (KNN), classification and regression tree (CART), and naïve Bayes (NB) were trained to predict the final smoking status of the patients in a six-month period. Sensitivity, specificity, accuracy, and area under receiver operating characteristic (ROC) curve (AUC or ROC value) were used to determine the performance of the models. We adopted the ANN model which reached a slightly better performance, with a sensitivity of 0.704, a specificity of 0.567, an accuracy of 0.640, and an ROC value of 0.660 (95% confidence interval (CI): 0.617–0.702) for prediction in smoking cessation outcome. A predictive model for smoking cessation was constructed. The model could aid in providing the predicted success rate for all smokers. It also had the potential to achieve personalized and precision medicine for treatment of smoking cessation.

Download Full-text

57 Precision neoantigen discovery using novel algorithms and expanded HLA-ligandome datasets

Journal for ImmunoTherapy of Cancer ◽

10.1136/jitc-2020-sitc2020.0057 ◽

2020 ◽

Vol 8 (Suppl 3) ◽

pp. A62-A62

Author(s):

Dattatreya Mellacheruvu ◽

Rachel Pyke ◽

Charles Abbott ◽

Nick Phillips ◽

Sejal Desai ◽

...

Keyword(s):

Machine Learning ◽

Cell Lines ◽

Antigen Processing ◽

Large Scale ◽

Prediction Models ◽

K562 Cells ◽

Machine Learning Algorithms ◽

Training Data ◽

High Quality ◽

Tissue Samples

BackgroundAccurately identified neoantigens can be effective therapeutic agents in both adjuvant and neoadjuvant settings. A key challenge for neoantigen discovery has been the availability of accurate prediction models for MHC peptide presentation. We have shown previously that our proprietary model based on (i) large-scale, in-house mono-allelic data, (ii) custom features that model antigen processing, and (iii) advanced machine learning algorithms has strong performance. We have extended upon our work by systematically integrating large quantities of high-quality, publicly available data, implementing new modelling algorithms, and rigorously testing our models. These extensions lead to substantial improvements in performance and generalizability. Our algorithm, named Systematic HLA Epitope Ranking Pan Algorithm (SHERPA™), is integrated into the ImmunoID NeXT Platform®, our immuno-genomics and transcriptomics platform specifically designed to enable the development of immunotherapies.MethodsIn-house immunopeptidomic data was generated using stably transfected HLA-null K562 cells lines that express a single HLA allele of interest, followed by immunoprecipitation using W6/32 antibody and LC-MS/MS. Public immunopeptidomics data was downloaded from repositories such as MassIVE and processed uniformly using in-house pipelines to generate peptide lists filtered at 1% false discovery rate. Other metrics (features) were either extracted from source data or generated internally by re-processing samples utilizing the ImmunoID NeXT Platform.ResultsWe have generated large-scale and high-quality immunopeptidomics data by using approximately 60 mono-allelic cell lines that unambiguously assign peptides to their presenting alleles to create our primary models. Briefly, our primary ‘binding’ algorithm models MHC-peptide binding using peptide and binding pockets while our primary ‘presentation’ model uses additional features to model antigen processing and presentation. Both primary models have significantly higher precision across all recall values in multiple test data sets, including mono-allelic cell lines and multi-allelic tissue samples. To further improve the performance of our model, we expanded the diversity of our training set using high-quality, publicly available mono-allelic immunopeptidomics data. Furthermore, multi-allelic data was integrated by resolving peptide-to-allele mappings using our primary models. We then trained a new model using the expanded training data and a new composite machine learning architecture. The resulting secondary model further improves performance and generalizability across several tissue samples.ConclusionsImproving technologies for neoantigen discovery is critical for many therapeutic applications, including personalized neoantigen vaccines, and neoantigen-based biomarkers for immunotherapies. Our new and improved algorithm (SHERPA) has significantly higher performance compared to a state-of-the-art public algorithm and furthers this objective.

Download Full-text

Efficient and Rapid Machine Learning Algorithms for Big Data and Dynamic Varying Systems

IEEE Transactions on Systems Man and Cybernetics Systems ◽

10.1109/tsmc.2017.2741558 ◽

2017 ◽

Vol 47 (10) ◽

pp. 2625-2626 ◽

Cited By ~ 16

Author(s):

Fuchun Sun ◽

Guang-Bin Huang ◽

Q. M. Jonathan Wu ◽

Shiji Song ◽

Donald C. Wunsch II

Keyword(s):

Machine Learning ◽

Big Data ◽

Learning Algorithms ◽

Machine Learning Algorithms

Download Full-text

Development of Heavy Rain Damage Prediction Model Using Machine Learning Based on Big Data

Advances in Meteorology ◽

10.1155/2018/5024930 ◽

2018 ◽

Vol 2018 ◽

pp. 1-11 ◽

Cited By ~ 12

Author(s):

Changhyun Choi ◽

Jeonghwan Kim ◽

Jongsung Kim ◽

Donghyun Kim ◽

Younghye Bae ◽

...

Keyword(s):

Machine Learning ◽

Big Data ◽

Prediction Model ◽

Prediction Models ◽

Meteorological Data ◽

Heavy Rain ◽

Machine Learning Techniques ◽

Damage Prediction ◽

Explanatory Variables ◽

The Republic

Prediction models of heavy rain damage using machine learning based on big data were developed for the Seoul Capital Area in the Republic of Korea. We used data on the occurrence of heavy rain damage from 1994 to 2015 as dependent variables and weather big data as explanatory variables. The model was developed by applying machine learning techniques such as decision trees, bagging, random forests, and boosting. As a result of evaluating the prediction performance of each model, the AUC value of the boosting model using meteorological data from the past 1 to 4 days was the highest at 95.87% and was selected as the final model. By using the prediction model developed in this study to predict the occurrence of heavy rain damage for each administrative region, we can greatly reduce the damage through proactive disaster management.

Download Full-text