Machine Learning and Its Application in Monitoring Diabetes Mellitus

Author(s):  
Vandana Kalra ◽  
Indu Kashyap ◽  
Harmeet Kaur

Data science is a fast-growing area that deals with data from its origin to the knowledge exploration. It comprises of two main subdomains, data analytics for preparing data, and machine learning to probe into this data for hidden patterns. Machine learning (ML) endows powerful algorithms for the automatic pattern recognition and producing prediction models for the structured and unstructured data. The available historical data has patterns having high predictive value used for the future success of an industry. These algorithms also help to obtain accurate prediction, classification, and simulation models by eliminating insignificant and faulty patterns. Machine learning provides major advancement in the healthcare industry by assisting doctors to diagnose chronic diseases correctly. Diabetes is one of the most common chronic disease that occurs when the pancreas cells are damaged and do not secrete sufficient amount of insulin required by the human body. Machine learning algorithms can help in early diagnosis of this chronic disease by studying its predictor parameter values.

Author(s):  
Shreekanth Jogar ◽  
Pavankumar Naik ◽  
Veeramma Vyapari ◽  
Madevi Vaddar ◽  
Kavita Dambal ◽  
...  

With big data growth in biomedical and healthcare communities, accurate analysis of medical data benefits early disease detection, patient care and community services. However, the analysis accuracy is reduced when the quality of medical data is incomplete. Moreover, different regions exhibit unique characteristics of certain regional diseases, which may weaken the prediction of disease outbreaks. In this paper, we streamline machine-learning algorithms for effective prediction of chronic disease outbreak in disease-frequent communities. We experiment the modified prediction models over real-life hospital data collected from central China in 2013-2015. To overcome the difficulty of incomplete data, we use a latent factor model to reconstruct the missing data. We experiment on a regional chronic disease of cerebral infarction. To the best of our knowledge, none of the existing work focused on both data types in the area of medical big data analytics. Compared to several typical prediction algorithms, the prediction accuracy of our proposed algorithm reaches 94.8% with a convergence speed which is faster than that of the CNN-based unimodal disease risk prediction (CNN-UDRP) algorithm.


Machine Learning (ML) furnishes the ability of insights on automatic recognizing patterns and determining the prediction models for the structured and unstructured data even in the absence of explicit programming instructions. Today, the impact of Artificial Intelligence (AI) has grown up to several heights, ranging from Life sciences to the Management techniques. The integration of ML led to reduce or eliminate the errors in the prediction, classification and simulation models. The objective of the paper is to represent the ML objectives, explore the various ML techniques and algorithms with its applications in the various fields.


2018 ◽  
Author(s):  
Liyan Pan ◽  
Guangjian Liu ◽  
Xiaojian Mao ◽  
Huixian Li ◽  
Jiexin Zhang ◽  
...  

BACKGROUND Central precocious puberty (CPP) in girls seriously affects their physical and mental development in childhood. The method of diagnosis—gonadotropin-releasing hormone (GnRH)–stimulation test or GnRH analogue (GnRHa)–stimulation test—is expensive and makes patients uncomfortable due to the need for repeated blood sampling. OBJECTIVE We aimed to combine multiple CPP–related features and construct machine learning models to predict response to the GnRHa-stimulation test. METHODS In this retrospective study, we analyzed clinical and laboratory data of 1757 girls who underwent a GnRHa test in order to develop XGBoost and random forest classifiers for prediction of response to the GnRHa test. The local interpretable model-agnostic explanations (LIME) algorithm was used with the black-box classifiers to increase their interpretability. We measured sensitivity, specificity, and area under receiver operating characteristic (AUC) of the models. RESULTS Both the XGBoost and random forest models achieved good performance in distinguishing between positive and negative responses, with the AUC ranging from 0.88 to 0.90, sensitivity ranging from 77.91% to 77.94%, and specificity ranging from 84.32% to 87.66%. Basal serum luteinizing hormone, follicle-stimulating hormone, and insulin-like growth factor-I levels were found to be the three most important factors. In the interpretable models of LIME, the abovementioned variables made high contributions to the prediction probability. CONCLUSIONS The prediction models we developed can help diagnose CPP and may be used as a prescreening tool before the GnRHa-stimulation test.


Author(s):  
Cheng-Chien Lai ◽  
Wei-Hsin Huang ◽  
Betty Chia-Chen Chang ◽  
Lee-Ching Hwang

Predictors for success in smoking cessation have been studied, but a prediction model capable of providing a success rate for each patient attempting to quit smoking is still lacking. The aim of this study is to develop prediction models using machine learning algorithms to predict the outcome of smoking cessation. Data was acquired from patients underwent smoking cessation program at one medical center in Northern Taiwan. A total of 4875 enrollments fulfilled our inclusion criteria. Models with artificial neural network (ANN), support vector machine (SVM), random forest (RF), logistic regression (LoR), k-nearest neighbor (KNN), classification and regression tree (CART), and naïve Bayes (NB) were trained to predict the final smoking status of the patients in a six-month period. Sensitivity, specificity, accuracy, and area under receiver operating characteristic (ROC) curve (AUC or ROC value) were used to determine the performance of the models. We adopted the ANN model which reached a slightly better performance, with a sensitivity of 0.704, a specificity of 0.567, an accuracy of 0.640, and an ROC value of 0.660 (95% confidence interval (CI): 0.617–0.702) for prediction in smoking cessation outcome. A predictive model for smoking cessation was constructed. The model could aid in providing the predicted success rate for all smokers. It also had the potential to achieve personalized and precision medicine for treatment of smoking cessation.


2020 ◽  
Vol 8 (Suppl 3) ◽  
pp. A62-A62
Author(s):  
Dattatreya Mellacheruvu ◽  
Rachel Pyke ◽  
Charles Abbott ◽  
Nick Phillips ◽  
Sejal Desai ◽  
...  

BackgroundAccurately identified neoantigens can be effective therapeutic agents in both adjuvant and neoadjuvant settings. A key challenge for neoantigen discovery has been the availability of accurate prediction models for MHC peptide presentation. We have shown previously that our proprietary model based on (i) large-scale, in-house mono-allelic data, (ii) custom features that model antigen processing, and (iii) advanced machine learning algorithms has strong performance. We have extended upon our work by systematically integrating large quantities of high-quality, publicly available data, implementing new modelling algorithms, and rigorously testing our models. These extensions lead to substantial improvements in performance and generalizability. Our algorithm, named Systematic HLA Epitope Ranking Pan Algorithm (SHERPA™), is integrated into the ImmunoID NeXT Platform®, our immuno-genomics and transcriptomics platform specifically designed to enable the development of immunotherapies.MethodsIn-house immunopeptidomic data was generated using stably transfected HLA-null K562 cells lines that express a single HLA allele of interest, followed by immunoprecipitation using W6/32 antibody and LC-MS/MS. Public immunopeptidomics data was downloaded from repositories such as MassIVE and processed uniformly using in-house pipelines to generate peptide lists filtered at 1% false discovery rate. Other metrics (features) were either extracted from source data or generated internally by re-processing samples utilizing the ImmunoID NeXT Platform.ResultsWe have generated large-scale and high-quality immunopeptidomics data by using approximately 60 mono-allelic cell lines that unambiguously assign peptides to their presenting alleles to create our primary models. Briefly, our primary ‘binding’ algorithm models MHC-peptide binding using peptide and binding pockets while our primary ‘presentation’ model uses additional features to model antigen processing and presentation. Both primary models have significantly higher precision across all recall values in multiple test data sets, including mono-allelic cell lines and multi-allelic tissue samples. To further improve the performance of our model, we expanded the diversity of our training set using high-quality, publicly available mono-allelic immunopeptidomics data. Furthermore, multi-allelic data was integrated by resolving peptide-to-allele mappings using our primary models. We then trained a new model using the expanded training data and a new composite machine learning architecture. The resulting secondary model further improves performance and generalizability across several tissue samples.ConclusionsImproving technologies for neoantigen discovery is critical for many therapeutic applications, including personalized neoantigen vaccines, and neoantigen-based biomarkers for immunotherapies. Our new and improved algorithm (SHERPA) has significantly higher performance compared to a state-of-the-art public algorithm and furthers this objective.


2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Matthijs Blankers ◽  
Louk F. M. van der Post ◽  
Jack J. M. Dekker

Abstract Background Accurate prediction models for whether patients on the verge of a psychiatric criseis need hospitalization are lacking and machine learning methods may help improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate the accuracy of ten machine learning algorithms, including the generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact. We also evaluate an ensemble model to optimize the accuracy and we explore individual predictors of hospitalization. Methods Data from 2084 patients included in the longitudinal Amsterdam Study of Acute Psychiatry with at least one reported psychiatric crisis care contact were included. Target variable for the prediction models was whether the patient was hospitalized in the 12 months following inclusion. The predictive power of 39 variables related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts was evaluated. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared and we also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis and the five best performing algorithms were combined in an ensemble model using stacking. Results All models performed above chance level. We found Gradient Boosting to be the best performing algorithm (AUC = 0.774) and K-Nearest Neighbors to be the least performing (AUC = 0.702). The performance of GLM/logistic regression (AUC = 0.76) was slightly above average among the tested algorithms. In a Net Reclassification Improvement analysis Gradient Boosting outperformed GLM/logistic regression by 2.9% and K-Nearest Neighbors by 11.3%. GLM/logistic regression outperformed K-Nearest Neighbors by 8.7%. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was in most cases modest. The results show that a predictive accuracy similar to the best performing model can be achieved when combining multiple algorithms in an ensemble model.


2019 ◽  
Vol 8 (2) ◽  
pp. 4499-4504

Heart diseases are responsible for the greatest number of deaths all over the world. These diseases are usually not detected in early stages as the cost of medical diagnostics is not affordable by a majority of the people. Research has shown that machine learning methods have a great capability to extract valuable information from the medical data. This information is used to build the prediction models which provide cost effective technological aid for a medical practitioner to detect the heart disease in early stages. However, the presence of some irrelevant and redundant features in medical data deteriorates the competence of the prediction system. This research was aimed to improve the accuracy of the existing methods by removing such features. In this study, brute force-based algorithm of feature selection was used to determine relevant significant features. After experimenting rigorously with 7528 possible combinations of features and 5 machine learning algorithms, 8 important features were identified. A prediction model was developed using these significant features. Accuracy of this model is experimentally calculated to be 86.4%which is higher than the results of existing studies. The prediction model proposed in this study shall help in predicting heart disease efficiently.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Fathima Aliyar Vellameeran ◽  
Thomas Brindha

Abstract Objectives To make a clear literature review on state-of-the-art heart disease prediction models. Methods It reviews 61 research papers and states the significant analysis. Initially, the analysis addresses the contributions of each literature works and observes the simulation environment. Here, different types of machine learning algorithms deployed in each contribution. In addition, the utilized dataset for existing heart disease prediction models was observed. Results The performance measures computed in entire papers like prediction accuracy, prediction error, specificity, sensitivity, f-measure, etc., are learned. Further, the best performance is also checked to confirm the effectiveness of entire contributions. Conclusions The comprehensive research challenges and the gap are portrayed based on the development of intelligent methods concerning the unresolved challenges in heart disease prediction using data mining techniques.


2021 ◽  
Author(s):  
Kate Bentley ◽  
Kelly Zuromski ◽  
Rebecca Fortgang ◽  
Emily Madsen ◽  
Daniel Kessler ◽  
...  

Background: Interest in developing machine learning algorithms that use electronic health record data to predict patients’ risk of suicidal behavior has recently proliferated. Whether and how such models might be implemented and useful in clinical practice, however, remains unknown. In order to ultimately make automated suicide risk prediction algorithms useful in practice, and thus better prevent patient suicides, it is critical to partner with key stakeholders (including the frontline providers who will be using such tools) at each stage of the implementation process.Objective: The aim of this focus group study was to inform ongoing and future efforts to deploy suicide risk prediction models in clinical practice. The specific goals were to better understand hospital providers’ current practices for assessing and managing suicide risk; determine providers’ perspectives on using automated suicide risk prediction algorithms; and identify barriers, facilitators, recommendations, and factors to consider for initiatives in this area. Methods: We conducted 10 two-hour focus groups with a total of 40 providers from psychiatry, internal medicine and primary care, emergency medicine, and obstetrics and gynecology departments within an urban academic medical center. Audio recordings of open-ended group discussions were transcribed and coded for relevant and recurrent themes by two independent study staff members. All coded text was reviewed and discrepancies resolved in consensus meetings with doctoral-level staff. Results: Though most providers reported using standardized suicide risk assessment tools in their clinical practices, existing tools were commonly described as unhelpful and providers indicated dissatisfaction with current suicide risk assessment methods. Overall, providers’ general attitudes toward the practical use of automated suicide risk prediction models and corresponding clinical decision support tools were positive. Providers were especially interested in the potential to identify high-risk patients who might be missed by traditional screening methods. Some expressed skepticism about the potential usefulness of these models in routine care; specific barriers included concerns about liability, alert fatigue, and increased demand on the healthcare system. Key facilitators included presenting specific patient-level features contributing to risk scores, emphasizing changes in risk over time, and developing systematic clinical workflows and provider trainings. Participants also recommended considering risk-prediction windows, timing of alerts, who will have access to model predictions, and variability across treatment settings.Conclusions: Providers were dissatisfied with current suicide risk assessment methods and open to the use of a machine learning-based risk prediction system to inform clinical decision-making. They also raised multiple concerns about potential barriers to the usefulness of this approach and suggested several possible facilitators. Future efforts in this area will benefit from incorporating systematic qualitative feedback from providers, patients, administrators, and payers on the use of new methods in routine care, especially given the complex, sensitive, and unfortunately still stigmatized nature of suicide risk.


Author(s):  
Prof. Dr. R. Sandhiya

In recent times, the diagnosis of heart disease has become a very critical task in the medical field. In the modern age, one person dies every minute due to heart disease. Data science has an important role in processing big amounts of data in the field of health sciences. Since the diagnosis of heart disease is a complex task, the assessment process should be automated to avoid the risks associated with it and alert the patient in advance. This paper uses the heart disease dataset available in the UCI Machine Learning Repository. The proposed work assesses the risk of heart disease in a patient by applying various data mining methods such as Naive Bayes, Decision Tree, KNN, Linear SVM, RBF SVM, Gaussian Process, Neural Network, Adabost, QDA and Random Forest. This paper provides a comparative study by analyzing the performance of various machine learning algorithms. Test results confirm that the KNN algorithm achieved the highest 97% accuracy compared to other implemented ML algorithms.


Sign in / Sign up

Export Citation Format

Share Document