scholarly journals A Comparison on Supervised and Semi-Supervised Machine Learning Classifiers for Gestational Diabetes Prediction

Author(s):  
Lokesh Kola

Abstract: Diabetes is the deadliest chronic diseases in the world. According to World Health Organization (WHO) around 422 million people are currently suffering from diabetes, particularly in low and middle-income countries. Also, the number of deaths due to diabetes is close to 1.6 million. Recent research has proven that the occurrence of diabetes is likely to be seen in people aged between 18 and this has risen from 4.7 to 8.5% from 1980 to 2014. Early diagnosis is necessary so that the disease does not go into advanced stages which is quite difficult to cure. Significant research has been performed in diabetes predictions. As time passes, challenges keep increasing to build a system to detect diabetes systematically. The hype for Machine Learning is increasing day to day to analyse medical data to diagnose a disease. Previous research has focused on just identifying the diabetes without specifying its type. In this paper, we have we have predicted gestational diabetes (Type-3) by comparing various supervised and semi-supervised machine learning algorithms on two datasets i.e., binned and non-binned datasets and compared the performance based on evaluation metrics. Keywords: Gestational diabetes, Machine Learning, Supervised Learning, Semi-Supervised Learning, Diabetes Prediction

2020 ◽  
Vol 17 (6) ◽  
pp. 2519-2522
Author(s):  
Kalpna Guleria ◽  
Avinash Sharma ◽  
Umesh Kumar Lilhore ◽  
Devendra Prasad

Approximately 2.1 million women every year are affected due to breast cancer which has become one of the major causes for cancer related deaths among women. World Health Organization’s (WHO) report 2018, reveals that around 15% of deaths among women are due to breast cancer. Lack of awareness is one of the major reason which has led to the detection of breast cancer at the later stage. Another major reason is access to limited health resources which make the problem worse. Early or timely detection of breast cancer is utmost important to increase the survival rate of the patients. World Health Organization’s (WHO) cancer awareness guidelines recommend that women aged between 40–49 years of age or 70–75 years of age must be subjected to mammographic screening which will provide the timely detection of the problem, if it persist. This article uses Breast Cancer dataset from UCI machine learning repository to predict and diagnose the class of breast cancer: benign or malignant by using supervised learning. Supervised machine learning algorithms: KNearest Neighbor (K-NN), Naive Bayes, logistic regression and decision tree have been utilized for breast cancer prediction. The performance evaluation of these classification algorithms is done based on various performance measures: accuracy, sensitivity, specificity and F -measure.


World Health Organization’s (WHO) report 2018, on diabetes has reported that the number of diabetic cases has increased from one hundred eight million to four hundred twenty-two million from the year 1980. The fact sheet shows that there is a major increase in diabetic cases from 4.7% to 8.5% among adults (18 years of age). Major health hazards caused due to diabetes include kidney function failure, heart disease, blindness, stroke, and lower limb dismembering. This article applies supervised machine learning algorithms on the Pima Indian Diabetic dataset to explore various patterns of risks involved using predictive models. Predictive model construction is based upon supervised machine learning algorithms: Naïve Bayes, Decision Tree, Random Forest, Gradient Boosted Tree, and Tree Ensemble. Further, the analytical patterns about these predictive models have been presented based on various performance parameters which include accuracy, precision, recall, and F-measure.


2022 ◽  
pp. 383-393
Author(s):  
Lokesh M. Giripunje ◽  
Tejas Prashant Sonar ◽  
Rohit Shivaji Mali ◽  
Jayant C. Modhave ◽  
Mahesh B. Gaikwad

Risk because of heart disease is increasing throughout the world. According to the World Health Organization report, the number of deaths because of heart disease is drastically increasing as compared to other diseases. Multiple factors are responsible for causing heart-related issues. Many approaches were suggested for prediction of heart disease, but none of them were satisfactory in clinical terms. Heart disease therapies and operations available are so costly, and following treatment, heart disease is also costly. This chapter provides a comprehensive survey of existing machine learning algorithms and presents comparison in terms of accuracy, and the authors have found that the random forest classifier is the most accurate model; hence, they are using random forest for further processes. Deployment of machine learning model using web application was done with the help of flask, HTML, GitHub, and Heroku servers. Webpages take input attributes from the users and gives the output regarding the patient heart condition with accuracy of having coronary heart disease in the next 10 years.


2021 ◽  
Author(s):  
Naser Zaeri

The coronavirus disease 2019 (COVID-19) outbreak has been designated as a worldwide pandemic by World Health Organization (WHO) and raised an international call for global health emergency. In this regard, recent advancements of technologies in the field of artificial intelligence and machine learning provide opportunities for researchers and scientists to step in this battlefield and convert the related data into a meaningful knowledge through computational-based models, for the task of containment the virus, diagnosis and providing treatment. In this study, we will provide recent developments and practical implementations of artificial intelligence modeling and machine learning algorithms proposed by researchers and practitioners during the pandemic period which suggest serious potential in compliant solutions for investigating diagnosis and decision making using computerized tomography (CT) scan imaging. We will review the modern algorithms in CT scan imaging modeling that may be used for detection, quantification, and tracking of Coronavirus and study how they can differentiate Coronavirus patients from those who do not have the disease.


2018 ◽  
Author(s):  
Roberto Acuña

BACKGROUND According to the World Health Organization (WHO) close to 800,000 people worldwide death by suicidal each year. Many more attempt to do it. In consequence, the WHO recognizes suicide as a global public health priority, which affects not only rich countries, but poor and middle income countries as well. OBJECTIVE The aim of this study is to evaluate several supervised classifiers for detecting messages with suicidal ideation in order to know if these systems can be used in automatic suicide prevention systems. METHODS We used machine learning techniques to make a systematic analysis of 28 supervised classifier algorithms with parameters by defect. The Life Corpus, used in this research, is a bilingual corpus (English and Spanish) oriented to suicide. The corpus was constructed by two annotation experts, retrieving texts from several social networks. The corpus quality was measured using mutual annotation agreement. RESULTS The different experiments determined that the classifier with the best performance was KStar, with the corpus version POS-SYNSETS-NUM; and the cycle with 2 classes Urgent and No Risk was the one that achieved the best results with the PRC-Area metrics of 0,81036 and F-measure of 0,7148. CONCLUSIONS The present research fulfilled the objective of discovering which characteristics are the most suitable for the automatic classification of messages with suicidal ideation, using the Life Corpus. The results of this evaluation demonstrate that the Life Corpus and machine learning techniques could be suitable for detecting suicide ideation messages.


2020 ◽  
Vol 1 (2) ◽  
pp. 1-4
Author(s):  
Priyam Guha ◽  
Abhishek Mukherjee ◽  
Abhishek Verma

This research paper deals with using supervised machine learning algorithms to detect authenticity of bank notes. In this research we were successful in achieving very high accuracy (of the order of 99%) by applying some data preprocessing tricks and then running the processed data on supervised learning algorithms like SVM, Decision Trees, Logistic Regression, KNN. We then proceed to analyze the misclassified points. We examine the confusion matrix to find out which algorithms had more number of false positives and which algorithm had more number of False negatives. This research paper deals with using supervised machine learning algorithms to detect authenticity of bank notes. In this research we were successful in achieving very high accuracy (of the order of 99%) by applying some data preprocessing tricks and then running the processed data on supervised learning algorithms like SVM, Decision Trees, Logistic Regression, KNN. We then proceed to analyze the misclassified points. We examine the confusion matrix to find out which algorithms had more number of false positives and which algorithm had more number of False negatives.


2020 ◽  
Author(s):  
Andre Lamurias ◽  
Sofia Jesus ◽  
Vanessa Neveu ◽  
Reza M Salek ◽  
Francisco M Couto

AbstractIn 2016, the International Agency for Research on Cancer, part of the World Health Organization, released the Exposome-Explorer, the first database dedicated to biomarkers of exposure for environmental risk factors for diseases. The database contents resulted from a manual literature search that yielded over 8500 citations, but only a small fraction of these publications were used in the final database. Manually curating a database is time-consuming and requires domain expertise to gather relevant data scattered throughout millions of articles. This work proposes a supervised machine learning approach to assist the previous manual literature retrieval process.The manually retrieved corpus of scientific publications used in the Exposome-Explorer was used as training and testing sets for the machine learning models (classifiers). Several parameters and algorithms were evaluated to predict an article’s relevance based on different datasets made of titles, abstracts and metadata.The top performance classifier was built with the Logistic Regression algorithm using the title and abstract set, achieving an F2-score of 70.1%. Furthermore, from 705 articles classified as relevant, we extracted 545 biomarkers, including 460 new candidate entries to the Exposome-Explorer database.Our methodology reduced the number of articles to be manually screened by the database curators by nearly 90%, while only misclassifying 22.1% of the relevant articles. We expect that this methodology can also be applied to similar biomarkers datasets or be adapted to assist the manual curation process of similar chemical or disease databases.


2021 ◽  
Vol 309 ◽  
pp. 01218
Author(s):  
P. Lakshmi Sruthi ◽  
K. Butchi Raju

COVID-19 is a global epidemic that has spread to over 170 nations. In practically all of the countries affected, the number of infected and death cases has been rising rapidly. Forecasting approaches can be implemented, resulting in the development of more effective strategies and the making of more informed judgments. These strategies examine historical data in order to make more accurate predictions about what will happen in the future. These forecasts could aid in preparing for potential risks and consequences. In order to create accurate findings, forecasting techniques are crucial. Forecasting strategies based on Big data analytics acquired from National databases (or) World Health Organization, as well as machine learning (or) data science techniques are classified in this study. This study shows the ability to predict the number of cases affected by COVID-19 as potential risk to mankind.


2021 ◽  
Author(s):  
Meng Ji ◽  
Pierrette Bouillon

BACKGROUND Linguistic accessibility has important impact on the reception and utilization of translated health resources among multicultural and multilingual populations. Linguistic understandability of health translation has been under-studied. OBJECTIVE Our study aimed to develop novel machine learning models for the study of the linguistic accessibility of health translations comparing Chinese translations of the World Health Organization health materials with original Chinese health resources developed by the Chinese health authorities. METHODS Using natural language processing tools for the assessment of the readability of Chinese materials, we explored and compared the readability of Chinese health translations from the World Health Organization with original Chinese materials from China Centre for Disease Control and Prevention. RESULTS Pairwise adjusted t test showed that three new machine learning models achieved statistically significant improvement over the baseline logistic regression in terms of AUC: C5.0 decision tree (p=0.000, 95% CI: -0.249, -0.152), random forest (p=0.000, 95% CI: 0.139, 0.239) and XGBoost Tree (p=0.000, 95% CI: 0.099, 0.193). There was however no significant difference between C5.0 decision tree and random forest (p=0.513). Extreme gradient boost tree was the best model having achieved statistically significant improvement over the C5.0 model (p=0.003) and the Random Forest model (p=0.006) at the adjusted Bonferroni p value at 0.008. CONCLUSIONS The development of machine learning algorithms significantly improved the accuracy and reliability of current approaches to the evaluation of the linguistic accessibility of Chinese health information, especially Chinese health translations in relation to original health resources. Although the new algorithms developed were based on Chinese health resources, they can be adapted for other languages to advance current research in accessible health translation, communication, and promotion.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Tahia Tazin ◽  
Md Nur Alam ◽  
Nahian Nakiba Dola ◽  
Mohammad Sajibul Bari ◽  
Sami Bourouis ◽  
...  

Stroke is a medical disorder in which the blood arteries in the brain are ruptured, causing damage to the brain. When the supply of blood and other nutrients to the brain is interrupted, symptoms might develop. According to the World Health Organization (WHO), stroke is the greatest cause of death and disability globally. Early recognition of the various warning signs of a stroke can help reduce the severity of the stroke. Different machine learning (ML) models have been developed to predict the likelihood of a stroke occurring in the brain. This research uses a range of physiological parameters and machine learning algorithms, such as Logistic Regression (LR), Decision Tree (DT) Classification, Random Forest (RF) Classification, and Voting Classifier, to train four different models for reliable prediction. Random Forest was the best performing algorithm for this task with an accuracy of approximately 96 percent. The dataset used in the development of the method was the open-access Stroke Prediction dataset. The accuracy percentage of the models used in this investigation is significantly higher than that of previous studies, indicating that the models used in this investigation are more reliable. Numerous model comparisons have established their robustness, and the scheme can be deduced from the study analysis.


Sign in / Sign up

Export Citation Format

Share Document