scholarly journals An Efficient Sequential Clustering Based Classification Model for Diabetes Diagnosis and Prediction

At last decade, the development of diverse models and the excessive data creation leads to an enormous production of dataset and source. The healthcare field offers rich in information and it needs to be analyzed to identify the patterns present in the data. The commonly available massive amount of healthcare data characterizes a rich data field. The way of extracting the medical design is difficult because of the characteristics of healthcare data like massive, real, and complicated details. Various machine learning (ML) algorithms has developed to predict the existence of the diabetes disease. Due to the massive quantity of diabetes disease dataset, clustering techniques can be applied to group the data before classifying it. A new automated clustering based classification model is applied for the identification of diabetes. To cluster the healthcare data, sequential clustering (SC) model is applied. Then, logistic regression (LR) model is applied for the effective categorization of the clustered data. The experimentations have been directed by the benchmark dataset. The simulation outcomes demonstrate that the efficiency of the SC-LR method beats the prevailing methods to predict the diabetes diseases.

2021 ◽  
Vol 42 (Supplement_1) ◽  
Author(s):  
M J Espinosa Pascual ◽  
P Vaquero Martinez ◽  
V Vaquero Martinez ◽  
J Lopez Pais ◽  
B Izquierdo Coronel ◽  
...  

Abstract Introduction Out of all patients admitted with Myocardial Infarction, 10 to 15% have Myocardial Infarction with Non-Obstructive Coronaries Arteries (MINOCA). Classification algorithms based on deep learning substantially exceed traditional diagnostic algorithms. Therefore, numerous machine learning models have been proposed as useful tools for the detection of various pathologies, but to date no study has proposed a diagnostic algorithm for MINOCA. Purpose The aim of this study was to estimate the diagnostic accuracy of several automated learning algorithms (Support-Vector Machine [SVM], Random Forest [RF] and Logistic Regression [LR]) to discriminate between people suffering from MINOCA from those with Myocardial Infarction with Obstructive Coronary Artery Disease (MICAD) at the time of admission and before performing a coronary angiography, whether invasive or not. Methods A Diagnostic Test Evaluation study was carried out applying the proposed algorithms to a database constituted by 553 consecutive patients admitted to our Hospital with Myocardial Infarction. According to the definitions of 2016 ESC Position Paper on MINOCA, patients were classified into two groups: MICAD and MINOCA. Out of the total 553 patients, 214 were discarded due to the lack of complete data. The set of machine learning algorithms was trained on 244 patients (training sample: 75%) and tested on 80 patients (test sample: 25%). A total of 64 variables were available for each patient, including demographic, clinical and laboratorial features before the angiographic procedure. Finally, the diagnostic precision of each architecture was taken. Results The most accurate classification model was the Random Forest algorithm (Specificity [Sp] 0.88, Sensitivity [Se] 0.57, Negative Predictive Value [NPV] 0.93, Area Under the Curve [AUC] 0.85 [CI 0.83–0.88]) followed by the standard Logistic Regression (Sp 0.76, Se 0.57, NPV 0.92 AUC 0.74 and Support-Vector Machine (Sp 0.84, Se 0.38, NPV 0.90, AUC 0.78) (see graph). The variables that contributed the most in order to discriminate a MINOCA from a MICAD were the traditional cardiovascular risk factors, biomarkers of myocardial injury, hemoglobin and gender. Results were similar when the 19 patients with Takotsubo syndrome were excluded from the analysis. Conclusion A prediction system for diagnosing MINOCA before performing coronary angiographies was developed using machine learning algorithms. Results show higher accuracy of diagnosing MINOCA than conventional statistical methods. This study supports the potential of machine learning algorithms in clinical cardiology. However, further studies are required in order to validate our results. FUNDunding Acknowledgement Type of funding sources: None. ROC curves of different algorithms


2021 ◽  
pp. 089198872199355
Author(s):  
Anastasia Bougea ◽  
Efthymia Efthymiopoulou ◽  
Ioanna Spanou ◽  
Panagiotis Zikos

Objective: Our aim was to develop a machine learning algorithm based only on non-invasively clinic collectable predictors, for the accurate diagnosis of these disorders. Methods: This is an ongoing prospective cohort study ( ClinicalTrials.gov identifier NCT number NCT04448340) of 78 PDD and 62 DLB subjects whose diagnostic follow-up is available for at least 3 years after the baseline assessment. We used predictors such as clinico-demographic characteristics, 6 neuropsychological tests (mini mental, PD Cognitive Rating Scale, Brief Visuospatial Memory test, Symbol digit written, Wechsler adult intelligence scale, trail making A and B). We investigated logistic regression, K-Nearest Neighbors (K-NNs) Support Vector Machine (SVM), Naïve Bayes classifier, and Ensemble Model for their ability to predict successfully PDD or DLB diagnosis. Results: The K-NN classification model had an accuracy 91.2% of overall cases based on 15 best clinical and cognitive scores achieving 96.42% sensitivity and 81% specificity on discriminating between DLB and PDD. The binomial logistic regression classification model achieved an accuracy of 87.5% based on 15 best features, showing 93.93% sensitivity and 87% specificity. The SVM classification model had an accuracy 84.6% of overall cases based on 15 best features achieving 90.62% sensitivity and 78.58% specificity. A model created on Naïve Bayes classification had 82.05% accuracy, 93.10% sensitivity and 74.41% specificity. Finally, an Ensemble model, synthesized by the individual ones, achieved 89.74% accuracy, 93.75% sensitivity and 85.73% specificity. Conclusion: Machine learning method predicted with high accuracy, sensitivity and specificity PDD or DLB diagnosis based on non-invasively and easily in-the-clinic and neuropsychological tests.


2021 ◽  
Vol 6 (1) ◽  
pp. 67-79
Author(s):  
Olalekan Awujoola ◽  
Philip O Odion ◽  
Martins E Irhebhude ◽  
Halima Aminu

Several higher institution of learning faces issue or difficulty of turning out more than 90% of their graduates who can competently satisfy and meet the requirements of the industry. However, the industry is also confronted with the difficulty of sourcing skilled tertiary institution graduates that match their needs. Failure or success of any organization depends mostly on how its workforce is recruited and retained. Therefore, the selection of an acceptable or satisfactory candidate for the job position is one of the major and vital problems of management decision-making. This work, therefore, proposes a modern, accurate and worthy machine learning classification model that can be deployed, implemented, and put to use when making predictions and assessments on job applicant's attributes from their academic performance datasets in other to meet the selection criteria for the industry. Both supervised and unsupervised machine learning classifiers were considered in this work. Naïve Bayes, Logistic Regression, support vector machine (SVM). Random Forest and Decision tree performed well, but Logistic Regression outperformed others with 93% accuracy.


2021 ◽  
Vol 11 (18) ◽  
pp. 8596
Author(s):  
Swetha Chittam ◽  
Balakrishna Gokaraju ◽  
Zhigang Xu ◽  
Jagannathan Sankar ◽  
Kaushik Roy

There is a high need for a big data repository for material compositions and their derived analytics of metal strength, in the material science community. Currently, many researchers maintain their own excel sheets, prepared manually by their team by tabulating the experimental data collected from scientific journals, and analyzing the data by performing manual calculations using formulas to determine the strength of the material. In this study, we propose a big data storage for material science data and its processing parameters information to address the laborious process of data tabulation from scientific articles, data mining techniques to retrieve the information from databases to perform big data analytics, and a machine learning prediction model to determine material strength insights. Three models are proposed based on Logistic regression, Support vector Machine SVM and Random Forest Algorithms. These models are trained and tested using a 10-fold cross validation approach. The Random Forest classification model performed better on the independent dataset, with 87% accuracy in comparison to Logistic regression and SVM with 72% and 78%, respectively.


2021 ◽  
Vol 10 (1) ◽  
pp. 65
Author(s):  
Boshra Farajollahi ◽  
Maysam Mehmannavaz ◽  
Hafez Mehrjoo ◽  
Fateme Moghbeli ◽  
Mohammad Javad Sayadi

Introduction: Diabetes is a disease associated with high levels of glucose in the blood. Diabetes make many kinds of complications, which also leads to a high rate of repeated admission of patients with diabetes. The aim of this study is to diagnose Diabetes with machine learning techniques.Material and Methods: The datasets of the article contain several medical predictor variables and one target variable, Outcome. Predictor variables includes the number of pregnancies the patient has had, their BMI, insulin level, age. The main objective of the machine learning models is to classify of the diabetes disease.Results: six classifiers have been also adapted and compared their performance based on accuracy, F1-score, recall, precision and AUC. And Finally, Adaboost has the most accuracy 83%.Conclusion: In this paper a performance comparison of different classifier models for classifying diagnosis is done. The models considered for comparison are logistic regression, Decision Tree, support vector machine (SVM), xgboost, Random forest and ada boost. Finally, in the comparison flow, Adaboost, Logistic Regression, SVM and Random Forest, usually has had a high amount; and their amounts has little differences normally. 


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Bo Sun

Music classification is conducive to online music retrieval, but the current music classification model finds it difficult to accurately identify various types of music, which makes the classification effect of the current music classification model poor. In order to improve the accuracy of music classification, a music classification model based on multifeature fusion and machine learning algorithm is proposed. First, we obtain the music signal, and then extract various features from the classification of the music signal, and use machine learning algorithms to describe the type of music signal and the relationship between the features. The music classifier and deep belief network machine learning models in shallow logistic regression are established, respectively. Experiments were designed for these two models to verify the applicability of the model for music classification. By comparing the experimental results, it is found that the classification accuracy of the deep confidence network model is higher than that of the logistic regression model, but the number of iterations needed for its accuracy to converge is also higher than that of the logistic regression model. Compared with other current music classification models, this model reduces the time of constructing music classifier, speeds up the speed of music classification, and can identify various types of music with high precision. The accuracy of music classification is obviously improved, which verifies the superiority of this music classification model.


2019 ◽  
Author(s):  
Oskar Flygare ◽  
Jesper Enander ◽  
Erik Andersson ◽  
Brjánn Ljótsson ◽  
Volen Z Ivanov ◽  
...  

**Background:** Previous attempts to identify predictors of treatment outcomes in body dysmorphic disorder (BDD) have yielded inconsistent findings. One way to increase precision and clinical utility could be to use machine learning methods, which can incorporate multiple non-linear associations in prediction models. **Methods:** This study used a random forests machine learning approach to test if it is possible to reliably predict remission from BDD in a sample of 88 individuals that had received internet-delivered cognitive behavioral therapy for BDD. The random forest models were compared to traditional logistic regression analyses. **Results:** Random forests correctly identified 78% of participants as remitters or non-remitters at post-treatment. The accuracy of prediction was lower in subsequent follow-ups (68%, 66% and 61% correctly classified at 3-, 12- and 24-month follow-ups, respectively). Depressive symptoms, treatment credibility, working alliance, and initial severity of BDD were among the most important predictors at the beginning of treatment. By contrast, the logistic regression models did not identify consistent and strong predictors of remission from BDD. **Conclusions:** The results provide initial support for the clinical utility of machine learning approaches in the prediction of outcomes of patients with BDD. **Trial registration:** ClinicalTrials.gov ID: NCT02010619.


Author(s):  
Dhilsath Fathima.M ◽  
S. Justin Samuel ◽  
R. Hari Haran

Aim: This proposed work is used to develop an improved and robust machine learning model for predicting Myocardial Infarction (MI) could have substantial clinical impact. Objectives: This paper explains how to build machine learning based computer-aided analysis system for an early and accurate prediction of Myocardial Infarction (MI) which utilizes framingham heart study dataset for validation and evaluation. This proposed computer-aided analysis model will support medical professionals to predict myocardial infarction proficiently. Methods: The proposed model utilize the mean imputation to remove the missing values from the data set, then applied principal component analysis to extract the optimal features from the data set to enhance the performance of the classifiers. After PCA, the reduced features are partitioned into training dataset and testing dataset where 70% of the training dataset are given as an input to the four well-liked classifiers as support vector machine, k-nearest neighbor, logistic regression and decision tree to train the classifiers and 30% of test dataset is used to evaluate an output of machine learning model using performance metrics as confusion matrix, classifier accuracy, precision, sensitivity, F1-score, AUC-ROC curve. Results: Output of the classifiers are evaluated using performance measures and we observed that logistic regression provides high accuracy than K-NN, SVM, decision tree classifiers and PCA performs sound as a good feature extraction method to enhance the performance of proposed model. From these analyses, we conclude that logistic regression having good mean accuracy level and standard deviation accuracy compared with the other three algorithms. AUC-ROC curve of the proposed classifiers is analyzed from the output figure.4, figure.5 that logistic regression exhibits good AUC-ROC score, i.e. around 70% compared to k-NN and decision tree algorithm. Conclusion: From the result analysis, we infer that this proposed machine learning model will act as an optimal decision making system to predict the acute myocardial infarction at an early stage than an existing machine learning based prediction models and it is capable to predict the presence of an acute myocardial Infarction with human using the heart disease risk factors, in order to decide when to start lifestyle modification and medical treatment to prevent the heart disease.


Energies ◽  
2021 ◽  
Vol 14 (7) ◽  
pp. 1809
Author(s):  
Mohammed El Amine Senoussaoui ◽  
Mostefa Brahami ◽  
Issouf Fofana

Machine learning is widely used as a panacea in many engineering applications including the condition assessment of power transformers. Most statistics attribute the main cause of transformer failure to insulation degradation. Thus, a new, simple, and effective machine-learning approach was proposed to monitor the condition of transformer oils based on some aging indicators. The proposed approach was used to compare the performance of two machine-learning classifiers: J48 decision tree and random forest. The service-aged transformer oils were classified into four groups: the oils that can be maintained in service, the oils that should be reconditioned or filtered, the oils that should be reclaimed, and the oils that must be discarded. From the two algorithms, random forest exhibited a better performance and high accuracy with only a small amount of data. Good performance was achieved through not only the application of the proposed algorithm but also the approach of data preprocessing. Before feeding the classification model, the available data were transformed using the simple k-means method. Subsequently, the obtained data were filtered through correlation-based feature selection (CFsSubset). The resulting features were again retransformed by conducting the principal component analysis and were passed through the CFsSubset filter. The transformation and filtration of the data improved the classification performance of the adopted algorithms, especially random forest. Another advantage of the proposed method is the decrease in the number of the datasets required for the condition assessment of transformer oils, which is valuable for transformer condition monitoring.


Sign in / Sign up

Export Citation Format

Share Document