scholarly journals Data Mining Technique for Diabetes Diagnosis using Classification Algorithms

2019 ◽  
Vol 8 (4) ◽  
pp. 9044-9049

Diabetes mellitus is defined as a one of the chronic and deadliest diseases which combined with abnormally high level of sugar (glucose) in the blood. The classification technique helps in diagnosis the symptoms at starting stages. This paper focused to prognosticate the chance of diabetes in patients with extremely correct classification of Diabetes. The classification algorithms viz., Naïve Bayes, Logistic Regression, and Decision Tree can be used to detect diabetes at an early stage. The algorithm performances are evaluated based on various measures like Recall, Precision, and F-Measure. Experiments are conducted where the time complexity of each of the algorithm is measured. Accuracy is also measured over correct classification and misclassification instances, observed that a Logistic Regression algorithm has much better performance when compared to the other type classifications. Using Receiver Operating Characteristic curves the results are verified in a systematic manner.

2010 ◽  
Vol 16 (5) ◽  
pp. 910-920 ◽  
Author(s):  
MICHAEL M. EHRENSPERGER ◽  
MANFRED BERRES ◽  
KIRSTEN I. TAYLOR ◽  
ANDREAS U. MONSCH

AbstractThe goal of the present study was to evaluate the diagnostic discriminability of three different global scores for the German version of the Consortium to Establish a Registry on Alzheimer’s Disease-Neuropsychological Assessment Battery (CERAD-NAB). The CERAD-NAB was administered to 1100 healthy control participants [NC; Mini-Mental State Examination (MMSE) mean = 28.9] and 352 patients with very mild Alzheimer’s disease (AD; MMSE mean = 26.1) at baseline and subsets of participants at follow-up an average of 2.4 (NC) and 1.2 (AD) years later. We calculated the following global scores: Chandler et al.’s (2005) score (summed raw scores), logistic regression on principal components analysis scores (PCA-LR), and logistic regression on demographically corrected CERAD-NAB variables (LR). Correct classification rates (CCR) were compared with areas under the receiver operating characteristics curves (AUC). The CCR of the LR score (AUC = .976) exceeded that of the PCA-LR, while the PCA-LR (AUC = .968) and Chandler (AUC = .968) scores performed comparably. Retest data improved the CCR of the PCA-LR and Chandler (trend) scores. Thus, for the German CERAD-NAB, Chandler et al.’s total score provided an effective global measure of cognitive functioning, whereby the inclusion of retest data tended to improve correct classification of individual cases. (JINS, 2010, 16, 910–920.)


2021 ◽  
Vol 4 (1) ◽  
pp. 14
Author(s):  
Husna Afanyn Khoirunissa ◽  
Amanda Rizky Widyaningrum ◽  
Annisa Priliya Ayu Maharani

<p>The Bank is a business entity that is dealing with money, accepting deposits from customers, providing funds for each withdrawal, billing checks on the customer's orders, giving credit and or embedding the excess deposits until required for repayment. The purpose of this research is to determine the influence of age, gender, country, customer credit score, number of bank products used by the customer, and the activation of the bank members in the decision to choose to continue using the bank account that he has retained or closed the bank account. The data in this research used 10,000 respondents originating from France, Spain, and Germany. The method used is data mining with early stage preprocessing to clean data from outlier and missing value and feature selection to select important attributes. Then perform the classification using three methods, which are Random Forest, Logistic Regression, and Multilayer Perceptron. The results of this research showed that the model with Multilayer Perceptron method with 10 folds Cross Validation is the best model with 85.5373% accuracy.</p><strong>Keywords:</strong> bank customer, random forest, logistic regression, multilayer perceptron


Author(s):  
O.A. Andreev ◽  
A.T. Trofimov

The paper addresses the issue of insuring the required probability of correct classification of marine objects in low-frequency passive sonar systems. The solution to the issue is sought through the application of methods for the synthesis of neural network classification algorithms using poly-Gaussian probabilistic models (Gaussian mixture models, GMM). It is shown that the use of GMM makes it possible to solve a number of problems specific to the issue; classification algorithms synthesized using mentioned methods can be implemented in the form of neural networks, which in turn can be described in C++/VHDL to create endpoint computing devices or software systems. The results of modeling of synthesized classification algorithms on experimental data are presented; it is demonstrated that such algorithms make it possible to increase the probability of correct classification of marine objects and to satisfy typical requirements for classification systems in low-frequency passive sonar systems.


Open Medicine ◽  
2021 ◽  
Vol 16 (1) ◽  
pp. 459-463
Author(s):  
Arash Hooshmand

Abstract A new logistic regression-based method to distinguish between cancerous and noncancerous RNA genomic data is developed and tested with 100% precision on 595 healthy and cancerous prostate samples. A logistic regression system is developed and trained using whole-exome sequencing data at a high-level, i.e., normalized quantification of RNAs obtained from 495 prostate cancer samples from The Cancer Genome Atlas and 100 healthy samples from the Genotype-Tissue Expression project. We could show that both sensitivity and specificity of the method in the classification of cancerous and noncancerous cells are perfectly 100%.


Author(s):  
B. Yang ◽  
X. Yu

Networks play the role of a high-level language, as is seen in Artificial Intelligence and statistics, because networks are used to build complex model from simple components. These years, Bayesian Networks, one of probabilistic networks, are a powerful data mining technique for handling uncertainty in complex domains. In this paper, we apply Bayesian Networks Augmented Naive Bayes (BAN) to texture classification of High-resolution satellite images and put up a new method to construct the network topology structure in terms of training accuracy based on the training samples. In the experiment, we choose GeoEye-1 satellite images. Experimental results demonstrate BAN outperform than NBC in the overall classification accuracy. Although it is time consuming, it will be an attractive and effective method in the future.


Data mining helps to solve many problems in the area of medical diagnosis using real-world data. However, much of the data is unrealizable as it does not have desirable features and contains a lot of gaps and errors. A complete set of data is a prerequisite for precise grouping and classification of a dataset. Preprocessing is a data mining technique that transforms the unrefined dataset into reliable and useful data. It is used for resolving the issues and changes raw data for next level processing. Discretization is a necessary step for data preprocessing task. It reduces the large chunks of numeric values to a group of well-organized values. It offers remarkable improvements in speed and accuracy in classification. This paper investigates the impact of preprocessing on the classification process. This work implements three techniques such as NaiveBayes, Logistic Regression, and SVM to classify Diabetes dataset. The experimental system is validated using discretize techniques and various classification algorithms.


2014 ◽  
Vol 14 (2) ◽  
pp. 5419-5431 ◽  
Author(s):  
Maha Fouad ◽  
Dr.Mahmoud M. Abd ellatif ◽  
Prof.Mohamed Hagag ◽  
Dr.Ahmed Akl

Predicting the outcome of a graft transplant with high level of accuracy is a challenging task In medical fields and Data Mining has a great role to answer the challenge. The goal of this study is to compare the performances and features of data mining technique namely Decision Tree , Rule Based Classifiers with Compare to Logistic Regression as a standard statistical data mining method to predict the outcome of kidney transplants over a 5-year horizon. The dataset was compiled from the Urology and Nephrology Center (UNC), Mansoura, Egypt. classifiers were developed using the Weka machine learning software workbench by applying Rule Based Classifiers (RIPPER, DTNB),Decision Tree Classifiers (BF,J48 ) and Logistic Regression. Further from Experimental Results, it has been found that Decision Tree and Rule Based classifiers are providing improved Accuracy and interpretable models compared to other Classifier.


Two different novel methods for classification of aircraft categories of Inverse Synthetic Aperture Radar (ISAR) images are presented. The first method forms numerical equivalents to shape, size, and other aircraft features as critical criteria to constitute the algorithm for their correct classification. The second method compares each ISAR image to unions of images of the different aircraft categories. ISAR images are constructed based on the Doppler shifts of various parts, caused by the rotation of the aircraft and the radar reflection pulse shape, which includes the size or duration of the radar pulse. The proposed classification algorithms were tested on seven aircraft categories. All seven different aircraft models are flying a holding pattern. The aim of both algorithms is to quickly match and determine the similarity of the captured aircraft to the seven different categories where the aircraft is in any position of a prescribed holding pattern. Experimental results clearly indicate that in most parts of the holding pattern the category of the aircraft can be successfully identified with both proposed methods. The union method shows more successful identification results and is superior to the results we obtained in the first proposed method.


Information ◽  
2021 ◽  
Vol 12 (12) ◽  
pp. 490
Author(s):  
Cristián Castillo-Olea ◽  
Roberto Conte-Galván ◽  
Clemente Zuñiga ◽  
Alexandra Siono ◽  
Angelica Huerta ◽  
...  

Background: The current pandemic caused by SARS-CoV-2 is an acute illness of global concern. SARS-CoV-2 is an infectious disease caused by a recently discovered coronavirus. Most people who get sick from COVID-19 experience either mild, moderate, or severe symptoms. In order to help make quick decisions regarding treatment and isolation needs, it is useful to determine which significant variables indicate infection cases in the population served by the Tijuana General Hospital (Hospital General de Tijuana). An Artificial Intelligence (Machine Learning) mathematical model was developed in order to identify early-stage significant variables in COVID-19 patients. Methods: The individual characteristics of the study subjects included age, gender, age group, symptoms, comorbidities, diagnosis, and outcomes. A mathematical model that uses supervised learning algorithms, allowing the identification of the significant variables that predict the diagnosis of COVID-19 with high precision, was developed. Results: Automatic algorithms were used to analyze the data: for Systolic Arterial Hypertension (SAH), the Logistic Regression algorithm showed results of 91.0% in area under ROC (AUC), 80% accuracy (CA), 80% F1 and 80% Recall, and 80.1% precision for the selected variables, while for Diabetes Mellitus (DM) with the Logistic Regression algorithm it obtained 91.2% AUC, 89.2% accuracy, 88.8% F1, 89.7% precision, and 89.2% recall for the selected variables. The neural network algorithm showed better results for patients with Obesity, obtaining 83.4% AUC, 91.4% accuracy, 89.9% F1, 90.6% precision, and 91.4% recall. Conclusions: Statistical analyses revealed that the significant predictive symptoms in patients with SAH, DM, and Obesity were more substantial in fatigue and myalgias/arthralgias. In contrast, the third dominant symptom in people with SAH and DM was odynophagia.


Sign in / Sign up

Export Citation Format

Share Document