scholarly journals Data Mining Techniques for Early Diagnosis of Diabetes: A Comparative Study

2021 ◽  
Vol 11 (5) ◽  
pp. 2218
Author(s):  
Luís Chaves ◽  
Gonçalo Marques

Diabetes is a life-long condition that is well-known in the 21st century. Once known as a disease of the West, the rise of diabetes has been fed by a nutrition shift, rapid urbanization and increasingly sedentary lifestyles. In late 2019, a new public health concern was emerging (COVID-19), with a particular hazard concerning people living with diabetes. Medical institutes have been collecting data for years. We expect to achieve predictions for pathological complications, which hopefully will prevent the loss of lives and improve the quality of life using data mining processes. This work proposes a comparative study of data mining techniques for early diagnosis of diabetes. We use a publicly accessible data set containing 520 instances, each with 17 attributes. Naive Bayes, Neural Network, AdaBoost, k-Nearest Neighbors, Random Forest and Support Vector Machine methods have been tested. The results suggest that Neural Networks should be used for diabetes prediction. The proposed model presents an AUC of 98.3% and 98.1% accuracy, an F1-Score, Precision and Sensitivity of 98.4% and a Specificity of 97.5%.

Author(s):  
Anchal Dahiya ◽  
Pooja Mittal

After experiencing the hard times of pandemic situations we learned that if we could have a smart system that can help us in automatic parking of the vehicles then it could be a great help to society. This idea motivated us to carry out this current work. Though, nowadays, in almost every application domain, IoT techniques are the buzzword. IoT techniques can also be used to achieve efficacy in predicting free available parking space in advance. But the biggest challenge with IoT techniques is that they generate numerous data, which makes its analysis intangible. It was realized that if IoT techniques can be fused with outperforming data mining techniques, more efficient predictions can be performed. Thus, for this purpose, the main objective of our paper is to firstly, select the most appropriate data mining technique, based on performance evaluation, and then to perform prediction of available parking space in advance by fusing it with IoT techniques. Due to the busy schedule, the drivers need to get information about free parking spaces in advance by using smart phones. With the help of this information, it will be easy for the drivers to park their vehicle in the exact location without wasting their precious time and will maintain social distancing in crowded areas too. Data mining techniques can play an important role in the prediction of available parking space, by extracting only relevant and important information when applied to the given dataset. For this purpose, a comparative analysis of five data mining techniques such as the Support Vector Machine, K- Nearest approach, Decision Tree, Random Forest, and Ensemble learning approaches are applied on PK lot data set by using Python language. For calculation of result anaconda (spyder) is used as a supportive tool. The main outcome of the paper is to find the technique that will give better results for the prediction of the available space and if we fused data mining techniques with IoT technologies results are improvised. Evaluation parameters that are used for finding the best technique are precision, recall, accuracy, and F1-Score. For numerical calculation of the results, the k-fold cross-validation method is used. As the empirical results are calculated using the Pk lot dataset, the decision tree outperformed the best among all the techniques that are selected for analysis.


Author(s):  
Usha Gupta ◽  
Kamlesh Sharma

Data mining plays a vital role in converting the medical data like text, image, and graphs into meaningful new data, which helps to take the better decision. In this chapter, an overview of the current research is discussed using the data mining techniques for the finding, analysis, and prediction of various diseases. The focus of this study is to identify the well-performing data mining algorithms used on medical and clinical databases. Multiple algorithms have been identified: text-based mining, association rule-based mining, pattern-based mining, keyword-based mining, machine learning, neural network support vector machine, apriori algorithm, k-means clustering, and natural language. Analyses of the algorithm show that there is no single algorithm or model more suitable for diagnosing or predicting diseases. In some scenarios, some algorithms work very well but not in another data set. There are many examples in clinical or medical research where the combination of different algorithms gives good results.


2019 ◽  
Vol 15 (2) ◽  
pp. 275-280
Author(s):  
Agus Setiyono ◽  
Hilman F Pardede

It is now common for a cellphone to receive spam messages. Great number of received messages making it difficult for human to classify those messages to Spam or no Spam.  One way to overcome this problem is to use Data Mining for automatic classifications. In this paper, we investigate various data mining techniques, named Support Vector Machine, Multinomial Naïve Bayes and Decision Tree for automatic spam detection. Our experimental results show that Support Vector Machine algorithm is the best algorithm over three evaluated algorithms. Support Vector Machine achieves 98.33%, while Multinomial Naïve Bayes achieves 98.13% and Decision Tree is at 97.10 % accuracy.


Author(s):  
Khalid AA Abakar ◽  
Chongwen Yu

This work demonstrated the possibility of using the data mining techniques such as artificial neural networks (ANN) and support vector machine (SVM) based model to predict the quality of the spinning yarn parameters. Three different kernel functions were used as SVM kernel functions which are Polynomial and Radial Basis Function (RBF) and Pearson VII Function-based Universal Kernel (PUK) and ANN model were used as data mining techniques to predict yarn properties. In this paper, it was found that the SVM model based on Person VII kernel function (PUK) have the same performance in prediction of spinning yarn quality in comparison with SVM based RBF kernel. The comparison with the ANN model showed that the two SVM models give a better prediction performance than an ANN model.


2018 ◽  
Vol 2018 ◽  
pp. 1-10 ◽  
Author(s):  
Johannes Masino ◽  
Jakob Thumm ◽  
Guillaume Levasseur ◽  
Michael Frey ◽  
Frank Gauterin ◽  
...  

This work aims at classifying the road condition with data mining methods using simple acceleration sensors and gyroscopes installed in vehicles. Two classifiers are developed with a support vector machine (SVM) to distinguish between different types of road surfaces, such as asphalt and concrete, and obstacles, such as potholes or railway crossings. From the sensor signals, frequency-based features are extracted, evaluated automatically with MANOVA. The selected features and their meaning to predict the classes are discussed. The best features are used for designing the classifiers. Finally, the methods, which are developed and applied in this work, are implemented in a Matlab toolbox with a graphical user interface. The toolbox visualizes the classification results on maps, thus enabling manual verification of the results. The accuracy of the cross-validation of classifying obstacles yields 81.0% on average and of classifying road material 96.1% on average. The results are discussed on a comprehensive exemplary data set.


2020 ◽  
Author(s):  
Daniela De Souza Gomes ◽  
Marcos Henrique Fonseca Ribeiro ◽  
Giovanni Ventorim Comarela ◽  
Gabriel Philippe Pereira

High failure rates are a worrying and relevant problem in Brazilian universities. From a data set of student transcripts, we performed a study case for both general and Computer Science contexts, in which Data Mining Techniques were used to find patterns concerning failures. The knowledge acquired can be used for better educational administration and also build intelligent systems to support students’ decision making.


The improvement of an information processing and Memory capacity, the vast amount of data is collected for various data analyses purposes. Data mining techniques are used to get knowledgeable information. The process of extraction of data by using data mining techniques the data get discovered publically and this leads to breaches of specific privacy data. Privacypreserving data mining is used to provide to protection of sensitive information from unwanted or unsanctioned disclosure. In this paper, we analysis the problem of discovering similarity checks for functional dependencies from a given dataset such that application of algorithm (l, d) inference with generalization can anonymised the micro data without loss in utility. [8] This work has presented Functional dependency based perturbation approach which hides sensitive information from the user, by applying (l, d) inference model on the dependency attributes based on Information Gain. This approach works on both categorical and numerical attributes. The perturbed data set does not affects the original dataset it maintains the same or very comparable patterns as the original data set. Hence the utility of the application is always high, when compared to other data mining techniques. The accuracy of the original and perturbed datasets is compared and analysed using tools, data mining classification algorithm.


2012 ◽  
Vol 38 (2) ◽  
pp. 375-397 ◽  
Author(s):  
Karel Dejaeger ◽  
Wouter Verbeke ◽  
David Martens ◽  
Bart Baesens

Sign in / Sign up

Export Citation Format

Share Document