Data Mining Techniques for Early Diagnosis of Diabetes: A Comparative Study

Diabetes is a life-long condition that is well-known in the 21st century. Once known as a disease of the West, the rise of diabetes has been fed by a nutrition shift, rapid urbanization and increasingly sedentary lifestyles. In late 2019, a new public health concern was emerging (COVID-19), with a particular hazard concerning people living with diabetes. Medical institutes have been collecting data for years. We expect to achieve predictions for pathological complications, which hopefully will prevent the loss of lives and improve the quality of life using data mining processes. This work proposes a comparative study of data mining techniques for early diagnosis of diabetes. We use a publicly accessible data set containing 520 instances, each with 17 attributes. Naive Bayes, Neural Network, AdaBoost, k-Nearest Neighbors, Random Forest and Support Vector Machine methods have been tested. The results suggest that Neural Networks should be used for diabetes prediction. The proposed model presents an AUC of 98.3% and 98.1% accuracy, an F1-Score, Precision and Sensitivity of 98.4% and a Specificity of 97.5%.

Download Full-text

Evaluation of Data Mining Techniques and Its Fusion with IoT Enabled Smart Technologies for Effective Prediction of Available Parking Space

International journal of electrical and computer engineering systems ◽

10.32985/ijeces.12.4.2 ◽

2021 ◽

Vol 12 (4) ◽

pp. 187-197

Author(s):

Anchal Dahiya ◽

Pooja Mittal

Keyword(s):

Data Mining ◽

Decision Tree ◽

Support Vector ◽

Learning Approaches ◽

Parking Space ◽

Data Mining Technique ◽

Data Set ◽

Data Mining Techniques ◽

Hard Times ◽

Smart Technologies

After experiencing the hard times of pandemic situations we learned that if we could have a smart system that can help us in automatic parking of the vehicles then it could be a great help to society. This idea motivated us to carry out this current work. Though, nowadays, in almost every application domain, IoT techniques are the buzzword. IoT techniques can also be used to achieve efficacy in predicting free available parking space in advance. But the biggest challenge with IoT techniques is that they generate numerous data, which makes its analysis intangible. It was realized that if IoT techniques can be fused with outperforming data mining techniques, more efficient predictions can be performed. Thus, for this purpose, the main objective of our paper is to firstly, select the most appropriate data mining technique, based on performance evaluation, and then to perform prediction of available parking space in advance by fusing it with IoT techniques. Due to the busy schedule, the drivers need to get information about free parking spaces in advance by using smart phones. With the help of this information, it will be easy for the drivers to park their vehicle in the exact location without wasting their precious time and will maintain social distancing in crowded areas too. Data mining techniques can play an important role in the prediction of available parking space, by extracting only relevant and important information when applied to the given dataset. For this purpose, a comparative analysis of five data mining techniques such as the Support Vector Machine, K- Nearest approach, Decision Tree, Random Forest, and Ensemble learning approaches are applied on PK lot data set by using Python language. For calculation of result anaconda (spyder) is used as a supportive tool. The main outcome of the paper is to find the technique that will give better results for the prediction of the available space and if we fused data mining techniques with IoT technologies results are improvised. Evaluation parameters that are used for finding the best technique are precision, recall, accuracy, and F1-Score. For numerical calculation of the results, the k-fold cross-validation method is used. As the empirical results are calculated using the Pk lot dataset, the decision tree outperformed the best among all the techniques that are selected for analysis.

Download Full-text

Review of Data Mining Techniques Used in Healthcare

Advances in Medical Technologies and Clinical Practice - Diagnostic Applications of Health Intelligence and Surveillance Systems ◽

10.4018/978-1-7998-6527-8.ch001 ◽

2021 ◽

pp. 1-26

Author(s):

Usha Gupta ◽

Kamlesh Sharma

Keyword(s):

Data Mining ◽

Vital Role ◽

Mining Machine ◽

Support Vector ◽

Data Set ◽

Data Mining Techniques ◽

Network Support ◽

Data Mining Algorithms ◽

Clinical Databases ◽

Mining Algorithms

Data mining plays a vital role in converting the medical data like text, image, and graphs into meaningful new data, which helps to take the better decision. In this chapter, an overview of the current research is discussed using the data mining techniques for the finding, analysis, and prediction of various diseases. The focus of this study is to identify the well-performing data mining algorithms used on medical and clinical databases. Multiple algorithms have been identified: text-based mining, association rule-based mining, pattern-based mining, keyword-based mining, machine learning, neural network support vector machine, apriori algorithm, k-means clustering, and natural language. Analyses of the algorithm show that there is no single algorithm or model more suitable for diagnosing or predicting diseases. In some scenarios, some algorithms work very well but not in another data set. There are many examples in clinical or medical research where the combination of different algorithms gives good results.

Download Full-text

KLASIFIKASI SMS SPAM MENGGUNAKAN SUPPORT VECTOR MACHINE

Jurnal Pilar Nusa Mandiri ◽

10.33480/pilar.v15i2.693 ◽

2019 ◽

Vol 15 (2) ◽

pp. 275-280

Author(s):

Agus Setiyono ◽

Hilman F Pardede

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Spam Detection ◽

Support Vector Machine Algorithm ◽

Data Mining Techniques ◽

To Receive

It is now common for a cellphone to receive spam messages. Great number of received messages making it difficult for human to classify those messages to Spam or no Spam. One way to overcome this problem is to use Data Mining for automatic classifications. In this paper, we investigate various data mining techniques, named Support Vector Machine, Multinomial Naïve Bayes and Decision Tree for automatic spam detection. Our experimental results show that Support Vector Machine algorithm is the best algorithm over three evaluated algorithms. Support Vector Machine achieves 98.33%, while Multinomial Naïve Bayes achieves 98.13% and Decision Tree is at 97.10 % accuracy.

Download Full-text

The Spinning Quality Control Management Based on Decision Making by Data Mining Techniques

International Journal of Emerging Research in Management and Technology ◽

10.23956/ijermt.v7i1.25 ◽

2018 ◽

Vol 7 (1) ◽

pp. 72

Author(s):

Khalid AA Abakar ◽

Chongwen Yu

Keyword(s):

Data Mining ◽

Kernel Functions ◽

Support Vector ◽

Ann Model ◽

Data Mining Techniques ◽

Yarn Quality ◽

Yarn Properties ◽

Svm Model ◽

Rbf Kernel

This work demonstrated the possibility of using the data mining techniques such as artificial neural networks (ANN) and support vector machine (SVM) based model to predict the quality of the spinning yarn parameters. Three different kernel functions were used as SVM kernel functions which are Polynomial and Radial Basis Function (RBF) and Pearson VII Function-based Universal Kernel (PUK) and ANN model were used as data mining techniques to predict yarn properties. In this paper, it was found that the SVM model based on Person VII kernel function (PUK) have the same performance in prediction of spinning yarn quality in comparison with SVM based RBF kernel. The comparison with the ANN model showed that the two SVM models give a better prediction performance than an ANN model.

Download Full-text

A Comparative Study on Heart Disease Prediction Using Data Mining Techniques and Feature Selection

2021 2nd International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST) ◽

10.1109/icrest51555.2021.9331158 ◽

2021 ◽

Author(s):

Farzana Tasnim ◽

Sultana Umme Habiba

Keyword(s):

Data Mining ◽

Feature Selection ◽

Heart Disease ◽

Comparative Study ◽

Disease Prediction ◽

Data Mining Techniques ◽

Using Data

Download Full-text

A comparative study of data mining techniques in predicting consumers credit card risk in banks

AFRICAN JOURNAL OF BUSINESS MANAGEMENT ◽

10.5897/ajbm11.476 ◽

2013 ◽

Vol 5 (20) ◽

pp. 8307-8312 ◽

Cited By ~ 2

Author(s):

Kock Sheng Ling ◽

Ying Wah Teh

Keyword(s):

Data Mining ◽

Comparative Study ◽

Credit Card ◽

Data Mining Techniques

Download Full-text

Characterization of Road Condition with Data Mining Based on Measured Kinematic Vehicle Parameters

Journal of Advanced Transportation ◽

10.1155/2018/8647607 ◽

2018 ◽

Vol 2018 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Johannes Masino ◽

Jakob Thumm ◽

Guillaume Levasseur ◽

Michael Frey ◽

Frank Gauterin ◽

...

Keyword(s):

Data Mining ◽

Support Vector ◽

Matlab Toolbox ◽

Data Set ◽

The Road ◽

Acceleration Sensors ◽

Road Surfaces ◽

Road Condition ◽

Sensor Signals

This work aims at classifying the road condition with data mining methods using simple acceleration sensors and gyroscopes installed in vehicles. Two classifiers are developed with a support vector machine (SVM) to distinguish between different types of road surfaces, such as asphalt and concrete, and obstacles, such as potholes or railway crossings. From the sensor signals, frequency-based features are extracted, evaluated automatically with MANOVA. The selected features and their meaning to predict the classes are discussed. The best features are used for designing the classifiers. Finally, the methods, which are developed and applied in this work, are implemented in a Matlab toolbox with a graphical user interface. The toolbox visualizes the classification results on maps, thus enabling manual verification of the results. The accuracy of the cross-validation of classifying obstacles yields 81.0% on average and of classifying road material 96.1% on average. The results are discussed on a comprehensive exemplary data set.

Download Full-text

Failure Analysis in University and Computer Science Contexts With Data Mining

10.5753/wei.2020.11132 ◽

2020 ◽

Author(s):

Daniela De Souza Gomes ◽

Marcos Henrique Fonseca Ribeiro ◽

Giovanni Ventorim Comarela ◽

Gabriel Philippe Pereira

Keyword(s):

Data Mining ◽

Decision Making ◽

Failure Analysis ◽

Computer Science ◽

Educational Administration ◽

Intelligent Systems ◽

Data Set ◽

Data Mining Techniques ◽

Study Case ◽

Support Students

High failure rates are a worrying and relevant problem in Brazilian universities. From a data set of student transcripts, we performed a study case for both general and Computer Science contexts, in which Data Mining Techniques were used to find patterns concerning failures. The knowledge acquired can be used for better educational administration and also build intelligent systems to support students’ decision making.

Download Full-text

Privacy Preservation using (L, D) Inference Model Based on Dependency Identification Information Gain

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f1196.0986s319 ◽

2019 ◽

Vol 8 (6S3) ◽

pp. 1170-1173

Keyword(s):

Data Mining ◽

Information Gain ◽

Original Data ◽

Perturbation Approach ◽

Sensitive Information ◽

Functional Dependencies ◽

Inference Model ◽

Data Set ◽

Data Mining Techniques ◽

Original Dataset

The improvement of an information processing and Memory capacity, the vast amount of data is collected for various data analyses purposes. Data mining techniques are used to get knowledgeable information. The process of extraction of data by using data mining techniques the data get discovered publically and this leads to breaches of specific privacy data. Privacypreserving data mining is used to provide to protection of sensitive information from unwanted or unsanctioned disclosure. In this paper, we analysis the problem of discovering similarity checks for functional dependencies from a given dataset such that application of algorithm (l, d) inference with generalization can anonymised the micro data without loss in utility. [8] This work has presented Functional dependency based perturbation approach which hides sensitive information from the user, by applying (l, d) inference model on the dependency attributes based on Information Gain. This approach works on both categorical and numerical attributes. The perturbed data set does not affects the original dataset it maintains the same or very comparable patterns as the original data set. Hence the utility of the application is always high, when compared to other data mining techniques. The accuracy of the original and perturbed datasets is compared and analysed using tools, data mining classification algorithm.

Download Full-text