Improved Classification Techniques to Predict the Co-disease in Diabetic Mellitus Patients using Discretization and Apriori Algorithm

The demand for data mining is now unavoidable in the medical industry due to its various applications and uses in predicting the diseases at the early stage. The methods available in the data mining theories are easy to extract the useful patterns and speed to recognize the task based outcomes. In data mining the classification models are really useful in building the classes for the medical data sets for future analysis in an accurate way. Besides these facilities, Association rules in data mining are a promising technique to find hidden patterns in a medical data set and have been successfully applied with market basket data, census data and financial data. Apriori algorithm, is considered to be a classic algorithm, is useful in mining frequent item sets on a database containing a large number of transactions and it also predicts the relevant association rules. Association rules capture the relationship of items that are present in data sets and when the data set contains continuous attributes, the existing algorithms may not work due to this, discretization can be applied to the association rules in order to find the relation between various patterns in data set. In this paper of our research, using Discretized Apriori the research work is done to predict the by-disease in people who are found with diabetic syndrome; also the rules extracted are analyzed. In the discretization step, numerical data is discretized and fed to the Apriori algorithm for better association rules to predict the diseases.

Download Full-text

Finding Persistent Strong Rules

Knowledge Discovery Practices and Emerging Applications of Data Mining - Advances in Data Mining and Database Management ◽

10.4018/978-1-60960-067-9.ch005 ◽

2010 ◽

pp. 85-107

Author(s):

Anthony Scime ◽

Karthik Rajasethupathy ◽

Kulathur S. Rajasethupathy ◽

Gregg R. Murray

Keyword(s):

Data Mining ◽

Association Rules ◽

Strong Association ◽

National Election ◽

Data Sets ◽

Rule Discovery ◽

Discovery Process ◽

Data Set ◽

Rule Sets ◽

Election Studies

Data mining is a collection of algorithms for finding interesting and unknown patterns or rules in data. However, different algorithms can result in different rules from the same data. The process presented here exploits these differences to find particularly robust, consistent, and noteworthy rules among much larger potential rule sets. More specifically, this research focuses on using association rules and classification mining to select the persistently strong association rules. Persistently strong association rules are association rules that are verifiable by classification mining the same data set. The process for finding persistent strong rules was executed against two data sets obtained from the American National Election Studies. Analysis of the first data set resulted in one persistent strong rule and one persistent rule, while analysis of the second data set resulted in 11 persistent strong rules and 10 persistent rules. The persistent strong rule discovery process suggests these rules are the most robust, consistent, and noteworthy among the much larger potential rule sets.

Download Full-text

Finding Persistent Strong Rules

Data Mining ◽

10.4018/978-1-4666-2455-9.ch002 ◽

2013 ◽

pp. 28-49

Author(s):

Anthony Scime ◽

Karthik Rajasethupathy ◽

Kulathur S. Rajasethupathy ◽

Gregg R. Murray

Keyword(s):

Data Mining ◽

Association Rules ◽

Strong Association ◽

National Election ◽

Data Sets ◽

Rule Discovery ◽

Discovery Process ◽

Data Set ◽

Rule Sets ◽

Election Studies

Data mining is a collection of algorithms for finding interesting and unknown patterns or rules in data. However, different algorithms can result in different rules from the same data. The process presented here exploits these differences to find particularly robust, consistent, and noteworthy rules among much larger potential rule sets. More specifically, this research focuses on using association rules and classification mining to select the persistently strong association rules. Persistently strong association rules are association rules that are verifiable by classification mining the same data set. The process for finding persistent strong rules was executed against two data sets obtained from the American National Election Studies. Analysis of the first data set resulted in one persistent strong rule and one persistent rule, while analysis of the second data set resulted in 11 persistent strong rules and 10 persistent rules. The persistent strong rule discovery process suggests these rules are the most robust, consistent, and noteworthy among the much larger potential rule sets.

Download Full-text

Diagnosis of Various Thyroid Ailments using Data Mining Classification Techniques

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit195119 ◽

2019 ◽

pp. 131-136

Author(s):

Umar Sidiq ◽

Syed Mutahar Aaqib ◽

Rafi Ahmad Khan

Keyword(s):

Data Mining ◽

Decision Tree ◽

Research Work ◽

Support Vector ◽

Data Sets ◽

Data Mining Technique ◽

K Nearest Neighbors ◽

Data Set ◽

Classification Techniques ◽

Using Data

Classification is one of the most considerable supervised learning data mining technique used to classify predefined data sets the classification is mainly used in healthcare sectors for making decisions, diagnosis system and giving better treatment to the patients. In this work, the data set used is taken from one of recognized lab of Kashmir. The entire research work is to be carried out with ANACONDA3-5.2.0 an open source platform under Windows 10 environment. An experimental study is to be carried out using classification techniques such as k nearest neighbors, Support vector machine, Decision tree and Naïve bayes. The Decision Tree obtained highest accuracy of 98.89% over other classification techniques.

Download Full-text

Study and Analysis of Medical Data Mining Techniques in Healthcare for Heart Disease Using Hybrid Approach

Intelligent Systems and Computer Technology - Advances in Parallel Computing ◽

10.3233/apc200144 ◽

2020 ◽

Author(s):

Khodke harish Eknath ◽

Yadav S K ◽

Kyatanavar D N

Keyword(s):

Data Mining ◽

Heart Disease ◽

Research Work ◽

Hybrid Approach ◽

Medical Data ◽

Data Sets ◽

Medical Data Mining ◽

Data Mining Techniques ◽

Inductive Type ◽

The Right

Information mining frameworks are exhaustively used in coronary affliction for affirmation and figure. As heart condition is that the essential clarification for death for individuals, recognizing confirmation . The work proposed is inductive type and needs deep analysis of the data to ensure the right predictions on the data sets provided. A sample dataset of patients for heart disease will be collected from repository. It involves the steps and procedure. The proposed research work can be carried out step by step to conclude it with the accurate results.

Download Full-text

Penerapan Data Mining Menggunakan Algoritma Apriori untuk Menentukan Pola Penyebab Gelandangan dan Pengemis

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.2020721376 ◽

2020 ◽

Vol 7 (2) ◽

pp. 229

Author(s):

Wirta Agustin ◽

Yulya Muharmi

Keyword(s):

Data Mining ◽

Association Rule ◽

Urban Areas ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Itemset ◽

Frequent Pattern ◽

Data Sets ◽

Apriori Algorithm ◽

Data Set

Gelandangan dan pengemis salah satu masalah yang ada di daerah perkotaan, karena dapat mengganggu ketertiban umum, keamanan, stabilitas dan pembangunan kota. Upaya yang dilakukan saat ini masih fokus pada cara penanganan gelandangan dan pengemis, belum untuk pencegahan. Salah satu cara yang bisa dilakukan adalah dengan menentukan pola usia gelandangan dan pengemis. Algoritma Apriori sebuah metode Association Rule dalam data mining untuk menentukan frequent itemset yang berfungsi membantu menemukan pola dalam sebuah data (frequent pattern mining). Perhitungan manual menggunakan algoritma apriori, menghasilkan pola kombinasi sebanyak 3 rules dengan nilai minimum support sebesar 30% dan nilai confidence tertinggi sebesar 100%. Pengujian penerapan Algoritma Apriori menggunakan aplikasi RapidMiner. RapidMiner salah satu software pengolahan data mining, diantaranya analisis teks, mengekstrak pola-pola dari data set dan mengkombinasikannya dengan metode statistika, kecerdasan buatan, dan database untuk mendapatkan informasi bermutu tinggi dari data yang diolah. Hasil pengujian menunjukkan perbandingan pola usia gelandangan dan pengemis yang berpotensi menjadi gelandangan dan pengemis. Berdasarkan hasil pengujian aplikasi RapidMiner dan hasil perhitungan manual Algoritma Apriori, dapat disimpulkan sesuai kriteria pengujian, bahiwa pola (rules) usia dan nilai confidence (c) hasil perhitungan manual Algoritma Apriori tidak mendekati nilai hasil pengujian menggunakan aplikasi RapidMiner, maka tingkat keakuratan pengujian rendah, yaitu 37.5 %. Abstract Homeless and beggars are one of the problems in urban areas as they possibly disrupt public order, security, stability and urban development. The efforts conducted are still focusing on managing the existing homeless and beggars instead of preventing the potential ones. One of the methods used for solving this problem is Algoritma Apriori which determines the age pattern of homeless and beggars. Apriori Algorithm is an Association Rule method in data mining to determine frequent item set that serves to help in finding patterns in a data (frequent pattern mining). The manual calculation through Apriori Algorithm obtains combination pattern of 3 rules with a minimum support value of 30% and the highest confidence value of 100%. These patterns were refences for the incharged department in precaution action of homeless and beggars arising numbers. Apriori Algorithm testing uses the RapidMiner application which is one of data mining processing software, including text analysis, extracting patterns from data sets and combining them with statistical methods, artificial intelligence, and databases to obtain high quality information from processed data. Based on the results of the said testing, it can be concluded that the level of accuracy test is low, i.e. 37.5%.

Download Full-text

Review and comparison of Apriori algorithm implementations on Hadoop-MapReduce and Spark

The Knowledge Engineering Review ◽

10.1017/s0269888918000127 ◽

2018 ◽

Vol 33 ◽

Cited By ~ 4

Author(s):

Eduardo P. S. Castro ◽

Thiago D. Maia ◽

Marluce R. Pereira ◽

Ahmed A. A. Esmin ◽

Denilson A. Pereira

Keyword(s):

Association Rules ◽

Data Sets ◽

Apriori Algorithm ◽

Mapreduce Framework ◽

Data Set ◽

Hadoop Mapreduce ◽

Detailed Assessment ◽

Mining Association Rules

AbstractSeveral Apriori algorithm implementations for mining association rules have been proposed in the literature using the Hadoop-MapReduce framework and, more recently, Spark. However, none of the works have made a detailed assessment of its performance, for example, comparing it with other implementations in various characteristics of data sets. In this work, we present a review of the main algorithms proposed for Hadoop-MapReduce and compared their implementations in a single environment under several different situations. Moreover, these algorithms had their implementations adapted to Spark, and also compared under the same circumstances. Based on the results of the experiments, we present a framework for recommending the Apriori implementation most appropriate for solving a given problem, according to the data set characteristics and minimum required support. The results show that Spark implementations overcome Hadoop-MapReduce implementations at runtime in most experiments. However, there is no single implementation that is the best in all the evaluated situations.

Download Full-text

Extrication of Apriori Algorithm using Association Rules on Medical Data sets

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset19627 ◽

2019 ◽

pp. 107-112

Author(s):

Anusha Viswanadapalli ◽

Praveen Kumar Nelapati

Keyword(s):

Data Mining ◽

Research Study ◽

Medical Data ◽

Frequent Pattern ◽

Data Sets ◽

Apriori Algorithm ◽

Compact Structure ◽

Frequent Item ◽

Frequent Pattern Tree ◽

Frequent Item Sets

During the process of mining frequent item sets, when minimum support is little, the production of candidate sets is a kind of time-consuming and frequent operation in the mining algorithm. The APRIORI growth algorithm does not need to produce the candidate sets, the database which provides the frequent item set is compressed to a frequent pattern tree (or APRIORI tree), and frequent item set is mining by using of APRIORI tree. These algorithms considered as efficient because of their compact structure and also for less generation of candidates item sets compare to Apriori and Apriori like algorithms. Therefore this paper aims to presents a basic Concepts of some of the algorithms (APRIORI-Growth, COFI-Tree, CT-PRO) based upon the APRIORI- Tree like structure for mining the frequent item sets along with their capabilities and comparisons. Data mining implementation on MEDICAL data to generate rules and patterns using Frequent Pattern (APRIORI)-Growth algorithm is the major concern of this research study. We presented in this paper how data mining can apply on MEDICAL data.

Download Full-text

Research on Improved Apriori Algorithm Based on Data Mining in Electronic Cases

International Journal of Healthcare Information Systems and Informatics ◽

10.4018/ijhisi.2019070102 ◽

2019 ◽

Vol 14 (3) ◽

pp. 16-28

Author(s):

Xiaoli Wang ◽

Kui Su ◽

Lirong Su

Keyword(s):

Data Mining ◽

Association Rules ◽

Medical Data ◽

Apriori Algorithm ◽

Prediction System ◽

Basic Information ◽

Related Information ◽

Resultant Data ◽

Health Related ◽

Lifestyle Related Diseases

This article makes progress of a commonly used Apriori algorithm, and proposes a new Apriori algorithm based on event ID. In this article, association rules are gained from massive medical data through the new Apriori algorithm. This article proposes and then uses the association rules in the prediction system. This article aims at making the lifestyle-related diseases prediction system provide better service for people, for families and for the whole society. The prediction system can automatically give out health-related information of the user after the person's basic information is put in, and it would also give out some pieces of valuable advice according to the resultant data, helping people realize self-determinant health engagement.

Download Full-text

A Support Based Initialization Algorithm for Categorical Data Clustering

Journal of Information Technology Research ◽

10.4018/jitr.2018040104 ◽

2018 ◽

Vol 11 (2) ◽

pp. 53-67

Author(s):

Ajay Kumar ◽

Shishir Kumar

Keyword(s):

Categorical Data ◽

Selection Process ◽

Numerical Data ◽

Real Data ◽

Data Sets ◽

Data Set ◽

Data Object ◽

Data Points ◽

Wu Method ◽

Selection Algorithms

Several initial center selection algorithms are proposed in the literature for numerical data, but the values of the categorical data are unordered so, these methods are not applicable to a categorical data set. This article investigates the initial center selection process for the categorical data and after that present a new support based initial center selection algorithm. The proposed algorithm measures the weight of unique data points of an attribute with the help of support and then integrates these weights along the rows, to get the support of every row. Further, a data object having the largest support is chosen as an initial center followed by finding other centers that are at the greatest distance from the initially selected center. The quality of the proposed algorithm is compared with the random initial center selection method, Cao's method, Wu method and the method introduced by Khan and Ahmad. Experimental analysis on real data sets shows the effectiveness of the proposed algorithm.

Download Full-text