Research on Data Mining Algorithm Based on Rough Set

Through in-depth study on the existing rough set and data mining technologies, for the shortcomings of the existing data mining algorithms based on rough set, this paper presents an improved algorithm. This algorithm has the attribute nuclear as the starting point of reduction calculation, filtering distinguishable matrix as the basis for selection of candidate attributes, and condition attribute, decision attribute information entropy as heuristic information, to find the smallest reduction of the decision information system. The improved algorithm well solves the defects of the heuristic algorithm based on distinguish matrix, reducing the property search space, so as to improve the reduction speed.

Download Full-text

Comparing Performance of Data Mining Algorithms in Prediction Heart Diseases

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v5i6.pp1569-1576 ◽

2015 ◽

Vol 5 (6) ◽

pp. 1569 ◽

Cited By ~ 13

Author(s):

Moloud Abdar ◽

Sharareh R. Niakan Kalhori ◽

Tole Sutikno ◽

Imam Much Ibnu Subroto ◽

Goli Arji

Keyword(s):

Neural Network ◽

Data Mining ◽

Decision Tree ◽

Heart Diseases ◽

Support Vector ◽

Data Mining Algorithm ◽

Network Support ◽

Data Mining Algorithms ◽

Mining Algorithms ◽

Analysis Models

Heart diseases are among the nation’s leading couse of mortality and moribidity. Data mining teqniques can predict the likelihood of patients getting a heart disease. The purpose of this study is comparison of different data mining algorithm on prediction of heart diseases. This work applied and compared data mining techniques to predict the risk of heart diseases. After feature analysis, models by five algorithms including decision tree (C5.0), neural network, support vector machine (SVM), logistic regression and k-nearest neighborhood (KNN) were developed and validated. C5.0 Decision tree has been able to build a model with greatest accuracy 93.02%, KNN, SVM, Neural network have been 88.37%, 86.05% and 80.23% respectively. Produced results of decision tree can be simply interpretable and applicable; their rules can be understood easily by different clinical practitioner.

Download Full-text

Migrating From Data Mining to Big Data Mining

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.4.14667 ◽

2018 ◽

Vol 7 (3.4) ◽

pp. 13

Author(s):

Gourav Bathla ◽

Himanshu Aggarwal ◽

Rinkle Rani

Keyword(s):

Data Mining ◽

Big Data ◽

Response Time ◽

Large Scale ◽

Naive Bayes ◽

Naïve Bayes ◽

Data Mining Algorithm ◽

Big Data Mining ◽

Data Mining Algorithms ◽

Mining Algorithms

Data mining is one of the most researched fields in computer science. Several researches have been carried out to extract and analyse important information from raw data. Traditional data mining algorithms like classification, clustering and statistical analysis can process small scale of data with great efficiency and accuracy. Social networking interactions, business transactions and other communications result in Big data. It is large scale of data which is not in competency for traditional data mining techniques. It is observed that traditional data mining algorithms are not capable for storage and processing of large scale of data. If some algorithms are capable, then response time is very high. Big data have hidden information, if that is analysed in intelligent manner can be highly beneficial for business organizations. In this paper, we have analysed the advancement from traditional data mining algorithms to Big data mining algorithms. Applications of traditional data mining algorithms can be straight forward incorporated in Big data mining algorithm. Several studies have analysed traditional data mining with Big data mining, but very few have analysed most important algortihsm within one research work, which is the core motive of our paper. Readers can easily observe the difference between these algorthithms with pros and cons. Mathemtics concepts are applied in data mining algorithms. Means and Euclidean distance calculation in Kmeans, Vectors application and margin in SVM and Bayes therorem, conditional probability in Naïve Bayes algorithm are real examples. Classification and clustering are the most important applications of data mining. In this paper, Kmeans, SVM and Naïve Bayes algorithms are analysed in detail to observe the accuracy and response time both on concept and empirical perspective. Hadoop, Mapreduce etc. Big data technologies are used for implementing Big data mining algorithms. Performace evaluation metrics like speedup, scaleup and response time are used to compare traditional mining with Big data mining.

Download Full-text

Comprehensibility of Data Mining Algorithms

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch037 ◽

2011 ◽

pp. 190-195 ◽

Cited By ~ 7

Author(s):

Zhi-Hua Zhou

Keyword(s):

Data Mining ◽

Decision Making ◽

Decision Makers ◽

Data Mining Algorithm ◽

Human Beings ◽

Common People ◽

Data Mining Algorithms ◽

Mining Algorithm ◽

Aid Decision ◽

Mining Algorithms

Data mining attempts to identify valid, novel, potentially useful, and ultimately understandable patterns from huge volume of data. The mined patterns must be ultimately understandable because the purpose of data mining is to aid decision-making. If the decision-makers cannot understand what does a mined pattern mean, then the pattern cannot be used well. Since most decision-makers are not data mining experts, ideally, the patterns should be in a style comprehensible to common people. So, comprehensibility of data mining algorithms, that is, the ability of a data mining algorithm to produce patterns understandable to human beings, is an important factor.

Download Full-text

TRADE-OFF BETWEEN COMPUTATION TIME AND NUMBER OF RULES FOR FUZZY MINING FROM QUANTITATIVE DATA

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488501001071 ◽

2001 ◽

Vol 09 (05) ◽

pp. 587-604 ◽

Cited By ~ 122

Author(s):

TZUNG-PEI HONG ◽

CHAN-SHENG KUO ◽

SHENG-CHAI CHI

Keyword(s):

Data Mining ◽

Computation Time ◽

Data Mining Algorithm ◽

Trade Off ◽

Fuzzy Association Rules ◽

Data Mining Algorithms ◽

Mining Algorithm ◽

Linguistic Term ◽

Complete Set ◽

Mining Algorithms

Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Most conventional data-mining algorithms identify the relationships among transactions using binary values. Transactions with quantitative values are however commonly seen in real-world applications. We proposed a fuzzy mining algorithm by which each attribute used only the linguistic term with the maximum cardinality int he mining process. The number of items was thus the same as that of the original attributes, making the processing time reduced. The fuzzy association rules derived in this way are not complete. This paper thus modifies it and proposes a new fuzzy data-mining algorithm for extrating interesting knowledge from transactions stored as quantitative values. The proposed algorithm can derive a more complete set of rules but with more computation time than the method proposed. Trade-off thus exists between the computation time and the completeness of rules. Choosing an appropriate learning method thus depends on the requirement of the application domains.

Download Full-text

Measurement of the appropriateness in career selection of the high school students by using data mining algorithms: A case study

Proceedings of the 2017 Federated Conference on Computer Science and Information Systems ◽

10.15439/2017f283 ◽

2017 ◽

Cited By ~ 1

Author(s):

Ahmet Firat Yelkuvan ◽

Hidayet Takci ◽

Kali Gurkahraman

Keyword(s):

High School ◽

Data Mining ◽

High School Students ◽

School Students ◽

Career Selection ◽

Data Mining Algorithms ◽

Using Data ◽

Mining Algorithms ◽

Selection Of

Download Full-text

Data Mining Algorithm and its Application with Massive Text Database

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.742.395 ◽

2015 ◽

Vol 742 ◽

pp. 395-398

Author(s):

Chun Ping Wang

Keyword(s):

Data Mining ◽

Semantic Analysis ◽

Data Mining Algorithm ◽

User Characteristics ◽

Text Data ◽

Data Mining Algorithms ◽

Mining Methods ◽

Language Characteristics ◽

Network Language ◽

Mining Algorithms

Features of large text data mining methods method is avoided semantic analysis from the lexical, syntactic, but by means of statistical analysis and processing large text data, thus maximizing literally ignored similar semantic differences, adapt to network language characteristics. The results of our paper show that data mining algorithms may extract the information in this article can portray the characteristics of vocabulary specific user characteristics and make recommendations based on the characteristics of the user vocabulary.

Download Full-text

Design and Implementation of Data Mining Based on Distributed Computing

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.644-650.1702 ◽

2014 ◽

Vol 644-650 ◽

pp. 1702-1705 ◽

Cited By ~ 2

Author(s):

Jin Hai Zhang

Keyword(s):

Data Mining ◽

Distributed Computing ◽

Data Storage ◽

Leading Edge ◽

Data Mining Algorithm ◽

Dynamic Features ◽

Computing Platform ◽

Data Mining Algorithms ◽

Rich Data ◽

Mining Algorithms

Beacause internet data has a massive, diverse, heterogeneous, dynamic features, using traditional databases to analyze these data, data storage and processing efficiencies already can not meet the requirements. Utilizing leading-edge distributed computing technology to solve traditional data mining scenarios in lack of data mining of massive data improved data mining algorithm of lot OK Hadoop distributed computing platform, which later on other data mining algorithms using Hadoop to reference while using rich data mining algorithms can be found there is more value in your data.

Download Full-text

Diabetic analytics: proposed conceptual data mining approaches in type 2 diabetes dataset

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v14.i1.pp88-95 ◽

2019 ◽

Vol 14 (1) ◽

pp. 88 ◽

Cited By ~ 1

Author(s):

Sinan Adnan Diwan Alalwan

Keyword(s):

Data Mining ◽

Type 2 Diabetes ◽

Random Forest ◽

Data Mining Algorithm ◽

Root Cause ◽

Data Mining Algorithms ◽

Conceptual Data ◽

Mining Algorithms ◽

Self Organizing

Diabetes is a fast spreading illness, which makes to worry millions of people around the globe. The people affected by type-2 diabetes are rapidly increasing and there are no effective diagnostic systems to control the diabetics. As per global health statistics, in western countries, population effected by type 2 diabetics are higher in rate and cost factor for treatment is increasing. There are no effective methods to eradicate the diabetes and it leads to carry out an investigative study on this disease. In existing reviews, researchers are using data analysis approaches to link the cause for diabetes with the patients based on the diet, life style, inheritance details, age factor, medical history, etc. to identify the root cause of the problem. By having multiple key factors and historical datasets, there are some data mining tools were developed, to generate new rules on the root cause of the disease and discover new knowledge from the past data’s, but the accuracy was not promising. The main objective of this paper is to carry out a detail literature review and design a conceptual data mining method at initial stage and implement it to improve the result accuracy compared to other classifiers. In this research, two data-mining algorithm were proposed at conceptual level: Self Organizing Map (SOM) and Random Forest Algorithm, which is applied on adult population datasets. The data set used for this research are from UCI machine Learning Repository: Diabetes Dataset. In this paper, data mining algorithms were discussed and implementation results were evaluated. Based on the result performance evaluation, Self-organizing maps have performed better compared to the Random Forest and other data mining algorithms such as naïve Bayes, decision tree, SVM and MLP for diagnosing the diabetes with better accuracy. In future, once system is implemented, it can be integrated with diabetic detector device for faster diagnosis of TYPE 2 diabetes disease.

Download Full-text

Anemia Selection in Pregnant Women by using Random prediction (Rp) Classification Algorithm

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b3016.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 2623-2630 ◽

Cited By ~ 1

Keyword(s):

Data Mining ◽

Feature Selection ◽

Classification Algorithm ◽

Computational Time ◽

The Novel ◽

Data Mining Algorithms ◽

Novel Method ◽

Mining Algorithms ◽

Median Vector ◽

Selection Of

Anemia is the global hematological disorder that occurs in pregnancy. The feature selection of unknown logical knowledge from the large dataset is capable with data mining techniques. The paper evaluates anemia features classes of Non-anemic, Mild and Severe or moderate in real time large-dimensional dataset. In the previous works, Anemia diseases can be classified in a selection of approaches, based on the Artificial Neural Networks (ANN), Gausnominal Classification and VectNeighbour classification. In these previous studies attains the proper feature selection with classification accuracy but it takes large time to predict the feature selection. So the current paper to overcome the feature selection, computational time process presents an improved Median vector feature selection (IMVFS) algorithm and new RandomPrediction (RP) classification algorithm to predict the anemia disease classes (Mild, Not anemic and Severe and moderate) based on the data mining algorithms. The results have shown that the performance of the novel method is effective compared with our previous Classification of ANN, Gausnominal and VectNeighbour classification algorithms. As the Experimental results show that proposed RandomPrediction (RP) classification with (IMVFS) feature selection methods clearly outperform than our previous methods

Download Full-text

Multi-Relational Data Mining A Comprehensive Survey

Improving Knowledge Discovery through the Integration of Data Mining Techniques - Advances in Data Mining and Database Management ◽

10.4018/978-1-4666-8513-0.ch003 ◽

2015 ◽

pp. 32-53

Author(s):

Ali H. Gazala ◽

Waseem Ahmad

Keyword(s):

Data Mining ◽

Search Space ◽

Data Representation ◽

Research Area ◽

Relational Data ◽

Future Research ◽

Relational Data Mining ◽

Data Mining Algorithms ◽

Table Data ◽

Mining Algorithms

Multi-Relational Data Mining or MRDM is a growing research area focuses on discovering hidden patterns and useful knowledge from relational databases. While the vast majority of data mining algorithms and techniques look for patterns in a flat single-table data representation, the sub-domain of MRDM looks for patterns that involve multiple tables (relations) from a relational database. This sub-domain has received an increased research attention during the last two decades due to the wide range of possible applications. As a result of that growing attention, many successful multi-relational data mining algorithms and techniques were presented. This chapter presents a comprehensive review about multi-relational data mining. It discusses the different approaches researchers have followed to explore the relational search space while highlighting some of the most significant challenges facing researchers working in this sub-domain. The chapter also describes number of MRDM systems that have been developed during the last few years and discusses some future research directions in this sub-domain.

Download Full-text