Research on Data Mining Algorithm Based on Rough Set

2012 ◽  
Vol 433-440 ◽  
pp. 3340-3346 ◽  
Author(s):  
Yong Bin Yang

Through in-depth study on the existing rough set and data mining technologies, for the shortcomings of the existing data mining algorithms based on rough set, this paper presents an improved algorithm. This algorithm has the attribute nuclear as the starting point of reduction calculation, filtering distinguishable matrix as the basis for selection of candidate attributes, and condition attribute, decision attribute information entropy as heuristic information, to find the smallest reduction of the decision information system. The improved algorithm well solves the defects of the heuristic algorithm based on distinguish matrix, reducing the property search space, so as to improve the reduction speed.

Author(s):  
Moloud Abdar ◽  
Sharareh R. Niakan Kalhori ◽  
Tole Sutikno ◽  
Imam Much Ibnu Subroto ◽  
Goli Arji

Heart diseases are among the nation’s leading couse of mortality and moribidity. Data mining teqniques can predict the likelihood of patients getting a heart disease. The purpose of this study is comparison of different data mining algorithm on prediction of heart diseases. This work applied and compared data mining techniques to predict the risk of heart diseases. After feature analysis, models by five algorithms including decision tree (C5.0), neural network, support vector machine (SVM), logistic regression and k-nearest neighborhood (KNN) were developed and validated. C5.0 Decision tree has been able to build a model with greatest accuracy 93.02%, KNN, SVM, Neural network have been 88.37%, 86.05% and 80.23% respectively. Produced results of decision tree can be simply interpretable and applicable; their rules can be understood easily by different clinical practitioner.


2018 ◽  
Vol 7 (3.4) ◽  
pp. 13
Author(s):  
Gourav Bathla ◽  
Himanshu Aggarwal ◽  
Rinkle Rani

Data mining is one of the most researched fields in computer science. Several researches have been carried out to extract and analyse important information from raw data. Traditional data mining algorithms like classification, clustering and statistical analysis can process small scale of data with great efficiency and accuracy. Social networking interactions, business transactions and other communications result in Big data. It is large scale of data which is not in competency for traditional data mining techniques. It is observed that traditional data mining algorithms are not capable for storage and processing of large scale of data. If some algorithms are capable, then response time is very high. Big data have hidden information, if that is analysed in intelligent manner can be highly beneficial for business organizations. In this paper, we have analysed the advancement from traditional data mining algorithms to Big data mining algorithms. Applications of traditional data mining algorithms can be straight forward incorporated in Big data mining algorithm. Several studies have analysed traditional data mining with Big data mining, but very few have analysed most important algortihsm within one research work, which is the core motive of our paper. Readers can easily observe the difference between these algorthithms with  pros and cons. Mathemtics concepts are applied in data mining algorithms. Means and Euclidean distance calculation in Kmeans, Vectors application and margin in SVM and Bayes therorem, conditional probability in Naïve Bayes algorithm are real examples.  Classification and clustering are the most important applications of data mining. In this paper, Kmeans, SVM and Naïve Bayes algorithms are analysed in detail to observe the accuracy and response time both on concept and empirical perspective. Hadoop, Mapreduce etc. Big data technologies are used for implementing Big data mining algorithms. Performace evaluation metrics like speedup, scaleup and response time are used to compare traditional mining with Big data mining.  


Author(s):  
Zhi-Hua Zhou

Data mining attempts to identify valid, novel, potentially useful, and ultimately understandable patterns from huge volume of data. The mined patterns must be ultimately understandable because the purpose of data mining is to aid decision-making. If the decision-makers cannot understand what does a mined pattern mean, then the pattern cannot be used well. Since most decision-makers are not data mining experts, ideally, the patterns should be in a style comprehensible to common people. So, comprehensibility of data mining algorithms, that is, the ability of a data mining algorithm to produce patterns understandable to human beings, is an important factor.


Author(s):  
TZUNG-PEI HONG ◽  
CHAN-SHENG KUO ◽  
SHENG-CHAI CHI

Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Most conventional data-mining algorithms identify the relationships among transactions using binary values. Transactions with quantitative values are however commonly seen in real-world applications. We proposed a fuzzy mining algorithm by which each attribute used only the linguistic term with the maximum cardinality int he mining process. The number of items was thus the same as that of the original attributes, making the processing time reduced. The fuzzy association rules derived in this way are not complete. This paper thus modifies it and proposes a new fuzzy data-mining algorithm for extrating interesting knowledge from transactions stored as quantitative values. The proposed algorithm can derive a more complete set of rules but with more computation time than the method proposed. Trade-off thus exists between the computation time and the completeness of rules. Choosing an appropriate learning method thus depends on the requirement of the application domains.


2015 ◽  
Vol 742 ◽  
pp. 395-398
Author(s):  
Chun Ping Wang

Features of large text data mining methods method is avoided semantic analysis from the lexical, syntactic, but by means of statistical analysis and processing large text data, thus maximizing literally ignored similar semantic differences, adapt to network language characteristics. The results of our paper show that data mining algorithms may extract the information in this article can portray the characteristics of vocabulary specific user characteristics and make recommendations based on the characteristics of the user vocabulary.


2014 ◽  
Vol 644-650 ◽  
pp. 1702-1705 ◽  
Author(s):  
Jin Hai Zhang

Beacause internet data has a massive, diverse, heterogeneous, dynamic features, using traditional databases to analyze these data, data storage and processing efficiencies already can not meet the requirements. Utilizing leading-edge distributed computing technology to solve traditional data mining scenarios in lack of data mining of massive data improved data mining algorithm of lot OK Hadoop distributed computing platform, which later on other data mining algorithms using Hadoop to reference while using rich data mining algorithms can be found there is more value in your data.


Author(s):  
Sinan Adnan Diwan Alalwan

<p><span>Diabetes is a fast spreading illness, which makes to worry millions of people around the globe. The people affected by type-2 diabetes are rapidly increasing and there are no effective diagnostic systems to control the diabetics. As per global health statistics, in western countries, population effected by type 2 diabetics are higher in rate and cost factor for treatment is increasing. There are no effective methods to eradicate the diabetes and it leads to carry out an investigative study on this disease. In existing reviews, researchers are using data analysis approaches to link the cause for diabetes with the patients based on the diet, life style, inheritance details, age factor, medical history, etc. to identify the root cause of the problem. By having multiple key factors and historical datasets, there are some data mining tools were developed, to generate new rules on the root cause of the disease and discover new knowledge from the past data’s, but the accuracy was not promising. The main objective of this paper is to carry out a detail literature review and design a conceptual data mining method at initial stage and implement it to improve the result accuracy compared to other classifiers. <br /> In this research, two data-mining algorithm were proposed at conceptual level: Self Organizing Map (SOM) and Random Forest Algorithm, which is applied on adult population datasets. The data set used for this research are from UCI machine Learning Repository: Diabetes Dataset. In this paper, <br /> data mining algorithms were discussed and implementation results were evaluated. Based on the result performance evaluation, Self-organizing maps have performed better compared to the Random Forest and other data mining algorithms such as naïve Bayes, decision tree, SVM and MLP for diagnosing the diabetes with better accuracy. In future, once system is implemented, <br /> it can be integrated with diabetic detector device for faster diagnosis of TYPE 2 diabetes disease.</span></p>


2019 ◽  
Vol 8 (2) ◽  
pp. 2623-2630 ◽  

Anemia is the global hematological disorder that occurs in pregnancy. The feature selection of unknown logical knowledge from the large dataset is capable with data mining techniques. The paper evaluates anemia features classes of Non-anemic, Mild and Severe or moderate in real time large-dimensional dataset. In the previous works, Anemia diseases can be classified in a selection of approaches, based on the Artificial Neural Networks (ANN), Gausnominal Classification and VectNeighbour classification. In these previous studies attains the proper feature selection with classification accuracy but it takes large time to predict the feature selection. So the current paper to overcome the feature selection, computational time process presents an improved Median vector feature selection (IMVFS) algorithm and new RandomPrediction (RP) classification algorithm to predict the anemia disease classes (Mild, Not anemic and Severe and moderate) based on the data mining algorithms. The results have shown that the performance of the novel method is effective compared with our previous Classification of ANN, Gausnominal and VectNeighbour classification algorithms. As the Experimental results show that proposed RandomPrediction (RP) classification with (IMVFS) feature selection methods clearly outperform than our previous methods


Author(s):  
Ali H. Gazala ◽  
Waseem Ahmad

Multi-Relational Data Mining or MRDM is a growing research area focuses on discovering hidden patterns and useful knowledge from relational databases. While the vast majority of data mining algorithms and techniques look for patterns in a flat single-table data representation, the sub-domain of MRDM looks for patterns that involve multiple tables (relations) from a relational database. This sub-domain has received an increased research attention during the last two decades due to the wide range of possible applications. As a result of that growing attention, many successful multi-relational data mining algorithms and techniques were presented. This chapter presents a comprehensive review about multi-relational data mining. It discusses the different approaches researchers have followed to explore the relational search space while highlighting some of the most significant challenges facing researchers working in this sub-domain. The chapter also describes number of MRDM systems that have been developed during the last few years and discusses some future research directions in this sub-domain.


Sign in / Sign up

Export Citation Format

Share Document