Finding Persistent Strong Rules

Author(s):  
Anthony Scime ◽  
Karthik Rajasethupathy ◽  
Kulathur S. Rajasethupathy ◽  
Gregg R. Murray

Data mining is a collection of algorithms for finding interesting and unknown patterns or rules in data. However, different algorithms can result in different rules from the same data. The process presented here exploits these differences to find particularly robust, consistent, and noteworthy rules among much larger potential rule sets. More specifically, this research focuses on using association rules and classification mining to select the persistently strong association rules. Persistently strong association rules are association rules that are verifiable by classification mining the same data set. The process for finding persistent strong rules was executed against two data sets obtained from the American National Election Studies. Analysis of the first data set resulted in one persistent strong rule and one persistent rule, while analysis of the second data set resulted in 11 persistent strong rules and 10 persistent rules. The persistent strong rule discovery process suggests these rules are the most robust, consistent, and noteworthy among the much larger potential rule sets.

Data Mining ◽  
2013 ◽  
pp. 28-49
Author(s):  
Anthony Scime ◽  
Karthik Rajasethupathy ◽  
Kulathur S. Rajasethupathy ◽  
Gregg R. Murray

Data mining is a collection of algorithms for finding interesting and unknown patterns or rules in data. However, different algorithms can result in different rules from the same data. The process presented here exploits these differences to find particularly robust, consistent, and noteworthy rules among much larger potential rule sets. More specifically, this research focuses on using association rules and classification mining to select the persistently strong association rules. Persistently strong association rules are association rules that are verifiable by classification mining the same data set. The process for finding persistent strong rules was executed against two data sets obtained from the American National Election Studies. Analysis of the first data set resulted in one persistent strong rule and one persistent rule, while analysis of the second data set resulted in 11 persistent strong rules and 10 persistent rules. The persistent strong rule discovery process suggests these rules are the most robust, consistent, and noteworthy among the much larger potential rule sets.


The demand for data mining is now unavoidable in the medical industry due to its various applications and uses in predicting the diseases at the early stage. The methods available in the data mining theories are easy to extract the useful patterns and speed to recognize the task based outcomes. In data mining the classification models are really useful in building the classes for the medical data sets for future analysis in an accurate way. Besides these facilities, Association rules in data mining are a promising technique to find hidden patterns in a medical data set and have been successfully applied with market basket data, census data and financial data. Apriori algorithm, is considered to be a classic algorithm, is useful in mining frequent item sets on a database containing a large number of transactions and it also predicts the relevant association rules. Association rules capture the relationship of items that are present in data sets and when the data set contains continuous attributes, the existing algorithms may not work due to this, discretization can be applied to the association rules in order to find the relation between various patterns in data set. In this paper of our research, using Discretized Apriori the research work is done to predict the by-disease in people who are found with diabetic syndrome; also the rules extracted are analyzed. In the discretization step, numerical data is discretized and fed to the Apriori algorithm for better association rules to predict the diseases.


The demand for data mining is now unavoidable in the medical industry due to its various applications and uses in predicting the diseases at the early stage. The methods available in the data mining theories are easy to extract the useful patterns and speed to recognize the task based outcomes. In data mining the classification models are really useful in building the classes for the medical data sets for future analysis in an accurate way. Besides these facilities, Association rules in data mining are a promising technique to find hidden patterns in a medical data set and have been successfully applied with market basket data, census data and financial data. Apriori algorithm, is considered to be a classic algorithm, is useful in mining frequent item sets on a database containing a large number of transactions and it also predicts the relevant association rules. Association rules capture the relationship of items that are present in data sets and when the data set contains continuous attributes, the existing algorithms may not work due to this, discretization can be applied to the association rules in order to find the relation between various patterns in data set. In this paper of our research, using Discretized Apriori the research work is done to predict the by-disease in people who are found with diabetic syndrome; also the rules extracted are analyzed. In the discretization step, numerical data is discretized and fed to the Apriori algorithm for better association rules to predict the diseases.


Author(s):  
Wei Wang ◽  

At present, storage technology cannot save data completely. Therefore, in such a big data environment, data mining technology needs to be optimized for intelligent data. Firstly, in the face of massive intelligent data, the potential relationship between data items in the database is firstly described by association rules. The data items are measured by support degree and confidence level, and the data set with minimum support is found. At the same time, strong association rules are obtained according to the given confidence level of users. Secondly, in order to effectively improve the scanning speed of data items, an optimized association data mining technology based on hash technology and optimized transaction compression technology is proposed. A hash function is used to count the item set in the set of waiting options, and the count is less than its support, then the pruning is done, and then the object compression technique is used to delete the item and the transaction which is unrelated to the item set, so as to improve the processing efficiency of the association rules. Experiments show that the optimized data mining technology can significantly improve the efficiency of obtaining valuable intelligent data.


Data Mining ◽  
2011 ◽  
pp. 191-208 ◽  
Author(s):  
Rafael S. Parpinelli ◽  
Heitor S. Lopes ◽  
Alex A. Freitas

This work proposes an algorithm for rule discovery called Ant-Miner (Ant Colony-Based Data Miner). The goal of Ant-Miner is to extract classification rules from data. The algorithm is based on recent research on the behavior of real ant colonies as well as in some data mining concepts. We compare the performance of Ant-Miner with the performance of the wellknown C4.5 algorithm on six public domain data sets. The results provide evidence that: (a) Ant-Miner is competitive with C4.5 with respect to predictive accuracy; and (b) the rule sets discovered by Ant-Miner are simpler (smaller) than the rule sets discovered by C4.5.


2013 ◽  
Vol 284-287 ◽  
pp. 3070-3073
Author(s):  
Duen Kai Chen

In this study, we report a voting behavior analysis intelligent system based on data mining technology. From previous literature, we have witnessed increasing number of studies applied information technology to facilitate voting behavior analysis. In this study, we built a likely voter identification model through the use of data mining technology, the classification algorithm used here constructs decision tree model to identify voters and non voters. This model is evaluated by its accuracy and number of attributes used to correctly identify likely voter. Our goal is to try to use just a small number of survey questions while maintaining the accuracy rates of other similar models. This model was built and tested on Taiwan’s Election and Democratization Study (TEDS) data sets. According to the experimental results, the proposed model can improve likely voter identification rate and this finding is consistent with previous studies based on American National Election Studies.


2008 ◽  
pp. 2105-2120
Author(s):  
Kesaraporn Techapichetvanich ◽  
Amitava Datta

Both visualization and data mining have become important tools in discovering hidden relationships in large data sets, and in extracting useful knowledge and information from large databases. Even though many algorithms for mining association rules have been researched extensively in the past decade, they do not incorporate users in the association-rule mining process. Most of these algorithms generate a large number of association rules, some of which are not practically interesting. This chapter presents a new technique that integrates visualization into the mining association rule process. Users can apply their knowledge and be involved in finding interesting association rules through interactive visualization, after obtaining visual feedback as the algorithm generates association rules. In addition, the users gain insight and deeper understanding of their data sets, as well as control over mining meaningful association rules.


Author(s):  
Kesaraporn Techapichetvanich ◽  
Amitava Datta

Both visualization and data mining have become important tools in discovering hidden relationships in large data sets, and in extracting useful knowledge and information from large databases. Even though many algorithms for mining association rules have been researched extensively in the past decade, they do not incorporate users in the association-rule mining process. Most of these algorithms generate a large number of association rules, some of which are not practically interesting. This chapter presents a new technique that integrates visualization into the mining association rule process. Users can apply their knowledge and be involved in finding interesting association rules through interactive visualization, after obtaining visual feedback as the algorithm generates association rules. In addition, the users gain insight and deeper understanding of their data sets, as well as control over mining meaningful association rules.


Author(s):  
Jung Hwan Oh ◽  
Jeong Kyu Lee ◽  
Sae Hwang

Data mining, which is defined as the process of extracting previously unknown knowledge and detecting interesting patterns from a massive set of data, has been an active research area. As a result, several commercial products and research prototypes are available nowadays. However, most of these studies have focused on corporate data — typically in an alpha-numeric database, and relatively less work has been pursued for the mining of multimedia data (Zaïane, Han, & Zhu, 2000). Digital multimedia differs from previous forms of combined media in that the bits representing texts, images, audios, and videos can be treated as data by computer programs (Simoff, Djeraba, & Zaïane, 2002). One facet of these diverse data in terms of underlying models and formats is that they are synchronized and integrated hence, can be treated as integrated data records. The collection of such integral data records constitutes a multimedia data set. The challenge of extracting meaningful patterns from such data sets has lead to research and development in the area of multimedia data mining. This is a challenging field due to the non-structured nature of multimedia data. Such ubiquitous data is required in many applications such as financial, medical, advertising and Command, Control, Communications and Intelligence (C3I) (Thuraisingham, Clifton, Maurer, & Ceruti, 2001). Multimedia databases are widespread and multimedia data sets are extremely large. There are tools for managing and searching within such collections, but the need for tools to extract hidden and useful knowledge embedded within multimedia data is becoming critical for many decision-making applications.


Sign in / Sign up

Export Citation Format

Share Document