Optimization of Intelligent Data Mining Technology in Big Data Environment

Author(s):  
Wei Wang ◽  

At present, storage technology cannot save data completely. Therefore, in such a big data environment, data mining technology needs to be optimized for intelligent data. Firstly, in the face of massive intelligent data, the potential relationship between data items in the database is firstly described by association rules. The data items are measured by support degree and confidence level, and the data set with minimum support is found. At the same time, strong association rules are obtained according to the given confidence level of users. Secondly, in order to effectively improve the scanning speed of data items, an optimized association data mining technology based on hash technology and optimized transaction compression technology is proposed. A hash function is used to count the item set in the set of waiting options, and the count is less than its support, then the pruning is done, and then the object compression technique is used to delete the item and the transaction which is unrelated to the item set, so as to improve the processing efficiency of the association rules. Experiments show that the optimized data mining technology can significantly improve the efficiency of obtaining valuable intelligent data.

Author(s):  
Anthony Scime ◽  
Karthik Rajasethupathy ◽  
Kulathur S. Rajasethupathy ◽  
Gregg R. Murray

Data mining is a collection of algorithms for finding interesting and unknown patterns or rules in data. However, different algorithms can result in different rules from the same data. The process presented here exploits these differences to find particularly robust, consistent, and noteworthy rules among much larger potential rule sets. More specifically, this research focuses on using association rules and classification mining to select the persistently strong association rules. Persistently strong association rules are association rules that are verifiable by classification mining the same data set. The process for finding persistent strong rules was executed against two data sets obtained from the American National Election Studies. Analysis of the first data set resulted in one persistent strong rule and one persistent rule, while analysis of the second data set resulted in 11 persistent strong rules and 10 persistent rules. The persistent strong rule discovery process suggests these rules are the most robust, consistent, and noteworthy among the much larger potential rule sets.


Data Mining ◽  
2013 ◽  
pp. 28-49
Author(s):  
Anthony Scime ◽  
Karthik Rajasethupathy ◽  
Kulathur S. Rajasethupathy ◽  
Gregg R. Murray

Data mining is a collection of algorithms for finding interesting and unknown patterns or rules in data. However, different algorithms can result in different rules from the same data. The process presented here exploits these differences to find particularly robust, consistent, and noteworthy rules among much larger potential rule sets. More specifically, this research focuses on using association rules and classification mining to select the persistently strong association rules. Persistently strong association rules are association rules that are verifiable by classification mining the same data set. The process for finding persistent strong rules was executed against two data sets obtained from the American National Election Studies. Analysis of the first data set resulted in one persistent strong rule and one persistent rule, while analysis of the second data set resulted in 11 persistent strong rules and 10 persistent rules. The persistent strong rule discovery process suggests these rules are the most robust, consistent, and noteworthy among the much larger potential rule sets.


2014 ◽  
Vol 543-547 ◽  
pp. 2040-2044
Author(s):  
Yan Bo Wang

With the rapid development of network and database technology, data need to be processed massively increased, how to carry out effective data mining is a serious problem. The mature development of granular computing algorithm provides new ideas and new methods to study for data mining. Association rules of granular computing can reduce the number of object scanning data set, and improve the efficiency of the algorithm. In this paper we introduce the data source, classification, technology, system structure, operation process, application in other areas of data mining technology. Based on association rules of granular computing, data mining technology can provide quantitative basis for enterprise in screening assessment, so the service object has a stronger competitive advantage and focus more on its problems.


2021 ◽  
Vol 256 ◽  
pp. 02040
Author(s):  
Chunlei Zhou ◽  
Xinwei Dong ◽  
Liang Ji ◽  
Bijun Zhang ◽  
Zhongping Xu ◽  
...  

The traditional data mining algorithm focuses too much on a single dimension of data time or space, ignoring the association between time and space, which leads to a large amount of computation and low processing efficiency of the mining algorithm and makes it difficult to guarantee the final data mining effect. In response to the above problems, a hierarchical mining algorithm based on association rules for high-dimensional spatio-temporal big data is proposed. Based on the traditional association rules, after establishing the association rules of spatio-temporal data, the data to be mined are cleaned for redundancy. After selecting the local linear embedding algorithm to reduce the dimensionality of the data, a hierarchical mining strategy is developed to realize high-dimensional spatio-temporal big data mining by searching frequent predicates to form a spatio-temporal transaction database. The simulation experiment results verify that the algorithm has high complexity and can effectively reduce the processing volume, which can improve the processing efficiency by at least 56.26% compared with other algorithms.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Sha Duan ◽  
Ziwei Wang

In the digital information age, data mining technology is becoming more widely used in libraries for its useful impact. In the context of big data, how to efficiently mine big data, extract features, and provide users with high-quality personalized service is one of the important issues that needs to be solved in the current university library big data application. Brain computing is a kind of comprehensive processing behavior of the human brain simulated by the computer, which can comprehensively analyze a variety of information and play a very good guiding role in processing library service behavior. This paper briefly introduces the related concepts and algorithms of data mining technology and deeply studies the classical algorithm of association rules, namely, Apriori algorithm, which analyzes the necessity and feasibility of applying data mining technology to university library management. The design idea and functional goal of the college book intelligent recommendation system are based on the decision tree method and association rule analysis method. Through the application research of data mining technology in the personalized service of the university library, combined with the actual work, this paper proposes data mining of association rules in the university library system. The research further elaborates on the system architecture, data processing, mining implementation algorithms, and application of mining results. The experimental results of the research have certain significance for the university library to explore personalized services, provide book recommendation services, and make corresponding decisions to optimize the library’s collection layout.


2021 ◽  
Author(s):  
Bin Wu ◽  
Yimin Mao ◽  
Deborah Simon Mwakapesa ◽  
Yaser Ahangari Nanehkaran ◽  
Qianhu Deng ◽  
...  

Abstract AR (Association rule) is considered to be one of the models for data mining. With the growth of datasets, conventional association rules are not suitable for big data mining, which has aroused a large number of scholars' interest in algorithm innovation. This study aims to design an optimization parallel association rules mining algorithm based on MapReduce, named as PMRARIM-IEG algorithm, to deal with problems such as the excessive space occupied by the CanTree (CanTreeCanonical order Tree), the inability to dynamically set the support threshold, and the time-consuming data transmission in the Map and Reduce phases. Firstly, a structure called SIM-IE (similar items merging based on information entropy) strategy is adopted for reducing the space occupation of the CanTree effectively. Then, a DST-GA (dynamic support threshold obtaining using genetic algorithm) is proposed to obtain the relatively optimal dynamic support threshold in the big data environment. Finally, in the process of MapReduce parallel, a LZO (Lempel-Ziv-Oberhumer) data compression strategy is used to compress the output data of the Map stage, which improves the speed of the data transmission. We compared the PMRARIM-IEG algorithm with other algorithms on five datasets, including Wikipedia , LiveJournal, com-amazon, kosarak, and webdocs. The experimental results obtained demonstrate that the proposed algorithm, PMRARIM-IEG, not only reduces the space and time complexity, but also obtains a well-performing speed-up ratio in a big data environment.


Sign in / Sign up

Export Citation Format

Share Document