Analysis study on R-Eclat algorithm in infrequent itemsets mining

There are rising interests in developing techniques for data mining. One of the important subfield in data mining is itemset mining, which consists of discovering appealing and useful patterns in transaction databases. In a big data environment, the problem of mining infrequent itemsets becomes more complicated when dealing with a huge dataset. Infrequent itemsets mining may provide valuable information in the knowledge mining process. The current basic algorithms that widely implemented in infrequent itemset mining are derived from Apriori and FP-Growth. The use of Eclat-based in infrequent itemset mining has not yet been extensively exploited. This paper addresses the discovery of infrequent itemsets mining from the transactional database based on Eclat algorithm. To address this issue, the minimum support measure is defined as a weighted frequency of occurrence of an itemsets in the analysed data. Preliminary experimental results illustrate that Eclat-based algorithm is more efficient in mining dense data as compared to sparse data.

Download Full-text

Overview of Tourism Data Mining in Big Data Environment

Proceedings of the 2016 7th International Conference on Education, Management, Computer and Medicine (EMCM 2016) ◽

10.2991/emcm-16.2017.208 ◽

2017 ◽

Author(s):

Wenjie Xiao ◽

Changguo Xiang

Keyword(s):

Data Mining ◽

Big Data ◽

Data Environment

Download Full-text

The Importance of Data Mining Technology to University Research in the Big Data Environment

Modern Industrial IoT, Big Data and Supply Chain - Smart Innovation, Systems and Technologies ◽

10.1007/978-981-33-6141-6_26 ◽

2021 ◽

pp. 249-254

Author(s):

Deng Meiling ◽

Lei Guiping

Keyword(s):

Data Mining ◽

Big Data ◽

University Research ◽

Mining Technology ◽

Data Environment

Download Full-text

Mining of top-k high utility itemsets with negative utility

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201357 ◽

2020 ◽

pp. 1-16

Author(s):

Rui Sun ◽

Meng Han ◽

Chunyan Zhang ◽

Mingyao Shen ◽

Shiyu Du

Keyword(s):

Data Mining ◽

Search Space ◽

Experimental Results ◽

Effective Algorithm ◽

Memory Usage ◽

Utility Value ◽

Itemset Mining ◽

High Utility ◽

High Utility Itemsets

High utility itemset mining(HUIM) with negative utility is an emerging data mining task. However, the setting of the minimum utility threshold is always a challenge when mining high utility itemsets(HUIs) with negative items. Although the top-k HUIM method is very common, this method can only mine itemsets with positive items, and the problem of missing itemsets occurs when mining itemsets with negative items. To solve this problem, we first propose an effective algorithm called THN (Top-k High Utility Itemset Mining with Negative Utility). It proposes a strategy for automatically increasing the minimum utility threshold. In order to solve the problem of multiple scans of the database, it uses transaction merging and dataset projection technology. It uses a redefined sub-tree utility value and a redefined local utility value to prune the search space. Experimental results on real datasets show that THN is efficient in terms of runtime and memory usage, and has excellent scalability. Moreover, experiments show that THN performs particularly well on dense datasets.

Download Full-text

Image Monitoring and Management of Hot Tourism Destination Based on Data Mining Technology in Big Data Environment

Microprocessors and Microsystems ◽

10.1016/j.micpro.2020.103515 ◽

2021 ◽

Vol 80 ◽

pp. 103515

Author(s):

Jian Zhang ◽

Liyuan Dong

Keyword(s):

Data Mining ◽

Big Data ◽

Tourism Destination ◽

Mining Technology ◽

Image Monitoring ◽

Data Environment

Download Full-text

Target Customer Selection Method Based on Data Mining in Big Data Environment

2017 International Conference on Smart Grid and Electrical Automation (ICSGEA) ◽

10.1109/icsgea.2017.86 ◽

2017 ◽

Author(s):

Jicheng Li ◽

Xinyue Huang

Keyword(s):

Data Mining ◽

Big Data ◽

Selection Method ◽

Data Environment

Download Full-text

Research of network data mining based on reliability source under big data environment

Neural Computing and Applications ◽

10.1007/s00521-016-2349-x ◽

2016 ◽

Vol 28 (S1) ◽

pp. 327-335 ◽

Cited By ~ 2

Author(s):

Jinhai Li ◽

Youshi He ◽

Yunlei Ma

Keyword(s):

Data Mining ◽

Big Data ◽

Network Data ◽

Data Environment

Download Full-text

Information Visualization from the Perspective of Big Data Analysis and Fusion

Scientific Programming ◽

10.1155/2021/8934632 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Xiang Lin

Keyword(s):

Big Data ◽

Data Analysis ◽

Information Visualization ◽

Big Data Analysis ◽

Experimental Results ◽

Data Sources ◽

Visualization Technique ◽

Data Volume ◽

Data Environment ◽

Information Association

In the big data environment, the visualization technique has been increasingly adopted to mine the data on library and information (L&I), with the diversification of data sources and the growth of data volume. The previous research into the information association of L&I visualization network rarely tries to construct such a network or explore the information association of the network. To overcome these defects, this paper explores the visualization of L&I from the perspective of big data analysis and fusion. Firstly, the authors analyzed the topology of the L&I visualization network and calculated the metrics for the construction of L&I visualization topology map. Next, the importance of meta-paths of the L&I visualization network was calculated. Finally, a complex big data L&I visualization network was established, and the associations between information nodes were analyzed in detail. Experimental results verify the effectiveness of the proposed algorithm.

Download Full-text

DEVELOPING A PARALLEL CLASSIFIER FOR MINING IN BIG DATA SETS

IIUM Engineering Journal ◽

10.31436/iiumej.v22i2.1541 ◽

2021 ◽

Vol 22 (2) ◽

pp. 119-134

Author(s):

Ahad Shamseen ◽

Morteza Mohammadi Zanjireh ◽

Mahdi Bahaghighat ◽

Qin Xin

Keyword(s):

Data Mining ◽

Big Data ◽

Decision Tree ◽

Main Memory ◽

Experimental Results ◽

Primary Data ◽

Data Sets ◽

Decision Tree Classifier ◽

Vast Amount ◽

Tree Classifier

Data mining is the extraction of information and its roles from a vast amount of data. This topic is one of the most important topics these days. Nowadays, massive amounts of data are generated and stored each day. This data has useful information in different fields that attract programmers’ and engineers’ attention. One of the primary data mining classifying algorithms is the decision tree. Decision tree techniques have several advantages but also present drawbacks. One of its main drawbacks is its need to reside its data in the main memory. SPRINT is one of the decision tree builder classifiers that has proposed a fix for this problem. In this paper, our research developed a new parallel decision tree classifier by working on SPRINT results. Our experimental results show considerable improvements in terms of the runtime and memory requirements compared to the SPRINT classifier. Our proposed classifier algorithm could be implemented in serial and parallel environments and can deal with big data. ABSTRAK: Perlombongan data adalah pengekstrakan maklumat dan peranannya dari sejumlah besar data. Topik ini adalah salah satu topik yang paling penting pada masa ini. Pada masa ini, data yang banyak dihasilkan dan disimpan setiap hari. Data ini mempunyai maklumat berguna dalam pelbagai bidang yang menarik perhatian pengaturcara dan jurutera. Salah satu algoritma pengkelasan perlombongan data utama adalah pokok keputusan. Teknik pokok keputusan mempunyai beberapa kelebihan tetapi kekurangan. Salah satu kelemahan utamanya adalah keperluan menyimpan datanya dalam memori utama. SPRINT adalah salah satu pengelasan pembangun pokok keputusan yang telah mengemukakan untuk masalah ini. Dalam makalah ini, penyelidikan kami sedang mengembangkan pengkelasan pokok keputusan selari baru dengan mengusahakan hasil SPRINT. Hasil percubaan kami menunjukkan peningkatan yang besar dari segi jangka masa dan keperluan memori berbanding dengan pengelasan SPRINT. Algoritma pengklasifikasi yang dicadangkan kami dapat dilaksanakan dalam persekitaran bersiri dan selari dan dapat menangani data besar.

Download Full-text

Research on Execution of Civil Servants and Professional Ethics based on Data Mining Technique and Joint Modeling Analysis of Multiple Factors under Big Data Environment

International Journal of u- and e- Service Science and Technology ◽

10.14257/ijunesst.2016.9.5.23 ◽

2016 ◽

Vol 9 (5) ◽

pp. 257-270

Author(s):

Yang Du ◽

Hongwei Wang

Keyword(s):

Data Mining ◽

Big Data ◽

Professional Ethics ◽

Joint Modeling ◽

Civil Servants ◽

Data Mining Technique ◽

Multiple Factors ◽

Mining Technique ◽

Modeling Analysis ◽

Data Environment

Download Full-text

RECURSIVE JOIN PROCESSING IN BIG DATA ENVIRONMENT

Journal of Computer Science and Cybernetics ◽

10.15625/1813-9663/37/2/15889 ◽

2021 ◽

Vol 37 (2) ◽

pp. 107-122

Author(s):

Anh-Cang Phan ◽

Thanh-Ngoan Trieu ◽

Thuong-Cang Phan

Keyword(s):

Big Data ◽

Large Scale ◽

Large Datasets ◽

Experimental Results ◽

Hierarchical Data ◽

Efficient Approach ◽

Intermediate Data ◽

Incremental Computation ◽

Data Environment ◽

Over Time

In the era of information explosion, Big data is receiving increased attention as having important implications for growth, profitability, and survival of modern organizations. However, it also offers many challenges in the way data is processed and queried over time. A join operation is one of the most common operations appearing in many data queries. Specially, a recursive join is a join type used to query hierarchical data but it is more extremely complex and costly. The evaluation of the recursive join in MapReduce includes some iterations of two tasks of a join task and an incremental computation task. Those tasks are significantly expensive and reduce the performance of queries in large datasets because they generate plenty of intermediate data transmitting over the network. In this study, we thus propose a simple but efficient approach for Big recursive joins based on reducing by half the number of the required iterations in the Spark environment. This improvement leads to significantly reducing the number of the required tasks as well as the amount of the intermediate data generated and transferred over the network. Our experimental results show that an improved recursive join is more efficient and faster than a traditional one on large-scale datasets.

Download Full-text