The MapReduce Model on Cascading Platform for Frequent Itemset Mining

Nur Rokhman; Amelia Nursanti

doi:10.22146/ijccs.34102

The MapReduce Model on Cascading Platform for Frequent Itemset Mining

IJCCS (Indonesian Journal of Computing and Cybernetics Systems) ◽

10.22146/ijccs.34102 ◽

2018 ◽

Vol 12 (2) ◽

pp. 149

Author(s):

Nur Rokhman ◽

Amelia Nursanti

Keyword(s):

Large Scale ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Programming Models ◽

Distributed Programming ◽

Itemset Mining ◽

Large Scale Data ◽

Mapreduce Model ◽

Large Scale Data Processing ◽

Scale Data

The implementation of parallel algorithms is very interesting research recently. Parallelism is very suitable to handle large-scale data processing. MapReduce is one of the parallel and distributed programming models. The implementation of parallel programming faces many difficulties. The Cascading gives easy scheme of Hadoop system which implements MapReduce model.Frequent itemsets are most often appear objects in a dataset. The Frequent Itemset Mining (FIM) requires complex computation. FIM is a complicated problem when implemented on large-scale data. This paper discusses the implementation of MapReduce model on Cascading for FIM. The experiment uses the Amazon dataset product co-purchasing network metadata.The experiment shows the fact that the simple mechanism of Cascading can be used to solve FIM problem. It gives time complexity O(n), more efficient than the nonparallel which has complexity O(n2/m).

Download Full-text

Efficient Large Scale Frequent Itemset Mining with Hybrid Partitioning Approach

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit1952206 ◽

2019 ◽

pp. 845-852

Author(s):

Priyanka R. ◽

Mohammed Ibrahim M. ◽

Ranjith Kumar M.

Keyword(s):

Large Scale ◽

Customer Segmentation ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Frequent Patterns ◽

Itemset Mining ◽

Large Scale Data ◽

Player Tracking ◽

Frequent Items ◽

Scale Data

In today’s world, voluminous data are available which are generated from various sources in various forms. Mining or analyzing this large scale data in an efficient way so as to make them useful for the mankind is difficult with the existing approaches. Frequent itemset mining is one such technique used for analyzing in many fields like finance, health care system where the main focus is gathering frequent patterns and grouping them to be meaningful inorder to gather useful insights from the data. Some major applications include customer segmentation in marketing, shopping cart analyses, management relationship, web usage mining, player tracking and so on. Many parallel algorithms, like Dist-Eclat Algorithm, Big FIM algorithm are available to perform large scale Frequent itemset mining. In Dist-Eclat algorithm, datasets are partitioned using Round Robin technique which uses a hybrid partitioning approach, which can improve the overall efficiency of the system. The system works as follows: Initially the data collected are distributed by mapreduce. Then the local frequent k-itmesets are computed using FP-Tree and sent to the map phase. Later the mining results are combined to the center node. Finally, global frequent itemsets are gathered by mapreduce. The proposed system is expected to improve in efficiency by using hybrid partitioning approach in the datasets based on the identification of frequent items.

Download Full-text

Frequent Itemset Mining in Large Datasets a Survey

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2017100103 ◽

2017 ◽

Vol 7 (4) ◽

pp. 37-49

Author(s):

Amrit Pal ◽

Manish Kumar

Keyword(s):

Large Scale ◽

Parallel Implementation ◽

Complete Information ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Data Parallel ◽

Itemset Mining ◽

Large Scale Data ◽

Day By Day ◽

Scale Data

Frequent Itemset Mining is a well-known area in data mining. Most of the techniques available for frequent itemset mining requires complete information about the data which can result in generation of the association rules. The amount of data is increasing day by day taking form of BigData, which require changes in the algorithms for working on such large-scale data. Parallel implementation of the mining techniques can provide solutions to this problem. In this paper a survey of frequent itemset mining techniques is done which can be used in a parallel environment. Programming models like Map Reduce provides efficient architecture for working with BigData, paper also provides information about issues and feasibility about technique to be implemented in such environment.

Download Full-text

GMiner: A fast GPU-based frequent itemset mining method for large-scale data

Information Sciences ◽

10.1016/j.ins.2018.01.046 ◽

2018 ◽

Vol 439-440 ◽

pp. 19-38 ◽

Cited By ~ 11

Author(s):

Kang-Wook Chon ◽

Sang-Hyun Hwang ◽

Min-Soo Kim

Keyword(s):

Large Scale ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Mining Method ◽

Itemset Mining ◽

Large Scale Data ◽

Scale Data

Download Full-text

Teaching large scale data processing

Proceedings of the 1st ACM Summit on Computing Education in China on First ACM Summit on Computing Education in China - SCE '08 ◽

10.1145/1517632.1517635 ◽

2008 ◽

Author(s):

Kang Chen ◽

Yubing Yin ◽

Weimin Zheng

Keyword(s):

Data Processing ◽

Large Scale ◽

Large Scale Data ◽

Large Scale Data Processing ◽

Scale Data

Download Full-text

Advanced monitoring techniques for a large‐scale data‐processing network

Campus-Wide Information Systems ◽

10.1108/10650740810921448 ◽

2008 ◽

Vol 25 (5) ◽

pp. 287-300 ◽

Cited By ~ 1

Author(s):

B. Martin ◽

A. Al‐Shabibi ◽

S.M. Batraneanu ◽

Ciobotaru ◽

G.L. Darlea ◽

...

Keyword(s):

Data Processing ◽

Large Scale ◽

Monitoring Techniques ◽

Large Scale Data ◽

Large Scale Data Processing ◽

Processing Network ◽

Scale Data

Download Full-text

Large scale data processing in real world: From analytics to predictions

2014 14th International Conference on Advances in ICT for Emerging Regions (ICTer) ◽

10.1109/icter.2014.7083870 ◽

2014 ◽

Author(s):

Srinath Perera

Keyword(s):

Data Processing ◽

Real World ◽

Large Scale ◽

Large Scale Data ◽

Large Scale Data Processing ◽

Scale Data

Download Full-text

BestPeer++: A Peer-to-Peer Based Large-Scale Data Processing Platform

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2012.236 ◽

2014 ◽

Vol 26 (6) ◽

pp. 1316-1331 ◽

Cited By ~ 6

Author(s):

Gang Chen ◽

Tianlei Hu ◽

Dawei Jiang ◽

Peng Lu ◽

Kian-Lee Tan ◽

...

Keyword(s):

Data Processing ◽

Large Scale ◽

Peer To Peer ◽

Large Scale Data ◽

Large Scale Data Processing ◽

Processing Platform ◽

Scale Data

Download Full-text

Medimate : Ailment Diffusion Control System with Real Time Large Scale Data Processing

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.31.18233 ◽

2018 ◽

Vol 7 (2.31) ◽

pp. 240

Author(s):

S Sujeetha ◽

Veneesa Ja ◽

K Vinitha ◽

R Suvedha

Keyword(s):

Control System ◽

Data Processing ◽

Real Time ◽

Large Scale ◽

Diffusion Control ◽

Qr Code ◽

Healthcare Applications ◽

Large Scale Data ◽

Large Scale Data Processing ◽

Scale Data

In the existing scenario, a patient has to go to the hospital to take necessary tests, consult a doctor and buy prescribed medicines or use specified healthcare applications. Hence time is wasted at hospitals and in medical shops. In the case of healthcare applications, face to face interaction with the doctor is not available. The downside of the existing scenario can be improved by the Medimate: Ailment diffusion control system with real time large scale data processing. The purpose of medimate is to establish a Tele Conference Medical System that can be used in remote areas. The medimate is configured for better diagnosis and medical treatment for the rural people. The system is installed with Heart Beat Sensor, Temperature Sensor, Ultrasonic Sensor and Load Cell to monitor the patient’s health parameters. The voice instructions are updated for easier access. The application for enabling video and voice communication with the doctor through Camera and Headphone is installed at both the ends. The doctor examines the patient and prescribes themedicines. The medical dispenser delivers medicine to the patient as per the prescription. The QR code will be generated for each prescription by medimate and that QR code can be used forthe repeated medical conditions in the future. Medical details are updated in the server periodically.

Download Full-text

Towards Heterogeneous Network Alignment: Design and Implementation of a Large-Scale Data Processing Framework

Lecture Notes in Computer Science - Euro-Par 2018: Parallel Processing Workshops ◽

10.1007/978-3-030-10549-5_54 ◽

2018 ◽

pp. 692-703 ◽

Cited By ~ 1

Author(s):

Marianna Milano ◽

Pierangelo Veltri ◽

Mario Cannataro ◽

Pietro H. Guzzi

Keyword(s):

Data Processing ◽

Heterogeneous Network ◽

Large Scale ◽

Network Alignment ◽

Design And Implementation ◽

Large Scale Data ◽

Large Scale Data Processing ◽

Scale Data ◽

Processing Framework

Download Full-text

Integration of large-scale data processing systems and traditional parallel database technology

Proceedings of the VLDB Endowment ◽

10.14778/3352063.3352145 ◽

2019 ◽

Vol 12 (12) ◽

pp. 2290-2299

Author(s):

Azza Abouzied ◽

Daniel J. Abadi ◽

Kamil Bajda-Pawlikowski ◽

Avi Silberschatz

Keyword(s):

Data Processing ◽

Large Scale ◽

Parallel Database ◽

Large Scale Data ◽

Database Technology ◽

Large Scale Data Processing ◽

Scale Data

Download Full-text