mapreduce model
Recently Published Documents


TOTAL DOCUMENTS

117
(FIVE YEARS 32)

H-INDEX

6
(FIVE YEARS 2)

2022 ◽  
Vol 16 (3) ◽  
pp. 1-26
Author(s):  
Jerry Chun-Wei Lin ◽  
Youcef Djenouri ◽  
Gautam Srivastava ◽  
Yuanfa Li ◽  
Philip S. Yu

High-utility sequential pattern mining (HUSPM) is a hot research topic in recent decades since it combines both sequential and utility properties to reveal more information and knowledge rather than the traditional frequent itemset mining or sequential pattern mining. Several works of HUSPM have been presented but most of them are based on main memory to speed up mining performance. However, this assumption is not realistic and not suitable in large-scale environments since in real industry, the size of the collected data is very huge and it is impossible to fit the data into the main memory of a single machine. In this article, we first develop a parallel and distributed three-stage MapReduce model for mining high-utility sequential patterns based on large-scale databases. Two properties are then developed to hold the correctness and completeness of the discovered patterns in the developed framework. In addition, two data structures called sidset and utility-linked list are utilized in the developed framework to accelerate the computation for mining the required patterns. From the results, we can observe that the designed model has good performance in large-scale datasets in terms of runtime, memory, efficiency of the number of distributed nodes, and scalability compared to the serial HUSP-Span approach.


Author(s):  
Uttama Garg

The amount of data in today’s world is increasing exponentially. Effectively analyzing Big Data is a very complex task. The MapReduce programming model created by Google in 2004 revolutionized the big-data comput-ing market. Nowadays the model is being used by many for scientific and research analysis as well as for commercial purposes. The MapReduce model however is quite a low-level progamming model and has many limitations. Active research is being undertaken to make models that overcome/remove these limitations. In this paper we have studied some popular data analytic models that redress some of the limitations of MapReduce; namely ASTERIX and Pregel (Giraph) We discuss these models briefly and through the discussion highlight how these models are able to overcome MapReduce’s limitations.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Jerry Chun-Wei Lin ◽  
Youcef Djenouri ◽  
Gautam Srivastava ◽  
Philippe Fournier-Viger

In recent years, HUIM (or a.k.a. high-utility itemset mining) can be seen as investigated in an extensive manner and studied in many applications especially in basket-market analysis and its relevant applications. Since current basket-market scenario also involves IoT equipment to collect information, i.e., sensor or smart devices, it is necessary to consider the mining of HUIs (or a.k.a. high-utility itemsets) in a large-scale database especially with IoT situations. First, a GA-based MapReduce model is presented in this work known as GMR-Miner for mining closed patterns with high utilization in large-scale databases. The k -means model is initially adopted to group transactions regarding their relevant correlation based on the frequency factor. A genetic algorithm (GA) is utilized in the developed MapReduce framework that can be used to explore the potential and possible candidates in a limited time. Also, the developed 3-tier MapReduce model can be easily deployed in Spark for the handlings of any database of large scale for knowledge discovery of closed patterns with high utilization. We created sets of extensive experimental environments for evaluating the results of the developed GMR-Miner compared to the well-known and state-of-the-art CLS-Miner. We present our in-depth results to show that the developed GMR-Miner outperforms CLS-Miner in many criteria, i.e., memory usage, scalability, and runtime.


2021 ◽  
Vol 11 (3) ◽  
pp. 190-198
Author(s):  
Sharafadeen Muhammad ◽  
Ibrahim Kabiru Dahiru ◽  
Ahmad Abubakar ◽  
Muhammad Sanusi Ibrahim

The emergence of large amount of data requires an efficient means of processing and storage facilities. Cloud computing provides an effective solution; MapReduce programming paradigm has the ability to handle such data by implementing Hadoop, but came up with some conflicting challenges in terms of Service Level Agreement (SLA) between major stakeholders. This paper focuses on coming up with a MapReduce model through system identification in order to address the requirement of the service time to meet-up the SLA within the limit of defined threshold in the presence of uncertainties in the system. A second order nonlinear model was obtained, which shows a good representation of the real system and could be used to develop control laws on the real system.


Electronics ◽  
2021 ◽  
Vol 10 (5) ◽  
pp. 532
Author(s):  
Lan Huang ◽  
Teng Gao ◽  
Dalin Li ◽  
Zihao Wang ◽  
Kangping Wang

FPGA has recently played an increasingly important role in heterogeneous computing, but Register Transfer Level design flows are not only inefficient in design, but also require designers to be familiar with the circuit architecture. High-level synthesis (HLS) allows developers to design FPGA circuits more efficiently with a more familiar programming language, a higher level of abstraction, and automatic adaptation of timing constraints. When using HLS tools, such as Xilinx Vivado HLS, specific design patterns and techniques are required in order to create high-performance circuits. Moreover, designing efficient concurrency and data flow structures requires a deep understanding of the hardware, imposing more learning costs on programmers. In this paper, we propose a set of functional patterns libraries based on the MapReduce model, implemented by C++ templates, which can quickly implement high-performance parallel pipelined computing models on FPGA with specified simple parameters. The usage of this pattern library allows flexible adaptation of parallel and flow structures in algorithms, which greatly improves the coding efficiency. The contributions of this paper are as follows. (1) Four standard functional operators suitable for hardware parallel computing are defined. (2) Functional concurrent programming patterns are described based on C++ templates and Xilinx HLS. (3) The efficiency of this programming paradigm is verified with two algorithms with different complexity.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
C. Lakshmi ◽  
K. UshaRani

PurposeResilient distributed processing technique (RDPT), in which mapper and reducer are simplified with the Spark contexts and support distributed parallel query processing.Design/methodology/approachThe proposed work is implemented with Pig Latin with Spark contexts to develop query processing in a distributed environment.FindingsQuery processing in Hadoop influences the distributed processing with the MapReduce model. MapReduce caters to the works on different nodes with the implementation of complex mappers and reducers. Its results are valid for some extent size of the data.Originality/valuePig supports the required parallel processing framework with the following constructs during the processing of queries: FOREACH; FLATTEN; COGROUP.


2021 ◽  
Vol 348 ◽  
pp. 01003
Author(s):  
Abdullayev Vugar Hacimahmud ◽  
Ragimova Nazila Ali ◽  
Khalilov Matlab Etibar

The volume of information in the 21st century is growing at a rapid pace. Big data technologies are used to process modern information. This article discusses the use of big data technologies to implement monitoring of social processes. Big data has its characteristics and principles, which reflect here. In addition, we also discussed big data applications in some areas. Particular attention in this article pays to the interactions of big data and sociology. For this, there consider digital sociology and computational social sciences. One of the main objects of study in sociology is social processes. The article shows the types of social processes and their monitoring. As an example, there is implemented monitoring of social processes at the university. There are used following technologies for the realization of social processes monitoring: products 1010data (1010edge, 1010connect, 1010reveal, 1010equities), products of Apache Software Foundation (Apache Hive, Apache Chukwa, Apache Hadoop, Apache Pig), MapReduce framework, language R, library Pandas, NoSQL, etc. Despite this, this article examines the use of the MapReduce model for social processes monitoring at the university.


IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Asmaa G. Seliem ◽  
Hesham F. A. Hamed ◽  
Wael Abouelwafa

Author(s):  
A. N. M. Bazlur Rashid ◽  
Tonmoy Choudhury

Real-word large-scale optimisation problems often result in local optima due to their large search space and complex objective function. Hence, traditional evolutionary algorithms (EAs) are not suitable for these problems. Distributed EA, such as a cooperative co-evolutionary algorithm (CCEA), can solve these problems efficiently. It can decompose a large-scale problem into smaller sub-problems and evolve them independently. Further, the CCEA population diversity avoids local optima. Besides, MapReduce, an open-source platform, provides a ready-to-use distributed, scalable, and fault-tolerant infrastructure to parallelise the developed algorithm using the map and reduce features. The CCEA can be distributed and executed in parallel using the MapReduce model to solve large-scale optimisations in less computing time. The effectiveness of CCEA, together with the MapReduce, has been proven in the literature for large-scale optimisations. This article presents the cooperative co-evolution, MapReduce model, and associated techniques suitable for large-scale optimisation problems.


Sign in / Sign up

Export Citation Format

Share Document