Data Analytic Models That Redress the Limitations of MapReduce

The amount of data in today’s world is increasing exponentially. Effectively analyzing Big Data is a very complex task. The MapReduce programming model created by Google in 2004 revolutionized the big-data comput-ing market. Nowadays the model is being used by many for scientific and research analysis as well as for commercial purposes. The MapReduce model however is quite a low-level progamming model and has many limitations. Active research is being undertaken to make models that overcome/remove these limitations. In this paper we have studied some popular data analytic models that redress some of the limitations of MapReduce; namely ASTERIX and Pregel (Giraph) We discuss these models briefly and through the discussion highlight how these models are able to overcome MapReduce’s limitations.

Download Full-text

A Survey on Job Scheduling in Big Data

Cybernetics and Information Technologies ◽

10.1515/cait-2016-0033 ◽

2016 ◽

Vol 16 (3) ◽

pp. 35-51 ◽

Cited By ~ 1

Author(s):

M. Senthilkumar ◽

P. Ilango

Keyword(s):

Big Data ◽

Job Scheduling ◽

Research Area ◽

Data Locality ◽

Distributed Data ◽

Distributed Data Processing ◽

Big Data Applications ◽

Mapreduce Model ◽

Active Research ◽

And Behavior

Abstract Big Data Applications with Scheduling becomes an active research area in last three years. The Hadoop framework becomes very popular and most used frameworks in a distributed data processing. Hadoop is also open source software that allows the user to effectively utilize the hardware. Various scheduling algorithms of the MapReduce model using Hadoop vary with design and behavior, and are used for handling many issues like data locality, awareness with resource, energy and time. This paper gives the outline of job scheduling, classification of the scheduler, and comparison of different existing algorithms with advantages, drawbacks, limitations. In this paper, we discussed various tools and frameworks used for monitoring and the ways to improve the performance in MapReduce. This paper helps the beginners and researchers in understanding the scheduling mechanisms used in Big Data.

Download Full-text

Comprehend the Performance of MapReduce Programming model for K-Means algorithm on Hadoop Cluster

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.i8129.078919 ◽

2019 ◽

Vol 8 (9) ◽

pp. 1600-1603

Keyword(s):

Big Data ◽

Programming Model ◽

Primary Memory ◽

Model Based ◽

Considerable Research ◽

Memory Size ◽

Input Size ◽

Mapreduce Model ◽

Hadoop Cluster ◽

The Impact

MapReduce is a programming model used for processing Big Data. There are had been considerable research in improvement of performance of MapReduce model. This paper examines performance of MapReduce model based on K Means algorithm inside the Hadoop cluster. Different input size had been taken on various configurations to discover the impact of CPU cores and primary memory size. Results of this evaluation had been shown that the number of cores had maximum impact of the performance of MapReduce model.

Download Full-text

Handling Data Skew in MapReduce Cluster by Using Partition Tuning

Journal of Healthcare Engineering ◽

10.1155/2017/1425102 ◽

2017 ◽

Vol 2017 ◽

pp. 1-12 ◽

Cited By ~ 9

Author(s):

Yufei Gao ◽

Yanjie Zhou ◽

Bing Zhou ◽

Lei Shi ◽

Jiacai Zhang

Keyword(s):

Big Data ◽

Data Analytics ◽

Programming Model ◽

Big Data Analytics ◽

Rule Mining ◽

Data Skew ◽

Healthcare Data ◽

Tuning Method ◽

Mapreduce Model ◽

The One

The healthcare industry has generated large amounts of data, and analyzing these has emerged as an important problem in recent years. The MapReduce programming model has been successfully used for big data analytics. However, data skew invariably occurs in big data analytics and seriously affects efficiency. To overcome the data skew problem in MapReduce, we have in the past proposed a data processing algorithm called Partition Tuning-based Skew Handling (PTSH). In comparison with the one-stage partitioning strategy used in the traditional MapReduce model, PTSH uses a two-stage strategy and the partition tuning method to disperse key-value pairs in virtual partitions and recombines each partition in case of data skew. The robustness and efficiency of the proposed algorithm were tested on a wide variety of simulated datasets and real healthcare datasets. The results showed that PTSH algorithm can handle data skew in MapReduce efficiently and improve the performance of MapReduce jobs in comparison with the native Hadoop, Closer, and locality-aware and fairness-aware key partitioning (LEEN). We also found that the time needed for rule extraction can be reduced significantly by adopting the PTSH algorithm, since it is more suitable for association rule mining (ARM) on healthcare data.

Download Full-text

A Review on Big Data Analytics in Business

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset21841130 ◽

2018 ◽

pp. 210-214

Author(s):

Manbir Sandhu ◽

Purnima, Anuradha Saini

Keyword(s):

Big Data ◽

Big Data Analytics ◽

Smart Devices ◽

Data Streaming ◽

Huge Amount ◽

Business Units ◽

Enormous Amount ◽

Tools And Techniques ◽

Heath Care ◽

Data Analytic

Big data is a fast-growing technology that has the scope to mine huge amount of data to be used in various analytic applications. With large amount of data streaming in from a myriad of sources: social media, online transactions and ubiquity of smart devices, Big Data is practically garnering attention across all stakeholders from academics, banking, government, heath care, manufacturing and retail. Big Data refers to an enormous amount of data generated from disparate sources along with data analytic techniques to examine this voluminous data for predictive trends and patterns, to exploit new growth opportunities, to gain insight, to make informed decisions and optimize processes. Data-driven decision making is the essence of business establishments. The explosive growth of data is steering the business units to tap the potential of Big Data to achieve fueling growth and to achieve a cutting edge over their competitors. The overwhelming generation of data brings with it, its share of concerns. This paper discusses the concept of Big Data, its characteristics, the tools and techniques deployed by organizations to harness the power of Big Data and the daunting issues that hinder the adoption of Business Intelligence in Big Data strategies in organizations.

Download Full-text

Scalable and Flexible Big Data Analytic Framework (SFBAF) For Big Data Processing and Knowledge Extraction

International Conference on Engineering Technologies and Big Data Analytics (ETBDA’2016) Jan. 21-22, 2016 Bangkok (Thailand) ◽

10.15242/iie.e0116024 ◽

2016 ◽

Cited By ~ 1

Keyword(s):

Big Data ◽

Data Processing ◽

Knowledge Extraction ◽

Analytic Framework ◽

Big Data Processing ◽

Data Analytic

Download Full-text

High-Level Parallel Ant Colony Optimization with Algorithmic Skeletons

International Journal of Parallel Programming ◽

10.1007/s10766-021-00714-1 ◽

2021 ◽

Author(s):

Breno A. de Melo Menezes ◽

Nina Herrmann ◽

Herbert Kuchen ◽

Fernando Buarque de Lima Neto

Keyword(s):

Ant Colony Optimization ◽

High Performance ◽

Optimization Problems ◽

Programming Model ◽

Parallel Implementation ◽

Ant Colony ◽

Algorithmic Skeletons ◽

Low Level ◽

Programming Patterns ◽

High Level

AbstractParallel implementations of swarm intelligence algorithms such as the ant colony optimization (ACO) have been widely used to shorten the execution time when solving complex optimization problems. When aiming for a GPU environment, developing efficient parallel versions of such algorithms using CUDA can be a difficult and error-prone task even for experienced programmers. To overcome this issue, the parallel programming model of Algorithmic Skeletons simplifies parallel programs by abstracting from low-level features. This is realized by defining common programming patterns (e.g. map, fold and zip) that later on will be converted to efficient parallel code. In this paper, we show how algorithmic skeletons formulated in the domain specific language Musket can cope with the development of a parallel implementation of ACO and how that compares to a low-level implementation. Our experimental results show that Musket suits the development of ACO. Besides making it easier for the programmer to deal with the parallelization aspects, Musket generates high performance code with similar execution times when compared to low-level implementations.

Download Full-text

Contingency Analysis of Power System using Big Data Analytic Techniques

2020 5th International Conference on Computing, Communication and Security (ICCCS) ◽

10.1109/icccs49678.2020.9276796 ◽

2020 ◽

Author(s):

Ravi V Angadi ◽

Suresh Babu Daram ◽

P. S Venkataramu

Keyword(s):

Big Data ◽

Power System ◽

Contingency Analysis ◽

Data Analytic

Download Full-text

Big Data Analytic for Intrusion Detection System

2018 International Conference on Electronics, Control, Optimization and Computer Science (ICECOCS) ◽

10.1109/icecocs.2018.8610578 ◽

2018 ◽

Author(s):

Merieme Britel

Keyword(s):

Big Data ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Data Analytic

Download Full-text

PARALLEL COMPUTING OF NUMERICAL SCHEMES AND BIG DATA ANALYTIC FOR SOLVING REAL LIFE APPLICATIONS

Jurnal Teknologi ◽

10.11113/jt.v78.9552 ◽

2016 ◽

Vol 78 (8-2) ◽

Cited By ~ 2

Author(s):

Norma Alias ◽

Nadia Nofri Yeni Suhari ◽

Hafizah Farhah Saipan Saipol ◽

Abdullah Aysh Dahawi ◽

Masyitah Mohd Saidi ◽

...

Keyword(s):

Big Data ◽

Parallel Computing ◽

Parallel Algorithm ◽

Sparse Matrices ◽

Real Life ◽

Poor Performance ◽

Equation System ◽

Numerical Schemes ◽

Linear Equation System ◽

Data Analytic

This paper proposed the several real life applications for big data analytic using parallel computing software. Some parallel computing software under consideration are Parallel Virtual Machine, MATLAB Distributed Computing Server and Compute Unified Device Architecture to simulate the big data problems. The parallel computing is able to overcome the poor performance at the runtime, speedup and efficiency of programming in sequential computing. The mathematical models for the big data analytic are based on partial differential equations and obtained the large sparse matrices from discretization and development of the linear equation system. Iterative numerical schemes are used to solve the problems. Thus, the process of computational problems are summarized in parallel algorithm. Therefore, the parallel algorithm development is based on domain decomposition of problems and the architecture of difference parallel computing software. The parallel performance evaluations for distributed and shared memory architecture are investigated in terms of speedup, efficiency, effectiveness and temporal performance.

Download Full-text

Big data network security index correlation measure based on the fusion of modified two order cone programming model

International Journal of Internet Protocol Technology ◽

10.1504/ijipt.2021.113899 ◽

2021 ◽

Vol 14 (1) ◽

pp. 16

Author(s):

Xiaohai Lan

Keyword(s):

Big Data ◽

Network Security ◽

Programming Model ◽

Correlation Measure ◽

Data Network ◽

Cone Programming ◽

Security Index

Download Full-text