Budget Constraint Scheduler for Big Data Using Hadoop MapReduce

D. C. Vinutha; G. T. Raju

doi:10.1007/s42979-021-00638-0

Computational storage: an efficient and scalable platform for big data and HPC applications

Journal Of Big Data ◽

10.1186/s40537-019-0265-5 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 2

Author(s):

Mahdi Torabzadehkashi ◽

Siavash Rezaei ◽

Ali HeydariGorji ◽

Hosein Bobarshad ◽

Vladimir Alves ◽

...

Keyword(s):

Big Data ◽

High Performance ◽

Distributed Processing ◽

Data Access ◽

Distributed Applications ◽

Process Data ◽

Storage Devices ◽

Hadoop Mapreduce ◽

Big Data Applications ◽

Application Processor

AbstractIn the era of big data applications, the demand for more sophisticated data centers and high-performance data processing mechanisms is increasing drastically. Data are originally stored in storage systems. To process data, application servers need to fetch them from storage devices, which imposes the cost of moving data to the system. This cost has a direct relation with the distance of processing engines from the data. This is the key motivation for the emergence of distributed processing platforms such as Hadoop, which move process closer to data. Computational storage devices (CSDs) push the “move process to data” paradigm to its ultimate boundaries by deploying embedded processing engines inside storage devices to process data. In this paper, we introduce Catalina, an efficient and flexible computational storage platform, that provides a seamless environment to process data in-place. Catalina is the first CSD equipped with a dedicated application processor running a full-fledged operating system that provides filesystem-level data access for the applications. Thus, a vast spectrum of applications can be ported for running on Catalina CSDs. Due to these unique features, to the best of our knowledge, Catalina CSD is the only in-storage processing platform that can be seamlessly deployed in clusters to run distributed applications such as Hadoop MapReduce and HPC applications in-place without any modifications on the underlying distributed processing framework. For the proof of concept, we build a fully functional Catalina prototype and a CSD-equipped platform using 16 Catalina CSDs to run Intel HiBench Hadoop and HPC benchmarks to investigate the benefits of deploying Catalina CSDs in the distributed processing environments. The experimental results show up to 2.2× improvement in performance and 4.3× reduction in energy consumption, respectively, for running Hadoop MapReduce benchmarks. Additionally, thanks to the Neon SIMD engines, the performance and energy efficiency of DFT algorithms are improved up to 5.4× and 8.9×, respectively.

Download Full-text

Benchmarking Apache Spark and Hadoop MapReduce on Big Data Classification

10.1145/3481646.3481649 ◽

2021 ◽

Author(s):

Taha Tekdogan ◽

Ali Cakmak

Keyword(s):

Big Data ◽

Data Classification ◽

Apache Spark ◽

Hadoop Mapreduce ◽

Big Data Classification

Download Full-text

SAIR: significance-aware approach to improve QoR of big data processing in case of budget constraint

The Journal of Supercomputing ◽

10.1007/s11227-019-02797-7 ◽

2019 ◽

Vol 75 (9) ◽

pp. 5760-5781 ◽

Cited By ~ 1

Author(s):

Hossein Ahmadvand ◽

Maziar Goudarzi

Keyword(s):

Big Data ◽

Data Processing ◽

Budget Constraint ◽

Big Data Processing

Download Full-text

Applying compression algorithms on hadoop cluster implementing through apache tez and hadoop mapreduce

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.26.12539 ◽

2018 ◽

Vol 7 (2.26) ◽

pp. 80

Author(s):

Dr E. Laxmi Lydia ◽

M Srinivasa Rao

Keyword(s):

Big Data ◽

Execution Time ◽

Big Data Analytics ◽

Research Area ◽

Word Count ◽

Hadoop Mapreduce ◽

Interactive Query ◽

Hadoop Distributed File System ◽

Hadoop Cluster ◽

Compressed Data

The latest and famous subject all over the cloud research area is Big Data; its main appearances are volume, velocity and variety. The characteristics are difficult to manage through traditional software and their various available methodologies. To manage the data which is occurring from various domains of big data are handled through Hadoop, which is open framework software which is mainly developed to provide solutions. Handling of big data analytics is done through Hadoop Map Reduce framework and it is the key engine of hadoop cluster and it is extensively used in these days. It uses batch processing system.Apache developed an engine named "Tez", which supports interactive query system and it won't writes any temporary data into the Hadoop Distributed File System(HDFS).The paper mainly focuses on performance juxtaposition of MapReduce and TeZ, performance of these two engines are examined through the compression of input files and map output files. To compare two engines we used Bzip compression algorithm for the input files and snappy for the map out files. Word Count and Terasort gauge are used on our experiments. For the Word Count gauge, the results shown that Tez engine has better execution time than Hadoop MapReduce engine for the both compressed and non-compressed data. It has reduced the execution time nearly 39% comparing to the execution time of the Hadoop MapReduce engine. Correspondingly for the terasort gauge, the Tez engine has higher execution time than Hadoop MapReduce engine.

Download Full-text

Feasible study of K-mean and K-medoids for analysis of Hadoop mapreduce framework for big data

Communication and Computing Systems ◽

10.1201/9781315364094-187 ◽

2016 ◽

Author(s):

Subhash Chandra ◽

Deepak Motwani

Keyword(s):

Big Data ◽

Mapreduce Framework ◽

Hadoop Mapreduce ◽

Feasible Study

Download Full-text

Efficient big data processing in Hadoop MapReduce

Proceedings of the VLDB Endowment ◽

10.14778/2367502.2367562 ◽

2012 ◽

Vol 5 (12) ◽

pp. 2014-2015 ◽

Cited By ~ 113

Author(s):

Jens Dittrich ◽

Jorge-Arnulfo Quiané-Ruiz

Keyword(s):

Big Data ◽

Data Processing ◽

Big Data Processing ◽

Hadoop Mapreduce

Download Full-text

Clustering on Big Data Using Hadoop MapReduce

2015 International Conference on Computational Intelligence and Communication Networks (CICN) ◽

10.1109/cicn.2015.161 ◽

2015 ◽

Cited By ~ 4

Author(s):

Nadeem Akthar ◽

Mohd Vasim Ahamad ◽

Shahbaz Khan

Keyword(s):

Big Data ◽

Hadoop Mapreduce

Download Full-text

Optimizing the Performance of Big Data Workflows in Multi-cloud Environments Under Budget Constraint

2016 IEEE International Conference on Services Computing (SCC) ◽

10.1109/scc.2016.25 ◽

2016 ◽

Cited By ~ 10

Author(s):

Chase Q. Wu ◽

Huiyan Cao

Keyword(s):

Big Data ◽

Budget Constraint ◽

Cloud Environments ◽

Multi Cloud

Download Full-text

High Performance Computation of Big Data: Performance Optimization Approach towards a Parallel Frequent Item Set Mining Algorithm for Transaction Data based on Hadoop MapReduce Framework

International Journal of Intelligent Systems and Applications ◽

10.5815/ijisa.2017.01.08 ◽

2017 ◽

Vol 9 (1) ◽

pp. 75-84 ◽

Cited By ~ 1

Author(s):

Guru Prasad M S ◽

Nagesh H R ◽

Swathi Prabhu

Keyword(s):

Big Data ◽

Performance Optimization ◽

High Performance ◽

Optimization Approach ◽

Mapreduce Framework ◽

Transaction Data ◽

Hadoop Mapreduce ◽

Frequent Item ◽

Mining Algorithm ◽

High Performance Computation

Download Full-text

Similarity Measurement Technique for Measuring the Performance of Page Rank Algorithm Based on Hadoop

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.e6843.018520 ◽

2020 ◽

Vol 8 (5) ◽

pp. 4712-4717

Keyword(s):

Big Data ◽

Search Engine ◽

Measurement Technique ◽

Similarity Measurement ◽

Web Pages ◽

Ranking Algorithm ◽

Web Data ◽

Mapreduce Framework ◽

Page Rank ◽

Hadoop Mapreduce

In this century big data manipulation is a challenging task in the field of web mining because content of web data is massively increasing day by day. Using search engine retrieving efficient, relevant and meaningful information from massive amount of Web Data is quite impossible. Different search engine uses different ranking algorithm to retrieve relevant information easily. A new page ranking algorithm is presented based on synonymous word count using Hadoop MapReduce framework named as Similarity Measurement Technique (SMT). Hadoop MapReduce framework is used to partition Big Data and provides a scalable, economical and easier way to process these data. It stores intermediate result for running iterative jobs in the local disk. In this algorithm, SMT takes a query from user and parse it using Hadoop and calculate rank of web pages. For experimental purpose wiki data file have been used and applied page rank algorithm (PR), improvised page rank algorithm (IPR) and proposed SMT method to calculate page rank of all web pages and compare among these methods. Proposed method provides better scoring accuracy than other approaches and reduces theme drift problem.

Download Full-text