hadoop system Latest Research Papers

Currently, remote sensing is widely used in environmental monitoring applications, mostly air quality mapping and climate change supervision. However, satellite sensors occur massive volumes of data in near-real-time, stored in multiple formats and are provided with high velocity and variety. Besides, the processing of satellite big data is challenging. Thus, this study aims to approve that satellite data are big data and proposes a new big data architecture for satellite data processing. The developed software is enabling an efficient remote sensing big data ingestion and preprocessing. As a result, the experiment results show that 86 percent of the unnecessary daily files are discarded with a data cleansing of 20 percent of the erroneous and inaccurate plots. The final output is integrated into the Hadoop system, especially the HDFS, HBase, and Hive, for extra calculation and processing.

Download Full-text

Replica-aware data recovery performance improvement for Hadoop system with NVM

CCF Transactions on High Performance Computing ◽

10.1007/s42514-021-00066-9 ◽

2021 ◽

Author(s):

Xin Li ◽

Huijie Li ◽

Youyou Lu ◽

Yanchao Zhao ◽

Xiaolin Qin

Keyword(s):

Performance Improvement ◽

Data Recovery ◽

Recovery Performance ◽

Hadoop System

Download Full-text

Choosing a Data Storage Format in the Apache Hadoop System Based on Experimental Evaluation Using Apache Spark

Symmetry ◽

10.3390/sym13020195 ◽

2021 ◽

Vol 13 (2) ◽

pp. 195 ◽

Cited By ~ 1

Author(s):

Vladimir Belov ◽

Andrey Tatarintsev ◽

Evgeny Nikulchev

Keyword(s):

Big Data ◽

Data Storage ◽

Experimental Evaluation ◽

Optimization Methods ◽

Apache Spark ◽

Apache Hadoop ◽

Second Stage ◽

Experimental Stand ◽

Storage Format ◽

Hadoop System

One of the most important tasks of any platform for big data processing is storing the data received. Different systems have different requirements for the storage formats of big data, which raises the problem of choosing the optimal data storage format to solve the current problem. This paper describes the five most popular formats for storing big data, presents an experimental evaluation of these formats and a methodology for choosing the format. The following data storage formats will be considered: avro, CSV, JSON, ORC, parquet. At the first stage, a comparative analysis of the main characteristics of the studied formats was carried out; at the second stage, an experimental evaluation of these formats was prepared and carried out. For the experiment, an experimental stand was deployed with tools for processing big data installed on it. The aim of the experiment was to find out characteristics of data storage formats, such as the volume and processing speed for different operations using the Apache Spark framework. In addition, within the study, an algorithm for choosing the optimal format from the presented alternatives was developed using tropical optimization methods. The result of the study is presented in the form of a technique for obtaining a vector of ratings of data storage formats for the Apache Hadoop system, based on an experimental assessment using Apache Spark.

Download Full-text

Using Hadoop Technology to Overcome Big Data Problems by Choosing Proposed Cost-efficient Scheduler Algorithm for Heterogeneous Hadoop System (BD3)

Journal of Scientific Research and Reports ◽

10.9734/jsrr/2020/v26i930310 ◽

2020 ◽

pp. 58-84

Author(s):

Abou_el_ela Abdou Hussein

Keyword(s):

Big Data ◽

Data Processing ◽

Data Storage ◽

Database Management System ◽

Data Sets ◽

Complex Data ◽

Daily Data ◽

Complex Data Sets ◽

Cost Efficient ◽

Hadoop System

Day by day advanced web technologies have led to tremendous growth amount of daily data generated volumes. This mountain of huge and spread data sets leads to phenomenon that called big data which is a collection of massive, heterogeneous, unstructured, enormous and complex data sets. Big Data life cycle could be represented as, Collecting (capture), storing, distribute, manipulating, interpreting, analyzing, investigate and visualizing big data. Traditional techniques as Relational Database Management System (RDBMS) couldn’t handle big data because it has its own limitations, so Advancement in computing architecture is required to handle both the data storage requisites and the weighty processing needed to analyze huge volumes and variety of data economically. There are many technologies manipulating a big data, one of them is hadoop. Hadoop could be understand as an open source spread data processing that is one of the prominent and well known solutions to overcome handling big data problem. Apache Hadoop was based on Google File System and Map Reduce programming paradigm. Through this paper we dived to search for all big data characteristics starting from first three V's that have been extended during time through researches to be more than fifty six V's and making comparisons between researchers to reach to best representation and the precise clarification of all big data V’s characteristics. We highlight the challenges that face big data processing and how to overcome these challenges using Hadoop and its use in processing big data sets as a solution for resolving various problems in a distributed cloud based environment. This paper mainly focuses on different components of hadoop like Hive, Pig, and Hbase, etc. Also we institutes absolute description of Hadoop Pros and cons and improvements to face hadoop problems by choosing proposed Cost-efficient Scheduler Algorithm for heterogeneous Hadoop system.

Download Full-text

Asynchronous Non-Blocking Algorithm to Handle Straggler Reduce Tasks in Hadoop System

International Journal on Advanced Science Engineering and Information Technology ◽

10.18517/ijaseit.10.5.9073 ◽

2020 ◽

Vol 10 (5) ◽

pp. 1913

Author(s):

Arwan A. Khoiruddin ◽

Nordin Zakaria ◽

Hitham Seddig Alhussian

Keyword(s):

Blocking Algorithm ◽

Hadoop System

Download Full-text

Development of a Holistic Prototype Hadoop System for Big Data Handling

Advances in Communication, Devices and Networking - Lecture Notes in Electrical Engineering ◽

10.1007/978-981-15-4932-8_52 ◽

2020 ◽

pp. 463-471

Author(s):

Manisha K. Gupta ◽

Md. Nadeem Akhtar Hasid ◽

Sourav Dhar ◽

H. S. Mruthyunjaya

Keyword(s):

Big Data ◽

Data Handling ◽

Hadoop System

Download Full-text

Performance Configuration Optimization of Hadoop System Based on Genetic Simulated Annealing Algorithm

Artificial Intelligence and Robotics Research ◽

10.12677/airr.2020.92015 ◽

2020 ◽

Vol 09 (02) ◽

pp. 131-139

Author(s):

婉莹包

Keyword(s):

Simulated Annealing ◽

Simulated Annealing Algorithm ◽

Configuration Optimization ◽

Genetic Simulated Annealing Algorithm ◽

Annealing Algorithm ◽

Hadoop System

Download Full-text

Integration of a Distributed Hadoop System into the Infrastructure of a Technology Startup Company

Izvestia Journal of the Union of Scientists - Varna. Economic Sciences Series ◽

10.36997/ijusv-ess/2020.9.2.76 ◽

2020 ◽

Vol 9 (2) ◽

pp. 76-84

Author(s):

Stefka Petrova ◽

◽

Svetoslav Ivanov ◽

Keyword(s):

Startup Company ◽

Hadoop System

Download Full-text

Hadoop Performance Analysis Model with Deep Data Locality

Information ◽

10.3390/info10070222 ◽

2019 ◽

Vol 10 (7) ◽

pp. 222 ◽

Cited By ~ 1

Author(s):

Sungchul Lee ◽

Ju-Yeon Jo ◽

Yoohwan Kim

Keyword(s):

Big Data ◽

Performance Analysis ◽

Data Locality ◽

Performance Model ◽

Data System ◽

Analysis Model ◽

Physical Test ◽

Data Movement ◽

Hadoop Distributed File System ◽

Hadoop System

Background: Hadoop has become the base framework on the big data system via the simple concept that moving computation is cheaper than moving data. Hadoop increases a data locality in the Hadoop Distributed File System (HDFS) to improve the performance of the system. The network traffic among nodes in the big data system is reduced by increasing a data-local on the machine. Traditional research increased the data-local on one of the MapReduce stages to increase the Hadoop performance. However, there is currently no mathematical performance model for the data locality on the Hadoop. Methods: This study made the Hadoop performance analysis model with data locality for analyzing the entire process of MapReduce. In this paper, the data locality concept on the map stage and shuffle stage was explained. Also, this research showed how to apply the Hadoop performance analysis model to increase the performance of the Hadoop system by making the deep data locality. Results: This research proved the deep data locality for increasing performance of Hadoop via three tests, such as, a simulation base test, a cloud test and a physical test. According to the test, the authors improved the Hadoop system by over 34% by using the deep data locality. Conclusions: The deep data locality improved the Hadoop performance by reducing the data movement in HDFS.

Download Full-text

hadoop system
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

AN EFFICIENT APPROACH FOR BIGDATA SECURITY BASED ON HADOOP SYSTEM USING CRYPTOGRAPHIC TECHNIQUES

Big data and remote sensing: A new software of ingestion

Replica-aware data recovery performance improvement for Hadoop system with NVM

Choosing a Data Storage Format in the Apache Hadoop System Based on Experimental Evaluation Using Apache Spark

Using Hadoop Technology to Overcome Big Data Problems by Choosing Proposed Cost-efficient Scheduler Algorithm for Heterogeneous Hadoop System (BD3)

Asynchronous Non-Blocking Algorithm to Handle Straggler Reduce Tasks in Hadoop System

Development of a Holistic Prototype Hadoop System for Big Data Handling

Performance Configuration Optimization of Hadoop System Based on Genetic Simulated Annealing Algorithm

Integration of a Distributed Hadoop System into the Infrastructure of a Technology Startup Company

Hadoop Performance Analysis Model with Deep Data Locality

Export Citation Format

hadoop systemRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

AN EFFICIENT APPROACH FOR BIGDATA SECURITY BASED ON HADOOP SYSTEM USING CRYPTOGRAPHIC TECHNIQUES

Big data and remote sensing: A new software of ingestion

Replica-aware data recovery performance improvement for Hadoop system with NVM

Choosing a Data Storage Format in the Apache Hadoop System Based on Experimental Evaluation Using Apache Spark

Using Hadoop Technology to Overcome Big Data Problems by Choosing Proposed Cost-efficient Scheduler Algorithm for Heterogeneous Hadoop System (BD3)

Asynchronous Non-Blocking Algorithm to Handle Straggler Reduce Tasks in Hadoop System

Development of a Holistic Prototype Hadoop System for Big Data Handling

Performance Configuration Optimization of Hadoop System Based on Genetic Simulated Annealing Algorithm

Integration of a Distributed Hadoop System into the Infrastructure of a Technology Startup Company

Hadoop Performance Analysis Model with Deep Data Locality

hadoop system
Recently Published Documents