hadoop framework Latest Research Papers

In the past twenty years, we have witnessed an unprecedented production of data worldwide that has generated a growing demand for computing resources and has stimulated the design of computing paradigms and software tools to efficiently and quickly obtain insights on such a Big Data. State-of-the-art parallel computing techniques such as the MapReduce guarantee high performance in scenarios where involved computing nodes are equally sized and clustered via broadband network links, and the data are co-located with the cluster of nodes. Unfortunately, the mentioned techniques have proven ineffective in geographically distributed scenarios, i.e., computing contexts where nodes and data are geographically distributed across multiple distant data centers. In the literature, researchers have proposed variants of the MapReduce paradigm that obtain awareness of the constraints imposed in those scenarios (such as the imbalance of nodes computing power and of interconnecting links) to enforce smart task scheduling strategies. We have designed a hierarchical computing framework in which a context-aware scheduler orchestrates computing tasks that leverage the potential of the vanilla Hadoop framework within each data center taking part in the computation. In this work, after presenting the features of the developed framework, we advocate the opportunity of fragmenting the data in a smart way so that the scheduler produces a fairer distribution of the workload among the computing tasks. To prove the concept, we implemented a software prototype of the framework and ran several experiments on a small-scale testbed. Test results are discussed in the last part of the paper.

Download Full-text

Optimal Common Job Block Table (CJBT) to improve the Performance in Hadoop framework

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit217689 ◽

2021 ◽

pp. 346-350

Author(s):

Pinjari Vali Basha

Keyword(s):

Optimal Algorithm ◽

Structured Data ◽

Map Reduce ◽

Huge Amount ◽

Resource Saving ◽

Hadoop Distributed File System ◽

Two Phases ◽

The Cost ◽

Rapid Transformation ◽

Hadoop Framework

By rapid transformation of technology, huge amount of data (structured data and Un Structured data) is generated every day. With the aid of 5G technology and IoT the data generated and processed every day is very large. If we dig deeper the data generated approximately 2.5 quintillion bytes. This data (Big Data) is stored and processed with the help of Hadoop framework. Hadoop framework has two phases for storing and retrieve the data in the network. <ul> <li>Hadoop Distributed file System (HDFS)</li> <li>Map Reduce algorithm</li> </ul> In the native Hadoop framework, there are some limitations for Map Reduce algorithm. If the same job is repeated again then we have to wait for the results to carry out all the steps in the native Hadoop. This led to wastage of time, resources. If we improve the capabilities of Name node i.e., maintain Common Job Block Table (CJBT) at Name node will improve the performance. By employing Common Job Block Table will improve the performance by compromising the cost to maintain Common Job Block Table. Common Job Block Table contains the meta data of files which are repeated again. This will avoid re computations, a smaller number of computations, resource saving and faster processing. The size of Common Job Block Table will keep on increasing, there should be some limit on the size of the table by employing algorithm to keep track of the jobs. The optimal Common Job Block table is derived by employing optimal algorithm at Name node.

Download Full-text

Early detection of diabetic retinopathy from big data in hadoop framework

Displays ◽

10.1016/j.displa.2021.102061 ◽

2021 ◽

Vol 70 ◽

pp. 102061

Author(s):

Amartya Hatua ◽

Badri Narayan Subudhi ◽

Veerakumar T. ◽

Ashish Ghosh

Keyword(s):

Big Data ◽

Diabetic Retinopathy ◽

Early Detection ◽

Hadoop Framework

Download Full-text

Inter-Collateral Diabetic Retinopathy Extraction and Evaluation Using Trained Datasets of Multiple Feature Set

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i6.4036 ◽

2021 ◽

Vol 12 (6) ◽

pp. 1808-1814

Author(s):

K. T. Ilayarajaa, E. Logashanmugam

Keyword(s):

Diabetic Retinopathy ◽

Threshold Value ◽

True Positive ◽

Multiple Features ◽

Feature Evaluation ◽

Novel Approach ◽

Multiple Feature ◽

History Of ◽

Attribute Extraction ◽

Hadoop Framework

Diabetic Retinopathy (DR) is caused due to un-mounted diabetic comorbidities. The patients suffer complete vision blindness if untreated or diagnosed on later stage. In this article, we propose a novel approach for early detection and prediction using trained datasets of multiple features. The process expansion is resultant of multiple stage attribute extraction via a series of inter-collateral parameters of diabetics. Typically, the proposed technique is designed and developed on a multi-value and multi-dimension datasets such as comorbidities history of patient encountered during diabetics. The proposed technique uses collateral attributes in evaluating retinopathy status and thereby validates the extracted DR under threshold value comparisons. The results are computed using HADOOP framework for recursive pattern and feature evaluation. The trial is processed on UCL digital library datasets with estimated performance of 98.7% with extraction and 92.34% with value True-Positive (TP) prediction.

Download Full-text

Fuzzy high-utility pattern mining in parallel and distributed Hadoop framework

Information Sciences ◽

10.1016/j.ins.2020.12.004 ◽

2021 ◽

Vol 553 ◽

pp. 31-48

Author(s):

Jimmy Ming-Tai Wu ◽

Gautam Srivastava ◽

Min Wei ◽

Unil Yun ◽

Jerry Chun-Wei Lin

Keyword(s):

Pattern Mining ◽

High Utility ◽

Hadoop Framework

Download Full-text

An integrated multi-node Hadoop framework to predict high-risk factors of Diabetes Mellitus using a Multilevel MapReduce based Fuzzy Classifier (MMR-FC) and Modified DBSCAN algorithm

Applied Soft Computing ◽

10.1016/j.asoc.2021.107423 ◽

2021 ◽

pp. 107423

Author(s):

J. Ramsingh ◽

V. Bhuvaneswari

Keyword(s):

Diabetes Mellitus ◽

Risk Factors ◽

High Risk ◽

Fuzzy Classifier ◽

High Risk Factors ◽

Dbscan Algorithm ◽

Hadoop Framework

Download Full-text

YOUTUBE DATA ANALYSIS USING HADOOP FRAMEWORK

International Journal of Engineering Applied Sciences and Technology ◽

10.33564/ijeast.2021.v05i11.051 ◽

2021 ◽

Vol 5 (11) ◽

Author(s):

Ashwini T ◽

Sahana LM ◽

Mahalakshmi E ◽

Shweta S Padti

Keyword(s):

Big Data ◽

Application Programming Interface ◽

Structured Data ◽

Unstructured Data ◽

Distributed File System ◽

Hadoop Distributed File System ◽

Application Programming ◽

Programming Interface ◽

Hadoop Framework ◽

Social Media Tool

— Analysis of consistent and structured data has seen huge success in past decades. Where the analysis of unstructured data in the form of multimedia format remains a challenging task. YouTube is one of the most used and popular social media tool. The main aim of this paper is to analyze the data that is generated from YouTube that can be mined and utilized. API (Application Programming Interface) and going to be stored in Hadoop Distributed File System (HDFS). Dataset can be analyzed using MapReduce. Which is used to identify the video categories in which most number of videos are uploaded. The objective of this paper is to demonstrate Hadoop framework, to process and handle big data there are many components. In the existing method, big data can be analyzed and processed in multiple stages by using MapReduce. Due to huge space consumption of each job, Implementing iterative map reduce jobs is expensive. A Hive method is used to analyze the big data to overcome the drawbacks of existing methods, which is the state-ofthe-art method. The hive works by extracting the YouTube information by generating API (Application Programming Interface) key and uses the SQL queries.

Download Full-text

A Hierarchical Hadoop Framework to Handle Big Data in Geo-Distributed Computing Environments

Research Anthology on Architectures, Frameworks, and Integration Strategies for Distributed and Cloud Computing ◽

10.4018/978-1-7998-5339-8.ch031 ◽

2021 ◽

pp. 651-683

Author(s):

Orazio Tomarchio ◽

Giuseppe Di Modica ◽

Marco Cavallo ◽

Carmelo Polito

Keyword(s):

Big Data ◽

Data Centers ◽

Poor Performance ◽

Communication Technologies ◽

Sources Of Information ◽

The Poor ◽

The Social ◽

Computing Environments ◽

Definition Of ◽

Hadoop Framework

Advances in the communication technologies, along with the birth of new communication paradigms leveraging on the power of the social, has fostered the production of huge amounts of data. Old-fashioned computing paradigms are unfit to handle the dimensions of the data daily produced by the countless, worldwide distributed sources of information. So far, the MapReduce has been able to keep the promise of speeding up the computation over Big Data within a cluster. This article focuses on scenarios of worldwide distributed Big Data. While stigmatizing the poor performance of the Hadoop framework when deployed in such scenarios, it proposes the definition of a Hierarchical Hadoop Framework (H2F) to cope with the issues arising when Big Data are scattered over geographically distant data centers. The article highlights the novelty introduced by the H2F with respect to other hierarchical approaches. Tests run on a software prototype are also reported to show the increase of performance that H2F is able to achieve in geographical scenarios over a plain Hadoop approach.

Download Full-text