hadoop framework
Recently Published Documents


TOTAL DOCUMENTS

173
(FIVE YEARS 49)

H-INDEX

7
(FIVE YEARS 3)

2022 ◽  
Vol 6 (1) ◽  
pp. 5
Author(s):  
Giuseppe Di Modica ◽  
Orazio Tomarchio

In the past twenty years, we have witnessed an unprecedented production of data worldwide that has generated a growing demand for computing resources and has stimulated the design of computing paradigms and software tools to efficiently and quickly obtain insights on such a Big Data. State-of-the-art parallel computing techniques such as the MapReduce guarantee high performance in scenarios where involved computing nodes are equally sized and clustered via broadband network links, and the data are co-located with the cluster of nodes. Unfortunately, the mentioned techniques have proven ineffective in geographically distributed scenarios, i.e., computing contexts where nodes and data are geographically distributed across multiple distant data centers. In the literature, researchers have proposed variants of the MapReduce paradigm that obtain awareness of the constraints imposed in those scenarios (such as the imbalance of nodes computing power and of interconnecting links) to enforce smart task scheduling strategies. We have designed a hierarchical computing framework in which a context-aware scheduler orchestrates computing tasks that leverage the potential of the vanilla Hadoop framework within each data center taking part in the computation. In this work, after presenting the features of the developed framework, we advocate the opportunity of fragmenting the data in a smart way so that the scheduler produces a fairer distribution of the workload among the computing tasks. To prove the concept, we implemented a software prototype of the framework and ran several experiments on a small-scale testbed. Test results are discussed in the last part of the paper.


Author(s):  
Pinjari Vali Basha

<p>By rapid transformation of technology, huge amount of data (structured data and Un Structured data) is generated every day.  With the aid of 5G technology and IoT the data generated and processed every day is very large. If we dig deeper the data generated approximately 2.5 quintillion bytes.<br> This data (Big Data) is stored and processed with the help of Hadoop framework. Hadoop framework has two phases for storing and retrieve the data in the network.</p> <ul> <li>Hadoop Distributed file System (HDFS)</li> <li>Map Reduce algorithm</li> </ul> <p>In the native Hadoop framework, there are some limitations for Map Reduce algorithm. If the same job is repeated again then we have to wait for the results to carry out all the steps in the native Hadoop. This led to wastage of time, resources.  If we improve the capabilities of Name node i.e., maintain Common Job Block Table (CJBT) at Name node will improve the performance. By employing Common Job Block Table will improve the performance by compromising the cost to maintain Common Job Block Table.<br> Common Job Block Table contains the meta data of files which are repeated again. This will avoid re computations, a smaller number of computations, resource saving and faster processing. The size of Common Job Block Table will keep on increasing, there should be some limit on the size of the table by employing algorithm to keep track of the jobs. The optimal Common Job Block table is derived by employing optimal algorithm at Name node.</p>


Displays ◽  
2021 ◽  
Vol 70 ◽  
pp. 102061
Author(s):  
Amartya Hatua ◽  
Badri Narayan Subudhi ◽  
Veerakumar T. ◽  
Ashish Ghosh

Author(s):  
K. T. Ilayarajaa, E. Logashanmugam

Diabetic Retinopathy (DR) is caused due to un-mounted diabetic comorbidities. The patients suffer complete vision blindness if untreated or diagnosed on later stage. In this article, we propose a novel approach for early detection and prediction using trained datasets of multiple features. The process expansion is resultant of multiple stage attribute extraction via a series of inter-collateral parameters of diabetics. Typically, the proposed technique is designed and developed on a multi-value and multi-dimension datasets such as comorbidities history of patient encountered during diabetics. The proposed technique uses collateral attributes in evaluating retinopathy status and thereby validates the extracted DR under threshold value comparisons. The results are computed using HADOOP framework for recursive pattern and feature evaluation. The trial is processed on UCL digital library datasets with estimated performance of 98.7% with extraction and 92.34% with value True-Positive (TP) prediction.  


2021 ◽  
Vol 553 ◽  
pp. 31-48
Author(s):  
Jimmy Ming-Tai Wu ◽  
Gautam Srivastava ◽  
Min Wei ◽  
Unil Yun ◽  
Jerry Chun-Wei Lin

Author(s):  
Ashwini T ◽  
Sahana LM ◽  
Mahalakshmi E ◽  
Shweta S Padti

— Analysis of consistent and structured data has seen huge success in past decades. Where the analysis of unstructured data in the form of multimedia format remains a challenging task. YouTube is one of the most used and popular social media tool. The main aim of this paper is to analyze the data that is generated from YouTube that can be mined and utilized. API (Application Programming Interface) and going to be stored in Hadoop Distributed File System (HDFS). Dataset can be analyzed using MapReduce. Which is used to identify the video categories in which most number of videos are uploaded. The objective of this paper is to demonstrate Hadoop framework, to process and handle big data there are many components. In the existing method, big data can be analyzed and processed in multiple stages by using MapReduce. Due to huge space consumption of each job, Implementing iterative map reduce jobs is expensive. A Hive method is used to analyze the big data to overcome the drawbacks of existing methods, which is the state-ofthe-art method. The hive works by extracting the YouTube information by generating API (Application Programming Interface) key and uses the SQL queries.


Author(s):  
Orazio Tomarchio ◽  
Giuseppe Di Modica ◽  
Marco Cavallo ◽  
Carmelo Polito

Advances in the communication technologies, along with the birth of new communication paradigms leveraging on the power of the social, has fostered the production of huge amounts of data. Old-fashioned computing paradigms are unfit to handle the dimensions of the data daily produced by the countless, worldwide distributed sources of information. So far, the MapReduce has been able to keep the promise of speeding up the computation over Big Data within a cluster. This article focuses on scenarios of worldwide distributed Big Data. While stigmatizing the poor performance of the Hadoop framework when deployed in such scenarios, it proposes the definition of a Hierarchical Hadoop Framework (H2F) to cope with the issues arising when Big Data are scattered over geographically distant data centers. The article highlights the novelty introduced by the H2F with respect to other hierarchical approaches. Tests run on a software prototype are also reported to show the increase of performance that H2F is able to achieve in geographical scenarios over a plain Hadoop approach.


Author(s):  
Akram Elomari ◽  
Larbi Hassouni ◽  
Abderrahim MAIZATE

Sign in / Sign up

Export Citation Format

Share Document