scholarly journals Optimal Common Job Block Table (CJBT) to improve the Performance in Hadoop framework

Author(s):  
Pinjari Vali Basha

<p>By rapid transformation of technology, huge amount of data (structured data and Un Structured data) is generated every day.  With the aid of 5G technology and IoT the data generated and processed every day is very large. If we dig deeper the data generated approximately 2.5 quintillion bytes.<br> This data (Big Data) is stored and processed with the help of Hadoop framework. Hadoop framework has two phases for storing and retrieve the data in the network.</p> <ul> <li>Hadoop Distributed file System (HDFS)</li> <li>Map Reduce algorithm</li> </ul> <p>In the native Hadoop framework, there are some limitations for Map Reduce algorithm. If the same job is repeated again then we have to wait for the results to carry out all the steps in the native Hadoop. This led to wastage of time, resources.  If we improve the capabilities of Name node i.e., maintain Common Job Block Table (CJBT) at Name node will improve the performance. By employing Common Job Block Table will improve the performance by compromising the cost to maintain Common Job Block Table.<br> Common Job Block Table contains the meta data of files which are repeated again. This will avoid re computations, a smaller number of computations, resource saving and faster processing. The size of Common Job Block Table will keep on increasing, there should be some limit on the size of the table by employing algorithm to keep track of the jobs. The optimal Common Job Block table is derived by employing optimal algorithm at Name node.</p>

Author(s):  
Sachin Arun Thanekar ◽  
K. Subrahmanyam ◽  
A. B. Bagwan

<p>Nowadays we all are surrounded by Big data. The term ‘Big Data’ itself indicates huge volume, high velocity, variety and veracity i.e. uncertainty of data which gave rise to new difficulties and challenges. Big data generated may be structured data, Semi Structured data or unstructured data. For existing database and systems lot of difficulties are there to process, analyze, store and manage such a Big Data.  The Big Data challenges are Protection, Curation, Capture, Analysis, Searching, Visualization, Storage, Transfer and sharing. Map Reduce is a framework using which we can write applications to process huge amount of data, in parallel, on large clusters of commodity hardware in a reliable manner. Lot of efforts have been put by different researchers to make it simple, easy, effective and efficient. In our survey paper we emphasized on the working of Map Reduce, challenges, opportunities and recent trends so that researchers can think on further improvement.</p>


Author(s):  
Reema Abdulraziq ◽  
Muneer Bani Yassein ◽  
Shadi Aljawarneh

Big data refers to the huge amount of data that is being used in commercial, industrial and economic environments. There are three types of big data; structured, unstructured and semi-structured data. When it comes to discussions on big data, three major aspects that can be considered as its main dimensions are the volume, velocity, and variety of the data. This data is collected, analysed and checked for use by the end users. Cloud computing and the Internet of Things (IoT) are used to enable this huge amount of collected data to be stored and connected to the Internet. The time and the cost are reduced by means of these technologies, and in addition, they are able to accommodate this large amount of data regardless of its size. This chapter focuses on how big data, with the emergence of cloud computing and the Internet of Things (IOT), can be used via several applications and technologies.


Author(s):  
Ashwini T ◽  
Sahana LM ◽  
Mahalakshmi E ◽  
Shweta S Padti

— Analysis of consistent and structured data has seen huge success in past decades. Where the analysis of unstructured data in the form of multimedia format remains a challenging task. YouTube is one of the most used and popular social media tool. The main aim of this paper is to analyze the data that is generated from YouTube that can be mined and utilized. API (Application Programming Interface) and going to be stored in Hadoop Distributed File System (HDFS). Dataset can be analyzed using MapReduce. Which is used to identify the video categories in which most number of videos are uploaded. The objective of this paper is to demonstrate Hadoop framework, to process and handle big data there are many components. In the existing method, big data can be analyzed and processed in multiple stages by using MapReduce. Due to huge space consumption of each job, Implementing iterative map reduce jobs is expensive. A Hive method is used to analyze the big data to overcome the drawbacks of existing methods, which is the state-ofthe-art method. The hive works by extracting the YouTube information by generating API (Application Programming Interface) key and uses the SQL queries.


Author(s):  
Sachin Arun Thanekar ◽  
K. Subrahmanyam ◽  
A. B. Bagwan

<p>Nowadays we all are surrounded by Big data. The term ‘Big Data’ itself indicates huge volume, high velocity, variety and veracity i.e. uncertainty of data which gave rise to new difficulties and challenges. Big data generated may be structured data, Semi Structured data or unstructured data. For existing database and systems lot of difficulties are there to process, analyze, store and manage such a Big Data. The Big Data challenges are Protection, Curation, Capture, Analysis, Searching, Visualization, Storage, Transfer and sharing. Map Reduce is a framework using which we can write applications to process huge amount of data, in parallel, on large clusters of commodity hardware in a reliable manner. Lot of efforts have been put by different researchers to make it simple, easy, effective and efficient. In our survey paper we emphasized on the working of Map Reduce, challenges, opportunities and recent trends so that researchers can think on further improvement. </p>


The study of Hadoop Distributed File System (HDFS) and Map Reduce (MR) are the key aspects of the Hadoop framework. The big data scenarios like Face Book (FB) data processing or the twitter analytics such as storing the tweets and processing the tweets is other scenario of big data which can depends on Hadoop framework to perform the storage and processing through which further analytics can be done. The point here is the usage of space and time in the processing of the above-mentioned huge amounts of the data definitely leads to higher amounts of space and time consumption of the Hadoop framework. The problem here is usage of huge amounts of the space and at the same time the processing time is also high which need to be reduced so as to get the fastest response from the framework. The attempt is important as all the other eco system tools also depends on HDFS and MR so as to perform the data storage and processing of the data and alternative architecture so as to improve the usage of the space and effective utilization of the resources so as to reduce the time requirements of the framework. The outcome of the work is faster data processing and less space utilization of the framework in the processing of MR along with other eco system tools like Hive, Flume, Sqoop and Pig Latin. The work is proposing an alternative framework of the HDFS and MR and the name we are assigning is Unified Space Allocation and Data Processing with Metadata based Distributed File System (USAMDFS).


The Hadoop framework provides a way of storing and processing the huge amounts of the data. The social media like Facebook, twitter and amazon uses Hadoop eco system tools so as to store the data in Hadoop distributed file system and to process the data Map Reduce (MR). The current work describes the usage of Sqoop in the process of import and export with HDFS. The work involves various possible import/export commands supported by the tool Sqoop in the eco system of Hadoop. The importance of the work is to highlight the common errors while installing Sqoop and working with Sqoop. Many developers and researchers were using Sqoop so as to perform the import/export process and to handle the source data in the relational format. In the current work the connectivity between mysql and sqoop were presented and various commands usage along with the results were presented. The outcome of the work is for each command the possible errors encountered and the corresponding solution is mentioned. The common configuration settings we have to follow so as to handle the Sqoop without any errors is also mentioned


Author(s):  
Sachin Arun Thanekar ◽  
K. Subrahmanyam ◽  
A. B. Bagwan

<p>Nowadays we all are surrounded by Big data. The term ‘Big Data’ itself indicates huge volume, high velocity, variety and veracity i.e. uncertainty of data which gave rise to new difficulties and challenges. Big data generated may be structured data, Semi Structured data or unstructured data. For existing database and systems lot of difficulties are there to process, analyze, store and manage such a Big Data.  The Big Data challenges are Protection, Curation, Capture, Analysis, Searching, Visualization, Storage, Transfer and sharing. Map Reduce is a framework using which we can write applications to process huge amount of data, in parallel, on large clusters of commodity hardware in a reliable manner. Lot of efforts have been put by different researchers to make it simple, easy, effective and efficient. In our survey paper we emphasized on the working of Map Reduce, challenges, opportunities and recent trends so that researchers can think on further improvement.</p>


2020 ◽  
Vol 13 (4) ◽  
pp. 790-797
Author(s):  
Gurjit Singh Bhathal ◽  
Amardeep Singh Dhiman

Background: In current scenario of internet, large amounts of data are generated and processed. Hadoop framework is widely used to store and process big data in a highly distributed manner. It is argued that Hadoop Framework is not mature enough to deal with the current cyberattacks on the data. Objective: The main objective of the proposed work is to provide a complete security approach comprising of authorisation and authentication for the user and the Hadoop cluster nodes and to secure the data at rest as well as in transit. Methods: The proposed algorithm uses Kerberos network authentication protocol for authorisation and authentication and to validate the users and the cluster nodes. The Ciphertext-Policy Attribute- Based Encryption (CP-ABE) is used for data at rest and data in transit. User encrypts the file with their own set of attributes and stores on Hadoop Distributed File System. Only intended users can decrypt that file with matching parameters. Results: The proposed algorithm was implemented with data sets of different sizes. The data was processed with and without encryption. The results show little difference in processing time. The performance was affected in range of 0.8% to 3.1%, which includes impact of other factors also, like system configuration, the number of parallel jobs running and virtual environment. Conclusion: The solutions available for handling the big data security problems faced in Hadoop framework are inefficient or incomplete. A complete security framework is proposed for Hadoop Environment. The solution is experimentally proven to have little effect on the performance of the system for datasets of different sizes.


1993 ◽  
Vol 8 (5) ◽  
pp. 957-961 ◽  
Author(s):  
J.C. Abele ◽  
R.L. Bristol ◽  
T.C. Nguyen ◽  
M.W. Ohmer ◽  
L.S. Wood

A model proposed by Tinkham1to explain the resistance versus temperature broadening found in highTcsuperconductors in applied magnetic fields is extended to “foot and knee”-structured data taken on polycrystalline YBa2Cu3O6+δ. The proposed extension involves a series combination of two types of superconductors. For this series combination to result, a critical ratio of the two types of superconductors must be met—a result common to both percolation and randomized cellular autonoma theory. This critical ratio is investigated via statistical computer models of a polycrystalline superconductor having two phases of crystallites—one with substantially lowerJcthan the other.


Sign in / Sign up

Export Citation Format

Share Document