Optimal Common Job Block Table (CJBT) to improve the Performance in Hadoop framework

Big Data and MapReduce Challenges, Opportunities and Trends

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v6i6.pp2911-2919 ◽

2016 ◽

Vol 6 (6) ◽

pp. 2911

Author(s):

Sachin Arun Thanekar ◽

K. Subrahmanyam ◽

A. B. Bagwan

Keyword(s):

Big Data ◽

High Velocity ◽

Structured Data ◽

Unstructured Data ◽

Map Reduce ◽

Huge Amount ◽

Commodity Hardware ◽

Survey Paper ◽

Recent Trends ◽

Large Clusters

Nowadays we all are surrounded by Big data. The term ‘Big Data’ itself indicates huge volume, high velocity, variety and veracity i.e. uncertainty of data which gave rise to new difficulties and challenges. Big data generated may be structured data, Semi Structured data or unstructured data. For existing database and systems lot of difficulties are there to process, analyze, store and manage such a Big Data. The Big Data challenges are Protection, Curation, Capture, Analysis, Searching, Visualization, Storage, Transfer and sharing. Map Reduce is a framework using which we can write applications to process huge amount of data, in parallel, on large clusters of commodity hardware in a reliable manner. Lot of efforts have been put by different researchers to make it simple, easy, effective and efficient. In our survey paper we emphasized on the working of Map Reduce, challenges, opportunities and recent trends so that researchers can think on further improvement.

Download Full-text

The Rise of Big Data, Cloud, and Internet of Things

Critical Research on Scalability and Security Issues in Virtual Cloud Environments - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-5225-3029-9.ch010 ◽

2018 ◽

pp. 201-222 ◽

Cited By ~ 1

Author(s):

Reema Abdulraziq ◽

Muneer Bani Yassein ◽

Shadi Aljawarneh

Keyword(s):

Cloud Computing ◽

Big Data ◽

Internet Of Things ◽

End Users ◽

Structured Data ◽

The Internet ◽

Huge Amount ◽

Volume Velocity ◽

The Cost ◽

The Internet Of Things

Big data refers to the huge amount of data that is being used in commercial, industrial and economic environments. There are three types of big data; structured, unstructured and semi-structured data. When it comes to discussions on big data, three major aspects that can be considered as its main dimensions are the volume, velocity, and variety of the data. This data is collected, analysed and checked for use by the end users. Cloud computing and the Internet of Things (IoT) are used to enable this huge amount of collected data to be stored and connected to the Internet. The time and the cost are reduced by means of these technologies, and in addition, they are able to accommodate this large amount of data regardless of its size. This chapter focuses on how big data, with the emergence of cloud computing and the Internet of Things (IOT), can be used via several applications and technologies.

Download Full-text

YOUTUBE DATA ANALYSIS USING HADOOP FRAMEWORK

International Journal of Engineering Applied Sciences and Technology ◽

10.33564/ijeast.2021.v05i11.051 ◽

2021 ◽

Vol 5 (11) ◽

Author(s):

Ashwini T ◽

Sahana LM ◽

Mahalakshmi E ◽

Shweta S Padti

Keyword(s):

Big Data ◽

Application Programming Interface ◽

Structured Data ◽

Unstructured Data ◽

Distributed File System ◽

Hadoop Distributed File System ◽

Application Programming ◽

Programming Interface ◽

Hadoop Framework ◽

Social Media Tool

— Analysis of consistent and structured data has seen huge success in past decades. Where the analysis of unstructured data in the form of multimedia format remains a challenging task. YouTube is one of the most used and popular social media tool. The main aim of this paper is to analyze the data that is generated from YouTube that can be mined and utilized. API (Application Programming Interface) and going to be stored in Hadoop Distributed File System (HDFS). Dataset can be analyzed using MapReduce. Which is used to identify the video categories in which most number of videos are uploaded. The objective of this paper is to demonstrate Hadoop framework, to process and handle big data there are many components. In the existing method, big data can be analyzed and processed in multiple stages by using MapReduce. Due to huge space consumption of each job, Implementing iterative map reduce jobs is expensive. A Hive method is used to analyze the big data to overcome the drawbacks of existing methods, which is the state-ofthe-art method. The hive works by extracting the YouTube information by generating API (Application Programming Interface) key and uses the SQL queries.

Download Full-text

A Study on MapReduce: Challenges and Trends

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v4.i1.pp176-183 ◽

2016 ◽

Vol 4 (1) ◽

pp. 176

Author(s):

Sachin Arun Thanekar ◽

K. Subrahmanyam ◽

A. B. Bagwan

Keyword(s):

Big Data ◽

High Velocity ◽

Structured Data ◽

Unstructured Data ◽

Map Reduce ◽

Huge Amount ◽

Commodity Hardware ◽

Survey Paper ◽

Recent Trends ◽

Large Clusters

Nowadays we all are surrounded by Big data. The term ‘Big Data’ itself indicates huge volume, high velocity, variety and veracity i.e. uncertainty of data which gave rise to new difficulties and challenges. Big data generated may be structured data, Semi Structured data or unstructured data. For existing database and systems lot of difficulties are there to process, analyze, store and manage such a Big Data. The Big Data challenges are Protection, Curation, Capture, Analysis, Searching, Visualization, Storage, Transfer and sharing. Map Reduce is a framework using which we can write applications to process huge amount of data, in parallel, on large clusters of commodity hardware in a reliable manner. Lot of efforts have been put by different researchers to make it simple, easy, effective and efficient. In our survey paper we emphasized on the working of Map Reduce, challenges, opportunities and recent trends so that researchers can think on further improvement.

Download Full-text

The File System Recommendations to Reduce the Space and Time Parameters in Hadoop File Storage and Map Reduce Processing of Big Data Applications

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j7579.0891020 ◽

2020 ◽

Vol 9 (10) ◽

pp. 353-356

Keyword(s):

Big Data ◽

Data Processing ◽

Data Storage ◽

File System ◽

Distributed File System ◽

Map Reduce ◽

Space And Time ◽

File Storage ◽

Hadoop Distributed File System ◽

Hadoop Framework

The study of Hadoop Distributed File System (HDFS) and Map Reduce (MR) are the key aspects of the Hadoop framework. The big data scenarios like Face Book (FB) data processing or the twitter analytics such as storing the tweets and processing the tweets is other scenario of big data which can depends on Hadoop framework to perform the storage and processing through which further analytics can be done. The point here is the usage of space and time in the processing of the above-mentioned huge amounts of the data definitely leads to higher amounts of space and time consumption of the Hadoop framework. The problem here is usage of huge amounts of the space and at the same time the processing time is also high which need to be reduced so as to get the fastest response from the framework. The attempt is important as all the other eco system tools also depends on HDFS and MR so as to perform the data storage and processing of the data and alternative architecture so as to improve the usage of the space and effective utilization of the resources so as to reduce the time requirements of the framework. The outcome of the work is faster data processing and less space utilization of the framework in the processing of MR along with other eco system tools like Hive, Flume, Sqoop and Pig Latin. The work is proposing an alternative framework of the HDFS and MR and the name we are assigning is Unified Space Allocation and Data Processing with Metadata based Distributed File System (USAMDFS).

Download Full-text

Sqoop usage in Hadoop Distributed File System and Observations to Handle Common Errors

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d4980.119420 ◽

2020 ◽

Vol 9 (4) ◽

pp. 452-454

Keyword(s):

File System ◽

Distributed File System ◽

Current Work ◽

Map Reduce ◽

The Social ◽

Source Data ◽

Hadoop Distributed File System ◽

Import And Export ◽

The Common ◽

Hadoop Framework

The Hadoop framework provides a way of storing and processing the huge amounts of the data. The social media like Facebook, twitter and amazon uses Hadoop eco system tools so as to store the data in Hadoop distributed file system and to process the data Map Reduce (MR). The current work describes the usage of Sqoop in the process of import and export with HDFS. The work involves various possible import/export commands supported by the tool Sqoop in the eco system of Hadoop. The importance of the work is to highlight the common errors while installing Sqoop and working with Sqoop. Many developers and researchers were using Sqoop so as to perform the import/export process and to handle the source data in the relational format. In the current work the connectivity between mysql and sqoop were presented and various commands usage along with the results were presented. The outcome of the work is for each command the possible errors encountered and the corresponding solution is mentioned. The common configuration settings we have to follow so as to handle the Sqoop without any errors is also mentioned

Download Full-text

Big Data and MapReduce Challenges, Opportunities and Trends

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v6i6.10555 ◽

2016 ◽

Vol 6 (6) ◽

pp. 2911

Author(s):

Sachin Arun Thanekar ◽

K. Subrahmanyam ◽

A. B. Bagwan

Keyword(s):

Big Data ◽

High Velocity ◽

Structured Data ◽

Unstructured Data ◽

Map Reduce ◽

Huge Amount ◽

Commodity Hardware ◽

Survey Paper ◽

Recent Trends ◽

Large Clusters

Nowadays we all are surrounded by Big data. The term ‘Big Data’ itself indicates huge volume, high velocity, variety and veracity i.e. uncertainty of data which gave rise to new difficulties and challenges. Big data generated may be structured data, Semi Structured data or unstructured data. For existing database and systems lot of difficulties are there to process, analyze, store and manage such a Big Data. The Big Data challenges are Protection, Curation, Capture, Analysis, Searching, Visualization, Storage, Transfer and sharing. Map Reduce is a framework using which we can write applications to process huge amount of data, in parallel, on large clusters of commodity hardware in a reliable manner. Lot of efforts have been put by different researchers to make it simple, easy, effective and efficient. In our survey paper we emphasized on the working of Map Reduce, challenges, opportunities and recent trends so that researchers can think on further improvement.

Download Full-text

Big Data Security Challenges and Solution of Distributed Computing in Hadoop Environment: A Security Framework

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666190822095422 ◽

2020 ◽

Vol 13 (4) ◽

pp. 790-797

Author(s):

Gurjit Singh Bhathal ◽

Amardeep Singh Dhiman

Keyword(s):

Big Data ◽

Data Security ◽

Data Sets ◽

Security Framework ◽

Hadoop Distributed File System ◽

Current Scenario ◽

Hadoop Cluster ◽

Ciphertext Policy ◽

In Transit ◽

Hadoop Framework

Background: In current scenario of internet, large amounts of data are generated and processed. Hadoop framework is widely used to store and process big data in a highly distributed manner. It is argued that Hadoop Framework is not mature enough to deal with the current cyberattacks on the data. Objective: The main objective of the proposed work is to provide a complete security approach comprising of authorisation and authentication for the user and the Hadoop cluster nodes and to secure the data at rest as well as in transit. Methods: The proposed algorithm uses Kerberos network authentication protocol for authorisation and authentication and to validate the users and the cluster nodes. The Ciphertext-Policy Attribute- Based Encryption (CP-ABE) is used for data at rest and data in transit. User encrypts the file with their own set of attributes and stores on Hadoop Distributed File System. Only intended users can decrypt that file with matching parameters. Results: The proposed algorithm was implemented with data sets of different sizes. The data was processed with and without encryption. The results show little difference in processing time. The performance was affected in range of 0.8% to 3.1%, which includes impact of other factors also, like system configuration, the number of parallel jobs running and virtual environment. Conclusion: The solutions available for handling the big data security problems faced in Hadoop framework are inefficient or incomplete. A complete security framework is proposed for Hadoop Environment. The solution is experimentally proven to have little effect on the performance of the system for datasets of different sizes.

Download Full-text

A Survey on Implementation of Word-Count with Map Reduce Programming Oriented Model using Hadoop Framework

SSRN Electronic Journal ◽

10.2139/ssrn.3351074 ◽

2019 ◽

Author(s):

Santosh Yadav ◽

Jay Prakash

Keyword(s):

Map Reduce ◽

Word Count ◽

Hadoop Framework

Download Full-text

Resistive transition broadening in two-phase polycrystalline YBaCuO

Journal of Materials Research ◽

10.1557/jmr.1993.0957 ◽

1993 ◽

Vol 8 (5) ◽

pp. 957-961 ◽

Cited By ~ 1

Author(s):

J.C. Abele ◽

R.L. Bristol ◽

T.C. Nguyen ◽

M.W. Ohmer ◽

L.S. Wood

Keyword(s):

Magnetic Fields ◽

Computer Models ◽

Structured Data ◽

The Other ◽

Critical Ratio ◽

Resistive Transition ◽

Two Phase ◽

Two Phases

A model proposed by Tinkham1to explain the resistance versus temperature broadening found in highTcsuperconductors in applied magnetic fields is extended to “foot and knee”-structured data taken on polycrystalline YBa2Cu3O6+δ. The proposed extension involves a series combination of two types of superconductors. For this series combination to result, a critical ratio of the two types of superconductors must be met—a result common to both percolation and randomized cellular autonoma theory. This critical ratio is investigated via statistical computer models of a polycrystalline superconductor having two phases of crystallites—one with substantially lowerJcthan the other.

Download Full-text