Modeling Apache Hive based applications in Big Data architectures

Performance Analysis of ECG Big Data using Apache Hive and Apache Pig

2019 8th International Conference on Information and Communication Technologies (ICICT) ◽

10.1109/icict47744.2019.9001287 ◽

2019 ◽

Cited By ~ 1

Author(s):

Mudassar Ahmad ◽

Safina Kanwal ◽

Maryam Cheema ◽

Muhammad Asif Habib

Keyword(s):

Big Data ◽

Performance Analysis ◽

Apache Pig ◽

Apache Hive

Download Full-text

Efficient processing of complex XSD using Hive and Spark

PeerJ Computer Science ◽

10.7717/peerj-cs.652 ◽

2021 ◽

Vol 7 ◽

pp. e652

Author(s):

Diana Martinez-Mosquera ◽

Rosa Navarrete ◽

Sergio Luján-Mora

Keyword(s):

Big Data ◽

Performance Management ◽

Mobile Networks ◽

Real Life ◽

Real Data ◽

Xml Schema ◽

Apache Spark ◽

Data Sets ◽

Apache Hive

The eXtensible Markup Language (XML) files are widely used by the industry due to their flexibility in representing numerous kinds of data. Multiple applications such as financial records, social networks, and mobile networks use complex XML schemas with nested types, contents, and/or extension bases on existing complex elements or large real-world files. A great number of these files are generated each day and this has influenced the development of Big Data tools for their parsing and reporting, such as Apache Hive and Apache Spark. For these reasons, multiple studies have proposed new techniques and evaluated the processing of XML files with Big Data systems. However, a more usual approach in such works involves the simplest XML schemas, even though, real data sets are composed of complex schemas. Therefore, to shed light on complex XML schema processing for real-life applications with Big Data tools, we present an approach that combines three techniques. This comprises three main methods for parsing XML files: cataloging, deserialization, and positional explode. For cataloging, the elements of the XML schema are mapped into root, arrays, structures, values, and attributes. Based on these elements, the deserialization and positional explode are straightforwardly implemented. To demonstrate the validity of our proposal, we develop a case study by implementing a test environment to illustrate the methods using real data sets provided from performance management of two mobile network vendors. Our main results state the validity of the proposed method for different versions of Apache Hive and Apache Spark, obtain the query execution times for Apache Hive internal and external tables and Apache Spark data frames, and compare the query performance in Apache Hive with that of Apache Spark. Another contribution made is a case study in which a novel solution is proposed for data analysis in the performance management systems of mobile networks.

Download Full-text

An Overview of Apache Pig and Apache Hive

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit195250 ◽

2019 ◽

pp. 432-436 ◽

Cited By ~ 1

Author(s):

Saiyam Arora ◽

Abinesh Verma ◽

Richa Vasuja ◽

Richa Vasuja

Keyword(s):

Big Data ◽

Distributed Storage ◽

Data Sets ◽

Great Work ◽

Apache Hadoop ◽

The Social ◽

Tremendous Amount ◽

Hadoop Ecosystem ◽

Apache Pig ◽

Apache Hive

Ever since the enhancement of technology has taken place, the data is growing at an alarming rate. The most prominent factor of data growth is the “Social Media”, leads to the origination of a tremendous amount of data called Big Data. Big Data is a term used for data sets that are extremely large in size as well as complicated to store and process using traditional database processing applications. A saviour to deal with Big Data is “Hadoop” and two major components of Hadoop which are HDFS (Distributed Storage) and Map Reduce(Parallel Processing). Apache Pig and Hive is an essential part of the Hadoop Ecosystem. This paper covers an overview of both Apache Pig and Hive with their architecture. As Hadoop, no doubt is doing tremendously great work by storing and processing the huge volume of data but there are more frameworks now a days to increase the efficiency of Hadoop framework which are basically seen as the layers of Hadoop or a part of Apache Hadoop project. And that is why this paper includes the two most important layers namely Apache Pig and Apache Hive.

Download Full-text

Performance Comparison Between Apache Hive and Oracle SQL for Big Data Analytics

Advances in Intelligent Systems and Computing - Proceedings of the Eighth International Conference on Soft Computing and Pattern Recognition (SoCPaR 2016) ◽

10.1007/978-3-319-60618-7_14 ◽

2017 ◽

pp. 130-141

Author(s):

Rotsnarani Sethy ◽

Santosh Kumar Dash ◽

Mrutyunjaya Panda

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Performance Comparison ◽

Apache Hive

Download Full-text

Fair: A Hadoop-based Hybrid Model for Faculty Information Retrieval System

10.20944/preprints201706.0115.v1 ◽

2017 ◽

Author(s):

Harishchandra Dubey

Keyword(s):

Big Data ◽

Distributed Computing ◽

Computing Environment ◽

The Past ◽

Execution Engine ◽

Proposed Model ◽

Data Problem ◽

The Right ◽

Apache Hive ◽

Centralized System

In era of ever-expanding data and knowledge, we lack a centralized system that maps all the faculties to their research works. This problem has not been addressed in the past and it becomes challenging for students to connect with the right faculty of their domain. Since we have so many colleges and faculties this lies in the category of big data problem. In this paper, we present a model which works on the distributed computing environment to tackle big data. The proposed model uses apache spark as an execution engine and hive as database. The results are visualized with the help of Tableau that is connected to Apache Hive to achieve distributed computing.

Download Full-text

Big Data Analytics Using Apache Hive to Analyze Health Data

10.4018/978-1-6684-3662-2.ch046 ◽

2022 ◽

pp. 979-992

Author(s):

Pavani Konagala

Keyword(s):

Big Data ◽

Stock Exchange ◽

Big Data Analytics ◽

Large Data ◽

Massive Data ◽

Data Sets ◽

Related Data ◽

Health Related ◽

Relational Database Management ◽

Apache Hive

A large volume of data is stored electronically. It is very difficult to measure the total volume of that data. This large amount of data is coming from various sources such as stock exchange, which may generate terabytes of data every day, Facebook, which may take about one petabyte of storage, and internet archives, which may store up to two petabytes of data, etc. So, it is very difficult to manage that data using relational database management systems. With the massive data, reading and writing from and into the drive takes more time. So, the storage and analysis of this massive data has become a big problem. Big data gives the solution for these problems. It specifies the methods to store and analyze the large data sets. This chapter specifies a brief study of big data techniques to analyze these types of data. It includes a wide study of Hadoop characteristics, Hadoop architecture, advantages of big data and big data eco system. Further, this chapter includes a comprehensive study of Apache Hive for executing health-related data and deaths data of U.S. government.

Download Full-text

Big Data Analytics Using Apache Hive to Analyze Health Data

Nature-Inspired Algorithms for Big Data Frameworks - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-5852-1.ch015 ◽

2019 ◽

pp. 358-372

Author(s):

Pavani Konagala

Keyword(s):

Big Data ◽

Stock Exchange ◽

Big Data Analytics ◽

Large Data ◽

Massive Data ◽

Data Sets ◽

Related Data ◽

Health Related ◽

Relational Database Management ◽

Apache Hive

A large volume of data is stored electronically. It is very difficult to measure the total volume of that data. This large amount of data is coming from various sources such as stock exchange, which may generate terabytes of data every day, Facebook, which may take about one petabyte of storage, and internet archives, which may store up to two petabytes of data, etc. So, it is very difficult to manage that data using relational database management systems. With the massive data, reading and writing from and into the drive takes more time. So, the storage and analysis of this massive data has become a big problem. Big data gives the solution for these problems. It specifies the methods to store and analyze the large data sets. This chapter specifies a brief study of big data techniques to analyze these types of data. It includes a wide study of Hadoop characteristics, Hadoop architecture, advantages of big data and big data eco system. Further, this chapter includes a comprehensive study of Apache Hive for executing health-related data and deaths data of U.S. government.

Download Full-text