apache hive Latest Research Papers

Big Data Analytics Using Apache Hive to Analyze Health Data

10.4018/978-1-6684-3662-2.ch046 ◽

2022 ◽

pp. 979-992

Author(s):

Pavani Konagala

Keyword(s):

Big Data ◽

Stock Exchange ◽

Big Data Analytics ◽

Large Data ◽

Massive Data ◽

Data Sets ◽

Related Data ◽

Health Related ◽

Relational Database Management ◽

Apache Hive

A large volume of data is stored electronically. It is very difficult to measure the total volume of that data. This large amount of data is coming from various sources such as stock exchange, which may generate terabytes of data every day, Facebook, which may take about one petabyte of storage, and internet archives, which may store up to two petabytes of data, etc. So, it is very difficult to manage that data using relational database management systems. With the massive data, reading and writing from and into the drive takes more time. So, the storage and analysis of this massive data has become a big problem. Big data gives the solution for these problems. It specifies the methods to store and analyze the large data sets. This chapter specifies a brief study of big data techniques to analyze these types of data. It includes a wide study of Hadoop characteristics, Hadoop architecture, advantages of big data and big data eco system. Further, this chapter includes a comprehensive study of Apache Hive for executing health-related data and deaths data of U.S. government.

Download Full-text

Efficient processing of complex XSD using Hive and Spark

PeerJ Computer Science ◽

10.7717/peerj-cs.652 ◽

2021 ◽

Vol 7 ◽

pp. e652

Author(s):

Diana Martinez-Mosquera ◽

Rosa Navarrete ◽

Sergio Luján-Mora

Keyword(s):

Big Data ◽

Performance Management ◽

Mobile Networks ◽

Real Life ◽

Real Data ◽

Xml Schema ◽

Apache Spark ◽

Data Sets ◽

Apache Hive

The eXtensible Markup Language (XML) files are widely used by the industry due to their flexibility in representing numerous kinds of data. Multiple applications such as financial records, social networks, and mobile networks use complex XML schemas with nested types, contents, and/or extension bases on existing complex elements or large real-world files. A great number of these files are generated each day and this has influenced the development of Big Data tools for their parsing and reporting, such as Apache Hive and Apache Spark. For these reasons, multiple studies have proposed new techniques and evaluated the processing of XML files with Big Data systems. However, a more usual approach in such works involves the simplest XML schemas, even though, real data sets are composed of complex schemas. Therefore, to shed light on complex XML schema processing for real-life applications with Big Data tools, we present an approach that combines three techniques. This comprises three main methods for parsing XML files: cataloging, deserialization, and positional explode. For cataloging, the elements of the XML schema are mapped into root, arrays, structures, values, and attributes. Based on these elements, the deserialization and positional explode are straightforwardly implemented. To demonstrate the validity of our proposal, we develop a case study by implementing a test environment to illustrate the methods using real data sets provided from performance management of two mobile network vendors. Our main results state the validity of the proposed method for different versions of Apache Hive and Apache Spark, obtain the query execution times for Apache Hive internal and external tables and Apache Spark data frames, and compare the query performance in Apache Hive with that of Apache Spark. Another contribution made is a case study in which a novel solution is proposed for data analysis in the performance management systems of mobile networks.

Download Full-text

Retailing Analysis Using Hadoop and Apache Hive

International Journal of Simulation Systems Science & Technology ◽

10.5013/ijssst.a.20.01.08 ◽

2020 ◽

Author(s):

Hiba A. Abu-Alsaad

Keyword(s):

Apache Hive

Download Full-text

Performance Analysis of ECG Big Data using Apache Hive and Apache Pig

2019 8th International Conference on Information and Communication Technologies (ICICT) ◽

10.1109/icict47744.2019.9001287 ◽

2019 ◽

Cited By ~ 1

Author(s):

Mudassar Ahmad ◽

Safina Kanwal ◽

Maryam Cheema ◽

Muhammad Asif Habib

Keyword(s):

Big Data ◽

Performance Analysis ◽

Apache Pig ◽

Apache Hive

Download Full-text

Apache Hive Performance Improvement Techniques for Relational Data

2019 International Artificial Intelligence and Data Processing Symposium (IDAP) ◽

10.1109/idap.2019.8875898 ◽

2019 ◽

Cited By ~ 1

Author(s):

Melih Gunay ◽

M. Numan Ince ◽

Alper Cetinkaya

Keyword(s):

Performance Improvement ◽

Relational Data ◽

Apache Hive

Download Full-text

Apache Hive

Proceedings of the 2019 International Conference on Management of Data - SIGMOD '19 ◽

10.1145/3299869.3314045 ◽

2019 ◽

Cited By ~ 2

Author(s):

Jesús Camacho-Rodríguez ◽

Ashutosh Chauhan ◽

Alan Gates ◽

Eugene Koifman ◽

Owen O'Malley ◽

...

Keyword(s):

Apache Hive

Download Full-text

An Overview of Apache Pig and Apache Hive

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit195250 ◽

2019 ◽

pp. 432-436 ◽

Cited By ~ 1

Author(s):

Saiyam Arora ◽

Abinesh Verma ◽

Richa Vasuja ◽

Richa Vasuja

Keyword(s):

Big Data ◽

Distributed Storage ◽

Data Sets ◽

Great Work ◽

Apache Hadoop ◽

The Social ◽

Tremendous Amount ◽

Hadoop Ecosystem ◽

Apache Pig ◽

Apache Hive

Ever since the enhancement of technology has taken place, the data is growing at an alarming rate. The most prominent factor of data growth is the “Social Media”, leads to the origination of a tremendous amount of data called Big Data. Big Data is a term used for data sets that are extremely large in size as well as complicated to store and process using traditional database processing applications. A saviour to deal with Big Data is “Hadoop” and two major components of Hadoop which are HDFS (Distributed Storage) and Map Reduce(Parallel Processing). Apache Pig and Hive is an essential part of the Hadoop Ecosystem. This paper covers an overview of both Apache Pig and Hive with their architecture. As Hadoop, no doubt is doing tremendously great work by storing and processing the huge volume of data but there are more frameworks now a days to increase the efficiency of Hadoop framework which are basically seen as the layers of Hadoop or a part of Apache Hadoop project. And that is why this paper includes the two most important layers namely Apache Pig and Apache Hive.

Download Full-text

Big Data Analytics Using Apache Hive to Analyze Health Data

Nature-Inspired Algorithms for Big Data Frameworks - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-5852-1.ch015 ◽

2019 ◽

pp. 358-372

Author(s):

Pavani Konagala

Keyword(s):

Big Data ◽

Stock Exchange ◽

Big Data Analytics ◽

Large Data ◽

Massive Data ◽

Data Sets ◽

Related Data ◽

Health Related ◽

Relational Database Management ◽

Apache Hive

A large volume of data is stored electronically. It is very difficult to measure the total volume of that data. This large amount of data is coming from various sources such as stock exchange, which may generate terabytes of data every day, Facebook, which may take about one petabyte of storage, and internet archives, which may store up to two petabytes of data, etc. So, it is very difficult to manage that data using relational database management systems. With the massive data, reading and writing from and into the drive takes more time. So, the storage and analysis of this massive data has become a big problem. Big data gives the solution for these problems. It specifies the methods to store and analyze the large data sets. This chapter specifies a brief study of big data techniques to analyze these types of data. It includes a wide study of Hadoop characteristics, Hadoop architecture, advantages of big data and big data eco system. Further, this chapter includes a comprehensive study of Apache Hive for executing health-related data and deaths data of U.S. government.

Download Full-text

Analyzing Performance of Apache Pig and Apache Hive with Hadoop

Engineering Vibration, Communication and Information Processing - Lecture Notes in Electrical Engineering ◽

10.1007/978-981-13-1642-5_4 ◽

2018 ◽

pp. 41-51 ◽

Cited By ~ 3

Author(s):

Krati Bansal ◽

Priyanka Chawla ◽

Pratik Kurle

Keyword(s):

Apache Pig ◽

Apache Hive

Download Full-text

An Approach To Twitter Sentiment Analysis Over Hadoop

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.5.20110 ◽

2018 ◽

Vol 7 (4.5) ◽

pp. 374

Author(s):

Yazala Ritika Siril Paul ◽

Dilipkumar A. Borikar

Keyword(s):

Sentiment Analysis ◽

Opinion Mining ◽

Emotional State ◽

Streaming Data ◽

Data Streaming ◽

Text Data ◽

Twitter Data ◽

The People ◽

Data Platform ◽

Apache Hive

Sentiment analysis is the process of identifying people’s attitude and emotional state from the language they use via any social websites or other sources. The main aim is to identify a set of potential features in the review and extract the opinion expressions of those features by making full use of their associations. The Twitter has now become a routine for the people around the world to post thousands of reactions and opinions on every topic, every second of every single day. It’s like one big psychological database that’s constantly being updated and which can be used to analyze the sentiments of the people. Hadoop is one of the best options available for twitter data sentiment analysis and which also works for the distributed big data, streaming data, text data etc. This paper provides an efficient mechanism to perform sentiment analysis/ opinion mining on Twitter data over Hortonworks Data platform, which provides Hadoop on Windows, with the assistance of Apache Flume, Apache HDFS and Apache Hive.

Download Full-text

apache hive
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Big Data Analytics Using Apache Hive to Analyze Health Data

Efficient processing of complex XSD using Hive and Spark

Retailing Analysis Using Hadoop and Apache Hive

Performance Analysis of ECG Big Data using Apache Hive and Apache Pig

Apache Hive Performance Improvement Techniques for Relational Data

Apache Hive

An Overview of Apache Pig and Apache Hive

Big Data Analytics Using Apache Hive to Analyze Health Data

Analyzing Performance of Apache Pig and Apache Hive with Hadoop

An Approach To Twitter Sentiment Analysis Over Hadoop

Export Citation Format

apache hiveRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Big Data Analytics Using Apache Hive to Analyze Health Data

Efficient processing of complex XSD using Hive and Spark

Retailing Analysis Using Hadoop and Apache Hive

Performance Analysis of ECG Big Data using Apache Hive and Apache Pig

Apache Hive Performance Improvement Techniques for Relational Data

Apache Hive

An Overview of Apache Pig and Apache Hive

Big Data Analytics Using Apache Hive to Analyze Health Data

Analyzing Performance of Apache Pig and Apache Hive with Hadoop

An Approach To Twitter Sentiment Analysis Over Hadoop

apache hive
Recently Published Documents