Big Data Analytics with Apache Hadoop MapReduce Framework

The term big data analytics refers to mining and analyzing of the voluminous amount of data in big data by using various tools and platforms. Some of the popular tools are Apache Hadoop, Apache Spark, HBase, Storm, Grid Gain, HPCC, Casandra, Pig, Hive, and No SQL, etc. These tools are used depending on the parameter taken for big data analysis. So, we need a comparative analysis of such analytical tools to choose best and simpler way of analysis to gain more optimal throughput and efficient mining. This chapter contributes to a comparative study of big data analytics tools based on different aspects such as their functionality, pros, and cons based on characteristics that can be used to determine the best and most efficient among them. Through the comparative study, people are capable of using such tools in a more efficient way.

Download Full-text

Big Data Analytics Tools and Platform in Big Data Landscape

10.4018/978-1-6684-3662-2.ch029 ◽

2022 ◽

pp. 622-631

Author(s):

Mohd Imran ◽

Mohd Vasim Ahamad ◽

Misbahul Haque ◽

Mohd Shoaib

Keyword(s):

Big Data ◽

Comparative Analysis ◽

Comparative Study ◽

Data Analytics ◽

Big Data Analytics ◽

Big Data Analysis ◽

Apache Hadoop ◽

Pros And Cons ◽

The Comparative Study ◽

Analytical Tools

The term big data analytics refers to mining and analyzing of the voluminous amount of data in big data by using various tools and platforms. Some of the popular tools are Apache Hadoop, Apache Spark, HBase, Storm, Grid Gain, HPCC, Casandra, Pig, Hive, and No SQL, etc. These tools are used depending on the parameter taken for big data analysis. So, we need a comparative analysis of such analytical tools to choose best and simpler way of analysis to gain more optimal throughput and efficient mining. This chapter contributes to a comparative study of big data analytics tools based on different aspects such as their functionality, pros, and cons based on characteristics that can be used to determine the best and most efficient among them. Through the comparative study, people are capable of using such tools in a more efficient way.

Download Full-text

The importance of big data technology

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.5.21139 ◽

2018 ◽

Vol 7 (4.5) ◽

pp. 485

Author(s):

Samson Fadiya ◽

Arif Sari

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Management Systems ◽

Apache Hadoop ◽

Industry Standard ◽

Data Framework ◽

Efficiency And Effectiveness ◽

Relational Database Management ◽

Big Data Technology

The adoption of Web 2.0 technologies, Internet of Things, etc. by individuals and organization has led to an explosion of data. As it stands, existing Relational Database Management Systems (RDBMSs) are incapable of handling this deluge of data. The term Big Data was coined to represent these vast, fast and complex datasets that regular RDBMSs could not handle. Special tools or frameworks were developed to deal with processing, managing and storing this big data. These tools are capable of functioning in distributed industry- standard environments thereby maintaining efficiency and effectiveness at a business level. Apache Hadoop is an example of such a framework. This report discusses big data, it origins, opportunities and challenges that it presents, big data analytics and the application of big data using existing big data tools or frameworks. It also discusses Apache Hadoop as a big data framework and provides a basic overview of this technology from technological and business perspectives.

Download Full-text

Leverage RAF to find domain experts on research social network services: A big data analytics methodology with MapReduce framework

International Journal of Production Economics ◽

10.1016/j.ijpe.2014.12.038 ◽

2015 ◽

Vol 165 ◽

pp. 185-193 ◽

Cited By ~ 22

Author(s):

Jianshan Sun ◽

Wei Xu ◽

Jian Ma ◽

Jiasen Sun

Keyword(s):

Big Data ◽

Social Network ◽

Data Analytics ◽

Big Data Analytics ◽

Network Services ◽

Mapreduce Framework ◽

Domain Experts ◽

Social Network Services

Download Full-text

Shared disk big data analytics with Apache Hadoop

2012 19th International Conference on High Performance Computing ◽

10.1109/hipc.2012.6507520 ◽

2012 ◽

Cited By ~ 9

Author(s):

Anirban Mukherjee ◽

Joydip Datta ◽

Raghavendra Jorapur ◽

Ravi Singhvi ◽

Saurav Haloi ◽

...

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Apache Hadoop

Download Full-text

Exploring Big Data Analytic Approaches to Cancer Blog Text Analysis

International Journal of Healthcare Information Systems and Informatics ◽

10.4018/ijhisi.2019100101 ◽

2019 ◽

Vol 14 (4) ◽

pp. 1-20 ◽

Cited By ~ 2

Author(s):

Viju Raghupathi ◽

Yilu Zhou ◽

Wullianallur Raghupathi

Keyword(s):

Big Data ◽

Data Analytics ◽

Word Association ◽

Big Data Analytics ◽

Text Analytics ◽

Hadoop Mapreduce ◽

Cancer Management ◽

Exploratory Approach ◽

Clustering And Classification ◽

Data Analytic

In this article, the authors explore the potential of a big data analytics approach to unstructured text analytics of cancer blogs. The application is developed using Cloudera platform's Hadoop MapReduce framework. It uses several text analytics algorithms, including word count, word association, clustering, and classification, to identify and analyze the patterns and keywords in cancer blog postings. This article establishes an exploratory approach to involving big data analytics methods in developing text analytics applications for the analysis of cancer blogs. Additional insights are extracted through various means, including the development of categories or keywords contained in the blogs, the development of a taxonomy, and the examination of relationships among the categories. The application has the potential for generalizability and implementation with health content in other blogs and social media. It can provide insight and decision support for cancer management and facilitate efficient and relevant searches for information related to cancer.

Download Full-text

Scalability Study of Hadoop MapReduce and Hive in Big Data Analytics

International Journal Of Engineering And Computer Science ◽

10.18535/ijecs/v5i11.11 ◽

2016 ◽

Cited By ~ 2

Author(s):

Khadija Jabeen ◽

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Hadoop Mapreduce

Download Full-text

Simulations of Hadoop/MapReduce-Based Platform to Support its Usability of Big Data Analytics in Healthcare

Athens Journal of Τechnology & Engineering ◽

10.30958/ajte.5-3-1 ◽

2018 ◽

Vol 5 (3) ◽

pp. 197-222

Author(s):

Dillon Chrimes ◽

Hamid Zamani ◽

Belaid Moa ◽

Alex Kuo

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Hadoop Mapreduce

Download Full-text

Statistical Visualization of Big Data Through Hadoop Streaming in RStudio

10.4018/978-1-6684-3662-2.ch035 ◽

2022 ◽

pp. 758-787

Author(s):

Chitresh Verma ◽

Rajiv Pandey

Keyword(s):

Big Data ◽

Data Visualization ◽

Data Analytics ◽

Big Data Analytics ◽

Data Streaming ◽

Data Set ◽

Graphical Modeling ◽

Hadoop Mapreduce ◽

R Packages ◽

Case Based

Data Visualization enables visual representation of the data set for interpretation of data in a meaningful manner from human perspective. The Statistical visualization calls for various tools, algorithms and techniques that can support and render graphical modeling. This chapter shall explore on the detailed features R and RStudio. The combination of Hadoop and R for the Big Data Analytics and its data visualization shall be demonstrated through appropriate code snippets. The integration perspective of R and Hadoop is explained in detail with the help of a utility called Hadoop streaming jar. The various R packages and their integration with Hadoop operations in the R environment are explained through suitable examples. The process of data streaming is provided using different readers of Hadoop streaming package. A case based statistical project is considered in which the data set is visualized after dual execution using the Hadoop MapReduce and R script.

Download Full-text