scholarly journals Identifying Requirements for Big Data Analytics and Mapping to Hadoop Tools

2019 ◽  
Vol 8 (3) ◽  
pp. 4384-4392

Big data is being generating in a wide variety of formats at an exponential rate. Big data analytics deals with processing and analyzing voluminous data to provide useful insight for guided decision making. The traditional data storage and management tools are not well-equipped to handle big data and its application. Apache Hadoop is a popular open-source platform that supports storage and processing of extremely large datasets. For the purposes of big data analytics, Hadoop ecosystem provides a variety of tools. However, there is a need to select a tool that is best suited for a specific requirement of big data analytics. The tools have their own advantages and drawbacks over each other. Some of them have overlapping business use cases however they differ in critical functional areas. So, there is a need to consider the trade-offs between usability and suitability while selecting a tool from Hadoop ecosystem. This paper identifies the requirements for Big Data Analytics (BDA) and maps tools of the Hadoop framework that are best suited for them. For this, we have categorized Hadoop tools according to their functionality and usage. Different Hadoop tools are discussed from the users’ perspective along with their pros and cons, if any. Also, for each identified category, comparison of Hadoop tools based on important parameters is presented. The tools have been thoroughly studied and analyzed based on their suitability for the different requirements of big data analytics. A mapping of big data analytics requirements to the Hadoop tools has been established for use by the data analysts and predictive modelers.

Author(s):  
Mohd Imran ◽  
Mohd Vasim Ahamad ◽  
Misbahul Haque ◽  
Mohd Shoaib

The term big data analytics refers to mining and analyzing of the voluminous amount of data in big data by using various tools and platforms. Some of the popular tools are Apache Hadoop, Apache Spark, HBase, Storm, Grid Gain, HPCC, Casandra, Pig, Hive, and No SQL, etc. These tools are used depending on the parameter taken for big data analysis. So, we need a comparative analysis of such analytical tools to choose best and simpler way of analysis to gain more optimal throughput and efficient mining. This chapter contributes to a comparative study of big data analytics tools based on different aspects such as their functionality, pros, and cons based on characteristics that can be used to determine the best and most efficient among them. Through the comparative study, people are capable of using such tools in a more efficient way.


2022 ◽  
pp. 622-631
Author(s):  
Mohd Imran ◽  
Mohd Vasim Ahamad ◽  
Misbahul Haque ◽  
Mohd Shoaib

The term big data analytics refers to mining and analyzing of the voluminous amount of data in big data by using various tools and platforms. Some of the popular tools are Apache Hadoop, Apache Spark, HBase, Storm, Grid Gain, HPCC, Casandra, Pig, Hive, and No SQL, etc. These tools are used depending on the parameter taken for big data analysis. So, we need a comparative analysis of such analytical tools to choose best and simpler way of analysis to gain more optimal throughput and efficient mining. This chapter contributes to a comparative study of big data analytics tools based on different aspects such as their functionality, pros, and cons based on characteristics that can be used to determine the best and most efficient among them. Through the comparative study, people are capable of using such tools in a more efficient way.


Author(s):  
Chien-Lung Chan ◽  
Chi-Chang Chang

Unlike most daily decisions, medical decision making often has substantial consequences and trade-offs. Recently, big data analytics techniques such as statistical analysis, data mining, machine learning and deep learning can be applied to construct innovative decision models. With complex decision making, it can be difficult to comprehend and compare the benefits and risks of all available options to make a decision. For these reasons, this Special Issue focuses on the use of big data analytics and forms of public health decision making based on the decision model, spanning from theory to practice. A total of 64 submissions were carefully blind peer reviewed by at least two referees and, finally, 23 papers were selected for this Special Issue.


2019 ◽  
Vol 59 (6) ◽  
pp. 415-429 ◽  
Author(s):  
JUAN-PEDRO CABRERA-SÁNCHEZ ◽  
ÁNGEL F VILLAREJO-RAMOS

ABSTRACT With the total quantity of data doubling every two years, the low price of computing and data storage, make Big Data analytics (BDA) adoption desirable for companies, as a tool to get competitive advantage. Given the availability of free software, why have some companies failed to adopt these techniques? To answer this question, we extend the unified theory of technology adoption and use of technology model (UTAUT) adapted for the BDA context, adding two variables: resistance to use and perceived risk. We used the level of implementation of these techniques to divide companies into users and non-users of BDA. The structural models were evaluated by partial least squares (PLS). The results show the importance of good infrastructure exceeds the difficulties companies face in implementing it. While companies planning to use Big Data expect strong results, current users are more skeptical about its performance.


Author(s):  
Balasree K ◽  
Dharmarajan K

In rapid development of Big Data technology over the recent years, this paper discussing about the Machine Learning (ML) playing role that is based on methods and algorithms to Big Data Processing and Big Data Analytics. In evolutionary fields and computing fields of developments that both are complementing each other. Big Data: The rapid growth of such data solutions needed to be studied and provided to handle then to gain the knowledge from datasets and extracting values due to the data sets are very high in velocity and variety. The Big data analytics are involving and indicating the appropriate data storage and computational outline that enhanced by using Scalable Machine Learning Algorithms and Big Data Analytics then the analytics to reveal the massive amounts of hidden data’s and secret correlations. This type of Analytic information useful for organizations and companies to gain deeper knowledge, development and getting advantages over the competition. When using this Analytics we can predict the accurate implementation over the data. This paper presented about the detailed review of state-of-the-art developments and overview of advantages and challenges in Machine Learning Algorithms over big data analytics.


Web Services ◽  
2019 ◽  
pp. 89-104
Author(s):  
Priya P. Panigrahi ◽  
Tiratha Raj Singh

In this digital and computing world, data formation and collection rate are growing very rapidly. With these improved proficiencies of data storage and fast computation along with the real-time distribution of data through the internet, the usual everyday ingestion of data is mounting exponentially. With the continuous advancement in data storage and accessibility of smart devices, the impact of big data will continue to develop. This chapter provides the fundamental concepts of big data, its benefits, probable pitfalls, big data analytics and its impact in Bioinformatics. With the generation of the deluge of biological data through next generation sequencing projects, there is a need to handle this data trough big data techniques. The chapter also presents a discussion of the tools for analytics, development of a novel data life cycle on big data, details of the problems and challenges connected with big data with special relevance to bioinformatics.


Author(s):  
Priya P. Panigrahi ◽  
Tiratha Raj Singh

In this digital and computing world, data formation and collection rate are growing very rapidly. With these improved proficiencies of data storage and fast computation along with the real-time distribution of data through the internet, the usual everyday ingestion of data is mounting exponentially. With the continuous advancement in data storage and accessibility of smart devices, the impact of big data will continue to develop. This chapter provides the fundamental concepts of big data, its benefits, probable pitfalls, big data analytics and its impact in Bioinformatics. With the generation of the deluge of biological data through next generation sequencing projects, there is a need to handle this data trough big data techniques. The chapter also presents a discussion of the tools for analytics, development of a novel data life cycle on big data, details of the problems and challenges connected with big data with special relevance to bioinformatics.


2019 ◽  
Vol 19 (3) ◽  
pp. 16-24 ◽  
Author(s):  
Ivan P. Popchev ◽  
Daniela A. Orozova

Abstract The issues related to the analysis and management of Big Data, aspects of the security, stability and quality of the data, represent a new research, and engineering challenge. In the present paper, techniques for Big Data storage, search, analysis and management in the area of the virtual e-Learning space and the problems in front of them are considered. A numerical example for explorative analysis of data about the students from Burgas Free University is applied, using instrument for Data Mining of Orange. The analysis is a base for a system for localization of students at risk.


Sign in / Sign up

Export Citation Format

Share Document