Analyzing and scripting indian election strategies using big data via Apache Hadoop framework

Apache Hadoop has been a popular parallel processing tool in the era of big data. While practitioners have rewritten many conventional analysis algorithms to make them customized to Hadoop, the issue of inefficient I/O in Hadoop-based programs has been repeatedly reported in the literature. In this article, we address the problem of the I/O inefficiency in Hadoop-based massive data analysis by introducing our efficient modification of Hadoop. We first incorporate a columnar data layout into the conventional Hadoop framework, without any modification of the Hadoop internals. We also provide Hadoop with indexing capability to save a huge amount of I/O while processing not only selection predicates but also star-join queries that are often used in many analysis tasks.

Download Full-text

Big Data Security Challenges and Solution of Distributed Computing in Hadoop Environment: A Security Framework

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666190822095422 ◽

2020 ◽

Vol 13 (4) ◽

pp. 790-797

Author(s):

Gurjit Singh Bhathal ◽

Amardeep Singh Dhiman

Keyword(s):

Big Data ◽

Data Security ◽

Data Sets ◽

Security Framework ◽

Hadoop Distributed File System ◽

Current Scenario ◽

Hadoop Cluster ◽

Ciphertext Policy ◽

In Transit ◽

Hadoop Framework

Background: In current scenario of internet, large amounts of data are generated and processed. Hadoop framework is widely used to store and process big data in a highly distributed manner. It is argued that Hadoop Framework is not mature enough to deal with the current cyberattacks on the data. Objective: The main objective of the proposed work is to provide a complete security approach comprising of authorisation and authentication for the user and the Hadoop cluster nodes and to secure the data at rest as well as in transit. Methods: The proposed algorithm uses Kerberos network authentication protocol for authorisation and authentication and to validate the users and the cluster nodes. The Ciphertext-Policy Attribute- Based Encryption (CP-ABE) is used for data at rest and data in transit. User encrypts the file with their own set of attributes and stores on Hadoop Distributed File System. Only intended users can decrypt that file with matching parameters. Results: The proposed algorithm was implemented with data sets of different sizes. The data was processed with and without encryption. The results show little difference in processing time. The performance was affected in range of 0.8% to 3.1%, which includes impact of other factors also, like system configuration, the number of parallel jobs running and virtual environment. Conclusion: The solutions available for handling the big data security problems faced in Hadoop framework are inefficient or incomplete. A complete security framework is proposed for Hadoop Environment. The solution is experimentally proven to have little effect on the performance of the system for datasets of different sizes.

Download Full-text

Early detection of diabetic retinopathy from big data in hadoop framework

Displays ◽

10.1016/j.displa.2021.102061 ◽

2021 ◽

Vol 70 ◽

pp. 102061

Author(s):

Amartya Hatua ◽

Badri Narayan Subudhi ◽

Veerakumar T. ◽

Ashish Ghosh

Keyword(s):

Big Data ◽

Diabetic Retinopathy ◽

Early Detection ◽

Hadoop Framework

Download Full-text

Analysing Big Data in VANET via HADOOP Framework

Journal of Computer Science & Systems Biology ◽

10.4172/jcsb.1000281 ◽

2018 ◽

Vol 11 (04) ◽

Author(s):

Rahul Kumar Chawda ◽

Ghanshyam Thakur

Keyword(s):

Big Data ◽

Hadoop Framework

Download Full-text

Sistema de Computação Paralela e Distribuída Utilizando Raspberry Pi e Apache Hadoop

10.5753/epoca.2018.13460 ◽

2018 ◽

Author(s):

Hemerson Pontes ◽

Gilvandro De Medeiros ◽

Joanderson Borges ◽

Helton Maia

Keyword(s):

Big Data ◽

Raspberry Pi ◽

Apache Hadoop

No contexto de Big Data, o grande fluxo e a complexidade dos dados gerados exigem elevado custo computacional para tarefas de processamento e extração de informação, sendo um desafio concluir tais execuções em tempo hábil para tomadas de decisões técnicas ou empresariais. No entanto, em clusters computacionais, pode-se gerenciar e distribuir pacotes de dados entre diferentes unidades de processamento, tornando-se possível e viável trabalhar com um grande volume de dados, processando-os de forma paralela e distribuída. Portanto, o presente trabalho se dispõe a construir a infraestrutura de um cluster e estudar seu funcionamento utilizando, para isso, a ferramenta Apache Hadoop para o processamento distribuído de dados.

Download Full-text

Big Data Analytics Tools and Platform in Big Data Landscape

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Handbook of Research on Pattern Engineering System Development for Big Data Analytics ◽

10.4018/978-1-5225-3870-7.ch006 ◽

2018 ◽

pp. 80-89

Author(s):

Mohd Imran ◽

Mohd Vasim Ahamad ◽

Misbahul Haque ◽

Mohd Shoaib

Keyword(s):

Big Data ◽

Comparative Analysis ◽

Comparative Study ◽

Data Analytics ◽

Big Data Analytics ◽

Big Data Analysis ◽

Apache Hadoop ◽

Pros And Cons ◽

The Comparative Study ◽

Analytical Tools

The term big data analytics refers to mining and analyzing of the voluminous amount of data in big data by using various tools and platforms. Some of the popular tools are Apache Hadoop, Apache Spark, HBase, Storm, Grid Gain, HPCC, Casandra, Pig, Hive, and No SQL, etc. These tools are used depending on the parameter taken for big data analysis. So, we need a comparative analysis of such analytical tools to choose best and simpler way of analysis to gain more optimal throughput and efficient mining. This chapter contributes to a comparative study of big data analytics tools based on different aspects such as their functionality, pros, and cons based on characteristics that can be used to determine the best and most efficient among them. Through the comparative study, people are capable of using such tools in a more efficient way.

Download Full-text

Big Data Analytics Tools and Platform in Big Data Landscape

10.4018/978-1-6684-3662-2.ch029 ◽

2022 ◽

pp. 622-631

Author(s):

Mohd Imran ◽

Mohd Vasim Ahamad ◽

Misbahul Haque ◽

Mohd Shoaib

Keyword(s):

Big Data ◽

Comparative Analysis ◽

Comparative Study ◽

Data Analytics ◽

Big Data Analytics ◽

Big Data Analysis ◽

Apache Hadoop ◽

Pros And Cons ◽

The Comparative Study ◽

Analytical Tools

The term big data analytics refers to mining and analyzing of the voluminous amount of data in big data by using various tools and platforms. Some of the popular tools are Apache Hadoop, Apache Spark, HBase, Storm, Grid Gain, HPCC, Casandra, Pig, Hive, and No SQL, etc. These tools are used depending on the parameter taken for big data analysis. So, we need a comparative analysis of such analytical tools to choose best and simpler way of analysis to gain more optimal throughput and efficient mining. This chapter contributes to a comparative study of big data analytics tools based on different aspects such as their functionality, pros, and cons based on characteristics that can be used to determine the best and most efficient among them. Through the comparative study, people are capable of using such tools in a more efficient way.

Download Full-text

Emerging Trends of Big Data in Cloud Computing

Applications of Big Data in Large- and Small-Scale Systems - Advances in Data Mining and Database Management ◽

10.4018/978-1-7998-6673-2.ch003 ◽

2021 ◽

pp. 38-55

Author(s):

Poonam Nandal ◽

Deepa Bura ◽

Meeta Singh

Keyword(s):

Cloud Computing ◽

Big Data ◽

Data Processing ◽

Open Source ◽

Software Framework ◽

Effective Solution ◽

Apache Hadoop ◽

Large Sets ◽

Exponential Increase ◽

Emerging Trends

In today's world where data is accumulating at an ever-increasing rate, processing of this big data was a necessity rather than a need. This required some tools for processing as well as analysis of the data that could be achieved to obtain some meaningful result or outcome out of it. There are many tools available in market which could be used for processing of big data. But the main focus on this chapter is on Apache Hadoop which could be regarded as an open source software based framework which could be efficiently deployed for processing, storing, analyzing, and to produce meaningful insights from large sets of data. It is always said that if exponential increase of data is processing challenge then Hadoop could be considered as one of the effective solution for processing, managing, analyzing, and storing this big data. Hadoop versions and components are also illustrated in the later section of the paper. This chapter majorly focuses on the technique, methodology, components, and methodologies adopted by Apache Hadoop software framework for big data processing.

Download Full-text

Leveraging Big Data Analytics Utilizing Hadoop Framework in Sports Science

Smart Computational Strategies: Theoretical and Practical Aspects ◽

10.1007/978-981-13-6295-8_22 ◽

2019 ◽

pp. 259-272

Author(s):

Gagandeep Jagdev ◽

Sarabjeet Kaur

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Sports Science ◽

Hadoop Framework

Download Full-text