Analyzing and scripting indian election strategies using big data via Apache Hadoop framework

Author(s):  
Gagandeep Jagdev ◽  
Amandeep Kaur
2018 ◽  
Vol 2018 ◽  
pp. 1-9
Author(s):  
Kyong-Ha Lee ◽  
Woo Lam Kang ◽  
Young-Kyoon Suh

Apache Hadoop has been a popular parallel processing tool in the era of big data. While practitioners have rewritten many conventional analysis algorithms to make them customized to Hadoop, the issue of inefficient I/O in Hadoop-based programs has been repeatedly reported in the literature. In this article, we address the problem of the I/O inefficiency in Hadoop-based massive data analysis by introducing our efficient modification of Hadoop. We first incorporate a columnar data layout into the conventional Hadoop framework, without any modification of the Hadoop internals. We also provide Hadoop with indexing capability to save a huge amount of I/O while processing not only selection predicates but also star-join queries that are often used in many analysis tasks.


2020 ◽  
Vol 13 (4) ◽  
pp. 790-797
Author(s):  
Gurjit Singh Bhathal ◽  
Amardeep Singh Dhiman

Background: In current scenario of internet, large amounts of data are generated and processed. Hadoop framework is widely used to store and process big data in a highly distributed manner. It is argued that Hadoop Framework is not mature enough to deal with the current cyberattacks on the data. Objective: The main objective of the proposed work is to provide a complete security approach comprising of authorisation and authentication for the user and the Hadoop cluster nodes and to secure the data at rest as well as in transit. Methods: The proposed algorithm uses Kerberos network authentication protocol for authorisation and authentication and to validate the users and the cluster nodes. The Ciphertext-Policy Attribute- Based Encryption (CP-ABE) is used for data at rest and data in transit. User encrypts the file with their own set of attributes and stores on Hadoop Distributed File System. Only intended users can decrypt that file with matching parameters. Results: The proposed algorithm was implemented with data sets of different sizes. The data was processed with and without encryption. The results show little difference in processing time. The performance was affected in range of 0.8% to 3.1%, which includes impact of other factors also, like system configuration, the number of parallel jobs running and virtual environment. Conclusion: The solutions available for handling the big data security problems faced in Hadoop framework are inefficient or incomplete. A complete security framework is proposed for Hadoop Environment. The solution is experimentally proven to have little effect on the performance of the system for datasets of different sizes.


Displays ◽  
2021 ◽  
Vol 70 ◽  
pp. 102061
Author(s):  
Amartya Hatua ◽  
Badri Narayan Subudhi ◽  
Veerakumar T. ◽  
Ashish Ghosh

2018 ◽  
Vol 11 (04) ◽  
Author(s):  
Rahul Kumar Chawda ◽  
Ghanshyam Thakur
Keyword(s):  
Big Data ◽  

2018 ◽  
Author(s):  
Hemerson Pontes ◽  
Gilvandro De Medeiros ◽  
Joanderson Borges ◽  
Helton Maia
Keyword(s):  
Big Data ◽  

No contexto de Big Data, o grande fluxo e a complexidade dos dados gerados exigem elevado custo computacional para tarefas de processamento e extração de informação, sendo um desafio concluir tais execuções em tempo hábil para tomadas de decisões técnicas ou empresariais. No entanto, em clusters computacionais, pode-se gerenciar e distribuir pacotes de dados entre diferentes unidades de processamento, tornando-se possível e viável trabalhar com um grande volume de dados, processando-os de forma paralela e distribuída. Portanto, o presente trabalho se dispõe a construir a infraestrutura de um cluster e estudar seu funcionamento utilizando, para isso, a ferramenta Apache Hadoop para o processamento distribuído de dados.


Author(s):  
Mohd Imran ◽  
Mohd Vasim Ahamad ◽  
Misbahul Haque ◽  
Mohd Shoaib

The term big data analytics refers to mining and analyzing of the voluminous amount of data in big data by using various tools and platforms. Some of the popular tools are Apache Hadoop, Apache Spark, HBase, Storm, Grid Gain, HPCC, Casandra, Pig, Hive, and No SQL, etc. These tools are used depending on the parameter taken for big data analysis. So, we need a comparative analysis of such analytical tools to choose best and simpler way of analysis to gain more optimal throughput and efficient mining. This chapter contributes to a comparative study of big data analytics tools based on different aspects such as their functionality, pros, and cons based on characteristics that can be used to determine the best and most efficient among them. Through the comparative study, people are capable of using such tools in a more efficient way.


2022 ◽  
pp. 622-631
Author(s):  
Mohd Imran ◽  
Mohd Vasim Ahamad ◽  
Misbahul Haque ◽  
Mohd Shoaib

The term big data analytics refers to mining and analyzing of the voluminous amount of data in big data by using various tools and platforms. Some of the popular tools are Apache Hadoop, Apache Spark, HBase, Storm, Grid Gain, HPCC, Casandra, Pig, Hive, and No SQL, etc. These tools are used depending on the parameter taken for big data analysis. So, we need a comparative analysis of such analytical tools to choose best and simpler way of analysis to gain more optimal throughput and efficient mining. This chapter contributes to a comparative study of big data analytics tools based on different aspects such as their functionality, pros, and cons based on characteristics that can be used to determine the best and most efficient among them. Through the comparative study, people are capable of using such tools in a more efficient way.


Author(s):  
Poonam Nandal ◽  
Deepa Bura ◽  
Meeta Singh

In today's world where data is accumulating at an ever-increasing rate, processing of this big data was a necessity rather than a need. This required some tools for processing as well as analysis of the data that could be achieved to obtain some meaningful result or outcome out of it. There are many tools available in market which could be used for processing of big data. But the main focus on this chapter is on Apache Hadoop which could be regarded as an open source software based framework which could be efficiently deployed for processing, storing, analyzing, and to produce meaningful insights from large sets of data. It is always said that if exponential increase of data is processing challenge then Hadoop could be considered as one of the effective solution for processing, managing, analyzing, and storing this big data. Hadoop versions and components are also illustrated in the later section of the paper. This chapter majorly focuses on the technique, methodology, components, and methodologies adopted by Apache Hadoop software framework for big data processing.


Sign in / Sign up

Export Citation Format

Share Document