Hap: Protecting the Apache Hadoop Clusters with Hadoop Authentication Process Using Kerberos

The corporate world is intensively thinking of how to process vast datasets efficiently and securely. Apache Hadoop, an open source framework serves this requirement. Most of the current Hadoop technologies use Kerberos Authentication to incorporate security aspect to Hadoop, which suffers from numerous security and performance issues. The reliance on authentication credentials, a single point of failure as well as a single point of vulnerability, the insider threat and time synchronization problem adds to the list. A comprehensive review of various authentication issues in Kerberos-enabled Hadoop Clusters is provided in this paper. This paper proposes an authentication framework for Hadoop that uses an enhanced One-time Password (OTP) that can solve all the identified problems. The simulation results in Riverbed Modeler proves that the proposed model performs as good as traditional Kerberos Authentication Mechanism. A comparative analysis with existing mechanisms is also presented to strengthen the claims of proposed method

Download Full-text

Text Mining with Apache Hadoop Over different Hadoop Clusters Architectures

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1866.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 1252-1256

Keyword(s):

Big Data ◽

Text Mining ◽

Real Time ◽

Single Node ◽

Text Documents ◽

Apache Hadoop ◽

Data Platform ◽

Hadoop Distributed File System ◽

Hadoop Platform ◽

Hadoop Clusters

Big data is very much practical for real time applicational systems. One of the mostly used real time application worldwide are on unstructured documents. Large number of documents are managed and maintained through popular leadingBig Data platform is Hadoop. It maintains all the information at Hadoop Distributed File System in Blocks. Irrespective of datasize, BigData has opened its path to store and analyze the data which has consumed time. To overcome this, Hadoophas designed cluster process for large volumes of unstructured data computations. Three different cluster architectures like Standalone, Single node cluster and multi node clusters are considered. In this paper, Big Data allows Hadoop platform to boost the processing speed overlarge datasets through cluster architectures, which are studied and analyzed through text documents from newsgroup20 dataset.It identifies the challenges on text mining and its applications using ApacheHadoop

Download Full-text

Apache Hadoop A Guide for Cluster Configuration and Testing

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i4.792796 ◽

2019 ◽

Vol 7 (4) ◽

pp. 792-796

Author(s):

Ankit Shah ◽

Mamta Padole

Keyword(s):

Apache Hadoop ◽

Cluster Configuration

Download Full-text

HISTORICAL ANALYSIS OF MESSAGE CONTENTS TO RECOMMEND ISSUES TO OPEN SOURCE SOFTWARE CONTRIBUTORS

Revista Eletrônica de Sistemas de Informação ◽

10.21529/resi.2014.1302005 ◽

2014 ◽

Vol 13 (2) ◽

Cited By ~ 1

Author(s):

Igor Fabio Steinmacher ◽

Igor S Wiese ◽

Andre Luis Schwerz ◽

Rafael Liberato Roberto ◽

João Eduardo Ferreira ◽

...

Keyword(s):

Open Source ◽

Open Source Software ◽

Naive Bayes ◽

Historical Analysis ◽

Naïve Bayes ◽

Apache Hadoop

Os desenvolvedores de projetos de software livre distribuídos utilizam ferramentas de acompanhamento de pendências para coordenar o seu trabalho. Essas ferramentas armazenam informações importantes, mantendo registro de decisões importantes e soluções para bugs. Decidir sobre que pendências são as mais adequadas para se contribuir pode ser difícil, uma vez que a elevada quantidade de dados aumenta a pressão sobre os desenvolvedores. Este artigo mostra a importância do conteúdo das discussões que ocorrem por meio da ferramenta de acompanhamento de pendências em um projeto de software livre para a construção de um classificador para predizer a participação de um colaborador na solução de um problema. Para projetar este modelo de predição, utilizamos dois algoritmos de aprendizagem de máquina: Naïve Bayes e J48. Utilizamos dados do projeto Apache Hadoop Commons para avaliar o uso dos algoritmos. Aplicando algoritmos de aprendizado de máquina aos dez desenvolvedores mais ativos no projeto, obtivemos uma média de recall de 66,82% para Naïve Bayes e 53,02% usando J48. Obtivemos 64,31% de precisão e 90,27% de acurácia usando o J48. Também realizamos um estudo exploratório com cinco desenvolvedores que participaram na solução de um volume menor de problemas , obtendo 77,41% de precisão, 48% de recall, e 98,84% de acurácia usando o algoritmo J48. Os resultados indicam que o conteúdo dos comentários em pendências/ problemas em projetos de software livre representam um fator relevante com base no qual recomendar pendências aos desenvolvedores que colaboram com o projeto.

Download Full-text

Public Opinion Knowledge (POK) platform based on apache hadoop: To get public opinion from French content published on the Web/CSM

2016 2nd International Conference on Cloud Computing Technologies and Applications (CloudTech) ◽

10.1109/cloudtech.2016.7847689 ◽

2016 ◽

Author(s):

Abdelkader Rhouati ◽

El Hassane Ettifouri ◽

Mohammed Ghaouth Belkasmi ◽

Toumi Bouchentouf

Keyword(s):

Public Opinion ◽

Apache Hadoop ◽

The Web

Download Full-text

Conversion and Display of the Calculated Data of Spectral Remote Sensing Data on the Basis of GeoServer Extensions and Distributed Storage Technologies

PROGRAMMNAYA INGENERIA ◽

10.17587/prin.12.107-112 ◽

2021 ◽

Vol 12 (2) ◽

pp. 107-112

Author(s):

I. E. Kharlampenkov ◽

◽

A. U. Oshchepkov ◽

Keyword(s):

Remote Sensing ◽

Information Technologies ◽

File System ◽

Distributed Storage ◽

Calculated Data ◽

Remote Sensing Data ◽

Distributed Computing Systems ◽

Apache Hadoop ◽

Distributed Information ◽

Sensing Data

The article presents methods for caching and displaying data from spectral satellite images using libraries of distributed computing systems that are part of the Apache Hadoop ecosystem, and GeoServer extensions. The authors gave a brief overview of existing tools that provide the ability to present remote sensing data using distributed information technologies. A distinctive feature is the way to convert remote sensing data inside Apache Parquet files for further display. This approach allows you to interact with the distributed file system via the Kite SDK libraries and switch on additional data processors based on Apache Hadoop technology as external services. A comparative analysis of existing tools, such as: GeoMesa, GeoWawe, etc is performed. The following steps are described: extracting data from Apache Parquet via the Kite SDK, converting this data to GDAL Dataset, iterating the received data, and saving it inside the file system in BIL format. In this article, the BIL format is used for the GeoServer cache. The extension was implemented and published under the Apache License on the GitHub resource. In conclusion, you will find instructions for installing and using the created extension.

Download Full-text

Performance evaluation of cloud-based log file analysis with Apache Hadoop and Apache Spark

Journal of Systems and Software ◽

10.1016/j.jss.2016.11.037 ◽

2017 ◽

Vol 125 ◽

pp. 133-151 ◽

Cited By ~ 51

Author(s):

Ilias Mavridis ◽

Helen Karatza

Keyword(s):

Performance Evaluation ◽

Apache Spark ◽

Apache Hadoop ◽

Log File Analysis ◽

Log File

Download Full-text

Key usage patterns for apache Hadoop in the enterprise

2013 IEEE International Conference on Big Data ◽

10.1109/bigdata.2013.6691544 ◽

2013 ◽

Author(s):

Amr Awadallah

Keyword(s):

Apache Hadoop ◽

Usage Patterns

Download Full-text

Efficient Transfer of data from RDBMS to HDFS and conversion to JSON format

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.38710 ◽

2021 ◽

Vol 9 (10) ◽

pp. 1869-1871

Author(s):

Dr. C. K. Gomathy

Keyword(s):

Data Warehouse ◽

Relational Databases ◽

Command Line ◽

Command Line Interface ◽

Apache Hadoop ◽

Enterprise Data Warehouse ◽

Efficient Transfer ◽

Efficient Execution

Abstract: Apache Sqoop is mainly used to efficiently transfer large volumes of data between Apache Hadoop and relational databases. It helps to certain tasks, such as ETL (Extract transform load) processing, from an enterprise data warehouse to Hadoop, for efficient execution at a much less cost. Here first we import the table which presents in MYSQL Database with the help of command-line interface application called Sqoop and there is a chance of addition of new rows and updating new rows then we have to execute the query again. So, with the help of our project there is no need of executing queries again for that we are using Sqoop job, which consists of total commands for import and next after import we retrieve the data from hive using Java JDBC and we convert the data to JSON Format, which consists of data in an organized way and easy to access manner by using GSON Library. Keywords: Sqoop, Json, Gson, Maven and JDBC

Download Full-text

Hap: Protecting the Apache Hadoop Clusters with Hadoop Authentication Process Using Kerberos

An approach for fast and parallel video processing on Apache Hadoop clusters

Authentication Framework for Kerberos Enabled Hadoop Clusters

Text Mining with Apache Hadoop Over different Hadoop Clusters Architectures

Apache Hadoop A Guide for Cluster Configuration and Testing

HISTORICAL ANALYSIS OF MESSAGE CONTENTS TO RECOMMEND ISSUES TO OPEN SOURCE SOFTWARE CONTRIBUTORS

Public Opinion Knowledge (POK) platform based on apache hadoop: To get public opinion from French content published on the Web/CSM

Conversion and Display of the Calculated Data of Spectral Remote Sensing Data on the Basis of GeoServer Extensions and Distributed Storage Technologies

Performance evaluation of cloud-based log file analysis with Apache Hadoop and Apache Spark

Key usage patterns for apache Hadoop in the enterprise

Efficient Transfer of data from RDBMS to HDFS and conversion to JSON format

Export Citation Format