Hap: Protecting the Apache Hadoop Clusters with Hadoop Authentication Process Using Kerberos

Author(s):  
V Valliyappan ◽  
Parminder Singh

The corporate world is intensively thinking of how to process vast datasets efficiently and securely. Apache Hadoop, an open source framework serves this requirement. Most of the current Hadoop technologies use Kerberos Authentication to incorporate security aspect to Hadoop, which suffers from numerous security and performance issues. The reliance on authentication credentials, a single point of failure as well as a single point of vulnerability, the insider threat and time synchronization problem adds to the list. A comprehensive review of various authentication issues in Kerberos-enabled Hadoop Clusters is provided in this paper. This paper proposes an authentication framework for Hadoop that uses an enhanced One-time Password (OTP) that can solve all the identified problems. The simulation results in Riverbed Modeler proves that the proposed model performs as good as traditional Kerberos Authentication Mechanism. A comparative analysis with existing mechanisms is also presented to strengthen the claims of proposed method


2019 ◽  
Vol 8 (2) ◽  
pp. 1252-1256

Big data is very much practical for real time applicational systems. One of the mostly used real time application worldwide are on unstructured documents. Large number of documents are managed and maintained through popular leadingBig Data platform is Hadoop. It maintains all the information at Hadoop Distributed File System in Blocks. Irrespective of datasize, BigData has opened its path to store and analyze the data which has consumed time. To overcome this, Hadoophas designed cluster process for large volumes of unstructured data computations. Three different cluster architectures like Standalone, Single node cluster and multi node clusters are considered. In this paper, Big Data allows Hadoop platform to boost the processing speed overlarge datasets through cluster architectures, which are studied and analyzed through text documents from newsgroup20 dataset.It identifies the challenges on text mining and its applications using ApacheHadoop


Author(s):  
Igor Fabio Steinmacher ◽  
Igor S Wiese ◽  
Andre Luis Schwerz ◽  
Rafael Liberato Roberto ◽  
João Eduardo Ferreira ◽  
...  

Os desenvolvedores de projetos de software livre distribuídos utilizam ferramentas de acompanhamento de pendências para coordenar o seu trabalho. Essas ferramentas armazenam informações importantes, mantendo registro de decisões importantes e soluções para bugs. Decidir sobre que pendências são as mais adequadas para se contribuir pode ser difícil, uma vez que a elevada quantidade de dados aumenta a pressão sobre os desenvolvedores. Este artigo mostra a importância do conteúdo das discussões que ocorrem por meio da ferramenta de acompanhamento de pendências em um projeto de software livre para a construção de um classificador para predizer a participação de um colaborador na solução de um problema. Para projetar este modelo de predição, utilizamos dois algoritmos de aprendizagem de máquina: Naïve Bayes e J48. Utilizamos dados do projeto Apache Hadoop Commons para avaliar o uso dos algoritmos. Aplicando algoritmos de aprendizado de máquina aos dez desenvolvedores mais ativos no projeto, obtivemos uma média de recall de 66,82% para Naïve Bayes e 53,02% usando J48. Obtivemos 64,31% de precisão e 90,27% de acurácia usando o J48. Também realizamos um estudo exploratório com cinco desenvolvedores que participaram na solução de um volume menor de problemas , obtendo 77,41% de precisão, 48% de recall, e 98,84% de acurácia usando o algoritmo J48. Os resultados indicam que o conteúdo dos comentários em pendências/ problemas em projetos de software livre representam um fator relevante com base no qual recomendar pendências aos desenvolvedores que colaboram com o projeto.


2021 ◽  
Vol 12 (2) ◽  
pp. 107-112
Author(s):  
I. E. Kharlampenkov ◽  
◽  
A. U. Oshchepkov ◽  

The article presents methods for caching and displaying data from spectral satellite images using libraries of distributed computing systems that are part of the Apache Hadoop ecosystem, and GeoServer extensions. The authors gave a brief overview of existing tools that provide the ability to present remote sensing data using distributed information technologies. A distinctive feature is the way to convert remote sensing data inside Apache Parquet files for further display. This approach allows you to interact with the distributed file system via the Kite SDK libraries and switch on additional data processors based on Apache Hadoop technology as external services. A comparative analysis of existing tools, such as: GeoMesa, GeoWawe, etc is performed. The following steps are described: extracting data from Apache Parquet via the Kite SDK, converting this data to GDAL Dataset, iterating the received data, and saving it inside the file system in BIL format. In this article, the BIL format is used for the GeoServer cache. The extension was implemented and published under the Apache License on the GitHub resource. In conclusion, you will find instructions for installing and using the created extension.


Author(s):  
Dr. C. K. Gomathy

Abstract: Apache Sqoop is mainly used to efficiently transfer large volumes of data between Apache Hadoop and relational databases. It helps to certain tasks, such as ETL (Extract transform load) processing, from an enterprise data warehouse to Hadoop, for efficient execution at a much less cost. Here first we import the table which presents in MYSQL Database with the help of command-line interface application called Sqoop and there is a chance of addition of new rows and updating new rows then we have to execute the query again. So, with the help of our project there is no need of executing queries again for that we are using Sqoop job, which consists of total commands for import and next after import we retrieve the data from hive using Java JDBC and we convert the data to JSON Format, which consists of data in an organized way and easy to access manner by using GSON Library. Keywords: Sqoop, Json, Gson, Maven and JDBC


Sign in / Sign up

Export Citation Format

Share Document