hadoop distributed file system Latest Research Papers

Anomaly Detection using system logs

International Journal of Information Security and Privacy ◽

10.4018/ijisp.285584 ◽

2022 ◽

Vol 16 (1) ◽

pp. 0-0

Keyword(s):

Neural Network ◽

Anomaly Detection ◽

Abnormal Behavior ◽

Feed Forward Neural Network ◽

One Dimensional ◽

Log Files ◽

System Logs ◽

Hadoop Distributed File System ◽

Windowing Technique ◽

Improved Accuracy

Anomaly detection is a very important step in building a secure and trustworthy system. Manually it is daunting to analyze and detect failures and anomalies. In this paper, we proposed an approach that leverages the pattern matching capabilities of Convolution Neural Network (CNN) for anomaly detection in system logs. Features from log files are extracted using a windowing technique. Based on this feature, a one-dimensional image (1×n dimension) is generated where the pixel values of an image correlate with the features of the logs. On these images, the 1D Convolution operation is applied followed by max pooling. Followed by Convolution layers, a multi-layer feed-forward neural network is used as a classifier that learns to classify the logs as normal or abnormal from the representation created by the convolution layers. The model learns the variation in log pattern for normal and abnormal behavior. The proposed approach achieved improved accuracy compared to existing approaches for anomaly detection in Hadoop Distributed File System (HDFS) logs.

Optimal Common Job Block Table (CJBT) to improve the Performance in Hadoop framework

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit217689 ◽

2021 ◽

pp. 346-350

Author(s):

Pinjari Vali Basha

Keyword(s):

Optimal Algorithm ◽

Structured Data ◽

Map Reduce ◽

Huge Amount ◽

Resource Saving ◽

Hadoop Distributed File System ◽

Two Phases ◽

The Cost ◽

Rapid Transformation ◽

Hadoop Framework

By rapid transformation of technology, huge amount of data (structured data and Un Structured data) is generated every day. With the aid of 5G technology and IoT the data generated and processed every day is very large. If we dig deeper the data generated approximately 2.5 quintillion bytes. This data (Big Data) is stored and processed with the help of Hadoop framework. Hadoop framework has two phases for storing and retrieve the data in the network. <ul> <li>Hadoop Distributed file System (HDFS)</li> <li>Map Reduce algorithm</li> </ul> In the native Hadoop framework, there are some limitations for Map Reduce algorithm. If the same job is repeated again then we have to wait for the results to carry out all the steps in the native Hadoop. This led to wastage of time, resources. If we improve the capabilities of Name node i.e., maintain Common Job Block Table (CJBT) at Name node will improve the performance. By employing Common Job Block Table will improve the performance by compromising the cost to maintain Common Job Block Table. Common Job Block Table contains the meta data of files which are repeated again. This will avoid re computations, a smaller number of computations, resource saving and faster processing. The size of Common Job Block Table will keep on increasing, there should be some limit on the size of the table by employing algorithm to keep track of the jobs. The optimal Common Job Block table is derived by employing optimal algorithm at Name node.

NFC-Blockchain Based COVID-19 Immunity Certificate: Proposed System and Emerging Issues

Information Technology and Management Science ◽

10.7250/itms-2021-0004 ◽

2021 ◽

Vol 24 ◽

pp. 26-32

Author(s):

Fredrick Ishengoma

Keyword(s):

Near Field ◽

Distributed File System ◽

Advanced Encryption Standard ◽

Near Field Communication ◽

Future Directions ◽

Aes Algorithm ◽

Hadoop Distributed File System ◽

Emerging Issues ◽

Dark Web ◽

Health Experts

Vaccine requirements are becoming more mandatory in several countries as public health experts and governments become more concerned about the COVID-19 pandemic and its variants. In the meantime, as the number of vaccine requirements grows, so does the counterfeiting of vaccination documents. Fake vaccination certificates are steadily growing, being sold online and on the dark web. Due to the nature of the COVID-19 pandemic, there is a need of robust authentication mechanisms that support touch-less technologies like Near Field Communication (NFC). Thus, in this paper, a blockchain-NFC based COVID-19 Digital Immunity Certificate (DIC) system is proposed. The vaccination data are first encrypted by the Advanced Encryption Standard (AES) algorithm on Hadoop Distributed File System (HDFS) and then uploaded to the blockchain. The proposed system is based on the amalgamation of NCF and blockchain technologies which can mitigate the issue of fake vaccination certificates. Furthermore, the emerging issues of employing the proposed system are discussed with future directions.

Using HDFS to Load, Search, and Retrieve Data from Local Data Nodes

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.38877 ◽

2021 ◽

Vol 9 (11) ◽

pp. 656-659

Author(s):

Shubh Goyal

Keyword(s):

File System ◽

Distributed File System ◽

Distributed Environment ◽

Local Data ◽

Major Purpose ◽

Data Redundancy ◽

Hadoop Distributed File System

Abstract: By utilizing the Hadoop environment, data may be loaded and searched from local data nodes. Because the dataset's capacity may be vast, loading and finding data using a query is often more difficult. We suggest a method for dealing with data in local nodes that does not overlap with data acquired by script. The query's major purpose is to store information in a distributed environment and look for it quickly. In this section, we define the script to eliminate duplicate data redundancy when searching and loading data in a dynamic manner. In addition, the Hadoop file system is available in a distributed environment. Keywords: HDFS; Hadoop distributed file system; replica; local; distributed; capacity; SQL; redundancy

Evaluation of Various DR Techniques in Massive Patient Datasets using HDFS

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d6508.1110421 ◽

2021 ◽

Vol 10 (4) ◽

pp. 1-6

Author(s):

Dr. K. B. V. Brahma Rao ◽

◽

Dr. R Krishnam Raju Indukuri ◽

Dr. Suresh Varma Penumatsa ◽

Dr. M. V. Rama Sundari ◽

...

Keyword(s):

Computation Time ◽

Principal Component ◽

Component Analysis ◽

Distributed File System ◽

Storage Space ◽

Huge Number ◽

Massive Datasets ◽

Linear Discriminant ◽

Hadoop Distributed File System ◽

Group Attributes

The objective of comparing various dimensionality techniques is to reduce feature sets in order to group attributes effectively with less computational processing time and utilization of memory. The various reduction algorithms can decrease the dimensionality of dataset consisting of a huge number of interrelated variables, while retaining the dissimilarity present in the dataset as much as possible. In this paper we use, Standard Deviation, Variance, Principal Component Analysis, Linear Discriminant Analysis, Factor Analysis, Positive Region, Information Entropy and Independent Component Analysis reduction algorithms using Hadoop Distributed File System for massive patient datasets to achieve lossless data reduction and to acquire required knowledge. The experimental results demonstrate that the ICA technique can efficiently operate on massive datasets eliminates irrelevant data without loss of accuracy, reduces storage space for the data and also the computation time compared to other techniques.

BlockHDFS: Blockchain-integrated Hadoop distributed file system for secure provenance traceability

Blockchain: Research and Applications ◽

10.1016/j.bcra.2021.100032 ◽

2021 ◽

pp. 100032

Author(s):

Viraaji Mothukuri ◽

Sai S. Cheerla ◽

Reza M. Parizi ◽

Qi Zhang ◽

Kim-Kwang Raymond Choo

Keyword(s):

File System ◽

Distributed File System ◽

Hadoop Distributed File System ◽

Secure Provenance

Performance Study on Indexing and Accessing of Small File in Hadoop Distributed File System

Journal of Information & Knowledge Management ◽

10.1142/s0219649221500519 ◽

2021 ◽

pp. 2150051

Author(s):

Anisha P Rodrigues ◽

Roshan Fernandes ◽

P. Vijaya ◽

Satish Chander

Keyword(s):

Processing Time ◽

Performance Metrics ◽

File System ◽

Distributed File System ◽

Access Time ◽

Memory Usage ◽

File Access ◽

Sequence File ◽

Hadoop Distributed File System ◽

Small File

Hadoop Distributed File System (HDFS) is developed to efficiently store and handle the vast quantity of files in a distributed environment over a cluster of computers. Various commodity hardware forms the Hadoop cluster, which is inexpensive and easily available. The large number of small files stored in HDFS consumed more memory which lags the performance because small files consumed heavy load on NameNode. Thus, the efficiency of indexing and accessing the small files on HDFS is improved by several techniques, such as archive files, New Hadoop Archive (New HAR), CombineFileInputFormat (CFIF), and Sequence file generation. The archive file combines the small files into single blocks. The new HAR file combines the smaller files into a single large file. The CFIF module merges the multiple files into a single split using NameNode, and the sequence file combines all the small files into a single sequence. The indexing and accessing of a small file in HDFS are evaluated using performance metrics, such as processing time and memory usage. The experiment shows that the sequence file generation approach is efficient when compared to other approaches concerning file access time is 1.5[Formula: see text]s, memory usage is 20 KB in multi-node, and the processing time is 0.1[Formula: see text]s.

A Comprehensive Survey for Hadoop Distributed File System

Asian Journal of Research in Computer Science ◽

10.9734/ajrcos/2021/v11i230260 ◽

2021 ◽

pp. 46-57

Author(s):

Karwan Jameel Merceedi ◽

Nareen Abdulla Sabry

Keyword(s):

Distributed Systems ◽

Data Storage ◽

File System ◽

Low Cost ◽

File Systems ◽

Cost Effective ◽

Distributed File System ◽

Software Frameworks ◽

Hadoop Distributed File System ◽

Basic Ideas

In the last few days, data and the internet have become increasingly growing, occurring in big data. For these problems, there are many software frameworks used to increase the performance of the distributed system. This software is used for available ample data storage. One of the most beneficial software frameworks used to utilize data in distributed systems is Hadoop. This software creates machine clustering and formatting the work between them. Hadoop consists of two major components: Hadoop Distributed File System (HDFS) and Map Reduce (MR). By Hadoop, we can process, count, and distribute each word in a large file and know the number of affecting for each of them. The HDFS is designed to effectively store and transmit colossal data sets to high-bandwidth user applications. The differences between this and other file systems provided are relevant. HDFS is intended for low-cost hardware and is exceptionally tolerant to defects. Thousands of computers in a vast cluster both have directly associated storage functions and user programmers. The resource scales with demand while being cost-effective in all sizes by distributing storage and calculation through numerous servers. Depending on the above characteristics of the HDFS, many researchers worked in this field trying to enhance the performance and efficiency of the addressed file system to be one of the most active cloud systems. This paper offers an adequate study to review the essential investigations as a trend beneficial for researchers wishing to operate in such a system. The basic ideas and features of the investigated experiments were taken into account to have a robust comparison, which simplifies the selection for future researchers in this subject. According to many authors, this paper will explain what Hadoop is and its architectures, how it works, and its performance analysis in a distributed systems. In addition, assessing each Writing and compare with each other.

Repair Pipelining for Erasure-coded Storage: Algorithms and Evaluation

ACM Transactions on Storage ◽

10.1145/3436890 ◽

2021 ◽

Vol 17 (2) ◽

pp. 1-29

Author(s):

Xiaolu Li ◽

Zuoru Yang ◽

Jinhong Li ◽

Runhui Li ◽

Patrick P. C. Lee ◽

...

Keyword(s):

File System ◽

Repair Time ◽

Distributed File System ◽

Heterogeneous Environments ◽

Hadoop Distributed File System ◽

Single Block ◽

Amazon Ec2 ◽

Repair Techniques

We propose repair pipelining , a technique that speeds up the repair performance in general erasure-coded storage. By carefully scheduling the repair of failed data in small-size units across storage nodes in a pipelined manner, repair pipelining reduces the single-block repair time to approximately the same as the normal read time for a single block in homogeneous environments. We further design different extensions of repair pipelining algorithms for heterogeneous environments and multi-block repair operations. We implement a repair pipelining prototype, called ECPipe , and integrate it as a middleware system into two versions of Hadoop Distributed File System (HDFS) (namely, HDFS-RAID and HDFS-3) as well as Quantcast File System. Experiments on a local testbed and Amazon EC2 show that repair pipelining significantly improves the performance of degraded reads and full-node recovery over existing repair techniques.

Analysis and Experimental Study of HDFS Performance

TEM Journal ◽

10.18421/tem102-38 ◽

2021 ◽

pp. 806-814

Author(s):

Yordan Kalmukov ◽

Milko Marinov ◽

Tsvetelina Mladenova ◽

Irena Valova

Keyword(s):

Experimental Study ◽

Big Data ◽

Computer System ◽

Data Storage ◽

Fault Tolerant ◽

Processing System ◽

Daily Basis ◽

Distributed File System ◽

Hadoop Distributed File System ◽

Big Data Storage

In the age of big data, the amount of data that people generate and use on a daily basis has far exceeded the storage and processing capabilities of a single computer system. That motivates the use of distributed big data storage and processing system such as Hadoop. It provides a reliable, horizontallyscalable, fault-tolerant and efficient service, based on the Hadoop Distributed File System (HDFS) and MapReduce. The purpose of this research is to experimentally determine whether (and to what extent) the network communication speed, the file replication factor, the files’ sizes and their number, and the location of the HDFS client influence the performance of the HDFS read/write operations.

hadoop distributed file system
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Anomaly Detection using system logs

Optimal Common Job Block Table (CJBT) to improve the Performance in Hadoop framework

NFC-Blockchain Based COVID-19 Immunity Certificate: Proposed System and Emerging Issues

Using HDFS to Load, Search, and Retrieve Data from Local Data Nodes

Evaluation of Various DR Techniques in Massive Patient Datasets using HDFS

BlockHDFS: Blockchain-integrated Hadoop distributed file system for secure provenance traceability

Performance Study on Indexing and Accessing of Small File in Hadoop Distributed File System

A Comprehensive Survey for Hadoop Distributed File System

Repair Pipelining for Erasure-coded Storage: Algorithms and Evaluation

Analysis and Experimental Study of HDFS Performance

Export Citation Format

hadoop distributed file systemRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Anomaly Detection using system logs

Optimal Common Job Block Table (CJBT) to improve the Performance in Hadoop framework

NFC-Blockchain Based COVID-19 Immunity Certificate: Proposed System and Emerging Issues

Using HDFS to Load, Search, and Retrieve Data from Local Data Nodes

Evaluation of Various DR Techniques in Massive Patient Datasets using HDFS

BlockHDFS: Blockchain-integrated Hadoop distributed file system for secure provenance traceability

Performance Study on Indexing and Accessing of Small File in Hadoop Distributed File System

A Comprehensive Survey for Hadoop Distributed File System

Repair Pipelining for Erasure-coded Storage: Algorithms and Evaluation

Analysis and Experimental Study of HDFS Performance

hadoop distributed file system
Recently Published Documents