A Novel Approach of Fair Scheduling to Enhance Performance of Hadoop Distributed File System

<div>Complex software intensive systems, especially distributed systems, generate logs for troubleshooting. The logs are text messages recording system events, which can help engineers determine the system's runtime status. This paper proposes a novel approach named ADR (stands for Anomaly Detection by workflow Relations) that employs matrix nullspace to mine numerical relations from log data. The mined relations can be used for both offline and online anomaly detection and facilitate fault diagnosis. We have evaluated ADR on log data collected from two distributed systems, HDFS (Hadoop Distributed File System) and BGL (IBM Blue Gene/L supercomputers system). ADR successfully mined 87 and 669 numerical relations from the logs and used them to detect anomalies with high precision and recall. For online anomaly detection, ADR employs PSO (Particle Swarm Optimization) to find the optimal sliding windows' size and achieves fast anomaly detection.</div><div>The experimental results confirm that ADR is effective for both offline and online anomaly detection. </div>

Download Full-text

Improving downloading performance in hadoop distributed file system

Journal of Computer Applications ◽

10.3724/sp.j.1087.2010.02060 ◽

2010 ◽

Vol 30 (8) ◽

pp. 2060-2065 ◽

Cited By ~ 4

Author(s):

Ning CAO ◽

Zhong-hai WU ◽

Hong-zhi LIU ◽

Qi-xun ZHANG

Keyword(s):

File System ◽

Distributed File System ◽

Hadoop Distributed File System

Download Full-text

A Technique For Big Statistics Security Based on Hadoop Distributed File System

SSRN Electronic Journal ◽

10.2139/ssrn.3508526 ◽

2019 ◽

Author(s):

Sindhu D M ◽

DR.Ravikumar G.K ◽

Manu Y.M

Keyword(s):

File System ◽

Distributed File System ◽

Hadoop Distributed File System

Download Full-text

Data protection on hadoop distributed file system by using encryption algorithms: a systematic literature review

Journal of Physics Conference Series ◽

10.1088/1742-6596/1444/1/012012 ◽

2020 ◽

Vol 1444 ◽

pp. 012012

Author(s):

Meisuchi Naisuty ◽

Achmad Nizar Hidayanto ◽

Nabila Clydea Harahap ◽

Ahmad Rosyiq ◽

Agus Suhanto ◽

...

Keyword(s):

Literature Review ◽

Systematic Literature Review ◽

Data Protection ◽

File System ◽

Distributed File System ◽

Hadoop Distributed File System ◽

Encryption Algorithms

Download Full-text

A Study on Security Approaches for Big Data Hadoop Distributed File System

Journal of Engineering and Applied Sciences ◽

10.36478/jeasci.2019.8266.8272 ◽

2019 ◽

Vol 14 (22) ◽

pp. 8266-8272

Author(s):

Leelavathi . ◽

M. Elshayeb

Keyword(s):

Big Data ◽

File System ◽

Distributed File System ◽

Hadoop Distributed File System

Download Full-text

The Evolution of the Hadoop Distributed File System

2018 32nd International Conference on Advanced Information Networking and Applications Workshops (WAINA) ◽

10.1109/waina.2018.00065 ◽

2018 ◽

Cited By ~ 1

Author(s):

Stathis Maneas ◽

Bianca Schroeder

Keyword(s):

File System ◽

Distributed File System ◽

Hadoop Distributed File System

Download Full-text

Applying the K-Means Algorithm in Big Raw Data Sets with Hadoop and MapReduce

Business Intelligence ◽

10.4018/978-1-4666-9562-7.ch062 ◽

2016 ◽

pp. 1220-1243

Author(s):

Ilias K. Savvas ◽

Georgia N. Sofianidou ◽

M-Tahar Kechadi

Keyword(s):

Big Data ◽

Clustering Algorithm ◽

File System ◽

Large Data ◽

Large Data Sets ◽

Distributed File System ◽

Data Sets ◽

Raw Data ◽

Hadoop Distributed File System ◽

Access To Data

Big data refers to data sets whose size is beyond the capabilities of most current hardware and software technologies. The Apache Hadoop software library is a framework for distributed processing of large data sets, while HDFS is a distributed file system that provides high-throughput access to data-driven applications, and MapReduce is software framework for distributed computing of large data sets. Huge collections of raw data require fast and accurate mining processes in order to extract useful knowledge. One of the most popular techniques of data mining is the K-means clustering algorithm. In this study, the authors develop a distributed version of the K-means algorithm using the MapReduce framework on the Hadoop Distributed File System. The theoretical and experimental results of the technique prove its efficiency; thus, HDFS and MapReduce can apply to big data with very promising results.

Download Full-text