Using Lustre and Slurm to process Hadoop workloads and extending to the WLCG

The Queen Mary University of London Grid site has investigated the use of its Lustre file system to support Hadoop work flows. Lustre is an open source, POSIX compatible, clustered file system often used in high performance computing clusters and is often paired with the Slurm batch system. Hadoop is an open-source software framework for distributed storage and processing of data normally run on dedicated hardware utilising the HDFS file system and Yarn batch system. Hadoop is an important modern tool for data analytics used by a large range of organisation including CERN. By using our existing Lustre file system and Slurm batch system, the need to have dedicated hardware is removed and a single platform only has to be maintained for data storage and processing. The motivation and benefits of using Hadoop with Lustre and Slurm are presented. The installation, benchmarks, limitations and future plans are discussed. We also investigate using the standard WLCG Grid middleware Cream-CE service to provide a Grid enabled Hadoop service.

Download Full-text

On the benefits of a workflow-aware file system in high-performance computing systems

Eighth International Conference on High-Performance Computing in Asia-Pacific Region (HPCASIA'05) ◽

10.1109/hpcasia.2005.58 ◽

2005 ◽

Author(s):

Yang Wang ◽

P. Lu

Keyword(s):

High Performance Computing ◽

High Performance ◽

File System ◽

Computing Systems ◽

Performance Computing

Download Full-text

A Review: Map Reduce Framework for Cloud Computing

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.6.20224 ◽

2018 ◽

Vol 7 (4.6) ◽

pp. 13

Author(s):

Mekala Sandhya ◽

Ashish Ladda ◽

Dr. Uma N Dulhare ◽

. . ◽

. .

Keyword(s):

Data Mining ◽

Cloud Computing ◽

Distributed Computing ◽

Data Storage ◽

High Performance ◽

Large Scale ◽

Distributed Storage ◽

Large Data ◽

Mass Data ◽

Internet Information

In this generation of Internet, information and data are growing continuously. Even though various Internet services and applications. The amount of information is increasing rapidly. Hundred billions even trillions of web indexes exist. Such large data brings people a mass of information and more difficulty discovering useful knowledge in these huge amounts of data at the same time. Cloud computing can provide infrastructure for large data. Cloud computing has two significant characteristics of distributed computing i.e. scalability, high availability. The scalability can seamlessly extend to large-scale clusters. Availability says that cloud computing can bear node errors. Node failures will not affect the program to run correctly. Cloud computing with data mining does significant data processing through high-performance machine. Mass data storage and distributed computing provide a new method for mass data mining and become an effective solution to the distributed storage and efficient computing in data mining.

Download Full-text

Openlava: An open source scheduler for high performance computing

2016 International Conference on Research Advances in Integrated Navigation Systems (RAINS) ◽

10.1109/rains.2016.7764375 ◽

2016 ◽

Author(s):

Pranav Joshi ◽

Muda Rajesh Babu

Keyword(s):

Open Source ◽

High Performance Computing ◽

High Performance ◽

Performance Computing

Download Full-text

Modernization and optimization of a legacy open-source CFD code for high-performance computing architectures

International Journal of Computational Fluid Dynamics ◽

10.1080/10618562.2017.1285398 ◽

2017 ◽

Vol 31 (2) ◽

pp. 122-133 ◽

Cited By ~ 4

Author(s):

Aytekin Gel ◽

Jonathan Hu ◽

ElMoustapha Ould-Ahmed-Vall ◽

Alexander A. Kalinkin

Keyword(s):

Open Source ◽

High Performance Computing ◽

High Performance ◽

Performance Computing

Download Full-text

The effect of subject measurement error on joint kinematics in the conventional gait model: Insights from the open-source pyCGM tool using high performance computing methods

PLoS ONE ◽

10.1371/journal.pone.0189984 ◽

2018 ◽

Vol 13 (1) ◽

pp. e0189984 ◽

Cited By ~ 3

Author(s):

Mathew Schwartz ◽

Philippe C. Dixon

Keyword(s):

Measurement Error ◽

Open Source ◽

High Performance Computing ◽

High Performance ◽

Computing Methods ◽

Joint Kinematics ◽

Performance Computing

Download Full-text

File System Write-Optimization on High-Performance Computing Support Platform

Proceedings of the 2017 7th International Conference on Manufacturing Science and Engineering (ICMSE 2017) ◽

10.2991/icmse-17.2017.29 ◽

2017 ◽

Author(s):

Jiye Wang ◽

Nan Zeng ◽

Jun Yu

Keyword(s):

High Performance Computing ◽

High Performance ◽

File System ◽

Computing Support ◽

Performance Computing

Download Full-text

Inspiring the Next Generation of HPC Engineers with Reconfigurable, Multi-Tenant Resources for Teaching and Research

Sustainability ◽

10.3390/su132111782 ◽

2021 ◽

Vol 13 (21) ◽

pp. 11782

Author(s):

Taha Al-Jody ◽

Hamza Aagela ◽

Violeta Holmes

Keyword(s):

Open Source ◽

High Performance Computing ◽

Systems Engineering ◽

High Performance ◽

Bare Metal ◽

Next Generation ◽

Teaching And Research ◽

Exascale Computing ◽

Secure Access ◽

Performance Computing

There is a tradition at our university for teaching and research in High Performance Computing (HPC) systems engineering. With exascale computing on the horizon and a shortage of HPC talent, there is a need for new specialists to secure the future of research computing. Whilst many institutions provide research computing training for users within their particular domain, few offer HPC engineering and infrastructure-related courses, making it difficult for students to acquire these skills. This paper outlines how and why we are training students in HPC systems engineering, including the technologies used in delivering this goal. We demonstrate the potential for a multi-tenant HPC system for education and research, using novel container and cloud-based architecture. This work is supported by our previously published work that uses the latest open-source technologies to create sustainable, fast and flexible turn-key HPC environments with secure access via an HPC portal. The proposed multi-tenant HPC resources can be deployed on a “bare metal” infrastructure or in the cloud. An evaluation of our activities over the last five years is given in terms of recruitment metrics, skills audit feedback from students, and research outputs enabled by the multi-tenant usage of the resource.

Download Full-text

Modeling of distributed file System in big data storage by event- B

MATEC Web of Conferences ◽

10.1051/matecconf/201821004042 ◽

2018 ◽

Vol 210 ◽

pp. 04042

Author(s):

Ammar Alhaj Ali ◽

Pavel Varacha ◽

Said Krayem ◽

Roman Jasek ◽

Petr Zacek ◽

...

Keyword(s):

Big Data ◽

Data Storage ◽

High Performance ◽

File System ◽

Formal Method ◽

File Systems ◽

Distributed File System ◽

Distributed File Systems ◽

Data Systems ◽

Big Data Systems

Nowadays, a wide set of systems and application, especially in high performance computing, depends on distributed environments to process and analyses huge amounts of data. As we know, the amount of data increases enormously, and the goal to provide and develop efficient, scalable and reliable storage solutions has become one of the major issue for scientific computing. The storage solution used by big data systems is Distributed File Systems (DFSs), where DFS is used to build a hierarchical and unified view of multiple file servers and shares on the network. In this paper we will offer Hadoop Distributed File System (HDFS) as DFS in big data systems and we will present an Event-B as formal method that can be used in modeling, where Event-B is a mature formal method which has been widely used in a number of industry projects in a number of domains, such as automotive, transportation, space, business information, medical device and so on, And will propose using the Rodin as modeling tool for Event-B, which integrates modeling and proving as well as the Rodin platform is open source, so it supports a large number of plug-in tools.

Download Full-text

JMS: An Open Source Workflow Management System and Web-Based Cluster Front-End for High Performance Computing

PLoS ONE ◽

10.1371/journal.pone.0134273 ◽

2015 ◽

Vol 10 (8) ◽

pp. e0134273 ◽

Cited By ~ 8

Author(s):

David K. Brown ◽

David L. Penkler ◽

Thommas M. Musyoka ◽

Özlem Tastan Bishop

Keyword(s):

Open Source ◽

High Performance Computing ◽

Management System ◽

High Performance ◽

Workflow Management ◽

Workflow Management System ◽

Web Based ◽

Front End ◽

Performance Computing

Download Full-text

Research and Design on Intranet Security File System

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.433-440.4704 ◽

2012 ◽

Vol 433-440 ◽

pp. 4704-4709

Author(s):

Yan Shen Chen ◽

De Zhi Han

Keyword(s):

Data Storage ◽

Data Security ◽

High Performance ◽

File System ◽

Storage System ◽

Data Encryption ◽

Security Issue ◽

Storage And Retrieval ◽

System A ◽

Secure File System

To solve the data security issue in intranet massive storage system, a Multi-Protocol Secure File System ( for short MPSFS) is designed. Firstly, the MPSFS supports the access of users with different protocols, and provides the unified access interface, so can achieve high performance in data storage and retrieval; secondly, with the help of other technologies such as identity authentication, access control and data encryption, the MPSFS can effectively ensure the data security in the intranet storage system. By the experiment, the MPSFS can provide good security and scalability for intranet massive storage system, and has less effect to the network I/O performance.

Download Full-text