scholarly journals Using Lustre and Slurm to process Hadoop workloads and extending to the WLCG

2019 ◽  
Vol 214 ◽  
pp. 04049
Author(s):  
Daniel Traynor ◽  
Terry Froy

The Queen Mary University of London Grid site has investigated the use of its Lustre file system to support Hadoop work flows. Lustre is an open source, POSIX compatible, clustered file system often used in high performance computing clusters and is often paired with the Slurm batch system. Hadoop is an open-source software framework for distributed storage and processing of data normally run on dedicated hardware utilising the HDFS file system and Yarn batch system. Hadoop is an important modern tool for data analytics used by a large range of organisation including CERN. By using our existing Lustre file system and Slurm batch system, the need to have dedicated hardware is removed and a single platform only has to be maintained for data storage and processing. The motivation and benefits of using Hadoop with Lustre and Slurm are presented. The installation, benchmarks, limitations and future plans are discussed. We also investigate using the standard WLCG Grid middleware Cream-CE service to provide a Grid enabled Hadoop service.

2018 ◽  
Vol 7 (4.6) ◽  
pp. 13
Author(s):  
Mekala Sandhya ◽  
Ashish Ladda ◽  
Dr. Uma N Dulhare ◽  
. . ◽  
. .

In this generation of Internet, information and data are growing continuously. Even though various Internet services and applications. The amount of information is increasing rapidly. Hundred billions even trillions of web indexes exist. Such large data brings people a mass of information and more difficulty discovering useful knowledge in these huge amounts of data at the same time. Cloud computing can provide infrastructure for large data. Cloud computing has two significant characteristics of distributed computing i.e. scalability, high availability. The scalability can seamlessly extend to large-scale clusters. Availability says that cloud computing can bear node errors. Node failures will not affect the program to run correctly. Cloud computing with data mining does significant data processing through high-performance machine. Mass data storage and distributed computing provide a new method for mass data mining and become an effective solution to the distributed storage and efficient computing in data mining. 


2021 ◽  
Vol 13 (21) ◽  
pp. 11782
Author(s):  
Taha Al-Jody ◽  
Hamza Aagela ◽  
Violeta Holmes

There is a tradition at our university for teaching and research in High Performance Computing (HPC) systems engineering. With exascale computing on the horizon and a shortage of HPC talent, there is a need for new specialists to secure the future of research computing. Whilst many institutions provide research computing training for users within their particular domain, few offer HPC engineering and infrastructure-related courses, making it difficult for students to acquire these skills. This paper outlines how and why we are training students in HPC systems engineering, including the technologies used in delivering this goal. We demonstrate the potential for a multi-tenant HPC system for education and research, using novel container and cloud-based architecture. This work is supported by our previously published work that uses the latest open-source technologies to create sustainable, fast and flexible turn-key HPC environments with secure access via an HPC portal. The proposed multi-tenant HPC resources can be deployed on a “bare metal” infrastructure or in the cloud. An evaluation of our activities over the last five years is given in terms of recruitment metrics, skills audit feedback from students, and research outputs enabled by the multi-tenant usage of the resource.


2018 ◽  
Vol 210 ◽  
pp. 04042
Author(s):  
Ammar Alhaj Ali ◽  
Pavel Varacha ◽  
Said Krayem ◽  
Roman Jasek ◽  
Petr Zacek ◽  
...  

Nowadays, a wide set of systems and application, especially in high performance computing, depends on distributed environments to process and analyses huge amounts of data. As we know, the amount of data increases enormously, and the goal to provide and develop efficient, scalable and reliable storage solutions has become one of the major issue for scientific computing. The storage solution used by big data systems is Distributed File Systems (DFSs), where DFS is used to build a hierarchical and unified view of multiple file servers and shares on the network. In this paper we will offer Hadoop Distributed File System (HDFS) as DFS in big data systems and we will present an Event-B as formal method that can be used in modeling, where Event-B is a mature formal method which has been widely used in a number of industry projects in a number of domains, such as automotive, transportation, space, business information, medical device and so on, And will propose using the Rodin as modeling tool for Event-B, which integrates modeling and proving as well as the Rodin platform is open source, so it supports a large number of plug-in tools.


2012 ◽  
Vol 433-440 ◽  
pp. 4704-4709
Author(s):  
Yan Shen Chen ◽  
De Zhi Han

To solve the data security issue in intranet massive storage system, a Multi-Protocol Secure File System ( for short MPSFS) is designed. Firstly, the MPSFS supports the access of users with different protocols, and provides the unified access interface, so can achieve high performance in data storage and retrieval; secondly, with the help of other technologies such as identity authentication, access control and data encryption, the MPSFS can effectively ensure the data security in the intranet storage system. By the experiment, the MPSFS can provide good security and scalability for intranet massive storage system, and has less effect to the network I/O performance.


Sign in / Sign up

Export Citation Format

Share Document