grid middleware Latest Research Papers

AbstractThe prompt reconstruction of the data recorded from the Large Hadron Collider (LHC) detectors has always been addressed by dedicated resources at the CERN Tier-0. Such workloads come in spikes due to the nature of the operation of the accelerator and in special high load occasions experiments have commissioned methods to distribute (spill-over) a fraction of the load to sites outside CERN. The present work demonstrates a new way of supporting the Tier-0 environment by provisioning resources elastically for such spilled-over workflows onto the Piz Daint Supercomputer at CSCS. This is implemented using containers, tuning the existing batch scheduler and reinforcing the scratch file system, while still using standard Grid middleware. ATLAS, CMS and CSCS have jointly run selected prompt data reconstruction on up to several thousand cores on Piz Daint into a shared environment, thereby probing the viability of the CSCS high performance computer site as on demand extension of the CERN Tier-0, which could play a role in addressing the future LHC computing challenges for the high luminosity LHC.

Download Full-text

Anomaly detection using Unsupervised Machine Learning for Grid computing site operation

EPJ Web of Conferences ◽

10.1051/epjconf/202024507016 ◽

2020 ◽

Vol 245 ◽

pp. 07016

Author(s):

Tomoe Kishimoto ◽

Junichi Tnaka ◽

Tetsuro Mashimo ◽

Ryu Sawada ◽

Koji Terashi ◽

...

Keyword(s):

Machine Learning ◽

Anomaly Detection ◽

Grid Computing ◽

Machine Learning Techniques ◽

Storage Element ◽

Site Administrators ◽

Grid Middleware ◽

Learning Techniques ◽

The Status ◽

Computing Grid

A Grid computing site is composed of various services including Grid middleware, such as Computing Element and Storage Element. Text logs produced by the services provide useful information for understanding the status of the services. However, it is a time-consuming task for site administrators to monitor and analyze the service logs every day. Therefore, a support framework has been developed to ease the site administrator’s work. The framework detects anomaly logs using Machine Learning techniques and alerts site administrators. The framework has been examined using real service logs at the Tokyo Tier2 site, which is one of the Worldwide LHC Computing Grid sites. In this paper, a method of the anomaly detection in the framework and its performances at the Tokyo Tier2 site are reported.

Download Full-text

A Lightweight Door into Non-Grid Sites

EPJ Web of Conferences ◽

10.1051/epjconf/202024507005 ◽

2020 ◽

Vol 245 ◽

pp. 07005

Author(s):

Jeffrey Dost ◽

Marco Mascheroni ◽

Brian Bockelman ◽

Lincoln Bryant ◽

Timothy Cartwright ◽

...

Keyword(s):

High Energy Physics ◽

Open Science ◽

High Energy ◽

Operational Experience ◽

Grid Middleware ◽

Scientific Institutions ◽

Technical Details ◽

Physics Experiments ◽

Open Science Grid ◽

Energy Physics

The Open Science Grid (OSG) provides a common service for resource providers and scientific institutions, and supports sciences such as High Energy Physics, Structural Biology, and other community sciences. As scientific frontiers expand, so does the need for resources to analyze new data. For example, High Energy Physics experiments such as the LHC experiments foresee an exponential growth in the amount of data collected, which comes with corresponding growth in the need for computing resources. Allowing resource providers an easy way to share their resources is paramount to ensure the grow of resources available to scientists. In this context, the OSG Hosted CE initiative provides site administrator a way to reduce the effort needed to install and maintain a Compute Element (CE), and represents a solution for sites who do not have the effort and expertise to run their own Grid middleware. An HTCondor Compute Element is installed on a remote VM at UChicago for each site that joins the Hosted CE initiative. The hardware/software stack is maintained by OSG Operations staff in a homogeneus and automated way, providing a reduction in the overall operational effort needed to maintain the CEs: one single organization does it in an uniform way, instead of each single resource provider doing it in their own way. Currently, more than 20 institutions joined the Hosted CE initiative. This contribution discusses the technical details behind a Hosted CE installation, highlighting key strengths and common pitfalls, and outlining future plans to further reduce operational experience.

Download Full-text

Global Grids and Package Toolkits at 4 Grid Middleware Technologies

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f8129.088619 ◽

2019 ◽

Vol 8 (6) ◽

pp. 1469-1476

Keyword(s):

Resource Sharing ◽

Academic Research ◽

Scientific Instruments ◽

Research Projects ◽

Access To Resources ◽

Implementation Model ◽

Grid Middleware ◽

Grid Applications ◽

Grid Operation ◽

Computer Resources

Grid is an infrastructure that involves the integrated and collaborative use of computers, networks, databases and scientific instruments owned and managed by many organizations. Grid applications often involve large amounts of data and/or computer resources that require a secure resource sharing throughout the organization. This makes grid operation and deployment a complex undertaking. Grid middleware provides users with seamless computer skills and uniform access to resources in the heterogeneous grid. A number of toolkits and systems have been developed, most of which are the result of academic research projects worldwide. This chapter focuses on four of these intermediaries: UNICORE, Globus, Legion and Grid bus. It also presents our implementation of a UNICORE resource broker because it did not support this functionality. A comparison of these systems is included based on the architecture, the implementation model and a number of other functions.

Download Full-text

JAliEn: the new ALICE high-performance and high-scalability Grid framework

EPJ Web of Conferences ◽

10.1051/epjconf/201921403037 ◽

2019 ◽

Vol 214 ◽

pp. 03037

Author(s):

M. Martinez Pedreira ◽

C. Grigoras ◽

V. Yurchenko

Keyword(s):

Data Processing ◽

High Performance ◽

Data Access ◽

Performance Measurements ◽

Alice Experiment ◽

Grid Middleware ◽

Software Upgrades ◽

Order Of Magnitude ◽

And Performance ◽

And Storage

The ALICE experiment will undergo extensive hardware and software upgrades for the LHC Run3. This translates in significant increase of the CPU and storage resources required for data processing, and at the same time the data access rates will grow linearly with the amount of resources. JAliEn (Java ALICE Environment) is the new Grid middleware designed to scale-out horizontally to fulfil the computing needs of the upgrade, and at the same time to modernize all parts of the distributed system software. This paper will present the architecture of the JAliEn framework, the technologies used and performance measurements. This work will also describe the next generation solution that will replace our main database backend, the AliEn File Catalogue. The catalogue is an integral part of the system, containing the metadata of all files written to the distributed Grid storage and also provides powerful search and data manipulation tools. As for JAliEn, the focus has been put onto horizontal scalability, with the aim to handle near exascale data volumes and order of magnitude more workload than the currently used Grid middleware. Lastly, this contribution will present how JAliEn manages the increased complexity of the tasks associated with the new ALICE data processing and analysis framework (ALFA) and multi-core environments.

Download Full-text

Beyond X.509: token-based authentication and authorization for HEP

EPJ Web of Conferences ◽

10.1051/epjconf/201921409002 ◽

2019 ◽

Vol 214 ◽

pp. 09002 ◽

Cited By ~ 1

Author(s):

Andrea Ceccanti ◽

Enrico Vianello ◽

Marco Caberletti ◽

Francesco Giacomini

Keyword(s):

Ad Hoc ◽

Management Service ◽

Access Management ◽

Web Technologies ◽

Grid Middleware ◽

Recent Advancement ◽

Authentication And Authorization ◽

Reliable Solution ◽

And Storage ◽

Important Objective

X.509 certificates and VOMS have proved to be a secure and reliable solution for authentication and authorization on the Grid, but also showed usability issues and required the development of ad-hoc services and libraries to support VO-based authorization schemes in Grid middleware and experiment computing frameworks. The need to move beyond X.509 certificates is recognized as an important objective in the HEP R&D roadmap for software and computing, to overcome the usability issues of the current AAI and embrace recent advancement in web technologies widely adopted in industry, but also to enable the secure composition of computing and storage resources provisioned across heterogeneous providers in order to meet the computing needs of HL-LHC. A flexible and usable AAI based on modern web technologies is a key enabler of such secure composition and has been a major topic of research of the recently concluded INDIGO-DataCloud project. In this contribution, we present an integrated solution, based on the INDIGO-DataCloud Identity and Access Management service that demonstrates how a next generation, token-based VO-aware AAI can be built in support of HEP computing use cases, while maintaining compatibility with the existing, VOMS-based AAI used by the Grid.

Download Full-text

Using Lustre and Slurm to process Hadoop workloads and extending to the WLCG

EPJ Web of Conferences ◽

10.1051/epjconf/201921404049 ◽

2019 ◽

Vol 214 ◽

pp. 04049

Author(s):

Daniel Traynor ◽

Terry Froy

Keyword(s):

Open Source ◽

Data Storage ◽

High Performance ◽

File System ◽

Distributed Storage ◽

Grid Middleware ◽

Batch System ◽

Performance Computing ◽

Dedicated Hardware ◽

Modern Tool

The Queen Mary University of London Grid site has investigated the use of its Lustre file system to support Hadoop work flows. Lustre is an open source, POSIX compatible, clustered file system often used in high performance computing clusters and is often paired with the Slurm batch system. Hadoop is an open-source software framework for distributed storage and processing of data normally run on dedicated hardware utilising the HDFS file system and Yarn batch system. Hadoop is an important modern tool for data analytics used by a large range of organisation including CERN. By using our existing Lustre file system and Slurm batch system, the need to have dedicated hardware is removed and a single platform only has to be maintained for data storage and processing. The motivation and benefits of using Hadoop with Lustre and Slurm are presented. The installation, benchmarks, limitations and future plans are discussed. We also investigate using the standard WLCG Grid middleware Cream-CE service to provide a Grid enabled Hadoop service.

Download Full-text

Building, testing and distributing common software for the LHC experiments

EPJ Web of Conferences ◽

10.1051/epjconf/201921405020 ◽

2019 ◽

Vol 214 ◽

pp. 05020

Author(s):

Javier Cervantes Villanueva ◽

Gerardo Ganis ◽

Dmitri Konstantinov ◽

Grigorii Latyshev ◽

Pere Mato Vila ◽

...

Keyword(s):

Machine Learning ◽

Operating Systems ◽

Automatic System ◽

Web Pages ◽

Learning Tools ◽

Future Directions ◽

Grid Middleware ◽

Hardware Architectures ◽

User Friendly

Building, testing and deploying of coherent large software stacks is very challenging, in particular when they consist of the diverse set of packages required by the LHC experiments, the CERN Beams Department and data analysis services such as SWAN. These software stacks include several packages (Grid middleware, Monte Carlo generators, Machine Learning tools, Python modules) all available for a large number of compilers, operating systems and hardware architectures. To address this challenge, we developed an infrastructure around a tool called lcgcmake. Dedicated modules are responsible for building the packages, con-trolling the dependencies in a reliable and scalable way. The distribution relies on a robust and automatic system, responsible for building and testing the packages, installing them on CernVM-FS and packaging the binaries in RPMs and tarballs. This system is orchestrated through Jenkins on build machines provided by the CERN Openstack facility. The results are published through user-friendly web pages. In this paper we will present an overview of these infrastructure tools and policies. We also discuss the role of this effort within the HEP Software Foundation (HSF). Finally we will discuss the evolution of the infrastructure towards container (Docker) technologies and the future directions and challenges of the project.

Download Full-text