Interactive analysis notebooks on DESY batch resources

AbstractBatch scheduling systems are usually designed to maximise fair resource utilisation and efficiency, but are less well designed for demanding interactive processing, which requires fast access to resources while low upstart latency is only of secondary significance for high throughput of high performance computing scheduling systems. The computing clusters at DESY are intended as batch systems for end users to run massive analysis and simulation jobs enabling fast turnaround systems, in particular when processing is expected to feed back to operation of instruments in near real-time. The continuously increasing popularity of Jupyter Notebooks for interactive and online processing made an integration of this technology into the DESY batch systems indispensable. We present here our approach to utilise the HTCondor and SLURM backends to integrate Jupyter Notebook servers and the techniques involved to provide fast access. The chosen approach offers a smooth user experience allowing users to customize resource allocation tailored to their computational requirements. In addition, we outline the differences between the HPC and the HTC implementations and give an overview of the experience of running Jupyter Notebook services.

Download Full-text

Using Managed High Performance Computing Systems for High-Throughput Computing

Conquering Big Data with High Performance Computing ◽

10.1007/978-3-319-33742-5_4 ◽

2016 ◽

pp. 61-79 ◽

Cited By ~ 2

Author(s):

Lucas A. Wilson

Keyword(s):

High Performance Computing ◽

High Throughput ◽

High Performance ◽

Computing Systems ◽

High Throughput Computing ◽

Performance Computing

Download Full-text

High-Performance Computing In High-Throughput Sequencing

Biological Knowledge Discovery Handbook ◽

10.1002/9781118617151.ch43 ◽

2013 ◽

pp. 981-1002 ◽

Cited By ~ 1

Author(s):

Kamer Kaya ◽

Ayat Hatem ◽

Hatice Gülçin Özer ◽

Kun Huang ◽

Ümit V. Çatalyürek

Keyword(s):

High Performance Computing ◽

High Throughput ◽

High Performance ◽

High Throughput Sequencing ◽

Performance Computing

Download Full-text

An Introduction to Big Data, High Performance Computing, High-Throughput Computing, and Hadoop

Conquering Big Data with High Performance Computing ◽

10.1007/978-3-319-33742-5_1 ◽

2016 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Ritu Arora

Keyword(s):

Big Data ◽

High Performance Computing ◽

High Throughput ◽

High Performance ◽

High Throughput Computing ◽

Performance Computing

Download Full-text

Improvements of common open Grid standards to increase High Throughput and High Performance Computing effectiveness on large-scale Grid and e-science infrastructures

2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW) ◽

10.1109/ipdpsw.2010.5470916 ◽

2010 ◽

Cited By ~ 2

Author(s):

M. Riedel ◽

M.S. Memon ◽

A.S. Memon ◽

A. Streit ◽

F. Wolf ◽

...

Keyword(s):

High Performance Computing ◽

High Throughput ◽

High Performance ◽

Large Scale ◽

Large Scale Grid ◽

Scale Grid ◽

Performance Computing

Download Full-text

Adaptation and Policy-Based Resource Allocation for Efficient Bulk Data Transfers in High Performance Computing Environments

2014 Fourth International Workshop on Network-Aware Data Management ◽

10.1109/ndm.2014.7 ◽

2014 ◽

Cited By ~ 1

Author(s):

Ann L. Chervenak ◽

Alex Sim ◽

Junmin Gu ◽

Robert E. Schuler ◽

Nandan Hirpathak

Keyword(s):

Resource Allocation ◽

High Performance Computing ◽

High Performance ◽

Bulk Data ◽

Computing Environments ◽

Data Transfers ◽

Performance Computing

Download Full-text

A comparative analysis of resource allocation schemes for real-time services in high-performance computing systems

International Journal of Distributed Sensor Networks ◽

10.1177/1550147720932750 ◽

2020 ◽

Vol 16 (8) ◽

pp. 155014772093275 ◽

Cited By ~ 1

Author(s):

Muhammad Shuaib Qureshi ◽

Muhammad Bilal Qureshi ◽

Muhammad Fayaz ◽

Wali Khan Mashwani ◽

Samir Brahim Belhaouari ◽

...

Keyword(s):

Resource Allocation ◽

High Performance Computing ◽

Real Time ◽

High Performance ◽

System Size ◽

Operational Environment ◽

Computing Systems ◽

The Real ◽

Resource Allocation Schemes ◽

Performance Computing

An efficient resource allocation scheme plays a vital role in scheduling applications on high-performance computing resources in order to achieve desired level of service. The major part of the existing literature on resource allocation is covered by the real-time services having timing constraints as primary parameter. Resource allocation schemes for the real-time services have been designed with various architectures (static, dynamic, centralized, or distributed) and quality of service criteria (cost efficiency, completion time minimization, energy efficiency, and memory optimization). In this analysis, numerous resource allocation schemes for real-time services in various high-performance computing (distributed and non-distributed) domains have been studied and compared on the basis of common parameters such as application type, operational environment, optimization goal, architecture, system size, resource type, optimality, simulation tool, comparison technique, and input data. The basic aim of this study is to provide a consolidated platform to the researchers working on scheduling and allocating high-performance computing resources to the real-time services. This work comprehensively discusses, integrates, analysis, and categorizes all resource allocation schemes for real-time services into five high-performance computing classes: grid, cloud, edge, fog, and multicore computing systems. The workflow representations of the studied schemes help the readers in understanding basic working and architectures of these mechanisms in order to investigate further research gaps.

Download Full-text

Bioinformatics Clouds for High-Throughput Technologies

Advances in Data Mining and Database Management - Handbook of Research on Cloud Infrastructures for Big Data Analytics ◽

10.4018/978-1-4666-5864-6.ch020 ◽

2014 ◽

pp. 489-507 ◽

Cited By ~ 2

Author(s):

Claudia Cava ◽

Francesca Gallivanone ◽

Christian Salvatore ◽

Pasquale Anthony Della Rosa ◽

Isabella Castiglioni

Keyword(s):

Cloud Computing ◽

Big Data ◽

High Throughput ◽

High Performance ◽

Cellular Level ◽

High Dimensionality ◽

Computational Approaches ◽

New Information ◽

Wide Scale ◽

Performance Computing

Bioinformatics traditionally deals with computational approaches to the analysis of big data from high-throughput technologies as genomics, proteomics, and sequencing. Bioinformatics analysis allows extraction of new information from big data that might help to better assess the biological details at a molecular and cellular level. The wide-scale and high-dimensionality of Bioinformatics data has led to an increasing need of high performance computing and repository. In this chapter, the authors demonstrate the advantages of cloud computing in Bioinformatics research for high-throughput technologies.

Download Full-text

Evaluating Cloud Auto-Scaler Resource Allocation Planning Under High-Performance Computing Workloads

2020 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom) ◽

10.1109/ispa-bdcloud-socialcom-sustaincom51426.2020.00147 ◽

2020 ◽

Author(s):

Kester Leochico ◽

Eugene John

Keyword(s):

Resource Allocation ◽

High Performance Computing ◽

High Performance ◽

Performance Computing ◽

Allocation Planning

Download Full-text

High-performance computing service for bioinformatics and data science

Journal of the Medical Library Association JMLA ◽

10.5195/jmla.2018.512 ◽

2018 ◽

Vol 106 (4) ◽

Author(s):

Jean-Paul Courneya ◽

Alexa Mayo

Keyword(s):

Open Source ◽

High Throughput ◽

High Performance ◽

Large Scale ◽

Data Science ◽

Wet Work ◽

High Throughput Data ◽

Guided Learning ◽

Computational Resources ◽

Performance Computing

Despite having an ideal setup in their labs for wet work, researchers often lack the computational infrastructure to analyze the magnitude of data that result from “-omics” experiments. In this innovative project, the library supports analysis of high-throughput data from global molecular profiling experiments by offering a high-performance computer with open source software along with expert bioinformationist support. The audience for this new service is faculty, staff, and students for whom using the university’s large scale, CORE computational resources is not warranted because these resources exceed the needs of smaller projects. In the library’s approach, users are empowered to analyze high-throughput data that they otherwise would not be able to on their own computers. To develop the project, the library’s bioinformationist identified the ideal computing hardware and a group of open source bioinformatics software to provide analysis options for experimental data such as scientific images, sequence reads, and flow cytometry files. To close the loop between learning and practice, the bioinformationist developed self-guided learning materials and workshops or consultations on topics such as the National Center for Biotechnology Information’s BLAST, Bioinformatics on the Cloud, and ImageJ. Researchers apply the data analysis techniques that they learned in the classroom in an ideal computing environment.

Download Full-text

Large-scale HPC deployment of Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN)

EPJ Web of Conferences ◽

10.1051/epjconf/202024509011 ◽

2020 ◽

Vol 245 ◽

pp. 09011

Author(s):

Michael Hildreth ◽

Kenyi Paolo Hurtado Anampa ◽

Cody Kankel ◽

Scott Hampton ◽

Paul Brenner ◽

...

Keyword(s):

Artificial Intelligence ◽

Data Analysis ◽

High Performance Computing ◽

High Throughput ◽

High Performance ◽

Large Scale ◽

Starting Point ◽

Virtual Clusters ◽

Analysis Platform ◽

Performance Computing

The NSF-funded Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN) project aims to develop and deploy artificial intelligence (AI) and likelihood-free inference (LFI) techniques and software using scalable cyberinfrastructure (CI) built on top of existing CI elements. Specifically, the project has extended the CERN-based REANA framework, a cloud-based data analysis platform deployed on top of Kubernetes clusters that was originally designed to enable analysis reusability and reproducibility. REANA is capable of orchestrating extremely complicated multi-step workflows, and uses Kubernetes clusters both for scheduling and distributing container-based workloads across a cluster of available machines, as well as instantiating and monitoring the concrete workloads themselves. This work describes the challenges and development efforts involved in extending REANA and the components that were developed in order to enable large scale deployment on High Performance Computing (HPC) resources. Using the Virtual Clusters for Community Computation (VC3) infrastructure as a starting point, we implemented REANA to work with a number of differing workload managers, including both high performance and high throughput, while simultaneously removing REANA’s dependence on Kubernetes support at the workers level.

Download Full-text