scholarly journals Interactive analysis notebooks on DESY batch resources

2021 ◽  
Vol 5 (1) ◽  
Author(s):  
J. Reppin ◽  
C. Beyer ◽  
T. Hartmann ◽  
F. Schluenzen ◽  
M. Flemming ◽  
...  

AbstractBatch scheduling systems are usually designed to maximise fair resource utilisation and efficiency, but are less well designed for demanding interactive processing, which requires fast access to resources while low upstart latency is only of secondary significance for high throughput of high performance computing scheduling systems. The computing clusters at DESY are intended as batch systems for end users to run massive analysis and simulation jobs enabling fast turnaround systems, in particular when processing is expected to feed back to operation of instruments in near real-time. The continuously increasing popularity of Jupyter Notebooks for interactive and online processing made an integration of this technology into the DESY batch systems indispensable. We present here our approach to utilise the HTCondor and SLURM backends to integrate Jupyter Notebook servers and the techniques involved to provide fast access. The chosen approach offers a smooth user experience allowing users to customize resource allocation tailored to their computational requirements. In addition, we outline the differences between the HPC and the HTC implementations and give an overview of the experience of running Jupyter Notebook services.

Author(s):  
Kamer Kaya ◽  
Ayat Hatem ◽  
Hatice Gülçin Özer ◽  
Kun Huang ◽  
Ümit V. Çatalyürek

2020 ◽  
Vol 16 (8) ◽  
pp. 155014772093275 ◽  
Author(s):  
Muhammad Shuaib Qureshi ◽  
Muhammad Bilal Qureshi ◽  
Muhammad Fayaz ◽  
Wali Khan Mashwani ◽  
Samir Brahim Belhaouari ◽  
...  

An efficient resource allocation scheme plays a vital role in scheduling applications on high-performance computing resources in order to achieve desired level of service. The major part of the existing literature on resource allocation is covered by the real-time services having timing constraints as primary parameter. Resource allocation schemes for the real-time services have been designed with various architectures (static, dynamic, centralized, or distributed) and quality of service criteria (cost efficiency, completion time minimization, energy efficiency, and memory optimization). In this analysis, numerous resource allocation schemes for real-time services in various high-performance computing (distributed and non-distributed) domains have been studied and compared on the basis of common parameters such as application type, operational environment, optimization goal, architecture, system size, resource type, optimality, simulation tool, comparison technique, and input data. The basic aim of this study is to provide a consolidated platform to the researchers working on scheduling and allocating high-performance computing resources to the real-time services. This work comprehensively discusses, integrates, analysis, and categorizes all resource allocation schemes for real-time services into five high-performance computing classes: grid, cloud, edge, fog, and multicore computing systems. The workflow representations of the studied schemes help the readers in understanding basic working and architectures of these mechanisms in order to investigate further research gaps.


Author(s):  
Claudia Cava ◽  
Francesca Gallivanone ◽  
Christian Salvatore ◽  
Pasquale Anthony Della Rosa ◽  
Isabella Castiglioni

Bioinformatics traditionally deals with computational approaches to the analysis of big data from high-throughput technologies as genomics, proteomics, and sequencing. Bioinformatics analysis allows extraction of new information from big data that might help to better assess the biological details at a molecular and cellular level. The wide-scale and high-dimensionality of Bioinformatics data has led to an increasing need of high performance computing and repository. In this chapter, the authors demonstrate the advantages of cloud computing in Bioinformatics research for high-throughput technologies.


2018 ◽  
Vol 106 (4) ◽  
Author(s):  
Jean-Paul Courneya ◽  
Alexa Mayo

Despite having an ideal setup in their labs for wet work, researchers often lack the computational infrastructure to analyze the magnitude of data that result from “-omics” experiments. In this innovative project, the library supports analysis of high-throughput data from global molecular profiling experiments by offering a high-performance computer with open source software along with expert bioinformationist support. The audience for this new service is faculty, staff, and students for whom using the university’s large scale, CORE computational resources is not warranted because these resources exceed the needs of smaller projects. In the library’s approach, users are empowered to analyze high-throughput data that they otherwise would not be able to on their own computers. To develop the project, the library’s bioinformationist identified the ideal computing hardware and a group of open source bioinformatics software to provide analysis options for experimental data such as scientific images, sequence reads, and flow cytometry files. To close the loop between learning and practice, the bioinformationist developed self-guided learning materials and workshops or consultations on topics such as the National Center for Biotechnology Information’s BLAST, Bioinformatics on the Cloud, and ImageJ. Researchers apply the data analysis techniques that they learned in the classroom in an ideal computing environment.


2020 ◽  
Vol 245 ◽  
pp. 09011
Author(s):  
Michael Hildreth ◽  
Kenyi Paolo Hurtado Anampa ◽  
Cody Kankel ◽  
Scott Hampton ◽  
Paul Brenner ◽  
...  

The NSF-funded Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN) project aims to develop and deploy artificial intelligence (AI) and likelihood-free inference (LFI) techniques and software using scalable cyberinfrastructure (CI) built on top of existing CI elements. Specifically, the project has extended the CERN-based REANA framework, a cloud-based data analysis platform deployed on top of Kubernetes clusters that was originally designed to enable analysis reusability and reproducibility. REANA is capable of orchestrating extremely complicated multi-step workflows, and uses Kubernetes clusters both for scheduling and distributing container-based workloads across a cluster of available machines, as well as instantiating and monitoring the concrete workloads themselves. This work describes the challenges and development efforts involved in extending REANA and the components that were developed in order to enable large scale deployment on High Performance Computing (HPC) resources. Using the Virtual Clusters for Community Computation (VC3) infrastructure as a starting point, we implemented REANA to work with a number of differing workload managers, including both high performance and high throughput, while simultaneously removing REANA’s dependence on Kubernetes support at the workers level.


Sign in / Sign up

Export Citation Format

Share Document