Data-Aware Distributed Computing

Author(s):  
Esma Yildirim ◽  
Mehmet Balman ◽  
Tevfik Kosar

With the continuous increase in the data requirements of scientific and commercial applications, access to remote and distributed data has become a major bottleneck for end-to-end application performance. Traditional distributed computing systems closely couple data access and computation, and generally, data access is considered a side effect of computation. The limitations of traditional distributed computing systems and CPU-oriented scheduling and workflow management tools in managing complex data handling have motivated a newly emerging era: data-aware distributed computing. In this chapter, the authors elaborate on how the most crucial distributed computing components, such as scheduling, workflow management, and end-to-end throughput optimization, can become “data-aware.” In this new computing paradigm, called data-aware distributed computing, data placement activities are represented as full-featured jobs in the end-to-end workflow, and they are queued, managed, scheduled, and optimized via a specialized data-aware scheduler. As part of this new paradigm, the authors present a set of tools for mitigating the data bottleneck in distributed computing systems, which consists of three main components: a data-aware scheduler, which provides capabilities such as planning, scheduling, resource reservation, job execution, and error recovery for data movement tasks; integration of these capabilities to the other layers in distributed computing, such as workflow planning; and further optimization of data movement tasks via dynamic tuning of underlying protocol transfer parameters.

2019 ◽  
Vol 214 ◽  
pp. 03047 ◽  
Author(s):  
Fernando Barreiro ◽  
Doug Benjamin ◽  
Taylor Childers ◽  
Kaushik De ◽  
Johannes Elmsheuser ◽  
...  

Since 2010 the Production and Distributed Analysis system (PanDA) for the ATLAS experiment at the Large Hadron Colliderhas seen big changes to accommodate new types of distributed computing resources: clouds, HPCs, volunteer computers and other external resources. While PanDA was originally designed for fairly homogeneous resources available through the Worldwide LHC Computing Grid, the new resources are heterogeneous, at diverse scales and with diverse interfaces. Up to a fifth of the resources available to ATLAS are of such new types and require special techniques for integration into PanDA. In this talk, we present the nature and scale of these resources. We provide an overview of the various challenges faced, spanning infrastructure, software distribution, workload requirements, scaling requirements, workflow management, data management, network provisioning, and associated software and computing facilities. We describe the strategies for integrating these heterogeneous resources into ATLAS, and the new software components being developed in PanDA to efficiently use them. Plans for software and computing evolution to meet the needs of LHC operations and upgrade in the long term future will be discussed.


2019 ◽  
Vol 214 ◽  
pp. 09009
Author(s):  
Philippe Charpentier

Although the LHC experiments have been designed and prepared since 1984, the challenge of LHC computing was only tackled seriously much later, at the end of the ‘90s. This was the time at which the Grid paradigm wasemerging, and LHC computing had great hopes that most of its challenges would be solved by this new paradigm. The path to having functional and efficient distributed computing systems was in the end much more complex than anticipated. However, most obstacles were overcome, thanks to the introductionof new paradigms and a lot of manpower investment from the experiments and from the supporting IT units (for middleware development and infrastructuresetup). This contribution is briefly outlining some of the biggest hopes and disillusions of these past 20 years, and gives a brief outlook to the coming trends.


Sign in / Sign up

Export Citation Format

Share Document