Distributed Computing Software and Data Access Patterns in OSG Midscale Collaborations

In this paper we showcase the support in Open Science Grid (OSG) of Midscale collaborations, the region of computing and storage scale where multi-institutional researchers collaborate to execute their science workflows on the grid without having dedicated technical support teams of their own. Collaboration Services enables such collaborations to take advantage of the distributed resources of the Open Science Grid by facilitating access to submission hosts, the deployment of their applications and supporting their data management requirements. Distributed computing software adopted from large scale collaborations, such as CVMFS, Rucio, xCache lower the barrier of intermediate scale research to integrate with existing infrastructure.

Download Full-text

Optimizing Workflow Data Footprint

Scientific Programming ◽

10.1155/2007/701609 ◽

2007 ◽

Vol 15 (4) ◽

pp. 249-268 ◽

Cited By ~ 19

Author(s):

Gurmeet Singh ◽

Karan Vahi ◽

Arun Ramakrishnan ◽

Gaurang Mehta ◽

Ewa Deelman ◽

...

Keyword(s):

Data Storage ◽

Large Scale ◽

Open Science ◽

Scale Production ◽

Distributed Resources ◽

Data Intensive ◽

Large Scale Production ◽

Data Files ◽

The Cost ◽

Open Science Grid

In this paper we examine the issue of optimizing disk usage and scheduling large-scale scientific workflows onto distributed resources where the workflows are data-intensive, requiring large amounts of data storage, and the resources have limited storage resources. Our approach is two-fold: we minimize the amount of space a workflow requires during execution by removing data files at runtime when they are no longer needed and we demonstrate that workflows may have to be restructured to reduce the overall data footprint of the workflow. We show the results of our data management and workflow restructuring solutions using a Laser Interferometer Gravitational-Wave Observatory (LIGO) application and an astronomy application, Montage, running on a large-scale production grid-the Open Science Grid. We show that although reducing the data footprint of Montage by 48% can be achieved with dynamic data cleanup techniques, LIGO Scientific Collaboration workflows require additional restructuring to achieve a 56% reduction in data space usage. We also examine the cost of the workflow restructuring in terms of the application's runtime.

Download Full-text

Monitoring data access patterns in large-scale rendering

ACM SIGGRAPH 2014 Talks on - SIGGRAPH '14 ◽

10.1145/2614106.2614111 ◽

2014 ◽

Author(s):

Mark Hills ◽

Jim Vanns

Keyword(s):

Large Scale ◽

Data Access ◽

Monitoring Data ◽

Data Access Patterns ◽

Access Patterns

Download Full-text

Profiling Dynamic Data Access Patterns with Controlled Overhead and Quality

Proceedings of the 20th International Middleware Conference Industrial Track ◽

10.1145/3366626.3368125 ◽

2019 ◽

Author(s):

SeongJae Park ◽

Yunjae Lee ◽

Heon Y. Yeom

Keyword(s):

Data Access ◽

Dynamic Data ◽

Data Access Patterns ◽

Access Patterns

Download Full-text

Manycore Performance-Portability: Kokkos Multidimensional Array Library

Scientific Programming ◽

10.1155/2012/917630 ◽

2012 ◽

Vol 20 (2) ◽

pp. 89-114 ◽

Cited By ~ 13

Author(s):

H. Carter Edwards ◽

Daniel Sunderland ◽

Vicki Porter ◽

Chris Amsler ◽

Sam Mish

Keyword(s):

Programming Model ◽

Engineering Application ◽

Data Access ◽

Memory Space ◽

Performance Requirements ◽

Application Programming ◽

Multidimensional Array ◽

And Performance ◽

Data Access Patterns ◽

Access Patterns

Large, complex scientific and engineering application code have a significant investment in computational kernels to implement their mathematical models. Porting these computational kernels to the collection of modern manycore accelerator devices is a major challenge in that these devices have diverse programming models, application programming interfaces (APIs), and performance requirements. The Kokkos Array programming model provides library-based approach to implement computational kernels that are performance-portable to CPU-multicore and GPGPU accelerator devices. This programming model is based upon three fundamental concepts: (1) manycore compute devices each with its own memory space, (2) data parallel kernels and (3) multidimensional arrays. Kernel execution performance is, especially for NVIDIA® devices, extremely dependent on data access patterns. Optimal data access pattern can be different for different manycore devices – potentially leading to different implementations of computational kernels specialized for different devices. The Kokkos Array programming model supports performance-portable kernels by (1) separating data access patterns from computational kernels through a multidimensional array API and (2) introduce device-specific data access mappings when a kernel is compiled. An implementation of Kokkos Array is available through Trilinos [Trilinos website, http://trilinos.sandia.gov/, August 2011].

Download Full-text

Big Data Access Patterns

Big Data Application Architecture Q & A ◽

10.1007/978-1-4302-6293-0_5 ◽

2013 ◽

pp. 57-68

Author(s):

Nitin Sawant ◽

Himanshu Shah

Keyword(s):

Big Data ◽

Data Access ◽

Data Access Patterns ◽

Access Patterns

Download Full-text

Molecular Structure Determination on the Grid

Grid and Cloud Computing ◽

10.4018/978-1-4666-0879-5.ch406 ◽

2012 ◽

pp. 862-880

Author(s):

Russ Miller ◽

Charles Weeks

Keyword(s):

Molecular Structure ◽

New York ◽

Data Storage ◽

New York State ◽

Open Science ◽

Data Sets ◽

Distributed Resources ◽

Data Repositories ◽

Computational Resources ◽

Open Science Grid

Grids represent an emerging technology that allows geographically- and organizationally-distributed resources (e.g., computer systems, data repositories, sensors, imaging systems, and so forth) to be linked in a fashion that is transparent to the user. The New York State Grid (NYS Grid) is an integrated computational and data grid that provides access to a wide variety of resources to users from around the world. NYS Grid can be accessed via a Web portal, where the users have access to their data sets and applications, but do not need to be made aware of the details of the data storage or computational devices that are specifically employed in solving their problems. Grid-enabled versions of the SnB and BnP programs, which implement the Shake-and-Bake method of molecular structure (SnB) and substructure (BnP) determination, respectively, have been deployed on NYS Grid. Further, through the Grid Portal, SnB has been run simultaneously on all computational resources on NYS Grid as well as on more than 1100 of the over 3000 processors available through the Open Science Grid.

Download Full-text