scholarly journals Stork data scheduler: mitigating the data bottleneck in e-Science

Author(s):  
Tevfik Kosar ◽  
Mehmet Balman ◽  
Esma Yildirim ◽  
Sivakumar Kulasekaran ◽  
Brandon Ross

In this paper, we present the Stork data scheduler as a solution for mitigating the data bottleneck in e-Science and data-intensive scientific discovery. Stork focuses on planning, scheduling, monitoring and management of data placement tasks and application-level end-to-end optimization of networked inputs/outputs for petascale distributed e-Science applications. Unlike existing approaches, Stork treats data resources and the tasks related to data access and movement as first-class entities just like computational resources and compute tasks, and not simply the side-effect of computation. Stork provides unique features such as aggregation of data transfer jobs considering their source and destination addresses, and an application-level throughput estimation and optimization service. We describe how these two features are implemented in Stork and their effects on end-to-end data transfer performance.

2014 ◽  
Vol 22 (2) ◽  
pp. 173-185 ◽  
Author(s):  
Eli Dart ◽  
Lauren Rotman ◽  
Brian Tierney ◽  
Mary Hester ◽  
Jason Zurawski

The ever-increasing scale of scientific data has become a significant challenge for researchers that rely on networks to interact with remote computing systems and transfer results to collaborators worldwide. Despite the availability of high-capacity connections, scientists struggle with inadequate cyberinfrastructure that cripples data transfer performance, and impedes scientific progress. The ScienceDMZparadigm comprises a proven set of network design patterns that collectively address these problems for scientists. We explain the Science DMZ model, including network architecture, system configuration, cybersecurity, and performance tools, that creates an optimized network environment for science. We describe use cases from universities, supercomputing centers and research laboratories, highlighting the effectiveness of the Science DMZ model in diverse operational settings. In all, the Science DMZ model is a solid platform that supports any science workflow, and flexibly accommodates emerging network technologies. As a result, the Science DMZ vastly improves collaboration, accelerating scientific discovery.


2011 ◽  
Vol 135-136 ◽  
pp. 43-49
Author(s):  
Han Ning Wang ◽  
Wei Xiang Xu ◽  
Chao Long Jia

The application of high-speed railway data, which is an important component of China's transportation science data sharing, has embodied the typical characteristics of data-intensive computing. A reasonable and effective data placement strategy is needed to deploy and execute data-intensive applications in the cloud computing environment. Study results of current data placement approaches have been analyzed and compared in this paper. Combining the semi-definite programming algorithm with the dynamic interval mapping algorithm, a hierarchical structure data placement strategy is proposed. The semi-definite programming algorithm is suitable for the placement of files with various replications, ensuring that different replications of a file are placed on different storage devices. And the dynamic interval mapping algorithm could guarantee better self-adaptability of the data storage system. It has been proved both by theoretical analysis and experiment demonstration that a hierarchical data placement strategy could guarantee the self-adaptability, data reliability and high-speed data access for large-scale networks.


2013 ◽  
Vol 2013 ◽  
pp. 1-16 ◽  
Author(s):  
Shadi A. Issa ◽  
Romeo Kienzler ◽  
Mohamed El-Kalioby ◽  
Peter J. Tonellato ◽  
Dennis Wall ◽  
...  

Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client’s site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide theelastreampackage that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation.


2014 ◽  
Vol 1 (1) ◽  
pp. 9-34
Author(s):  
Bobby Suryajaya

SKK Migas plans to apply end-to-end security based on Web Services Security (WS-Security) for Sistem Operasi Terpadu (SOT). However, there are no prototype or simulation results that can support the plan that has already been communicated to many parties. This paper proposes an experiment that performs PRODML data transfer using WS-Security by altering the WSDL to include encryption and digital signature. The experiment utilizes SoapUI, and successfully loaded PRODML WSDL that had been altered with WSP-Policy based on X.509 to transfer a SOAP message.


Author(s):  
Kurmachalam Ajay Kumar ◽  
Saritha Vemuri ◽  
Ralla Suresh

High speed bulk data transfer is an important part of many data-intensive scientific applications. TCP fails for the transfer of large amounts of data over long distance across high-speed dedicated network links. Due to system hardware is incapable of saturating the bandwidths supported by the network and rise buffer overflow and packet-loss in the system. To overcome this there is a necessity to build a Performance Adaptive-UDP (PA-UDP) protocol for dynamically maximizing the implementation under different systems. A mathematical model and algorithms are used for effective buffer and CPU management. Performance Adaptive-UDP is a supreme protocol than other protocols by maintaining memory processing, packetloss processing and CPU utilization. Based on this protocol bulk data transfer is processed with high speed over the dedicated network links.


2020 ◽  
Author(s):  
Felix Bachofer ◽  
Thomas Esch ◽  
Jakub Balhar ◽  
Martin Boettcher ◽  
Enguerran Boissier ◽  
...  

<p>Urbanization is among the most relevant global trends that affects climate, environment, as well as health and socio-economic development of a majority of the global population. As such, it poses a major challenge for the current urban population and the well-being of the next generation. To understand how to take advantage of opportunities and properly mitigate to the negative impacts of this change, we need precise and up-to-date information of the urban areas. The Urban Thematic Exploitation Platform (UrbanTEP) is a collaborative system, which focuses on the processing of earth observation (EO) data and delivering multi-source information on trans-sectoral urban challenges.</p><p>The U-TEP is developed to provide end-to-end and ready-to-use solutions for a broad spectrum of users (service providers, experts and non-experts) to extract unique information/ indicators required for urban management and sustainability. Key components of the system are an open, web-based portal connected to distributed high-level computing infrastructures and providing key functionalities for</p><p>i) high-performance data access and processing,</p><p>ii) modular and generic state-of-the art pre-processing, analysis, and visualization,</p><p>iii) customized development and sharing of algorithms, products and services, and</p><p>iv) networking and communication.</p><p>The service and product portfolio provides access to the archives of Copernicus and Landsat missions, Datacube technology, DIAS processing environments, as well as premium products like the World Settlement Footprint (WSF). External service providers, as well as researchers can make use of on-demand processing of new data products and the possibility of developing and deploying new processors. The onboarding of service providers, developers and researchers is supported by the Network of Resources program of the European Space Agency (ESA) and the OCRE initiative of the European Commission.</p><p>In order to provide end-to-end solutions, the VISAT tool on UrbanTEP allows analyzing and visualizing project-related geospatial content and to develop storylines to enhance the transport of research output to customers and stakeholders effectively. Multiple visualizations (scopes) are already predefined. One available scope exemplary illustrates the exploitation of the WSF-Evolution dataset by analyzing the settlement and population development for South-East Asian countries from 1985 to 2015 in the context of the Sustainable Development Goal (SDG) 11.3.1 indicator. Other open scopes focus on urban green, functional urban areas, land-use and urban heat island modelling (e.g.).</p>


2014 ◽  
Vol 513 (4) ◽  
pp. 042044 ◽  
Author(s):  
L A T Bauerdick ◽  
K Bloom ◽  
B Bockelman ◽  
D C Bradley ◽  
S Dasu ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document