Efficient Big Data Transfer Using Bandwidth Reservation Service in High-Performance Networks

High-performance networks featuring advance bandwidth reservation have been developed and deployed to support big data transfer in extreme-scale scientific applications. The performance of such big data transfer largely depends on the transport protocols being used. For a given protocol in a given network environment, different parameter settings may lead to different performance, and oftentimes the default settings do not yield the best performance. It is, however, impractical to conduct an exhaustive search in the large parameter space of transport protocols for a set of suitable parameter values. This chapter proposes a stochastic approximation-based transport profiler, namely FastProf, to quickly determine the optimal operational zone of a protocol over dedicated connections. The proposed method is evaluated using both emulations based on real-life measurements and experiments over physical connections. The results show that FastProf significantly reduces the profiling overhead while achieving a comparable level of transport performance with the exhaustive search-based approach.

Download Full-text

Concurrent Bandwidth Reservation Strategies for Big Data Transfers in High-Performance Networks

IEEE Transactions on Network and Service Management ◽

10.1109/tnsm.2015.2430358 ◽

2015 ◽

Vol 12 (2) ◽

pp. 232-247 ◽

Cited By ~ 15

Author(s):

Liudong Zuo ◽

Michelle M. Zhu

Keyword(s):

Big Data ◽

High Performance ◽

Bandwidth Reservation ◽

Data Transfers

Download Full-text

Service Scheduling and Resource Allocation for Big Data Transfer in Elastic Optical Network

GLOBECOM 2020 - 2020 IEEE Global Communications Conference ◽

10.1109/globecom42002.2020.9322185 ◽

2020 ◽

Author(s):

Mehdi Tarhani ◽

Sanjib Sarkar ◽

Mehdi Shadaram

Keyword(s):

Resource Allocation ◽

Big Data ◽

Data Transfer ◽

Optical Network ◽

Service Scheduling ◽

Elastic Optical Network

Download Full-text

Perspectives on High-Performance Computing in a Big Data World

Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '19 ◽

10.1145/3307681.3325410 ◽

2019 ◽

Author(s):

Geoffrey C. Fox

Keyword(s):

Big Data ◽

High Performance Computing ◽

High Performance ◽

Performance Computing

Download Full-text

Computational storage: an efficient and scalable platform for big data and HPC applications

Journal Of Big Data ◽

10.1186/s40537-019-0265-5 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 2

Author(s):

Mahdi Torabzadehkashi ◽

Siavash Rezaei ◽

Ali HeydariGorji ◽

Hosein Bobarshad ◽

Vladimir Alves ◽

...

Keyword(s):

Big Data ◽

High Performance ◽

Distributed Processing ◽

Data Access ◽

Distributed Applications ◽

Process Data ◽

Storage Devices ◽

Hadoop Mapreduce ◽

Big Data Applications ◽

Application Processor

AbstractIn the era of big data applications, the demand for more sophisticated data centers and high-performance data processing mechanisms is increasing drastically. Data are originally stored in storage systems. To process data, application servers need to fetch them from storage devices, which imposes the cost of moving data to the system. This cost has a direct relation with the distance of processing engines from the data. This is the key motivation for the emergence of distributed processing platforms such as Hadoop, which move process closer to data. Computational storage devices (CSDs) push the “move process to data” paradigm to its ultimate boundaries by deploying embedded processing engines inside storage devices to process data. In this paper, we introduce Catalina, an efficient and flexible computational storage platform, that provides a seamless environment to process data in-place. Catalina is the first CSD equipped with a dedicated application processor running a full-fledged operating system that provides filesystem-level data access for the applications. Thus, a vast spectrum of applications can be ported for running on Catalina CSDs. Due to these unique features, to the best of our knowledge, Catalina CSD is the only in-storage processing platform that can be seamlessly deployed in clusters to run distributed applications such as Hadoop MapReduce and HPC applications in-place without any modifications on the underlying distributed processing framework. For the proof of concept, we build a fully functional Catalina prototype and a CSD-equipped platform using 16 Catalina CSDs to run Intel HiBench Hadoop and HPC benchmarks to investigate the benefits of deploying Catalina CSDs in the distributed processing environments. The experimental results show up to 2.2× improvement in performance and 4.3× reduction in energy consumption, respectively, for running Hadoop MapReduce benchmarks. Additionally, thanks to the Neon SIMD engines, the performance and energy efficiency of DFT algorithms are improved up to 5.4× and 8.9×, respectively.

Download Full-text

Efficient Big Data Transfer Using Bandwidth Reservation Service in High-Performance Networks

Intelligent Bandwidth Reservation for Big Data Transfer in High-Performance Networks

High-Performance End-to-End Integrity Verification on Big Data Transfer

On Performance Prediction of Big Data Transfer in High-performance Networks

An Integrated High-Performance Transport Solution for Big Data Transfer over Wide-Area Networks

Using NVMe Gen3 PCIe SSD Cards in High-density Servers for High-performance Big Data Transfer Over Multiple Network Channels

Stochastic Approximation-Based Transport Profiling for Big Data Movement Over Dedicated Connections

Concurrent Bandwidth Reservation Strategies for Big Data Transfers in High-Performance Networks

Service Scheduling and Resource Allocation for Big Data Transfer in Elastic Optical Network

Perspectives on High-Performance Computing in a Big Data World

Computational storage: an efficient and scalable platform for big data and HPC applications

Export Citation Format