Efficient data transfer protocols for big data

Author(s):  
Brian Tierney ◽  
Ezra Kissel ◽  
Martin Swany ◽  
Eric Pouyoul
2020 ◽  
Vol 22 (2) ◽  
pp. 130-144
Author(s):  
Aiqin Hou ◽  
Chase Qishi Wu ◽  
Liudong Zuo ◽  
Xiaoyang Zhang ◽  
Tao Wang ◽  
...  

2018 ◽  
Vol 8 (11) ◽  
pp. 2216
Author(s):  
Jiahui Jin ◽  
Qi An ◽  
Wei Zhou ◽  
Jiakai Tang ◽  
Runqun Xiong

Network bandwidth is a scarce resource in big data environments, so data locality is a fundamental problem for data-parallel frameworks such as Hadoop and Spark. This problem is exacerbated in multicore server-based clusters, where multiple tasks running on the same server compete for the server’s network bandwidth. Existing approaches solve this problem by scheduling computational tasks near the input data and considering the server’s free time, data placements, and data transfer costs. However, such approaches usually set identical values for data transfer costs, even though a multicore server’s data transfer cost increases with the number of data-remote tasks. Eventually, this hampers data-processing time, by minimizing it ineffectively. As a solution, we propose DynDL (Dynamic Data Locality), a novel data-locality-aware task-scheduling model that handles dynamic data transfer costs for multicore servers. DynDL offers greater flexibility than existing approaches by using a set of non-decreasing functions to evaluate dynamic data transfer costs. We also propose online and offline algorithms (based on DynDL) that minimize data-processing time and adaptively adjust data locality. Although DynDL is NP-complete (nondeterministic polynomial-complete), we prove that the offline algorithm runs in quadratic time and generates optimal results for DynDL’s specific uses. Using a series of simulations and real-world executions, we show that our algorithms are 30% better than algorithms that do not consider dynamic data transfer costs in terms of data-processing time. Moreover, they can adaptively adjust data localities based on the server’s free time, data placement, and network bandwidth, and schedule tens of thousands of tasks within subseconds or seconds.


2012 ◽  
Vol E95.D (12) ◽  
pp. 2852-2859
Author(s):  
Yutaka KAWAI ◽  
Adil HASAN ◽  
Go IWAI ◽  
Takashi SASAKI ◽  
Yoshiyuki WATASE
Keyword(s):  

Author(s):  
Ewa Niewiadomska-Szynkiewicz ◽  
Michał P. Karpowicz

Progress in life, physical sciences and technology depends on efficient data-mining and modern computing technologies. The rapid growth of data-intensive domains requires a continuous development of new solutions for network infrastructure, servers and storage in order to address Big Datarelated problems. Development of software frameworks, include smart calculation, communication management, data decomposition and allocation algorithms is clearly one of the major technological challenges we are faced with. Reduction in energy consumption is another challenge arising in connection with the development of efficient HPC infrastructures. This paper addresses the vital problem of energy-efficient high performance distributed and parallel computing. An overview of recent technologies for Big Data processing is presented. The attention is focused on the most popular middleware and software platforms. Various energy-saving approaches are presented and discussed as well.


Author(s):  
Suriya Murugan ◽  
Sumithra M. G.

Cognitive radio has emerged as a promising candidate solution to improve spectrum utilization in next generation wireless networks. Spectrum sensing is one of the main challenges encountered by cognitive radio and the application of big data is a powerful way to solve various problems. However, for the increasingly tense spectrum resources, the prediction of cognitive radio based on big data is an inevitable trend. The signal data from various sources is analyzed using the big data cognitive radio framework and efficient data analytics can be performed using different types of machine learning techniques. This chapter analyses the process of spectrum sensing in cognitive radio, the challenges to process spectrum data and need for dynamic machine learning algorithms in decision making process.


2012 ◽  
pp. 502-516
Author(s):  
Muzhou Xiong ◽  
Hai Jin

In this chapter, two algorithms have been presented for supporting efficient data transfer in the Grid environment. From a node’s perspective, a multiple data transfer channel can be formed by selecting some other nodes as relays in data transfer. One algorithm requires the sender to be aware of the global connection information while another does not. Experimental results indicate that both algorithms can transfer data efficiently under various circumstances.


Sign in / Sign up

Export Citation Format

Share Document