Rewiring 2 Links Is Enough: Accelerating Failure Recovery in Production Data Center Networks

Author(s):  
Guo Chen ◽  
Youjian Zhao ◽  
Dan Pei ◽  
Dan Li
2017 ◽  
Vol 25 (4) ◽  
pp. 1940-1953 ◽  
Author(s):  
Guo Chen ◽  
Youjian Zhao ◽  
Hailiang Xu ◽  
Dan Pei ◽  
Dan Li

Electronics ◽  
2020 ◽  
Vol 9 (8) ◽  
pp. 1187
Author(s):  
Yunhe Cui ◽  
Qing Qian ◽  
Guowei Shen ◽  
Chun Guo ◽  
Saifei Li

As a repository that holds computing facilities, storage facilities, network facilities and other facilities, the Software Defined Data Center (SDDC) can provide computing and storage resources for users. For a SDDC, it is important to provide continuous services for users. Hence, in order to achieve high reliability in Software Defined Data Center Networks (SDDCNs), a network failure recovery method for software defined data center networks (REVERT) is proposed to recover failures in SDDCNs. In REVERT, the network failures that occurred in SDDCNs are classified into three types, which are switch failure, failure of links among switches and failure of links between switches and servers. Specially, except recovering the switch failure and failure of links between switches, REVERT can also recover the failures of links between the switches and servers. To achieve that, a failure preprocessing method used to classify the network failures, a data structure for storing and finding the affected flows, a server cluster agent for communicating with the server clustering algorithm and a routing path calculation method are designed in REVERT. Meanwhile, REVERT has been implemented and evaluated on RYU controller and Mininet using three routing algorithms. Compared with the link usage before recovering the network failures, when there are more than 200 flows in the network, the mean link usages only slightly increase at about 1.83 percent. More importantly, the evaluation results also demonstrate that except recovering switch failures, intra-topo link failures, REVERT has the ability of recovering failures of links between servers and edge switches successfully.


2016 ◽  
Vol E99.B (11) ◽  
pp. 2361-2372 ◽  
Author(s):  
Chang RUAN ◽  
Jianxin WANG ◽  
Jiawei HUANG ◽  
Wanchun JIANG

Author(s):  
Jiawei Huang ◽  
Shiqi Wang ◽  
Shuping Li ◽  
Shaojun Zou ◽  
Jinbin Hu ◽  
...  

AbstractModern data center networks typically adopt multi-rooted tree topologies such leaf-spine and fat-tree to provide high bisection bandwidth. Load balancing is critical to achieve low latency and high throughput. Although the per-packet schemes such as Random Packet Spraying (RPS) can achieve high network utilization and near-optimal tail latency in symmetric topologies, they are prone to cause significant packet reordering and degrade the network performance. Moreover, some coding-based schemes are proposed to alleviate the problem of packet reordering and loss. Unfortunately, these schemes ignore the traffic characteristics of data center network and cannot achieve good network performance. In this paper, we propose a Heterogeneous Traffic-aware Partition Coding named HTPC to eliminate the impact of packet reordering and improve the performance of short and long flows. HTPC smoothly adjusts the number of redundant packets based on the multi-path congestion information and the traffic characteristics so that the tailing probability of short flows and the timeout probability of long flows can be reduced. Through a series of large-scale NS2 simulations, we demonstrate that HTPC reduces average flow completion time by up to 60% compared with the state-of-the-art mechanisms.


Sign in / Sign up

Export Citation Format

Share Document