scholarly journals An Approach for Modeling and Ranking Node-Level Stragglers in Cloud Datacenters

Author(s):  
Xue Ouyang ◽  
Peter Garraghan ◽  
Changjian Wang ◽  
Paul Townend ◽  
Jie Xu
Keyword(s):  
2021 ◽  
pp. 088541222199941
Author(s):  
Bokyong Shin

Although social capital is a relational concept, existing studies have focused less on measuring social relations. This article fills the gap by reviewing recent studies that used network measures grouped into three types according to the measurement level. The first group defined social capital as an individual asset and used node-level measures to explain personal benefits. The second group defined social capital as a collective asset and used graph-level measures to describe collective properties. The third group used subgraph-level measures to explain the development of social capital. This article offers a link between the concepts and measures of social capital.


2021 ◽  
Author(s):  
Víctor Costumero ◽  
Patricia Rosell Negre ◽  
Juan Carlos Bustamante ◽  
Paola Fuentes‐Claramonte ◽  
Jesús Adrián‐Ventura ◽  
...  
Keyword(s):  

1995 ◽  
Vol 169 (4) ◽  
pp. 382-385 ◽  
Author(s):  
Yosuke Adachi ◽  
Tatsuo Oshiro ◽  
Toshiro Okuyama ◽  
Tatsuro Kamakura ◽  
Masaki Mori ◽  
...  

2013 ◽  
Vol 23 (04) ◽  
pp. 1340011 ◽  
Author(s):  
FAISAL SHAHZAD ◽  
MARKUS WITTMANN ◽  
MORITZ KREUTZER ◽  
THOMAS ZEISER ◽  
GEORG HAGER ◽  
...  

The road to exascale computing poses many challenges for the High Performance Computing (HPC) community. Each step on the exascale path is mainly the result of a higher level of parallelism of the basic building blocks (i.e., CPUs, memory units, networking components, etc.). The reliability of each of these basic components does not increase at the same rate as the rate of hardware parallelism. This results in a reduction of the mean time to failure (MTTF) of the whole system. A fault tolerance environment is thus indispensable to run large applications on such clusters. Checkpoint/Restart (C/R) is the classic and most popular method to minimize failure damage. Its ease of implementation makes it useful, but typically it introduces significant overhead to the application. Several efforts have been made to reduce the C/R overhead. In this paper we compare various C/R techniques for their overheads by implementing them on two different categories of applications. These approaches are based on parallel-file-system (PFS)-level checkpoints (synchronous/asynchronous) and node-level checkpoints. We utilize the Scalable Checkpoint/Restart (SCR) library for the comparison of node-level checkpoints. For asynchronous PFS-level checkpoints, we use the Damaris library, the SCR asynchronous feature, and application-based checkpointing via dedicated threads. Our baseline for overhead comparison is the naïve application-based synchronous PFS-level checkpointing method. A 3D lattice-Boltzmann (LBM) flow solver and a Lanczos eigenvalue solver are used as prototypical applications in which all the techniques considered here may be applied.


Sign in / Sign up

Export Citation Format

Share Document