Static Fault-Tolerant Strategy for High Performance Computing Platform

2014 ◽  
Vol 989-994 ◽  
pp. 1810-1813
Author(s):  
Yu Sun ◽  
Jun Liu

It is an important research issue to ensure the computation correctness for parallel application and enhance the using rate of dynamic computing resource in distributed computing system. Based on the previous high performance distributing computing system, a fault-tolerant and task scheduler was developed, which combined the breathe mechanism, fault-discover mechanism and subtask reschedule mechanism. Experiments show that the fault-tolerant and task-scheduler has good performance and ensures the computation correctness even if when some computing resources fail.

2012 ◽  
Vol 2012 ◽  
pp. 1-10 ◽  
Author(s):  
Ra Inta ◽  
David J. Bowman ◽  
Susan M. Scott

The nature of modern astronomy means that a number of interesting problems exhibit a substantial computational bound and this situation is gradually worsening. Scientists, increasingly fighting for valuable resources on conventional high-performance computing (HPC) facilities—often with a limited customizable user environment—are increasingly looking to hardware acceleration solutions. We describe here a heterogeneous CPU/GPGPU/FPGA desktop computing system (the “Chimera”), built with commercial-off-the-shelf components. We show that this platform may be a viable alternative solution to many common computationally bound problems found in astronomy, however, not without significant challenges. The most significant bottleneck in pipelines involving real data is most likely to be the interconnect (in this case the PCI Express bus residing on the CPU motherboard). Finally, we speculate on the merits of our Chimera system on the entire landscape of parallel computing, through the analysis of representative problems from UC Berkeley’s “Thirteen Dwarves.”


2019 ◽  
Vol 16 (2) ◽  
pp. 768-772 ◽  
Author(s):  
R. Jothikumar ◽  
Kumar Subramaniam ◽  
Siva G. Shanmugam ◽  
S. Susi

Traditional voting system has been replaced by electronic voting systems in most places increasingly. It is efficient, but not efficient enough in terms of cost and capacity. High Performance Computing (HPC) in Cloud computing is a relatively new concept which has been replacing the traditional systems. The HPC has been widely used over the recent years because of its efficiency, reliability, speed and cost. Whereas in the traditional super computing system a lot of cost is involved. The ability of integration of HPC with cloud provided enormous growth in the area of parallel processing and computing. This system may advocate to push the case of promoting electronic voting system for higher traffic scenarios with lower cost requirements. This paper proposes an idea of implementing a fully formed e-voting system integrated with both HPC and Cloud Computing.


2019 ◽  
Author(s):  
Weiming Hu ◽  
Guido Cervone ◽  
Vivek Balasubramanian ◽  
Matteo Turilli ◽  
Shantenu Jha

2013 ◽  
Vol 23 (04) ◽  
pp. 1340011 ◽  
Author(s):  
FAISAL SHAHZAD ◽  
MARKUS WITTMANN ◽  
MORITZ KREUTZER ◽  
THOMAS ZEISER ◽  
GEORG HAGER ◽  
...  

The road to exascale computing poses many challenges for the High Performance Computing (HPC) community. Each step on the exascale path is mainly the result of a higher level of parallelism of the basic building blocks (i.e., CPUs, memory units, networking components, etc.). The reliability of each of these basic components does not increase at the same rate as the rate of hardware parallelism. This results in a reduction of the mean time to failure (MTTF) of the whole system. A fault tolerance environment is thus indispensable to run large applications on such clusters. Checkpoint/Restart (C/R) is the classic and most popular method to minimize failure damage. Its ease of implementation makes it useful, but typically it introduces significant overhead to the application. Several efforts have been made to reduce the C/R overhead. In this paper we compare various C/R techniques for their overheads by implementing them on two different categories of applications. These approaches are based on parallel-file-system (PFS)-level checkpoints (synchronous/asynchronous) and node-level checkpoints. We utilize the Scalable Checkpoint/Restart (SCR) library for the comparison of node-level checkpoints. For asynchronous PFS-level checkpoints, we use the Damaris library, the SCR asynchronous feature, and application-based checkpointing via dedicated threads. Our baseline for overhead comparison is the naïve application-based synchronous PFS-level checkpointing method. A 3D lattice-Boltzmann (LBM) flow solver and a Lanczos eigenvalue solver are used as prototypical applications in which all the techniques considered here may be applied.


2017 ◽  
Vol 33 (2) ◽  
pp. 119-130
Author(s):  
Vinh Van Le ◽  
Hoai Van Tran ◽  
Hieu Ngoc Duong ◽  
Giang Xuan Bui ◽  
Lang Van Tran

Metagenomics is a powerful approach to study environment samples which do not require the isolation and cultivation of individual organisms. One of the essential tasks in a metagenomic project is to identify the origin of reads, referred to as taxonomic assignment. Due to the fact that each metagenomic project has to analyze large-scale datasets, the metatenomic assignment is very much computation intensive. This study proposes a parallel algorithm for the taxonomic assignment problem, called SeMetaPL, which aims to deal with the computational challenge. The proposed algorithm is evaluated with both simulated and real datasets on a high performance computing system. Experimental results demonstrate that the algorithm is able to achieve good performance and utilize resources of the system efficiently. The software implementing the algorithm and all test datasets can be downloaded at http://it.hcmute.edu.vn/bioinfo/metapro/SeMetaPL.html.


Sign in / Sign up

Export Citation Format

Share Document