Fault Tolerance in Cluster Computing System

Simulation of the interactions between particles and matter in studies for developing X-rays detectors generally requires very long calculation times (up to several days or weeks). These times are often a serious limitation for the success of the simulations and for the accuracy of the simulated models. One of the tools used by the scientific community to perform these simulations is Geant4 (Geometry And Tracking) [2, 3]. On the best of experience in the design of the AVES cluster computing system, Federici et al. [1], the IAPS (Istituto di Astrofisica e Planetologia Spaziali INAF) laboratories were able to develop a cluster computer system dedicated to Geant 4. The Cluster is easy to use and easily expandable, and thanks to the design criteria adopted it achieves an excellent compromise between performance and cost. The management software developed for the Cluster splits the single instance of simulation on the cores available, allowing the use of software written for serial computation to reach a computing speed similar to that obtainable from a native parallel software. The simulations carried out on the Cluster showed an increase in execution time by a factor of 20 to 60 compared to the times obtained with the use of a single PC of medium quality.

Download Full-text

A Comparative Analysis of Performance of Shared Memory Cluster Computing Interconnection Systems

Journal of Computer Networks and Communications ◽

10.1155/2014/128438 ◽

2014 ◽

Vol 2014 ◽

pp. 1-9

Author(s):

Minakshi Tripathy ◽

C. R. Tripathy

Keyword(s):

Comparative Analysis ◽

Fault Tolerance ◽

Load Balancing ◽

Shared Memory ◽

Cluster Computing ◽

Distributed Shared Memory ◽

System Size ◽

Cluster Architecture ◽

Analysis Of Performance ◽

Made In

In recent past, many types of shared memory cluster computing interconnection systems have been proposed. Each of these systems has its own advantages and limitations. With the increase in system size of the cluster interconnection systems, the comparative analysis of their various performance measures becomes quite inevitable. The cluster architecture, load balancing, and fault tolerance are some of the important aspects, which need to be addressed. The comparison needs to be made in order to choose the best one for a particular application. In this paper, a detailed comparative study on four important and different classes of shared memory cluster architectures has been made. The systems taken up for the purpose of the study are shared memory clusters, hierarchical shared memory clusters, distributed shared memory clusters, and the virtual distributed shared memory clusters. These clusters are analyzed and compared on the basis of the architecture, load balancing, and fault tolerance aspects. The results of comparison are reported.

Download Full-text

HAMR: A dataflow-based real-time in-memory cluster computing engine

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016672080 ◽

2016 ◽

Vol 31 (5) ◽

pp. 361-374 ◽

Cited By ~ 3

Author(s):

Yao Wu ◽

Long Zheng ◽

Brian Heilig ◽

Guang R Gao

Keyword(s):

Big Data ◽

Memory Management ◽

High Performance ◽

Cluster Computing ◽

Programming Model ◽

Distributed Processing ◽

Large Data ◽

Computing System ◽

Fine Grain ◽

Execution Model

As the attention given to big data grows, cluster computing systems for distributed processing of large data sets become the mainstream and critical requirement in high performance distributed system research. One of the most successful systems is Hadoop, which uses MapReduce as a programming/execution model and takes disks as intermedia to process huge volumes of data. Spark, as an in-memory computing engine, can solve the iterative and interactive problems more efficiently. However, currently it is a consensus that they are not the final solutions to big data due to a MapReduce-like programming model, synchronous execution model and the constraint that only supports batch processing, and so on. A new solution, especially, a fundamental evolution is needed to bring big data solutions into a new era. In this paper, we introduce a new cluster computing system called HAMR which supports both batch and streaming processing. To achieve better performance, HAMR integrates high performance computing approaches, i.e. dataflow fundamental into a big data solution. With more specifications, HAMR is fully designed based on in-memory computing to reduce the unnecessary disk access overhead; task scheduling and memory management are in fine-grain manner to explore more parallelism; asynchronous execution improves efficiency of computation resource usage, and also makes workload balance across the whole cluster better. The experimental results show that HAMR can outperform Hadoop MapReduce and Spark by up to 19x and 7x respectively, in the same cluster environment. Furthermore, HAMR can handle scaling data size well beyond the capabilities of Spark.

Download Full-text