Epidemic failure detection and consensus for extreme parallelism

Future extreme-scale high-performance computing systems will be required to work under frequent component failures. The MPI Forum’s User Level Failure Mitigation proposal has introduced an operation, MPI_Comm_shrink, to synchronize the alive processes on the list of failed processes, so that applications can continue to execute even in the presence of failures by adopting algorithm-based fault tolerance techniques. This MPI_Comm_shrink operation requires a failure detection and consensus algorithm. This paper presents three novel failure detection and consensus algorithms using Gossiping. Stochastic pinging is used to quickly detect failures during the execution of the algorithm, failures are then disseminated to all the fault-free processes in the system and consensus on the failures is detected using the three consensus techniques. The proposed algorithms were implemented and tested using the Extreme-scale Simulator. The results show that the stochastic pinging detects all the failures in the system. In all the algorithms, the number of Gossip cycles to achieve global consensus scales logarithmically with system size. The second algorithm also shows better scalability in terms of memory and network bandwidth usage and a perfect synchronization in achieving global consensus. The third approach is a three-phase distributed failure detection and consensus algorithm and provides consistency guarantees even in very large and extreme-scale systems while at the same time being memory and bandwidth efficient.

Download Full-text

Determination of 4 phenols in water by three phase hollow fiber microextraction coupled with high performance liquid chromatography

Chinese Journal of Chromatography ◽

10.3724/sp.j.1123.2011.00054 ◽

2011 ◽

Vol 29 (1) ◽

pp. 54-58

Author(s):

Chao WEI ◽

Hengjun LU ◽

Meilan CHEN ◽

Yan ZHU

Keyword(s):

High Performance Liquid Chromatography ◽

Liquid Chromatography ◽

Hollow Fiber ◽

High Performance ◽

Three Phase

Download Full-text

Three-phase Hollow fiber Liquid-phase Micro Extraction for Determination and Analysis of Terazosin in Biological Fluids Via High Performance Liquid Chromatography at Trace Levels

Current Analytical Chemistry ◽

10.2174/1573411012666151030212948 ◽

2016 ◽

Vol 12 (5) ◽

pp. 489-495 ◽

Cited By ~ 6

Author(s):

Sahel Emadzadeh ◽

Mahnaz Qomi ◽

Mohammadreza Saadat ◽

Foroozan Piroozi

Keyword(s):

High Performance Liquid Chromatography ◽

Liquid Chromatography ◽

Liquid Phase ◽

Hollow Fiber ◽

High Performance ◽

Biological Fluids ◽

Three Phase

Download Full-text

An improved high performance three phase AC-DC boost converter with input power factor correction

IET-UK International Conference on Information and Communication Technology in Electrical Sciences (ICTES 2007) ◽

10.1049/ic:20070615 ◽

2007 ◽

Author(s):

V. Jaikumar ◽

G. Ravi ◽

V. Vijayavelan ◽

M. Kaliamoorthy

Keyword(s):

Power Factor ◽

High Performance ◽

Power Factor Correction ◽

Input Power ◽

Boost Converter ◽

Three Phase

Download Full-text

Toward a Faster Fault Tolerant Consensus to Maintain Data Consistency in Collaborative Environments

International Journal of Cooperative Information Systems ◽

10.1142/s0218843017500022 ◽

2017 ◽

Vol 26 (03) ◽

pp. 1750002

Author(s):

Fouad Hanna ◽

Lionel Droz-Bartholet ◽

Jean-Christophe Lapayre

Keyword(s):

Network Model ◽

Fault Tolerant ◽

Data Consistency ◽

Consensus Algorithm ◽

Simulation Platform ◽

Consensus Problem ◽

Collaborative Environments ◽

Consensus Algorithms ◽

Shared Data ◽

Simultaneous Process

The consensus problem has become a key issue in the field of collaborative telemedicine systems because of the need to guarantee the consistency of shared data. In this paper, we focus on the performance of consensus algorithms. First, we studied, in the literature, the most well-known algorithms in the domain. Experiments on these algorithms allowed us to propose a new algorithm that enhances the performance of consensus in different situations. During 2014, we presented our very first initial thoughts to enhance the performance of the consensus algorithms, but the proposed solution gave very moderate results. The goal of this paper is to present a new enhanced consensus algorithm, named Fouad, Lionel and J.-Christophe (FLC). This new algorithm was built on the architecture of the Mostefaoui-Raynal (MR) consensus algorithm and integrates new features and some known techniques in order to enhance the performance of consensus in situations where process crashes are present in the system. The results from our experiments running on the simulation platform Neko show that the FLC algorithm gives the best performance when using a multicast network model on different scenarios: in the first scenario, where there are no process crashes nor wrong suspicion, and even in the second one, where multiple simultaneous process crashes take place in the system.

Download Full-text

Input switched high performance three phase Buck-Boost controlled rectifier

2013 IEEE International Conference on Industrial Technology (ICIT) ◽

10.1109/icit.2013.6505732 ◽

2013 ◽

Cited By ~ 8

Author(s):

M. M. S. Khan ◽

M. S. Arifin ◽

M. H. Rahaman ◽

I. K. Amin ◽

M. R. T. Hossain ◽

...

Keyword(s):

High Performance ◽

Three Phase

Download Full-text

Application-based fault tolerance techniques for sparse matrix solvers

The International Journal of High Performance Computing Applications ◽

10.1177/1094342017694946 ◽

2017 ◽

Vol 32 (5) ◽

pp. 627-640

Author(s):

Simon McIntosh–Smith ◽

Rob Hunt ◽

James Price ◽

Alex Warwick Vesztrocy

Keyword(s):

Fault Tolerance ◽

High Performance Computing ◽

High Performance ◽

Sparse Matrix ◽

Sparse Matrices ◽

Error Correcting Codes ◽

Computing Systems ◽

Hardware Costs ◽

Extreme Scale ◽

Performance Computing

High-performance computing systems continue to increase in size in the quest for ever higher performance. The resulting increased electronic component count, coupled with the decrease in feature sizes of the silicon manufacturing processes used to build these components, may result in future exascale systems being more susceptible to soft errors caused by cosmic radiation than in current high-performance computing systems. Through the use of techniques such as hardware-based error-correcting codes and checkpoint-restart, many of these faults can be mitigated at the cost of increased hardware overhead, run-time, and energy consumption that can be as much as 10–20%. Some predictions expect these overheads to continue to grow over time. For extreme scale systems, these overheads will represent megawatts of power consumption and millions of dollars of additional hardware costs, which could potentially be avoided with more sophisticated fault-tolerance techniques. In this paper we present new software-based fault tolerance techniques that can be applied to one of the most important classes of software in high-performance computing: iterative sparse matrix solvers. Our new techniques enables us to exploit knowledge of the structure of sparse matrices in such a way as to improve the performance, energy efficiency, and fault tolerance of the overall solution.

Download Full-text

Dynamic Integration and Management of Opportunistic Resources for HEP

EPJ Web of Conferences ◽

10.1051/epjconf/201921408009 ◽

2019 ◽

Vol 214 ◽

pp. 08009 ◽

Cited By ~ 1

Author(s):

Matthias J. Schnepf ◽

R. Florian von Cube ◽

Max Fischer ◽

Manuel Giffels ◽

Christoph Heidecker ◽

...

Keyword(s):

High Performance ◽

High Energy Physics ◽

Job Scheduling ◽

High Energy ◽

Software Environment ◽

Resource Manager ◽

Network Bandwidth ◽

The Status ◽

Single Entry ◽

Institute Of Technology

Demand for computing resources in high energy physics (HEP) shows a highly dynamic behavior, while the provided resources by the Worldwide LHC Computing Grid (WLCG) remains static. It has become evident that opportunistic resources such as High Performance Computing (HPC) centers and commercial clouds are well suited to cover peak loads. However, the utilization of these resources gives rise to new levels of complexity, e.g. resources need to be managed highly dynamically and HEP applications require a very specific software environment usually not provided at opportunistic resources. Furthermore, aspects to consider are limitations in network bandwidth causing I/O-intensive workflows to run inefficiently. The key component to dynamically run HEP applications on opportunistic resources is the utilization of modern container and virtualization technologies. Based on these technologies, the Karlsruhe Institute of Technology (KIT) has developed ROCED, a resource manager to dynamically integrate and manage a variety of opportunistic resources. In combination with ROCED, HTCondor batch system acts as a powerful single entry point to all available computing resources, leading to a seamless and transparent integration of opportunistic resources into HEP computing. KIT is currently improving the resource management and job scheduling by focusing on I/O requirements of individual workflows, available network bandwidth as well as scalability. For these reasons, we are currently developing a new resource manager, called TARDIS. In this paper, we give an overview of the utilized technologies, the dynamic management, and integration of resources as well as the status of the I/O-based resource and job scheduling.

Download Full-text

High Performance Three-Phase Boost-Type Voltage Regulator

ERJ. Engineering Research Journal ◽

10.21608/erjm.2000.71328 ◽

2000 ◽

Vol 23 (2) ◽

pp. 105-117

Author(s):

T. S . Radwan

Keyword(s):

High Performance ◽

Voltage Regulator ◽

Three Phase

Download Full-text

Log Replication in Raft vs Kafka

Studia Universitatis Babeș-Bolyai Informatica ◽

10.24193/subbi.2020.2.05 ◽

2020 ◽

Vol 65 (2) ◽

pp. 66

Author(s):

M. Petrescu ◽

R. Petrescu

Keyword(s):

Distributed Systems ◽

Fault Tolerant ◽

Consensus Algorithm ◽

Correct Operation ◽

Consensus Algorithms ◽

Fault Tolerant System ◽

Multiple Algorithms

The implementation of a fault-tolerant system requires some type of consensus algorithm for correct operation. From Paxos to View-stamped Replication and Raft multiple algorithms have been developed to handle this problem. This paper presents and compares the Raft algorithm and Apache Kafka, a distributed messaging system which, although at a higher level, implements many concepts present in Raft (strong leadership, append-only log, log compaction, etc.).This shows that mechanisms conceived to handle one class of problems (consensus algorithms) are very useful to handle a larger category in the context of distributed systems.

Download Full-text

Implementation on the dSPACE 1104 of VOC-SVM based anti-windup PI Controller of a three-phase PWM rectifier

International Journal of Power Electronics and Drive Systems (IJPEDS) ◽

10.11591/ijpeds.v12.i3.pp1586-1597 ◽

2021 ◽

Vol 12 (3) ◽

pp. 1586

Author(s):

J. Lamterkati ◽

L. Ouboubker ◽

M. Khafallah ◽

A. El afia

Keyword(s):

High Performance ◽

Pulse Width Modulation ◽

Control Method ◽

Stable State ◽

Switching Frequency ◽

Pwm Rectifier ◽

External Loop ◽

Three Phase ◽

Dspace 1104 ◽

Controller Board

<p><span>The study made in this paper concerns the use of the voltage-oriented control (VOC) of three-phase pulse width modulation (PWM) rectifier with constant switching frequency. This control method, called voltage-oriented controlwith space vector modulation (VOC-SVM). The proposed control scheme has been founded on the transformation between stationary (α-β) and and synchronously rotating (d-q) coordinate system, it is based on two cascaded control loops so that a fast inner loop controls the grid current and an external loop DC-link voltage, while the DC-bus voltage is maintained at the desired level and ansured the unity power factor operation. So, the stable state performance and robustness against the load’s disturbance of PWM rectifiers are boths improved. The proposed scheme has been implemented and simulated in MATLAB/Simulink environment. The control system of the VOC-SVM strategy has been built based on dSPACE system with DS1104 controller board. The results obtained show the validity of the model and its control method. Compared with the conventional SPWM method, the VOC-SVM ensures high performance and fast transient response.</span></p>

Download Full-text