Fault-Tolerant Scheduling of Fine-Grained Tasks in Grid Environments

Reconfigurable computing for DSP remains an active area to explore as the need for incorporation with more conventional DSP technologies turn out to be obvious. Conventionally, the majority of the work in the area of reconfigurable computing is aimed on fine grained FPGA devices. Over the years, the focus is shifted from bit level granularity to a coarse grained composition. FIR filter remains and persist to be an important building block in various DSP systems. It computes the output by multiplying input samples with a set of coefficients followed by addition. Here multipliers and adders are modeled using the concept of divide and conquer. For developing a reconfiguarble FIR filter, different tap filters are designed as separate reconfigurable modules. Furthermore, there is an additional concern for making the system fault tolerant. A fault detection mechanism is introduced to detect the faults based on the nature of operands. The reconfigurable modules are structurally modeled in Verilog HDL and simulated and synthesized using Xilinx ISE 14.2. A comparison of the device utilization of reconfigurable modules is also presented in this paper by implementing the design on various Virtex FPGA devices.

Download Full-text

Fault-tolerant resource allocation for query processing in grid environments

International Journal of Web and Grid Services ◽

10.1504/ijwgs.2015.068895 ◽

2015 ◽

Vol 11 (2) ◽

pp. 143 ◽

Cited By ~ 2

Author(s):

Deniz Çokuslu ◽

Abdelkader Hameurlain ◽

Kayhan Erciyes

Keyword(s):

Resource Allocation ◽

Query Processing ◽

Fault Tolerant ◽

Grid Environments

Download Full-text

Empowering end hosts with fine grained and multiple connectivity circuits in optical grid environments

2007 First International Symposium on Advanced Networks and Telecommunication Systems ◽

10.1109/ants.2007.4620210 ◽

2007 ◽

Author(s):

Weiqiang Sun ◽

Guowu Xie ◽

Yaohui Jin ◽

Wei Guo ◽

Weisheng Hu

Keyword(s):

Fine Grained ◽

Grid Environments ◽

Optical Grid

Download Full-text

A fine-grained link-level fault-tolerant mechanism for networks-on-chip

2010 IEEE International Conference on Computer Design ◽

10.1109/iccd.2010.5647663 ◽

2010 ◽

Cited By ~ 9

Author(s):

Arseniy Vitkovskiy ◽

Vassos Soteriou ◽

Chrysostomos Nicopoulos

Keyword(s):

Fault Tolerant ◽

Networks On Chip ◽

Fine Grained ◽

On Chip

Download Full-text

Unified fault-tolerance framework for hybrid task-parallel message-passing applications

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016669416 ◽

2016 ◽

Vol 32 (5) ◽

pp. 641-657 ◽

Cited By ~ 5

Author(s):

Omer Subasi ◽

Tatiana Martsinkevich ◽

Ferad Zyulkyarov ◽

Osman Unsal ◽

Jesus Labarta ◽

...

Keyword(s):

Fault Tolerance ◽

Performance Improvement ◽

Message Passing ◽

Message Passing Interface ◽

Fault Tolerant ◽

Performance Score ◽

Fine Grained ◽

Transient Errors ◽

Task Parallel ◽

Complete Failure

We present a unified fault-tolerance framework for task-parallel message-passing applications to mitigate transient errors. First, we propose a fault-tolerant message-logging protocol that only requires the restart of the task that experienced the error and transparently handles any message passing interface calls inside the task. In our experiments we demonstrate that our fault-tolerant solution has a reasonable overhead, with a maximum observed overhead of 4.5%. We also show that fine-grained parallelization is important for hiding the overheads related to the protocol as well as the recovery of tasks. Secondly, we develop a mathematical model to unify task-level checkpointing and our protocol with system-wide checkpointing in order to provide complete failure coverage. We provide closed formulas for the optimal checkpointing interval and the performance score of the unified scheme. Experimental results show that the performance improvement can be as high as 98% with the unified scheme.

Download Full-text

Fine-Grained Fault-Tolerant Adaptive Routing for Networks-on-Chip

Algorithms and Architectures for Parallel Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-319-27140-8_34 ◽

2015 ◽

pp. 492-505 ◽

Cited By ~ 1

Author(s):

Junxiu Liu ◽

Jim Harkin ◽

Liam Maguire ◽

Yuhua Li ◽

Lei Wan ◽

...

Keyword(s):

Fault Tolerant ◽

Adaptive Routing ◽

Networks On Chip ◽

Fine Grained ◽

On Chip

Download Full-text

Very fine-grained fault-tolerant routing algorithm of NoC based on buffer reuse

2013 IEEE 4th International Conference on Software Engineering and Service Science ◽

10.1109/icsess.2013.6615416 ◽

2013 ◽

Cited By ~ 3

Author(s):

Shijian Zhang ◽

Guodong Han ◽

Fan Zhang

Keyword(s):

Fault Tolerant ◽

Routing Algorithm ◽

Fine Grained

Download Full-text

Failure Detection Protocols in the Application Layer

Application-Layer Fault-Tolerance Protocols ◽

10.4018/978-1-60566-182-7.ch008 ◽

2009 ◽

pp. 250-274

Author(s):

Vincenzo De Florio

Keyword(s):

Fault Tolerant ◽

Failure Detection ◽

Asynchronous Systems ◽

Application Layer ◽

Distributed Consensus ◽

Fine Grained ◽

Impossibility Results ◽

Fundamental Building Block ◽

Unreliable Failure Detectors

Failure detection is a fundamental building block to develop fault-tolerant distributed systems. Accurate failure detection in asynchronous systems (Chapter II) is notoriously difficult, as it is impossible to tell whether a process has actually failed or it is just slow. Because of this, several impossibility results have been derived—see for instance the well-known paper (Fischer, Lynch, & Paterson, 1985). As a consequence of these pessimistic results, many researchers have devoted their time and abilities to understanding how to reformulate the concept of system model in a fine-grained alternative way. Their goal was being able to tackle problems such as distributed consensus with the minimal requirements on the system environment. This brought to the theory of unreliable failure detectors for reliable systems, pioneered by the works of Chandra and Toueg (Chandra & Toueg, 1996). This chapter introduces these concepts and the formulation of failure detection protocols in the application layer. In particular a linguistic framework is proposed for the expression of those protocols. As a case study it is described the algorithm for failure detection used in the EFTOS DIR net and in the TIRAN Backbone—that is, the fault-tolerance managers introduced respectively in Chapter III and Chapter VI.

Download Full-text

Self-Adapting Event Configuration in Ubiquitous Wireless Sensor Networks

Technological Innovations in Adaptive and Dependable Systems ◽

10.4018/978-1-4666-0255-7.ch007 ◽

2012 ◽

pp. 109-126

Author(s):

Steffen Ortmann ◽

Michael Maaser ◽

Peter Langendoerfer

Keyword(s):

Energy Efficiency ◽

Wireless Sensor Networks ◽

Sensor Networks ◽

Event Detection ◽

Environmental Changes ◽

Fault Tolerant ◽

Low Cost ◽

Wireless Sensor ◽

Fine Grained ◽

Event Definition

Wireless Sensor Networks are the key-enabler for low cost ubiquitous applications in the area of homeland security, health-care, and environmental monitoring. A necessary prerequisite is reliable and efficient event detection in spite of sudden failures and environmental changes. Due to the fact that the sensors need to be low cost, they have only scarce resources leading to a certain level of failures of sensor nodes or sensing devices attached to the nodes. Available fault tolerant solutions are mainly customized approaches that revealed several shortcomings, particularly in adaptability and energy efficiency. The authors present a complete event detection concept including all necessary steps from formal event definition to autonomous device configuration. It features an event definition language that allows defining complex events as well as enhance the reliability by tailor-made voting schemes and application constraints. Based on that, this paper introduces a novel approach for self-adapting on-node and in-network processing, called Event Decision Tree (EDT). EDT autonomously adapts to available resources and environmental conditions, even though it requires to (re-)organize collaboration between neighboring nodes for evaluation. The authors’ approach achieves fine-grained event-related fault tolerance with configurable adaptation rate while enhancing maintainability and energy efficiency.

Download Full-text