failure handling Latest Research Papers

RocksDB: Evolution of Development Priorities in a Key-value Store Serving Large-scale Applications

ACM Transactions on Storage ◽

10.1145/3483840 ◽

2021 ◽

Vol 17 (4) ◽

pp. 1-32

Author(s):

Siying Dong ◽

Andrew Kryczka ◽

Yanqin Jin ◽

Michael Stumm

Keyword(s):

Large Scale ◽

Future Research ◽

Solid State Drives ◽

Open Problems ◽

Failure Handling ◽

Data Formats ◽

Data Corruption ◽

Write Amplification ◽

Protection Mechanisms ◽

Integrity Protection

This article is an eight-year retrospective on development priorities for RocksDB, a key-value store developed at Facebook that targets large-scale distributed systems and that is optimized for Solid State Drives (SSDs). We describe how the priorities evolved over time as a result of hardware trends and extensive experiences running RocksDB at scale in production at a number of organizations: from optimizing write amplification, to space amplification, to CPU utilization. We describe lessons from running large-scale applications, including that resource allocation needs to be managed across different RocksDB instances, that data formats need to remain backward- and forward-compatible to allow incremental software rollouts, and that appropriate support for database replication and backups are needed. Lessons from failure handling taught us that data corruption errors needed to be detected earlier and that data integrity protection mechanisms are needed at every layer of the system. We describe improvements to the key-value interface. We describe a number of efforts that in retrospect proved to be misguided. Finally, we describe a number of open problems that could benefit from future research.

A multiparty session typing discipline for fault-tolerant event-driven distributed programming

Proceedings of the ACM on Programming Languages ◽

10.1145/3485501 ◽

2021 ◽

Vol 5 (OOPSLA) ◽

pp. 1-30

Author(s):

Malte Viering ◽

Raymond Hu ◽

Patrick Eugster ◽

Lukasz Ziarek

Keyword(s):

Fault Tolerant ◽

Failure Detection ◽

Third Party ◽

Distributed Programming ◽

Failure Handling ◽

Session Types ◽

Dynamic Replacement ◽

Event Driven ◽

Industrial Strength ◽

Novel Model

This paper presents a formulation of multiparty session types (MPSTs) for practical fault-tolerant distributed programming. We tackle the challenges faced by session types in the context of distributed systems involving asynchronous and concurrent partial failures – such as supporting dynamic replacement of failed parties and retrying failed protocol segments in an ongoing multiparty session – in the presence of unreliable failure detection. Key to our approach is that we develop a novel model of event-driven concurrency for multiparty sessions. Inspired by real-world practices, it enables us to unify the session-typed handling of regular I/O events with failure handling and the combination of features needed to express practical fault-tolerant protocols. Moreover, the characteristics of our model allow us to prove a global progress property for well-typed processes engaged in multiple concurrent sessions, which does not hold in traditional MPST systems. To demonstrate its practicality, we implement our framework as a toolchain and runtime for Scala, and use it to specify and implement a session-typed version of the cluster management system of the industrial-strength Apache Spark data analytics framework. Our session-typed cluster manager composes with other vanilla Spark components to give a functioning Spark runtime; e.g., it can execute existing third-party Spark applications without code modification. A performance evaluation using the TPC-H benchmark shows our prototype implementation incurs an average overhead below 10%.

Simulation Of Jellyfish Topology Link Failure Handling Using Floyd Warshall and Johnson Algorithm in Software Defined Network Architecture

10.1109/icoict52021.2021.9527523 ◽

2021 ◽

Author(s):

Muhammad Arief Nugroho ◽

Andrian Rakhmatsyah

Keyword(s):

Network Architecture ◽

Link Failure ◽

Software Defined Network ◽

Failure Handling ◽

Johnson Algorithm

Expect the Unexpected: Leveraging the Human-Robot Ecosystem to Handle Unexpected Robot Failures

Frontiers in Robotics and AI ◽

10.3389/frobt.2021.656385 ◽

2021 ◽

Vol 8 ◽

Author(s):

Shanee Honig ◽

Tal Oron-Gilad

Keyword(s):

Human Robot Interaction ◽

Robot Interaction ◽

Unexpected Events ◽

Failure Handling ◽

Holistic Approaches

Unexpected robot failures are inevitable. We propose to leverage socio-technical relations within the human-robot ecosystem to support adaptable strategies for handling unexpected failures. The Theory of Graceful Extensibility is used to understand how characteristics of the ecosystem can influence its ability to respond to unexpected events. By expanding our perspective from Human-Robot Interaction to the Human-Robot Ecosystem, adaptable failure-handling strategies are identified, alongside technical, social and organizational arrangements that are needed to support them. We argue that robotics and HRI communities should pursue more holistic approaches to failure-handling, recognizing the need to embrace the unexpected and consider socio-technical relations within the human robot ecosystem when designing failure-handling strategies.

Failure Handling for Time-Sensitive Networks using SDN and Source Routing

2021 IEEE 7th International Conference on Network Softwarization (NetSoft) ◽

10.1109/netsoft51509.2021.9492666 ◽

2021 ◽

Author(s):

Gagan Nandha Kumar ◽

Kostas Katsalis ◽

Panagiotis Papadimitriou ◽

Paul Pop ◽

Georg Carle

Keyword(s):

Source Routing ◽

Failure Handling

Can Applications Recover from fsync Failures?

ACM Transactions on Storage ◽

10.1145/3450338 ◽

2021 ◽

Vol 17 (2) ◽

pp. 1-30

Author(s):

Anthony Rebello ◽

Yuvraj Patel ◽

Ramnatthan Alagappan ◽

Andrea C. Arpaci-Dusseau ◽

Remzi H. Arpaci-Dusseau

Keyword(s):

File Systems ◽

Data Loss ◽

Data Intensive ◽

Failure Handling ◽

Data Intensive Applications ◽

Failure Reporting

We analyze how file systems and modern data-intensive applications react to fsync failures. First, we characterize how three Linux file systems (ext4, XFS, Btrfs) behave in the presence of failures. We find commonalities across file systems (pages are always marked clean, certain block writes always lead to unavailability) as well as differences (page content and failure reporting is varied). Next, we study how five widely used applications (PostgreSQL, LMDB, LevelDB, SQLite, Redis) handle fsync failures. Our findings show that although applications use many failure-handling strategies, none are sufficient: fsync failures can cause catastrophic outcomes such as data loss and corruption. Our findings have strong implications for the design of file systems and applications that intend to provide strong durability guarantees.

A critical review based on Fault Tolerance in Software Defined Networks

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i2.849 ◽

2021 ◽

Vol 12 (2) ◽

pp. 456-461

Author(s):

Ms. Shailly

Keyword(s):

Fault Tolerant ◽

Decoupling Control ◽

Critical Survey ◽

Software Defined Networks ◽

Control Plane ◽

Tabular Data ◽

Plane Failure ◽

Failure Handling ◽

Data Plane ◽

And Control

SDN (Software-Defined Networks) is an incipient architecture of decoupling control plane and data plane involved in dynamic management of network. SDN is being installed in production based networks which ultimately lead to the need of secure and fault tolerant SDN. In the present investigation, we are discussing about the kind of failures with label happen in SDN. A critical survey based on the recently proposed mechanisms for handling failures in SDN. Initially, we discussed with the help of tabular data involving mechanism of data plane failure. We also discussed the various mechanisms for handling misconfiguration of drift able of switches and control plane failure handling mechanisms. We also epitomize issues with both data and control plane mechanism that are discussed earlier. In the end, we are stating that there is need of build much efficient and secure mechanism for SDN networks.

Failure Handling of Robotic Pick and Place Tasks With Multimodal Cues Under Partial Object Occlusion

Frontiers in Neurorobotics ◽

10.3389/fnbot.2021.570507 ◽

2021 ◽

Vol 15 ◽

Author(s):

Fan Zhu ◽

Liangliang Wang ◽

Yilin Wen ◽

Lei Yang ◽

Jia Pan ◽

...

Keyword(s):

Visual Cues ◽

Failure Detection ◽

Robotic Manipulator ◽

Open Loop ◽

System Control ◽

Grasp Planning ◽

Loop Control ◽

Failure Handling ◽

Pick And Place ◽

Place Task

The success of a robotic pick and place task depends on the success of the entire procedure: from the grasp planning phase, to the grasp establishment phase, then the lifting and moving phase, and finally the releasing and placing phase. Being able to detect and recover from grasping failures throughout the entire process is therefore a critical requirement for both the robotic manipulator and the gripper, especially when considering the almost inevitable object occlusion by the gripper itself during the robotic pick and place task. With the rapid rising of soft grippers, which rely heavily on their under-actuated body and compliant, open-loop control, less information is available from the gripper for effective overall system control. Tackling on the effectiveness of robotic grasping, this work proposes a hybrid policy by combining visual cues and proprioception of our gripper for the effective failure detection and recovery in grasping, especially using a proprioceptive self-developed soft robotic gripper that is capable of contact sensing. We solved failure handling of robotic pick and place tasks and proposed (1) more accurate pose estimation of a known object by considering the edge-based cost besides the image-based cost; (2) robust object tracking techniques that work even when the object is partially occluded in the system and achieve mean overlap precision up to 80%; (3) contact and contact loss detection between the object and the gripper by analyzing internal pressure signals of our gripper; (4) robust failure handling with the combination of visual cues under partial occlusion and proprioceptive cues from our soft gripper to effectively detect and recover from different accidental grasping failures. The proposed system was experimentally validated with the proprioceptive soft robotic gripper mounted on a collaborative robotic manipulator, and a consumer-grade RGB camera, showing that combining visual cues and proprioception from our soft actuator robotic gripper was effective in improving the detection and recovery from the major grasping failures in different stages for the compliant and robust grasping.

IT Infrastructure Anomaly Detection and Failure Handling: A Systematic Literature Review Focusing on Datasets, Log Preprocessing, Machine & Deep Learning Approaches and Automated Tool

IEEE Access ◽

10.1109/access.2021.3128283 ◽

2021 ◽

pp. 1-1

Author(s):

Deepali Arun Bhanage ◽

Ambika Vishal Pawar ◽

Ketan Kotecha

Keyword(s):

Deep Learning ◽

Literature Review ◽

Anomaly Detection ◽

Systematic Literature Review ◽

Learning Approaches ◽

It Infrastructure ◽

Failure Handling ◽

Automated Tool

Service failure handling and resilience amongst airlines in Nigeria

Cogent Business & Management ◽

10.1080/23311975.2021.1892924 ◽

2021 ◽

Vol 8 (1) ◽

pp. 1892924

Author(s):

Hart O. Awa ◽

Chigbo A. Nwobu ◽

Sunny R. Igwe

Keyword(s):

Service Failure ◽

Failure Handling

failure handling
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

RocksDB: Evolution of Development Priorities in a Key-value Store Serving Large-scale Applications

A multiparty session typing discipline for fault-tolerant event-driven distributed programming

Simulation Of Jellyfish Topology Link Failure Handling Using Floyd Warshall and Johnson Algorithm in Software Defined Network Architecture

Expect the Unexpected: Leveraging the Human-Robot Ecosystem to Handle Unexpected Robot Failures

Failure Handling for Time-Sensitive Networks using SDN and Source Routing

Can Applications Recover from fsync Failures?

A critical review based on Fault Tolerance in Software Defined Networks

Failure Handling of Robotic Pick and Place Tasks With Multimodal Cues Under Partial Object Occlusion

IT Infrastructure Anomaly Detection and Failure Handling: A Systematic Literature Review Focusing on Datasets, Log Preprocessing, Machine & Deep Learning Approaches and Automated Tool

Service failure handling and resilience amongst airlines in Nigeria

Export Citation Format

failure handlingRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

RocksDB: Evolution of Development Priorities in a Key-value Store Serving Large-scale Applications

A multiparty session typing discipline for fault-tolerant event-driven distributed programming

Simulation Of Jellyfish Topology Link Failure Handling Using Floyd Warshall and Johnson Algorithm in Software Defined Network Architecture

Expect the Unexpected: Leveraging the Human-Robot Ecosystem to Handle Unexpected Robot Failures

Failure Handling for Time-Sensitive Networks using SDN and Source Routing

Can Applications Recover from fsync Failures?

A critical review based on Fault Tolerance in Software Defined Networks

Failure Handling of Robotic Pick and Place Tasks With Multimodal Cues Under Partial Object Occlusion

IT Infrastructure Anomaly Detection and Failure Handling: A Systematic Literature Review Focusing on Datasets, Log Preprocessing, Machine & Deep Learning Approaches and Automated Tool

Service failure handling and resilience amongst airlines in Nigeria

failure handling
Recently Published Documents