failure handling
Recently Published Documents


TOTAL DOCUMENTS

82
(FIVE YEARS 20)

H-INDEX

8
(FIVE YEARS 1)

2021 ◽  
Vol 17 (4) ◽  
pp. 1-32
Author(s):  
Siying Dong ◽  
Andrew Kryczka ◽  
Yanqin Jin ◽  
Michael Stumm

This article is an eight-year retrospective on development priorities for RocksDB, a key-value store developed at Facebook that targets large-scale distributed systems and that is optimized for Solid State Drives (SSDs). We describe how the priorities evolved over time as a result of hardware trends and extensive experiences running RocksDB at scale in production at a number of organizations: from optimizing write amplification, to space amplification, to CPU utilization. We describe lessons from running large-scale applications, including that resource allocation needs to be managed across different RocksDB instances, that data formats need to remain backward- and forward-compatible to allow incremental software rollouts, and that appropriate support for database replication and backups are needed. Lessons from failure handling taught us that data corruption errors needed to be detected earlier and that data integrity protection mechanisms are needed at every layer of the system. We describe improvements to the key-value interface. We describe a number of efforts that in retrospect proved to be misguided. Finally, we describe a number of open problems that could benefit from future research.


2021 ◽  
Vol 5 (OOPSLA) ◽  
pp. 1-30
Author(s):  
Malte Viering ◽  
Raymond Hu ◽  
Patrick Eugster ◽  
Lukasz Ziarek

This paper presents a formulation of multiparty session types (MPSTs) for practical fault-tolerant distributed programming. We tackle the challenges faced by session types in the context of distributed systems involving asynchronous and concurrent partial failures – such as supporting dynamic replacement of failed parties and retrying failed protocol segments in an ongoing multiparty session – in the presence of unreliable failure detection. Key to our approach is that we develop a novel model of event-driven concurrency for multiparty sessions. Inspired by real-world practices, it enables us to unify the session-typed handling of regular I/O events with failure handling and the combination of features needed to express practical fault-tolerant protocols. Moreover, the characteristics of our model allow us to prove a global progress property for well-typed processes engaged in multiple concurrent sessions, which does not hold in traditional MPST systems. To demonstrate its practicality, we implement our framework as a toolchain and runtime for Scala, and use it to specify and implement a session-typed version of the cluster management system of the industrial-strength Apache Spark data analytics framework. Our session-typed cluster manager composes with other vanilla Spark components to give a functioning Spark runtime; e.g., it can execute existing third-party Spark applications without code modification. A performance evaluation using the TPC-H benchmark shows our prototype implementation incurs an average overhead below 10%.


2021 ◽  
Vol 8 ◽  
Author(s):  
Shanee Honig ◽  
Tal Oron-Gilad

Unexpected robot failures are inevitable. We propose to leverage socio-technical relations within the human-robot ecosystem to support adaptable strategies for handling unexpected failures. The Theory of Graceful Extensibility is used to understand how characteristics of the ecosystem can influence its ability to respond to unexpected events. By expanding our perspective from Human-Robot Interaction to the Human-Robot Ecosystem, adaptable failure-handling strategies are identified, alongside technical, social and organizational arrangements that are needed to support them. We argue that robotics and HRI communities should pursue more holistic approaches to failure-handling, recognizing the need to embrace the unexpected and consider socio-technical relations within the human robot ecosystem when designing failure-handling strategies.


Author(s):  
Gagan Nandha Kumar ◽  
Kostas Katsalis ◽  
Panagiotis Papadimitriou ◽  
Paul Pop ◽  
Georg Carle

2021 ◽  
Vol 17 (2) ◽  
pp. 1-30
Author(s):  
Anthony Rebello ◽  
Yuvraj Patel ◽  
Ramnatthan Alagappan ◽  
Andrea C. Arpaci-Dusseau ◽  
Remzi H. Arpaci-Dusseau

We analyze how file systems and modern data-intensive applications react to fsync failures. First, we characterize how three Linux file systems (ext4, XFS, Btrfs) behave in the presence of failures. We find commonalities across file systems (pages are always marked clean, certain block writes always lead to unavailability) as well as differences (page content and failure reporting is varied). Next, we study how five widely used applications (PostgreSQL, LMDB, LevelDB, SQLite, Redis) handle fsync failures. Our findings show that although applications use many failure-handling strategies, none are sufficient: fsync failures can cause catastrophic outcomes such as data loss and corruption. Our findings have strong implications for the design of file systems and applications that intend to provide strong durability guarantees.


Author(s):  
Ms. Shailly

SDN (Software-Defined Networks) is an incipient architecture of decoupling control plane and data plane involved in dynamic management of network. SDN is being installed in production based networks which ultimately lead to the need of secure and fault tolerant SDN. In the present investigation, we     are discussing about the kind of failures with label happen in SDN. A critical survey based on the recently proposed mechanisms for handling failures in SDN. Initially, we discussed with the help of tabular data involving mechanism of data plane failure. We also discussed the various mechanisms for handling misconfiguration of drift able of switches and control plane failure handling mechanisms. We also epitomize issues with both data and control plane mechanism that are discussed earlier. In the end, we are stating that there is need of build much efficient and secure mechanism for SDN networks.


2021 ◽  
Vol 15 ◽  
Author(s):  
Fan Zhu ◽  
Liangliang Wang ◽  
Yilin Wen ◽  
Lei Yang ◽  
Jia Pan ◽  
...  

The success of a robotic pick and place task depends on the success of the entire procedure: from the grasp planning phase, to the grasp establishment phase, then the lifting and moving phase, and finally the releasing and placing phase. Being able to detect and recover from grasping failures throughout the entire process is therefore a critical requirement for both the robotic manipulator and the gripper, especially when considering the almost inevitable object occlusion by the gripper itself during the robotic pick and place task. With the rapid rising of soft grippers, which rely heavily on their under-actuated body and compliant, open-loop control, less information is available from the gripper for effective overall system control. Tackling on the effectiveness of robotic grasping, this work proposes a hybrid policy by combining visual cues and proprioception of our gripper for the effective failure detection and recovery in grasping, especially using a proprioceptive self-developed soft robotic gripper that is capable of contact sensing. We solved failure handling of robotic pick and place tasks and proposed (1) more accurate pose estimation of a known object by considering the edge-based cost besides the image-based cost; (2) robust object tracking techniques that work even when the object is partially occluded in the system and achieve mean overlap precision up to 80%; (3) contact and contact loss detection between the object and the gripper by analyzing internal pressure signals of our gripper; (4) robust failure handling with the combination of visual cues under partial occlusion and proprioceptive cues from our soft gripper to effectively detect and recover from different accidental grasping failures. The proposed system was experimentally validated with the proprioceptive soft robotic gripper mounted on a collaborative robotic manipulator, and a consumer-grade RGB camera, showing that combining visual cues and proprioception from our soft actuator robotic gripper was effective in improving the detection and recovery from the major grasping failures in different stages for the compliant and robust grasping.


2021 ◽  
Vol 8 (1) ◽  
pp. 1892924
Author(s):  
Hart O. Awa ◽  
Chigbo A. Nwobu ◽  
Sunny R. Igwe

Sign in / Sign up

Export Citation Format

Share Document