transient errors Latest Research Papers

Programming-level and redundancy-free method for enhancing software reliability against transient errors in hardware

International Journal of Reliability Quality and Safety Engineering ◽

10.1142/s0218539321500388 ◽

2021 ◽

Author(s):

Bahman Arasteh ◽

Reza Solhi

Keyword(s):

Software Reliability ◽

Supply Voltage ◽

Fault Injection ◽

Soft Errors ◽

Transient Errors ◽

Radiation Induced ◽

And Performance ◽

Hardware Faults ◽

Program Reliability ◽

Recovery Blocks

Software play remarkable roles in different critical applications. On the other hand, due to the shrinking of transistor size and reduction in supply voltage, radiation-induced transient errors (soft errors) have become an important source of computer systems failure. As the rate of transient hardware faults increases, researchers have investigated software techniques to control these faults. Performance overhead is the main drawback of software-implemented methods like recovery blocks that use technical redundancy. Enhancing the software reliability against soft errors by utilizing inherently error masking (invulnerable) programming structures is the main goal of this study. During the programming phase and at the source code level, programmers can select different storage classes such as automatic, global, static and register for the data into their program without paying attention to their inherent reliability. In this study, the inherent effects of these storage classes on the program reliability are investigated. Extensive series of profiling and fault-injection experiments were performed on the set of benchmark programs implemented with different storage classes. Regarding the results of experiments, we find that the programs implemented with automatic storage classes have inherently higher reliability than the programs with static and register storage classes without performance overhead. This finding enables the programmers to develop highly reliable programs without technical redundancy and performance overhead.

Download Full-text

Analysis of radiation-induced transient errors on 7 nm FinFET technology

Microelectronics Reliability ◽

10.1016/j.microrel.2021.114319 ◽

2021 ◽

pp. 114319

Author(s):

S. Azimi ◽

C. De Sio ◽

L. Sterpone

Keyword(s):

Transient Errors ◽

Radiation Induced

Download Full-text

Redundancy-modified NAND multiplexing for nanocomputers

International Journal of Electrical Engineering Education ◽

10.1177/0020720920940607 ◽

2020 ◽

pp. 002072092094060

Author(s):

Lianhua Yu ◽

Ming Diao ◽

Xiaobo Chen

Keyword(s):

Fault Tolerant ◽

Experimental Results ◽

Nand Gate ◽

Nanometer Scale ◽

Transient Errors ◽

Error Distributions ◽

New System

It is necessary to study fault tolerant techniques for nanotechnology since nanometer devices are very sensitive to system and environment influences. In this paper, we present a novel fault tolerant technique for nanocomputers, namely, XOR multiplexing based on redundancy-modified NAND gates. The error distributions and fault tolerant ability of the proposed architecture are analyzed and compared them with von Neumann’s multiplexing. Experimental results show that compared with conventional multiplexing technique based on NAND gate, the new system has a much higher fault tolerant ability. According to the evaluation, by using multiple redundant components, the device error tolerant ability of the proposed architecture can up to the 10−1. In unreliable nanometer-scale devices-based systems, this architecture is potentially effective against the increasing transient errors.

Download Full-text

Software reliability enhancement against hardware transient errors using inherently reliable data structures

International Journal of Systems Assurance Engineering and Management ◽

10.1007/s13198-020-01011-9 ◽

2020 ◽

Vol 11 (5) ◽

pp. 883-898

Author(s):

Bahman Arasteh ◽

Saideh Khosroshahizadeh

Keyword(s):

Data Structures ◽

Software Reliability ◽

Reliable Data ◽

Transient Errors ◽

Reliability Enhancement

Download Full-text

EOS architectural evolution and strategic development directions

EPJ Web of Conferences ◽

10.1051/epjconf/202024504009 ◽

2020 ◽

Vol 245 ◽

pp. 04009

Author(s):

Georgios Bitzes ◽

Fabio Luchetti ◽

Andrea Manzi ◽

Mihai Patrascoiu ◽

Andreas Joachim Peters ◽

...

Keyword(s):

Software Design ◽

Storage Capacity ◽

Storage System ◽

End User ◽

Strategic Development ◽

Transient Errors ◽

Operational Load ◽

User Friendly ◽

Physics Experiments ◽

Interactive Experience

EOS [1] is the main storage system at CERN providing hundreds of PB of capacity to both physics experiments and also regular users of the CERN infrastructure. Since its first deployment in 2010, EOS has evolved and adapted to the challenges posed by ever-increasing requirements for storage capacity, user-friendly POSIX-like interactive experience and new paradigms like collaborative applications along with sync and share capabilities. Overcoming these challenges at various levels of the software stack meant coming up with a new architecture for the namespace subsystem, completely redesigning the EOS FUSE module and adapting the rest of the components like draining, LRU engine, file system consistency check and others, to ensure a stable and predictable performance. In this paper we detail the issues that triggered all these changes along with the software design choices that we made. In the last part of the paper, we move our focus to the areas that need immediate improvements in order to ensure a seamless experience for the end-user along with increased over-all availability of the service. Some of these changes have far-reaching effects and are aimed at simplifying both the deployment model but more importantly the operational load when dealing with (non/)transient errors in a system managing thousands of disks.

Download Full-text

Analyzing Radiation-Induced Transient Errors on SRAM-Based FPGAs by Propagation of Broadening Effect

IEEE Access ◽

10.1109/access.2019.2915136 ◽

2019 ◽

Vol 7 ◽

pp. 140182-140189 ◽

Cited By ~ 1

Author(s):

Corrado De Sio ◽

Sarah Azimi ◽

Luca Sterpone ◽

Boyang Du

Keyword(s):

Transient Errors ◽

Radiation Induced

Download Full-text

Understanding the evolution of conditions data access through Frontier for the ATLAS Experiment

EPJ Web of Conferences ◽

10.1051/epjconf/201921403020 ◽

2019 ◽

Vol 214 ◽

pp. 03020

Author(s):

Michal Svatos ◽

Alessandro De Salvo ◽

Alastair Dewhurst ◽

Emmanouil Vamvakopoulos ◽

Julio Lozano Bahilo ◽

...

Keyword(s):

Distributed Computing ◽

Monitoring System ◽

Data Access ◽

Computing System ◽

High Load ◽

Cascading Failure ◽

Cascading Failures ◽

Atlas Experiment ◽

Transient Errors ◽

Increasing Demand

The ATLAS Distributed Computing system uses the Frontier system to access the Conditions, Trigger, and Geometry database data stored in the Oracle Offline Database at CERN by means of the HTTP protocol. All ATLAS computing sites use Squid web proxies to cache the data, greatly reducing the load on the Frontier servers and the databases. One feature of the Frontier client is that in the event of failure, it retries with different services. While this allows transient errors and scheduled maintenance to happen transparently, it does open the system up to cascading failures if the load is high enough. Throughout LHC Run 2 there has been an ever increasing demand on the Frontier service. There have been multiple incidents where parts of the service failed due to high load. A significant improvement in the monitoring of the Frontier service wasrequired. The monitoring was needed to identify both problematic tasks, which could then be killed or throttled, and to identify failing site services as the consequence of a cascading failure is much higher. This presentation describes the implementation and features of the monitoring system.

Download Full-text

SRAM based Fault Tolerant Technique for Detection of Transient Errors in Processors through Pass Transistor Logic

International Journal of Computer Applications ◽

10.5120/ijca2017915534 ◽

2017 ◽

Vol 176 (2) ◽

pp. 14-17

Author(s):

S. Ravi ◽

T. Madhu ◽

M. Sailaja

Keyword(s):

Fault Tolerant ◽

Transient Errors ◽

Pass Transistor ◽

Pass Transistor Logic

Download Full-text

Evaluation of transient errors in GPGPUs for safety critical applications: An effective simulation-based fault injection environment

Journal of Systems Architecture ◽

10.1016/j.sysarc.2017.01.009 ◽

2017 ◽

Vol 75 ◽

pp. 95-106 ◽

Cited By ~ 2

Author(s):

Sarah Azimi ◽

Boyang Du ◽

Luca Sterpone

Keyword(s):

Fault Injection ◽

Safety Critical ◽

Transient Errors ◽

Simulation Based

Download Full-text

Improving the Performance of the Carrier Tracking Loop for GPS Receivers in Presence of Transient Errors due to PVT Variations

2016 IEEE International Workshop on Signal Processing Systems (SiPS) ◽

10.1109/sips.2016.22 ◽

2016 ◽

Cited By ~ 2

Author(s):

Mohamed Mourad Hafidhi ◽

Emmanuel Boutillon

Keyword(s):

Tracking Loop ◽

Gps Receivers ◽

Transient Errors ◽

Pvt Variations

Download Full-text

transient errors
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Programming-level and redundancy-free method for enhancing software reliability against transient errors in hardware

Analysis of radiation-induced transient errors on 7 nm FinFET technology

Redundancy-modified NAND multiplexing for nanocomputers

Software reliability enhancement against hardware transient errors using inherently reliable data structures

EOS architectural evolution and strategic development directions

Analyzing Radiation-Induced Transient Errors on SRAM-Based FPGAs by Propagation of Broadening Effect

Understanding the evolution of conditions data access through Frontier for the ATLAS Experiment

SRAM based Fault Tolerant Technique for Detection of Transient Errors in Processors through Pass Transistor Logic

Evaluation of transient errors in GPGPUs for safety critical applications: An effective simulation-based fault injection environment

Improving the Performance of the Carrier Tracking Loop for GPS Receivers in Presence of Transient Errors due to PVT Variations

Export Citation Format

transient errorsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Programming-level and redundancy-free method for enhancing software reliability against transient errors in hardware

Analysis of radiation-induced transient errors on 7 nm FinFET technology

Redundancy-modified NAND multiplexing for nanocomputers

Software reliability enhancement against hardware transient errors using inherently reliable data structures

EOS architectural evolution and strategic development directions

Analyzing Radiation-Induced Transient Errors on SRAM-Based FPGAs by Propagation of Broadening Effect

Understanding the evolution of conditions data access through Frontier for the ATLAS Experiment

SRAM based Fault Tolerant Technique for Detection of Transient Errors in Processors through Pass Transistor Logic

Evaluation of transient errors in GPGPUs for safety critical applications: An effective simulation-based fault injection environment

Improving the Performance of the Carrier Tracking Loop for GPS Receivers in Presence of Transient Errors due to PVT Variations

transient errors
Recently Published Documents