HDFS Pipeline Reformation to Minimize the Data Loss

Author(s):  
B. Purnachandra Rao ◽  
N. Nagamalleswara Rao
Keyword(s):  
Author(s):  
Rajesh Medikonduri

Abstract This paper discusses the physics, definitions, and nanoprobing flow of a flash bit memory. In addition, a case study showing the effectiveness of nanoprobing in detecting the Single Bit Fail Data Gain and Data Loss in Flash Memory is also discussed. The paper also includes cases where no passive voltage contrast was observed at the SEM and no leakage was observed at AFM, yet the units failing SBF DG, SBF DL and depletion, were detected by nanoprobing of the single bit. The major finding of this paper is a way to resolve data gain, data loss, and depletion failures of flash memory by nanoprobing procedure, despite no PVC seen at the SEM and no leakage seen at the AFM.


Author(s):  
Vincenzo Zeno-Zencovich
Keyword(s):  

2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Gundula Povysil ◽  
Monika Heinzl ◽  
Renato Salazar ◽  
Nicholas Stoler ◽  
Anton Nekrutenko ◽  
...  

Abstract Duplex sequencing is currently the most reliable method to identify ultra-low frequency DNA variants by grouping sequence reads derived from the same DNA molecule into families with information on the forward and reverse strand. However, only a small proportion of reads are assembled into duplex consensus sequences (DCS), and reads with potentially valuable information are discarded at different steps of the bioinformatics pipeline, especially reads without a family. We developed a bioinformatics toolset that analyses the tag and family composition with the purpose to understand data loss and implement modifications to maximize the data output for the variant calling. Specifically, our tools show that tags contain polymerase chain reaction and sequencing errors that contribute to data loss and lower DCS yields. Our tools also identified chimeras, which likely reflect barcode collisions. Finally, we also developed a tool that re-examines variant calls from raw reads and provides different summary data that categorizes the confidence level of a variant call by a tier-based system. With this tool, we can include reads without a family and check the reliability of the call, that increases substantially the sequencing depth for variant calling, a particular important advantage for low-input samples or low-coverage regions.


Author(s):  
Xuehu Yan ◽  
Lintao Liu ◽  
Longlong Li ◽  
Yuliang Lu

A secret image is split into   shares in the generation phase of secret image sharing (SIS) for a  threshold. In the recovery phase, the secret image is recovered when any   or more shares are collected, and each collected share is generally assumed to be lossless in conventional SIS during storage and transmission. However, noise will arise during real-world storage and transmission; thus, shares will experience data loss, which will also lead to data loss in the secret image being recovered. Secret image recovery in the case of lossy shares is an important issue that must be addressed in practice, which is the overall subject of this article. An SIS scheme that can recover the secret image from lossy shares is proposed in this article. First, robust SIS and its definition are introduced. Next, a robust SIS scheme for a  threshold without pixel expansion is proposed based on the Chinese remainder theorem (CRT) and error-correcting codes (ECC). By screening the random numbers, the share generation phase of the proposed robust SIS is designed to implement the error correction capability without increasing the share size. Particularly in the case of collecting noisy shares, our recovery method is to some degree robust to some noise types, such as least significant bit (LSB) noise, JPEG compression, and salt-and-pepper noise. A theoretical proof is presented, and experimental results are examined to evaluate the effectiveness of our proposed method.


2021 ◽  
Author(s):  
Marieke E. Ijsselsteijn ◽  
Antonios Somarakis ◽  
Boudewijn P. F. Lelieveldt ◽  
Thomas Höllt ◽  
Noel F. C. C. de Miranda

2020 ◽  
Vol 109 ◽  
pp. 158-171
Author(s):  
Wendi Feng ◽  
Chuanchang Liu ◽  
Zehua Guo ◽  
Thar Baker ◽  
Gang Wang ◽  
...  
Keyword(s):  

2014 ◽  
Vol 918 ◽  
pp. 295-300
Author(s):  
Peng Fei You ◽  
Yu Xing Peng ◽  
Zhen Huang ◽  
Chang Jian Wang

In distributed storage systems, erasure codes represent an attractive data redundancy solution which can provide the same reliability as replication requiring much less storage space. Multiple data losses happens usually and the lost data should be regenerated to maintain data redundancy in distributed storage systems. Regeneration for multiple data losses is expected to be finished as soon as possible, because the regeneration time can influence the data reliability and availability of distributed storage systems. However, multiple data losses is usually regenerated by regenerating single data loss one by one, which brings high entire regeneration time and severely reduces the data reliability and availability of distributed storage systems. In this paper, we propose a tree-structured parallel regeneration scheme based on regenerating codes (TPRORC) for multiple data losses in distributed storage systems. In our scheme, multiple regeneration trees based on regenerating code are constructed. Firstly, these trees are created independently, each of which dose not share any edges from the others and is responsible for one data loss; secondly, every regeneration tree based on regenerating codes owns the least network traffic and bandwidth optimized-paths for regenerating its data loss. Thus it can perform parallel regeneration for multiple data losses by using multiple optimized topology trees, in which network bandwidth is utilized efficiently and entire regeneration is overlapped. Our simulation results show that the tree-structured parallel regeneration scheme reduces the regeneration time significantly, compared to other regular regeneration schemes.


Sign in / Sign up

Export Citation Format

Share Document