scholarly journals Overcoming High Nanopore Basecaller Error Rates for DNA Storage via Basecaller-Decoder Integration and Convolutional Codes

Author(s):  
Shubham Chandak ◽  
Joachim Neu ◽  
Kedar Tatwawadi ◽  
Jay Mardia ◽  
Billy Lau ◽  
...  
2019 ◽  
Author(s):  
Shubham Chandak ◽  
Joachim Neu ◽  
Kedar Tatwawadi ◽  
Jay Mardia ◽  
Billy Lau ◽  
...  

ABSTRACTAs magnetization and semiconductor based storage technologies approach their limits, bio-molecules, such as DNA, have been identified as promising media for future storage systems, due to their high storage density (petabytes/gram) and long-term durability (thousands of years). Furthermore, nanopore DNA sequencing enables high-throughput sequencing using devices as small as a USB thumb drive and thus is ideally suited for DNA storage applications. Due to the high insertion/deletion error rates associated with basecalled nanopore reads, current approaches rely heavily on consensus among multiple reads and thus incur very high reading costs. We propose a novel approach which overcomes the high error rates in basecalled sequences by integrating a Viterbi error correction decoder with the basecaller, enabling the decoder to exploit the soft information available in the deep learning based basecaller pipeline. Using convolutional codes for error correction, we experimentally observed 3x lower reading costs than the state-of-the-art techniques at comparable writing costs.The code, data and Supplementary Material is available at https://github.com/shubhamchandak94/nanopore_dna_storage.


2021 ◽  
Author(s):  
Billy T Lau ◽  
Shubham Chandak ◽  
Sharmili Roy ◽  
Kedar Tatawadi ◽  
Mary Wootters ◽  
...  

The storage of data in DNA typically involves encoding and synthesizing data into short oligonucleotides, followed by reading with a sequencing instrument. Major challenges include the molecular consumption of synthesized DNA, issues with basecalling errors, and limitations with scaling up read access operations for individual data elements. Addressing these challenges, we describe a DNA storage system called MDRAM (Magnetic DNA-based Random Access Memory) that enables repetitive and efficient readouts of targeted files with nanopore-based sequencing. Through conjugation of synthesized DNA to magnetic beads, we enabled repeated readouts of data while preserving the original DNA analyte and maintaining data readout quality. MDRAM also utilizes an efficient convolutional coding scheme that leverages soft information in raw nanopore sequencing signals to achieve information reading costs comparable to Illumina sequencing despite substantially higher error rates. Finally, we demonstrate a proof-of-concept DNA-based proto-filesystem that enables an exponentially-scalable data address space using only small numbers of targeting primers for assembly and readout.


2019 ◽  
Vol 28 (4) ◽  
pp. 1411-1431 ◽  
Author(s):  
Lauren Bislick ◽  
William D. Hula

Purpose This retrospective analysis examined group differences in error rate across 4 contextual variables (clusters vs. singletons, syllable position, number of syllables, and articulatory phonetic features) in adults with apraxia of speech (AOS) and adults with aphasia only. Group differences in the distribution of error type across contextual variables were also examined. Method Ten individuals with acquired AOS and aphasia and 11 individuals with aphasia participated in this study. In the context of a 2-group experimental design, the influence of 4 contextual variables on error rate and error type distribution was examined via repetition of 29 multisyllabic words. Error rates were analyzed using Bayesian methods, whereas distribution of error type was examined via descriptive statistics. Results There were 4 findings of robust differences between the 2 groups. These differences were found for syllable position, number of syllables, manner of articulation, and voicing. Group differences were less robust for clusters versus singletons and place of articulation. Results of error type distribution show a high proportion of distortion and substitution errors in speakers with AOS and a high proportion of substitution and omission errors in speakers with aphasia. Conclusion Findings add to the continued effort to improve the understanding and assessment of AOS and aphasia. Several contextual variables more consistently influenced breakdown in participants with AOS compared to participants with aphasia and should be considered during the diagnostic process. Supplemental Material https://doi.org/10.23641/asha.9701690


2020 ◽  
Vol 36 (2) ◽  
pp. 296-302 ◽  
Author(s):  
Luke J. Hearne ◽  
Damian P. Birney ◽  
Luca Cocchi ◽  
Jason B. Mattingley

Abstract. The Latin Square Task (LST) is a relational reasoning paradigm developed by Birney, Halford, and Andrews (2006) . Previous work has shown that the LST elicits typical reasoning complexity effects, such that increases in complexity are associated with decrements in task accuracy and increases in response times. Here we modified the LST for use in functional brain imaging experiments, in which presentation durations must be strictly controlled, and assessed its validity and reliability. Modifications included presenting the components within each trial serially, such that the reasoning and response periods were separated. In addition, the inspection time for each LST problem was constrained to five seconds. We replicated previous findings of higher error rates and slower response times with increasing relational complexity and observed relatively large effect sizes (η2p > 0.70, r > .50). Moreover, measures of internal consistency and test-retest reliability confirmed the stability of the LST within and across separate testing sessions. Interestingly, we found that limiting the inspection time for individual problems in the LST had little effect on accuracy relative to the unconstrained times used in previous work, a finding that is important for future brain imaging experiments aimed at investigating the neural correlates of relational reasoning.


Author(s):  
Manuel Perea ◽  
Victoria Panadero

The vast majority of neural and computational models of visual-word recognition assume that lexical access is achieved via the activation of abstract letter identities. Thus, a word’s overall shape should play no role in this process. In the present lexical decision experiment, we compared word-like pseudowords like viotín (same shape as its base word: violín) vs. viocín (different shape) in mature (college-aged skilled readers), immature (normally reading children), and immature/impaired (young readers with developmental dyslexia) word-recognition systems. Results revealed similar response times (and error rates) to consistent-shape and inconsistent-shape pseudowords for both adult skilled readers and normally reading children – this is consistent with current models of visual-word recognition. In contrast, young readers with developmental dyslexia made significantly more errors to viotín-like pseudowords than to viocín-like pseudowords. Thus, unlike normally reading children, young readers with developmental dyslexia are sensitive to a word’s visual cues, presumably because of poor letter representations.


Sign in / Sign up

Export Citation Format

Share Document