Overcoming High Nanopore Basecaller Error Rates for DNA Storage via Basecaller-Decoder Integration and Convolutional Codes

ABSTRACTAs magnetization and semiconductor based storage technologies approach their limits, bio-molecules, such as DNA, have been identified as promising media for future storage systems, due to their high storage density (petabytes/gram) and long-term durability (thousands of years). Furthermore, nanopore DNA sequencing enables high-throughput sequencing using devices as small as a USB thumb drive and thus is ideally suited for DNA storage applications. Due to the high insertion/deletion error rates associated with basecalled nanopore reads, current approaches rely heavily on consensus among multiple reads and thus incur very high reading costs. We propose a novel approach which overcomes the high error rates in basecalled sequences by integrating a Viterbi error correction decoder with the basecaller, enabling the decoder to exploit the soft information available in the deep learning based basecaller pipeline. Using convolutional codes for error correction, we experimentally observed 3x lower reading costs than the state-of-the-art techniques at comparable writing costs.The code, data and Supplementary Material is available at https://github.com/shubhamchandak94/nanopore_dna_storage.

Download Full-text

Frame error rates for convolutional codes on fading channels and the concept of effective E/sub b//N/sub 0/

Proceedings of GLOBECOM 95 GLOCOM-95 ◽

10.1109/ctmc.1995.502926 ◽

1995 ◽

Cited By ~ 1

Author(s):

Nanda ◽

Rege

Keyword(s):

Fading Channels ◽

Convolutional Codes ◽

Error Rates

Download Full-text

Frame error rates for convolutional codes on fading channels and the concept of effective E/sub b//N/sub 0/

IEEE Transactions on Vehicular Technology ◽

10.1109/25.728513 ◽

1998 ◽

Vol 47 (4) ◽

pp. 1245-1250 ◽

Cited By ~ 50

Author(s):

S. Nanda ◽

K.M. Rege

Keyword(s):

Fading Channels ◽

Convolutional Codes ◽

Error Rates

Download Full-text

Packet Error Rates of Terminated and Tailbiting Convolutional Codes

The International Series in Engineering and Computer Science - Advanced Signal Processing for Communication Systems ◽

10.1007/0-306-47791-2_12 ◽

2005 ◽

pp. 151-166 ◽

Cited By ~ 2

Author(s):

Johan Lassing ◽

Tony Ottosson ◽

Erik Ström

Keyword(s):

Convolutional Codes ◽

Error Rates

Download Full-text

New Short Constraint Length, Rate 1/N Convolutional Codes Which Minimize the Required SNR for Given Desired Bit Error Rates

IEEE Transactions on Communications ◽

10.1109/tcom.1985.1096259 ◽

1985 ◽

Vol 33 (2) ◽

pp. 171-177 ◽

Cited By ~ 10

Author(s):

Pil Lee

Keyword(s):

Convolutional Codes ◽

Error Rates ◽

Bit Error Rates ◽

Constraint Length

Download Full-text

Magnetic DNA random access memory with nanopore readouts and exponentially-scaled combinatorial addressing

10.1101/2021.09.15.460571 ◽

2021 ◽

Author(s):

Billy T Lau ◽

Shubham Chandak ◽

Sharmili Roy ◽

Kedar Tatawadi ◽

Mary Wootters ◽

...

Keyword(s):

Magnetic Beads ◽

Storage System ◽

Random Access ◽

Random Access Memory ◽

Error Rates ◽

Access Memory ◽

Convolutional Coding ◽

Dna Storage ◽

Data Elements ◽

Sequencing Instrument

The storage of data in DNA typically involves encoding and synthesizing data into short oligonucleotides, followed by reading with a sequencing instrument. Major challenges include the molecular consumption of synthesized DNA, issues with basecalling errors, and limitations with scaling up read access operations for individual data elements. Addressing these challenges, we describe a DNA storage system called MDRAM (Magnetic DNA-based Random Access Memory) that enables repetitive and efficient readouts of targeted files with nanopore-based sequencing. Through conjugation of synthesized DNA to magnetic beads, we enabled repeated readouts of data while preserving the original DNA analyte and maintaining data readout quality. MDRAM also utilizes an efficient convolutional coding scheme that leverages soft information in raw nanopore sequencing signals to achieve information reading costs comparable to Illumina sequencing despite substantially higher error rates. Finally, we demonstrate a proof-of-concept DNA-based proto-filesystem that enables an exponentially-scalable data address space using only small numbers of targeting primers for assembly and readout.

Download Full-text

Perceptual Characteristics of Consonant Production in Apraxia of Speech and Aphasia

American Journal of Speech-Language Pathology ◽

10.1044/2019_ajslp-18-0169 ◽

2019 ◽

Vol 28 (4) ◽

pp. 1411-1431 ◽

Cited By ~ 3

Author(s):

Lauren Bislick ◽

William D. Hula

Keyword(s):

Error Rate ◽

Error Rates ◽

Group Differences ◽

Diagnostic Process ◽

Apraxia Of Speech ◽

Contextual Variables ◽

Error Type ◽

Type Distribution ◽

Phonetic Features ◽

Syllable Position

Purpose This retrospective analysis examined group differences in error rate across 4 contextual variables (clusters vs. singletons, syllable position, number of syllables, and articulatory phonetic features) in adults with apraxia of speech (AOS) and adults with aphasia only. Group differences in the distribution of error type across contextual variables were also examined. Method Ten individuals with acquired AOS and aphasia and 11 individuals with aphasia participated in this study. In the context of a 2-group experimental design, the influence of 4 contextual variables on error rate and error type distribution was examined via repetition of 29 multisyllabic words. Error rates were analyzed using Bayesian methods, whereas distribution of error type was examined via descriptive statistics. Results There were 4 findings of robust differences between the 2 groups. These differences were found for syllable position, number of syllables, manner of articulation, and voicing. Group differences were less robust for clusters versus singletons and place of articulation. Results of error type distribution show a high proportion of distortion and substitution errors in speakers with AOS and a high proportion of substitution and omission errors in speakers with aphasia. Conclusion Findings add to the continued effort to improve the understanding and assessment of AOS and aphasia. Several contextual variables more consistently influenced breakdown in participants with AOS compared to participants with aphasia and should be considered during the diagnostic process. Supplemental Material https://doi.org/10.23641/asha.9701690

Download Full-text

The Latin Square Task as a Measure of Relational Reasoning

European Journal of Psychological Assessment ◽

10.1027/1015-5759/a000520 ◽

2020 ◽

Vol 36 (2) ◽

pp. 296-302 ◽

Cited By ~ 1

Author(s):

Luke J. Hearne ◽

Damian P. Birney ◽

Luca Cocchi ◽

Jason B. Mattingley

Keyword(s):

Brain Imaging ◽

Response Times ◽

Latin Square ◽

Error Rates ◽

Functional Brain Imaging ◽

Inspection Time ◽

Relational Reasoning ◽

Validity And Reliability ◽

The Stability ◽

Test Retest Reliability

Abstract. The Latin Square Task (LST) is a relational reasoning paradigm developed by Birney, Halford, and Andrews (2006) . Previous work has shown that the LST elicits typical reasoning complexity effects, such that increases in complexity are associated with decrements in task accuracy and increases in response times. Here we modified the LST for use in functional brain imaging experiments, in which presentation durations must be strictly controlled, and assessed its validity and reliability. Modifications included presenting the components within each trial serially, such that the reasoning and response periods were separated. In addition, the inspection time for each LST problem was constrained to five seconds. We replicated previous findings of higher error rates and slower response times with increasing relational complexity and observed relatively large effect sizes (η2p > 0.70, r > .50). Moreover, measures of internal consistency and test-retest reliability confirmed the stability of the LST within and across separate testing sessions. Interestingly, we found that limiting the inspection time for individual problems in the LST had little effect on accuracy relative to the unconstrained times used in previous work, a finding that is important for future brain imaging experiments aimed at investigating the neural correlates of relational reasoning.

Download Full-text

Does Viotin Activate Violin More Than Viocin?

Experimental Psychology (formerly Zeitschrift für Experimentelle Psychologie) ◽

10.1027/1618-3169/a000223 ◽

2014 ◽

Vol 61 (1) ◽

pp. 23-29 ◽

Cited By ~ 15

Author(s):

Manuel Perea ◽

Victoria Panadero

Keyword(s):

Word Recognition ◽

Visual Word Recognition ◽

Developmental Dyslexia ◽

Computational Models ◽

Error Rates ◽

Visual Word ◽

Base Word ◽

Similar Response ◽

Skilled Readers ◽

Young Readers

The vast majority of neural and computational models of visual-word recognition assume that lexical access is achieved via the activation of abstract letter identities. Thus, a word’s overall shape should play no role in this process. In the present lexical decision experiment, we compared word-like pseudowords like viotín (same shape as its base word: violín) vs. viocín (different shape) in mature (college-aged skilled readers), immature (normally reading children), and immature/impaired (young readers with developmental dyslexia) word-recognition systems. Results revealed similar response times (and error rates) to consistent-shape and inconsistent-shape pseudowords for both adult skilled readers and normally reading children – this is consistent with current models of visual-word recognition. In contrast, young readers with developmental dyslexia made significantly more errors to viotín-like pseudowords than to viocín-like pseudowords. Thus, unlike normally reading children, young readers with developmental dyslexia are sensitive to a word’s visual cues, presumably because of poor letter representations.

Download Full-text

On error exponents for woven convolutional codes with outer warp and unequal error protection

European Transactions on Telecommunications ◽

10.1002/ett.4460140406 ◽

2003 ◽

Vol 14 (4) ◽

pp. 343-349

Author(s):

Viktor V. Zyablov ◽

Ralph Jordan

Keyword(s):

Convolutional Codes ◽

Unequal Error Protection ◽

Error Protection ◽

Error Exponents

Download Full-text