scholarly journals NOREC4DNA: using near-optimal rateless erasure codes for DNA storage

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Peter Michael Schwarz ◽  
Bernd Freisleben

Abstract Background DNA is a promising storage medium for high-density long-term digital data storage. Since DNA synthesis and sequencing are still relatively expensive tasks, the coding methods used to store digital data in DNA should correct errors and avoid unstable or error-prone DNA sequences. Near-optimal rateless erasure codes, also called fountain codes, are particularly interesting codes to realize high-capacity and low-error DNA storage systems, as shown by Erlich and Zielinski in their approach based on the Luby transform (LT) code. Since LT is the most basic fountain code, there is a large untapped potential for improvement in using near-optimal erasure codes for DNA storage. Results We present NOREC4DNA, a software framework to use, test, compare, and improve near-optimal rateless erasure codes (NORECs) for DNA storage systems. These codes can effectively be used to store digital information in DNA and cope with the restrictions of the DNA medium. Additionally, they can adapt to possible variable lengths of DNA strands and have nearly zero overhead. We describe the design and implementation of NOREC4DNA. Furthermore, we present experimental results demonstrating that NOREC4DNA can flexibly be used to evaluate the use of NORECs in DNA storage systems. In particular, we show that NORECs that apparently have not yet been used for DNA storage, such as Raptor and Online codes, can achieve significant improvements over LT codes that were used in previous work. NOREC4DNA is available on https://github.com/umr-ds/NOREC4DNA. Conclusion NOREC4DNA is a flexible and extensible software framework for using, evaluating, and comparing NORECs for DNA storage systems.

2019 ◽  
Vol 15 (01) ◽  
pp. 1-8
Author(s):  
Ashish C Patel ◽  
C G Joshi

Current data storage technologies cannot keep pace longer with exponentially growing amounts of data through the extensive use of social networking photos and media, etc. The "digital world” with 4.4 zettabytes in 2013 has predicted it to reach 44 zettabytes by 2020. From the past 30 years, scientists and researchers have been trying to develop a robust way of storing data on a medium which is dense and ever-lasting and found DNA as the most promising storage medium. Unlike existing storage devices, DNA requires no maintenance, except the need to store at a cool and dark place. DNA has a small size with high density; just 1 gram of dry DNA can store about 455 exabytes of data. DNA stores the informations using four bases, viz., A, T, G, and C, while CDs, hard disks and other devices stores the information using 0’s and 1’s on the spiral tracks. In the DNA based storage, after binarization of digital file into the binary codes, encoding and decoding are important steps in DNA based storage system. Once the digital file is encoded, the next step is to synthesize arbitrary single-strand DNA sequences and that can be stored in the deep freeze until use.When there is a need for information to be recovered, it can be done using DNA sequencing. New generation sequencing (NGS) capable of producing sequences with very high throughput at a much lower cost about less than 0.1 USD for one MB of data than the first sequencing technologies. Post-sequencing processing includes alignment of all reads using multiple sequence alignment (MSA) algorithms to obtain different consensus sequences. The consensus sequence is decoded as the reversal of the encoding process. Most prior DNA data storage efforts sequenced and decoded the entire amount of stored digital information with no random access, but nowadays it has become possible to extract selective files (e.g., retrieving only required image from a collection) from a DNA pool using PCR-based random access. Various scientists successfully stored up to 110 zettabytes data in one gram of DNA. In the future, with an efficient encoding, error corrections, cheaper DNA synthesis,and sequencing, DNA based storage will become a practical solution for storage of exponentially growing digital data.


Author(s):  
Yanmin Gao ◽  
Xin Chen ◽  
Jianye Hao ◽  
Chengwei Zhang ◽  
Hongyan Qiao ◽  
...  

AbstractIn DNA data storage, the massive sequence complexity creates challenges in repeatable and efficient information readout. Here, our study clearly demonstrated that canonical polymerase chain reaction (PCR) created significant DNA amplification biases, which greatly hinder fast and stable data retrieving from hundred-thousand synthetic DNA sequences encoding over 2.85 megabyte (MB) digital data. To mitigate the amplification bias, we adapted an isothermal DNA amplification for low-bias amplification of DNA pool with massive sequence complexity, and named the new method isothermal DNA reading (iDR). By using iDR, we were able to robustly and repeatedly retrieve the data stored in DNA strands attached on magnetic beads (MB) with significantly decreased sequencing reads, compared with the PCR method. Therefore, we believe that the low-bias iDR method provides an ideal platform for robust DNA data storage, and fast and reliable data readout.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Kyle J. Tomek ◽  
Kevin Volkel ◽  
Elaine W. Indermaur ◽  
James M. Tuck ◽  
Albert J. Keung

AbstractDNA holds significant promise as a data storage medium due to its density, longevity, and resource and energy conservation. These advantages arise from the inherent biomolecular structure of DNA which differentiates it from conventional storage media. The unique molecular architecture of DNA storage also prompts important discussions on how data should be organized, accessed, and manipulated and what practical functionalities may be possible. Here we leverage thermodynamic tuning of biomolecular interactions to implement useful data access and organizational features. Specific sets of environmental conditions including distinct DNA concentrations and temperatures were screened for their ability to switchably access either all DNA strands encoding full image files from a GB-sized background database or subsets of those strands encoding low resolution, File Preview, versions. We demonstrate File Preview with four JPEG images and provide an argument for the substantial and practical economic benefit of this generalizable strategy to organize data.


Author(s):  
Samiha Abdelrahman Mohammed Marwan ◽  
Ahmed Shawish ◽  
Khaled Nagaty

There are continuous threats to network technologies due to its rapidly-changing nature, which raises the demand for data-safe transmission. As a result, the need to come up with new techniques for securing data and accommodating the growing quantities of information is crucial. From nature to science, the idea that genes themselves are made of information stimulated the research in molecular deoxyribonucleic acid (DNA). DNA is capable of storing huge amounts of data, which leads to its promising effect in steganography. DNA steganography is the art of using DNA as an information carrier which achieves high data storage capacity as well as high security level. Currently, DNA steganography techniques utilize the properties of only one DNA strand, since the other strand is completely dependent on the first one. This paper presents a DNA-based steganography technique that hides data into both DNA strands with respect to the dependency between the two strands. In the proposed technique, a key of the same length of the reference DNA sequence is generated after using the second DNA strand. The sender sends both the encrypted DNA message and its reference DNA sequence together into a microdot. If the recipient receives this microdot uncontaminated, the sender can safely send the generated key afterwards. The proposed technique doubles the amount of data stored and guarantees a secure transmission process as well, for even if the attacker suspects the first-sent DNA sequence, they will never receive the key, and hence full data extraction is nearly impossible. The conducted experimental study confirms the effectiveness of the proposed.


2018 ◽  
Author(s):  
Henry H. Lee ◽  
Reza Kalhor ◽  
Naveen Goela ◽  
Jean Bolot ◽  
George M. Church

AbstractDNA is an emerging storage medium for digital data but its adoption is hampered by limitations of phosphoramidite chemistry, which was developed for single-base accuracy required for biological functionality. Here, we establish a de novo enzymatic DNA synthesis strategy designed from the bottom-up for information storage. We harness a template-independent DNA polymerase for controlled synthesis of sequences with user-defined information content. We demonstrate retrieval of 144-bits, including addressing, from perfectly synthesized DNA strands using batch-processed Illumina and real-time Oxford Nanopore sequencing. We then develop a codec for data retrieval from populations of diverse but imperfectly synthesized DNA strands, each with a ~30% error tolerance. With this codec, we experimentally validate a kilobyte-scale design which stores 1 bit per nucleotide. Simulations of the codec support reliable and robust storage of information for large-scale systems. This work paves the way for alternative synthesis and sequencing strategies to advance information storage in DNA.


2019 ◽  
Author(s):  
Lee Organick ◽  
Yuan-Jyue Chen ◽  
Siena Dumas Ang ◽  
Randolph Lopez ◽  
Karin Strauss ◽  
...  

ABSTRACTSynthetic DNA has been gaining momentum as a potential storage medium for archival data storage1–9. Digital information is translated into sequences of nucleotides and the resulting synthetic DNA strands are then stored for later individual file retrieval via PCR7–9(Fig. 1a). Using a previously presented encoding scheme9and new experiments, we demonstrate reliable file recovery when as few as 10 copies per sequence are stored, on average. This results in density of about 17 exabytes/g, nearly two orders of magnitude greater than prior work has shown6. Further, no prior work has experimentally demonstrated access to specific files in a pool more complex than approximately 106unique DNA sequences9, leaving the issue of accurate file retrieval at high data density and complexity unexamined. Here, we demonstrate successful PCR random access using three files of varying sizes in a complex pool of over 1010unique sequences, with no evidence that we have begun to approach complexity limits. We further investigate the role of file size on successful data recovery, the effect of increasing sequencing coverage to aid file recovery, and whether DNA strands drop out of solution in a systematic manner. These findings substantiate the robustness of PCR as a random access mechanism in complex settings, and that the number of copies needed for data retrieval does not compromise density significantly.


2020 ◽  
Author(s):  
Min Hao ◽  
Hongyan Qiao ◽  
Yanmin Gao ◽  
Zhaoguan Wang ◽  
Xin Qiao ◽  
...  

AbstractDNA emerged as novel material for mass data storage, the serious problem human society is facing. Taking advantage of current synthesis capacity, massive oligo pool demonstrated its high-potential in data storage in test tube. Herein, mixed culture of bacterial cells carrying mass oligo pool that was assembled in a high copy plasmid was presented as a stable material for large scale data storage. Living cells data storage was fabricated by a multiple-steps process, assembly, transformation and mixed culture. The underlying principle was explored by deep bioinformatic analysis. Although homology assembly showed sequence context dependent bias but the massive digital information oligos in mixed culture were constant over multiple successive passaging. In pushing the limitation, over ten thousand distinct oligos, totally 2304 Kbps encoding 445 KB digital data including texts and images, were stored in bacterial cell, the largest archival data storage in living cell reported so far. The mixed culture of living cell data storage opens up a new approach to simply bridge the in vitro and in vivo storage system with combined advantage of both storage capability and economical information propagation.


Entropy ◽  
2021 ◽  
Vol 23 (12) ◽  
pp. 1592
Author(s):  
Thi-Huong Khuat ◽  
Sunghwan Kim

Due to the properties of DNA data storage, the errors that occur in DNA strands make error correction an important and challenging task. In this paper, a new code design of quaternary code suitable for DNA storage is proposed to correct at most two consecutive deletion or insertion errors. The decoding algorithms of the proposed codes are also presented when one and two deletion or insertion errors occur, and it is proved that the proposed code can correct at most two consecutive errors. Moreover, the lower and upper bounds on the cardinality of the proposed quaternary codes are also evaluated, then the redundancy of the proposed code is provided as roughly 2log48n.


Sign in / Sign up

Export Citation Format

Share Document