scholarly journals Quantifying Molecular Bias in DNA Data Storage

2019 ◽  
Author(s):  
Yuan-Jyue Chen ◽  
Christopher N. Takahashi ◽  
Lee Organick ◽  
Kendall Stewart ◽  
Siena Dumas Ang ◽  
...  

DNA has recently emerged as an attractive medium for future digital data storage because of its extremely high information density and potential longevity. Recent work has shown promising results in developing proof-of-principle prototype systems. However, very uneven (biased) sequencing coverage distributions have been reported, which indicates inefficiencies in the storage process and points to optimization opportunities. These deviations from the average coverage in oligonucleotide copy distribution result in sequence drop-out and make error-free data retrieval from DNA more challenging. The uneven copy distribution was believed to stem from the underlying molecular processes, but the interplay between these molecular processes and the copy number distribution has been poorly understood until now. In this paper, we use millions of unique sequences from a DNA-based digital data archival system to study the oligonucleotide copy unevenness problem and show that two important sources of bias are the synthesis process and the Polymerase Chain Reaction (PCR) process. By mapping the sequencing coverage of a large complex oligonucleotide pool back to its spatial distribution on the synthesis chip, we find that significant bias comes from array-based oligonucleotide synthesis. We also find that PCR stochasticity is another main driver of oligonucleotide copy variation. Based on these findings, we develop a statistical model for each molecular process as well as the overall process and compare the predicted bias with our experimental data. We further use our model to explore the trade-offs between synthesis bias, storage physical density and sequencing redundancy, providing insights for engineering efficient, robust DNA data storage systems.


2020 ◽  
Author(s):  
Lee Organick ◽  
Bichlien H. Nguyen ◽  
Rachel McAmis ◽  
Weida D. Chen ◽  
A. Xavier Kohll ◽  
...  

ABSTRACTSynthetic DNA has recently risen as a viable alternative for long-term digital data storage. To ensure that information is safely recovered after storage, it is essential to appropriately preserve the physical DNA molecules encoding the data. While preservation of biological DNA has been studied previously, synthetic DNA differs in that it is typically much shorter in length, it has different sequence profiles with fewer, if any, repeats (or homopolymers), and it has different contaminants. In this paper we evaluate nine different methods used to preserve data files encoded in synthetic DNA by accelerated aging of nearly 29,000 DNA sequences. In addition to a molecular count comparison, we also sequence and analyze the DNA after aging. Our findings show that errors and erasures are stochastic and show no practical distribution difference between preservation methods. Finally, we compare the physical density of these methods and provide a stability versus density trade-offs discussion.



2020 ◽  
Author(s):  
Yeongjae Choi ◽  
Hyung Jong Bae ◽  
Amos C. Lee ◽  
Hansol Choi ◽  
Daewon Lee ◽  
...  

AbstractDNA-based data storage has attracted attention because of its higher physical density of the data and longer retention time than those of conventional digital data storage1–7. However, previous DNA-based data storage lacked index features and the data quality of storage after a single access is not preserved, obstructing its industrial use. Here, we propose DNA micro-disks, quick response (QR)-coded micro-sized disks that harbour data-encoded DNA molecules for the efficient management of DNA-based data storage. We demonstrate the two major features that previous DNA-based data storage studies could not achieve. One feature is accessing data items efficiently by indexing the data-encoded DNA library. Another is achieving write-once-read-many (WORM) memory through the immobilization of DNA molecules on the disk and their enrichment through in situ DNA production. Through these features, the reliability of DNA-based data storage was increased by allowing multiple accession of data-encoded DNA without data loss.



2019 ◽  
Author(s):  
Lee Organick ◽  
Yuan-Jyue Chen ◽  
Siena Dumas Ang ◽  
Randolph Lopez ◽  
Karin Strauss ◽  
...  

ABSTRACTSynthetic DNA has been gaining momentum as a potential storage medium for archival data storage1–9. Digital information is translated into sequences of nucleotides and the resulting synthetic DNA strands are then stored for later individual file retrieval via PCR7–9(Fig. 1a). Using a previously presented encoding scheme9and new experiments, we demonstrate reliable file recovery when as few as 10 copies per sequence are stored, on average. This results in density of about 17 exabytes/g, nearly two orders of magnitude greater than prior work has shown6. Further, no prior work has experimentally demonstrated access to specific files in a pool more complex than approximately 106unique DNA sequences9, leaving the issue of accurate file retrieval at high data density and complexity unexamined. Here, we demonstrate successful PCR random access using three files of varying sizes in a complex pool of over 1010unique sequences, with no evidence that we have begun to approach complexity limits. We further investigate the role of file size on successful data recovery, the effect of increasing sequencing coverage to aid file recovery, and whether DNA strands drop out of solution in a systematic manner. These findings substantiate the robustness of PCR as a random access mechanism in complex settings, and that the number of copies needed for data retrieval does not compromise density significantly.



2019 ◽  
Author(s):  
Kaikai Chen ◽  
Jinbo Zhu ◽  
Filip Boskovic ◽  
Ulrich F. Keyser

AbstractDNA is emerging as a novel material for digital data storage. The two main challenges are efficient encoding and data security. Here, we develop an approach that allows for writing and erasing data by relying solely on Watson-Crick base pairing of short oligonucleotides to single-stranded DNA overhangs located along a long double-stranded DNA hard drive (DNA-HD). Our enzyme-free system enables fast synthesis-free data writing with predetermined building blocks. The use of DNA base pairing allows for secure encryption on DNA-HDs that requires a physical key and nanopore sensing for decoding. The system is suitable for miniature integration for an end-to-end DNA storage device. Our study opens a novel pathway for rewritable and secure data storage with DNA.One Sentence SummaryStoring digital information on molecules along DNA hard drives for rewritable and secure data storage.



2021 ◽  
Author(s):  
Cheuk Chi A. Ng ◽  
Wai Man Tam ◽  
Haidi Yin ◽  
Qian Wu ◽  
Pui-Kin So ◽  
...  

Abstract From the beginning of civilization, the media for storing data have been continuously evolving from such as stone tablets, animal bones and bamboo tablets to paper, with improvements on data density over time. Since the invention of electronics in the last century, the percentage of data stored in digital form has been increasing rapidly to almost 100% recently. Moreover, the amount of data generated has been increasing exponentially, from several ZB in 2008 to an expected 74 ZB in 2021, causing a much increased demand for data storage correspondingly. Most of the digital data are stored in physical media such as hard drives. In addition, many of the data are rarely accessed and are archived on reels of magnetic tapes. However, the physical thickness of the tapes and the size of magnetic domains limit the maximum data density, which is expected to reach a plateau soon. Furthermore, data in old tapes need to be copied onto new tapes regularly, as the magnetic tapes can normally last for ten to twenty years only. This process is time-consuming and expensive. Hence, next-generation media that can store digital data with a much higher data density and durability are needed.Here we report the use of peptide sequences for digital data storage, a method that has not been reported before. The data-bearing peptides are commercially synthesized, and the data retrieval process is described here. As an example, we stored one dataset consists of (i) 848 bits of ASCII formatted text in 40 peptides, and (ii) another dataset consists of 13752 bits of the “silent night” music in MIDI format together with its title in ASCII format in 511 peptides. These files are available in Supplementary Files section.



2018 ◽  
Vol 6 (3) ◽  
pp. 359-363
Author(s):  
A. Saxena ◽  
◽  
S. Sharma ◽  
S. Dangi ◽  
A. Sharma ◽  
...  


2019 ◽  
Vol 15 (01) ◽  
pp. 1-8
Author(s):  
Ashish C Patel ◽  
C G Joshi

Current data storage technologies cannot keep pace longer with exponentially growing amounts of data through the extensive use of social networking photos and media, etc. The "digital world” with 4.4 zettabytes in 2013 has predicted it to reach 44 zettabytes by 2020. From the past 30 years, scientists and researchers have been trying to develop a robust way of storing data on a medium which is dense and ever-lasting and found DNA as the most promising storage medium. Unlike existing storage devices, DNA requires no maintenance, except the need to store at a cool and dark place. DNA has a small size with high density; just 1 gram of dry DNA can store about 455 exabytes of data. DNA stores the informations using four bases, viz., A, T, G, and C, while CDs, hard disks and other devices stores the information using 0’s and 1’s on the spiral tracks. In the DNA based storage, after binarization of digital file into the binary codes, encoding and decoding are important steps in DNA based storage system. Once the digital file is encoded, the next step is to synthesize arbitrary single-strand DNA sequences and that can be stored in the deep freeze until use.When there is a need for information to be recovered, it can be done using DNA sequencing. New generation sequencing (NGS) capable of producing sequences with very high throughput at a much lower cost about less than 0.1 USD for one MB of data than the first sequencing technologies. Post-sequencing processing includes alignment of all reads using multiple sequence alignment (MSA) algorithms to obtain different consensus sequences. The consensus sequence is decoded as the reversal of the encoding process. Most prior DNA data storage efforts sequenced and decoded the entire amount of stored digital information with no random access, but nowadays it has become possible to extract selective files (e.g., retrieving only required image from a collection) from a DNA pool using PCR-based random access. Various scientists successfully stored up to 110 zettabytes data in one gram of DNA. In the future, with an efficient encoding, error corrections, cheaper DNA synthesis,and sequencing, DNA based storage will become a practical solution for storage of exponentially growing digital data.



2011 ◽  
Vol 20 (06) ◽  
pp. 1019-1035 ◽  
Author(s):  
SAMBHU NATH PRADHAN ◽  
M. TILAK KUMAR ◽  
SANTANU CHATTOPDHYAY

In this paper, a heuristic based on genetic algorithm to realize multi-output Boolean function as three-level AND-OR-XOR network performing area power trade-off is presented. All the previous works dealt with the minimization of number of product terms only in the two sum-of-product-expressions representing a Boolean function during AND-OR-XOR network synthesis. To the best of knowledge this is the first ever effort to incorporate total power, that is, dynamic and leakage power along with the area (in terms of number of product terms) during three-level AND-OR-XOR networks synthesis. The synthesis process, without changing the delay performance results in lesser number of product terms compared to those reported in the literature. It also enumerates the trade-offs present in the solution space for different weights associated with area, dynamic power, and leakage power of the resulting circuit.



Author(s):  
Primasatria Edastama ◽  
Ninda Lutfiani ◽  
Qurotul Aini ◽  
Suryari Purnama ◽  
Isabella Yaumil Annisa

As an innovation in the world of computers, blockchain has many benefits and is also widely applied in the world of education. Blockchain itself has many advantages, especially in the world of education. Blockchain is a digital data storage system that consists of many servers (multiserver). In this Blockchain technology, data created by one server can be replicated and verified by another server. By using this technology with a decentralized system and strong cryptography and can help colleges or universities to build infrastructure in the archive storage of transcripts, diplomas, and diplomas. Usage One of the blockchain technology applications in education is iBC, namely the e-learning Blockchain Certificate, book copyright, and also e-Portfolios. iBC or e-learning Blockchain Certificate is a tool designed to create, verify and also issue blockchain certificates. As has been supported by the IBC to create certificates that are globally verified and stored in a decentralized manner. Here will be presented use cases that are relevant in the use of Blockchain technology in educational environments, especially data processing in universities and we also try to design an IBC based on blockchain technology that can be used to support transparency and accountability of colleges or universities in issuing diplomas and grades. 



1998 ◽  
Author(s):  
Kai-Oliver Mueller ◽  
Cornelia Denz ◽  
Torsten Rauch ◽  
Thorsten Heimann ◽  
J. Trumpfheller ◽  
...  


Sign in / Sign up

Export Citation Format

Share Document