Deoxyribonucleic Acid as a Tool for Digital Information Storage: An Overview

2019 ◽  
Vol 15 (01) ◽  
pp. 1-8
Author(s):  
Ashish C Patel ◽  
C G Joshi

Current data storage technologies cannot keep pace longer with exponentially growing amounts of data through the extensive use of social networking photos and media, etc. The "digital world” with 4.4 zettabytes in 2013 has predicted it to reach 44 zettabytes by 2020. From the past 30 years, scientists and researchers have been trying to develop a robust way of storing data on a medium which is dense and ever-lasting and found DNA as the most promising storage medium. Unlike existing storage devices, DNA requires no maintenance, except the need to store at a cool and dark place. DNA has a small size with high density; just 1 gram of dry DNA can store about 455 exabytes of data. DNA stores the informations using four bases, viz., A, T, G, and C, while CDs, hard disks and other devices stores the information using 0’s and 1’s on the spiral tracks. In the DNA based storage, after binarization of digital file into the binary codes, encoding and decoding are important steps in DNA based storage system. Once the digital file is encoded, the next step is to synthesize arbitrary single-strand DNA sequences and that can be stored in the deep freeze until use.When there is a need for information to be recovered, it can be done using DNA sequencing. New generation sequencing (NGS) capable of producing sequences with very high throughput at a much lower cost about less than 0.1 USD for one MB of data than the first sequencing technologies. Post-sequencing processing includes alignment of all reads using multiple sequence alignment (MSA) algorithms to obtain different consensus sequences. The consensus sequence is decoded as the reversal of the encoding process. Most prior DNA data storage efforts sequenced and decoded the entire amount of stored digital information with no random access, but nowadays it has become possible to extract selective files (e.g., retrieving only required image from a collection) from a DNA pool using PCR-based random access. Various scientists successfully stored up to 110 zettabytes data in one gram of DNA. In the future, with an efficient encoding, error corrections, cheaper DNA synthesis,and sequencing, DNA based storage will become a practical solution for storage of exponentially growing digital data.

2005 ◽  
Vol 34 (4) ◽  
Author(s):  
Robert Breslawski

With the rapid changes in technology for information creation, capture, display, distribution, storage and preservation, questions abound about the current state of microfilm and its place in the modern information management industry. Clearly there is a place for microfilm in the modern preservation vision. When it comes to information having permanent value, micrographic media remains a stalwart companion of those not willing to risk their data to the perils of digital data storage only. Quoting Jim Harvey of Altek Systems, “Now the word on the street is that without migration, degradation occurs in as little as seven years depending on storage conditions. This is an anathema to archival collections of information … Some are getting ‘that old time religion’ and backing up digital information collections with a permanent micrographic copy.”


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Peter Michael Schwarz ◽  
Bernd Freisleben

Abstract Background DNA is a promising storage medium for high-density long-term digital data storage. Since DNA synthesis and sequencing are still relatively expensive tasks, the coding methods used to store digital data in DNA should correct errors and avoid unstable or error-prone DNA sequences. Near-optimal rateless erasure codes, also called fountain codes, are particularly interesting codes to realize high-capacity and low-error DNA storage systems, as shown by Erlich and Zielinski in their approach based on the Luby transform (LT) code. Since LT is the most basic fountain code, there is a large untapped potential for improvement in using near-optimal erasure codes for DNA storage. Results We present NOREC4DNA, a software framework to use, test, compare, and improve near-optimal rateless erasure codes (NORECs) for DNA storage systems. These codes can effectively be used to store digital information in DNA and cope with the restrictions of the DNA medium. Additionally, they can adapt to possible variable lengths of DNA strands and have nearly zero overhead. We describe the design and implementation of NOREC4DNA. Furthermore, we present experimental results demonstrating that NOREC4DNA can flexibly be used to evaluate the use of NORECs in DNA storage systems. In particular, we show that NORECs that apparently have not yet been used for DNA storage, such as Raptor and Online codes, can achieve significant improvements over LT codes that were used in previous work. NOREC4DNA is available on https://github.com/umr-ds/NOREC4DNA. Conclusion NOREC4DNA is a flexible and extensible software framework for using, evaluating, and comparing NORECs for DNA storage systems.


2021 ◽  
Vol 23 (4) ◽  
pp. 796-815
Author(s):  
Yang Wang ◽  
Sun Sun Lim

People are today located in media ecosystems in which a variety of ICT devices and platforms coexist and complement each other to fulfil users’ heterogeneous requirements. These multi-media affordances promote a highly hyperlinked and nomadic habit of digital data management which blurs the long-standing boundaries between information storage, sharing and exchange. Specifically, during the pervasive sharing and browsing of fragmentary digital information (e.g. photos, videos, online diaries, news articles) across various platforms, life experiences and knowledge involved are meanwhile classified and stored for future retrieval and collective memory construction. For international migrants who straddle different geographical and cultural contexts, management of various digital materials is particularly complicated as they have to be familiar with and appropriately navigate technological infrastructures of both home and host countries. Drawing on ethnographic observations of 40 Chinese migrant mothers in Singapore, this article delves into their quotidian routines of acquiring, storing, sharing and exchanging digital information across a range of ICT devices and platforms, as well as cultural and emotional implications of these mediated behaviours for their everyday life experiences. A multi-layer and multi-sited repertoire of ‘life archiving’ was identified among these migrant mothers in which they leave footprints of everyday life through a tactical combination of interactive sharing, pervasive tagging and backup storage of diverse digital content.


2020 ◽  
Author(s):  
Filip Bošković ◽  
Alexander Ohmann ◽  
Ulrich F. Keyser ◽  
Kaikai Chen

AbstractThree-dimensional (3D) DNA nanostructures built via DNA self-assembly have established recent applications in multiplexed biosensing and storing digital information. However, a key challenge is that 3D DNA structures are not easily copied which is of vital importance for their large-scale production and for access to desired molecules by target-specific amplification. Here, we build 3D DNA structural barcodes and demonstrate the copying and random access of the barcodes from a library of molecules using a modified polymerase chain reaction (PCR). The 3D barcodes were assembled by annealing a single-stranded DNA scaffold with complementary short oligonucleotides containing 3D protrusions at defined locations. DNA nicks in these structures are ligated to facilitate barcode copying using PCR. To randomly access a target from a library of barcodes, we employ a non-complementary end in the DNA construct that serves as a barcode-specific primer template. Readout of the 3D DNA structural barcodes was performed with nanopore measurements. Our study provides a roadmap for convenient production of large quantities of self-assembled 3D DNA nanostructures. In addition, this strategy offers access to specific targets, a crucial capability for multiplexed single-molecule sensing and for DNA data storage.


2018 ◽  
Author(s):  
Henry H. Lee ◽  
Reza Kalhor ◽  
Naveen Goela ◽  
Jean Bolot ◽  
George M. Church

AbstractDNA is an emerging storage medium for digital data but its adoption is hampered by limitations of phosphoramidite chemistry, which was developed for single-base accuracy required for biological functionality. Here, we establish a de novo enzymatic DNA synthesis strategy designed from the bottom-up for information storage. We harness a template-independent DNA polymerase for controlled synthesis of sequences with user-defined information content. We demonstrate retrieval of 144-bits, including addressing, from perfectly synthesized DNA strands using batch-processed Illumina and real-time Oxford Nanopore sequencing. We then develop a codec for data retrieval from populations of diverse but imperfectly synthesized DNA strands, each with a ~30% error tolerance. With this codec, we experimentally validate a kilobyte-scale design which stores 1 bit per nucleotide. Simulations of the codec support reliable and robust storage of information for large-scale systems. This work paves the way for alternative synthesis and sequencing strategies to advance information storage in DNA.


2019 ◽  
Author(s):  
Lee Organick ◽  
Yuan-Jyue Chen ◽  
Siena Dumas Ang ◽  
Randolph Lopez ◽  
Karin Strauss ◽  
...  

ABSTRACTSynthetic DNA has been gaining momentum as a potential storage medium for archival data storage1–9. Digital information is translated into sequences of nucleotides and the resulting synthetic DNA strands are then stored for later individual file retrieval via PCR7–9(Fig. 1a). Using a previously presented encoding scheme9and new experiments, we demonstrate reliable file recovery when as few as 10 copies per sequence are stored, on average. This results in density of about 17 exabytes/g, nearly two orders of magnitude greater than prior work has shown6. Further, no prior work has experimentally demonstrated access to specific files in a pool more complex than approximately 106unique DNA sequences9, leaving the issue of accurate file retrieval at high data density and complexity unexamined. Here, we demonstrate successful PCR random access using three files of varying sizes in a complex pool of over 1010unique sequences, with no evidence that we have begun to approach complexity limits. We further investigate the role of file size on successful data recovery, the effect of increasing sequencing coverage to aid file recovery, and whether DNA strands drop out of solution in a systematic manner. These findings substantiate the robustness of PCR as a random access mechanism in complex settings, and that the number of copies needed for data retrieval does not compromise density significantly.


2020 ◽  
Author(s):  
Min Hao ◽  
Hongyan Qiao ◽  
Yanmin Gao ◽  
Zhaoguan Wang ◽  
Xin Qiao ◽  
...  

AbstractDNA emerged as novel material for mass data storage, the serious problem human society is facing. Taking advantage of current synthesis capacity, massive oligo pool demonstrated its high-potential in data storage in test tube. Herein, mixed culture of bacterial cells carrying mass oligo pool that was assembled in a high copy plasmid was presented as a stable material for large scale data storage. Living cells data storage was fabricated by a multiple-steps process, assembly, transformation and mixed culture. The underlying principle was explored by deep bioinformatic analysis. Although homology assembly showed sequence context dependent bias but the massive digital information oligos in mixed culture were constant over multiple successive passaging. In pushing the limitation, over ten thousand distinct oligos, totally 2304 Kbps encoding 445 KB digital data including texts and images, were stored in bacterial cell, the largest archival data storage in living cell reported so far. The mixed culture of living cell data storage opens up a new approach to simply bridge the in vitro and in vivo storage system with combined advantage of both storage capability and economical information propagation.


2020 ◽  
Author(s):  
Zhi Ping ◽  
Haoling Zhang ◽  
Shihong Chen ◽  
Qianlong Zhuang ◽  
Sha Joe Zhu ◽  
...  

AbstractChamaeleo is currently the only collection library that focuses on adapting multiple well-established coding schemes for DNA storage. It provides a tool for researchers to study various coding schemes and apply them in practice. Chamaeleo adheres to the concept of high aggregation and low coupling for software design which will enhance the performance efficiency. Here, we describe the working pipeline of Chamaeleo, and demonstrate its advantages over the implementation of existing single coding schemes. The source code is available at https://github.com/ntpz870817/Chamaeleo, it can be also installed by the command of pip.exe, “pip install chamaeleo”. Alternatively, the wheel file can be downloaded at https://pypi.org/project/Chamaeleo/. Detailed documentation is available at https://chamaeleo.readthedocs.io/en/latest/.Author SummaryDNA is now considered to be a promising candidate media for future digital information storage in order to tackle the global issue of data explosion. Transcoding between binary digital data and quanternary DNA information is one of the most important steps in the whole process of DNA digital storage. Although several coding schemes have been reported, researchers are still investigating better strategies. Moreover, the scripts of these coding schemes use different programming languages, software architectures and optimization contents. Therefore, we here introduce Chamaeleo, a library in which several classical coding schemes are collected, to reconstruct and optimize them. One of the key features of this tool is that we modulize the functions and make it feasible for more customized way of usage. Meanwhile, developers can also incorporate their new algorithms according to the framework expediently. Based on the benchmark tests we conducted, Chamaeleo shows better flexibility and expandability compared to original packages and we hope that it will help the further study and applications in DNA digital storage.


2019 ◽  
Author(s):  
Priscilla Ulguim

We live in the information age, and our lives are increasingly digitized. Our quotidian has been transformed over the last fifty years by the adoption of innovative networking and computing technology. The digital world presents opportunities for public archaeology to engage, inform and interact with people globally. Yet, as more personal data are published online, there are growing concerns over privacy, security, and the long-term implications of sharing digital information. These concerns extend beyond the living, to the dead, and are thus important considerations for archaeologists who share the stories of past people online. This analysis argues that the ‘born-digital’ records of humanity may be considered as public digital mortuary landscapes, representing death, memorialization and commemoration. The potential for the analysis of digital data from these spaces could result in a phenomenon approaching immortality, whereby artificial intelligence is applied to the data of the dead. This paper investigates the ethics of a digital public archaeology of the dead while considering the future of our digital lives as mnemonic spaces, and their implications for the living.Ulguim, P. F. 2018. Digital Remains Made Public: Sharing the Dead Online and Our Future Digital Mortuary Landscape. AP: Online Journal in Public Archaeology 8(2):153. https://doi.org/10.23914/ap.v8i2.162


2019 ◽  
Author(s):  
Eamonn Kennedy ◽  
Christopher E. Arcadia ◽  
Joseph Geiser ◽  
Peter M. Weber ◽  
Christopher Rose ◽  
...  

AbstractBiomolecular information systems offer numerous potential advantages over conventional semiconductor technologies. Downstream from DNA, the metabolome is an information-rich molecular system with diverse chemical dimensions which could be harnessed for information storage and processing. As a proof of principle of postgenomic data storage, here we demonstrate a workflow for representing abstract data in synthetic metabolomes. Our approach leverages robotic liquid handling for writing digital information into chemical mixtures, and mass spectrometry for extracting the data. We present several kilobyte-scale image datasets stored in synthetic metabolomes, which are decoded with accuracy exceeding 98-99% using multi-mass logistic regression. Cumulatively, >100,000 bits of digital image data was written into metabolomes. These early demonstrations provide insight into the benefits and limitations of postgenomic chemical information systems.


Sign in / Sign up

Export Citation Format

Share Document