Driving the scalability of DNA-based information storage systems

ABSTRACTThe extreme density of DNA presents a compelling advantage over current storage media; however, in order to reach practical capacities, new approaches for organizing and accessing information are needed. Here we use chemical handles to selectively extract unique files from a complex database of DNA mimicking 5 TB of data and design and implement a nested file address system that increases the theoretical maximum capacity of DNA storage systems by five orders of magnitude. These advancements enable the development and future scaling of DNA-based data storage systems with reasonable modern capacities and file access capabilities.

Download Full-text

Combinatorial constraint coding based on the EORS algorithm in DNA storage

PLoS ONE ◽

10.1371/journal.pone.0255376 ◽

2021 ◽

Vol 16 (7) ◽

pp. e0255376

Author(s):

Li Xiaoru ◽

Guo Ling

Keyword(s):

Data Storage ◽

Hamming Distance ◽

Random Search ◽

Gc Content ◽

Information Storage ◽

Storage Media ◽

Specific Hybridization ◽

Dna Storage ◽

Electronic Storage ◽

Increasing Demand

The development of information technology has produced massive amounts of data, which has brought severe challenges to information storage. Traditional electronic storage media cannot keep up with the ever-increasing demand for data storage, but in its place DNA has emerged as a feasible storage medium with high density, large storage capacity and strong durability. In DNA data storage, many different approaches can be used to encode data into codewords. DNA coding is a key step in DNA storage and can directly affect storage performance and data integrity. However, since errors are prone to occur in DNA synthesis and sequencing, and non-specific hybridization is prone to occur in the solution, how to effectively encode DNA has become an urgent problem to be solved. In this article, we propose a DNA storage coding method based on the equilibrium optimization random search (EORS) algorithm, which meets the Hamming distance, GC content and no-runlength constraints and can reduce the error rate in storage. Simulation experiments have shown that the size of the DNA storage code set constructed by the EORS algorithm that meets the combination constraints has increased by an average of 11% compared with previous work. The increase in the code set means that shorter DNA chains can be used to store more data.

Download Full-text

Promiscuous molecules for smarter file operations in DNA-based data storage

Nature Communications ◽

10.1038/s41467-021-23669-w ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Kyle J. Tomek ◽

Kevin Volkel ◽

Elaine W. Indermaur ◽

James M. Tuck ◽

Albert J. Keung

Keyword(s):

Data Storage ◽

Data Access ◽

Molecular Architecture ◽

Biomolecular Structure ◽

Jpeg Images ◽

Storage Media ◽

Dna Storage ◽

Dna Strands ◽

Background Database ◽

Organizational Features

AbstractDNA holds significant promise as a data storage medium due to its density, longevity, and resource and energy conservation. These advantages arise from the inherent biomolecular structure of DNA which differentiates it from conventional storage media. The unique molecular architecture of DNA storage also prompts important discussions on how data should be organized, accessed, and manipulated and what practical functionalities may be possible. Here we leverage thermodynamic tuning of biomolecular interactions to implement useful data access and organizational features. Specific sets of environmental conditions including distinct DNA concentrations and temperatures were screened for their ability to switchably access either all DNA strands encoding full image files from a GB-sized background database or subsets of those strands encoding low resolution, File Preview, versions. We demonstrate File Preview with four JPEG images and provide an argument for the substantial and practical economic benefit of this generalizable strategy to organize data.

Download Full-text

Dynamic DNA-based information storage

10.1101/836429 ◽

2019 ◽

Author(s):

Kevin N. Lin ◽

Albert J. Keung ◽

James M. Tuck

Keyword(s):

System Architecture ◽

Storage Systems ◽

Storage System ◽

Information Storage ◽

Dna Interactions ◽

Chain Reaction ◽

Dna Storage ◽

Polymerase Chain ◽

Key Innovations

AbstractTechnological leaps are often driven by key innovations that transform the underlying architectures of systems. Current DNA storage systems largely rely on polymerase chain reaction, which broadly informs how information is encoded, databases are organized, and files are accessed. Here we show that a hybrid ‘toehold’ DNA structure can unlock a fundamentally different, dynamic DNA-based information storage system architecture with broad advantages. This innovation increases theoretical storage densities and capacities by eliminating non-specific DNA-DNA interactions common in PCR and increasing the encodable sequence space. It also provides a physical handle with which to implement a range of in-storage file operations. Finally, it reads files non-destructively by harnessing the natural role of transcription in accessing information from DNA. This simple but powerful toehold structure lays the foundation for an information storage architecture with versatile capabilities.

Download Full-text

Information-theoretic problems of DNA-based storage systems

Information and Control Systems ◽

10.31799/1684-8853-2021-3-39-52 ◽

2021 ◽

pp. 39-52

Author(s):

Stanislav Kruglik ◽

Gregory Kucherov ◽

Kamilla Nazirkhanova ◽

Mikhail Filitov

Keyword(s):

Data Storage ◽

Storage Systems ◽

Error Correcting Codes ◽

Information Storage ◽

Practical Implementation ◽

Information Theoretic ◽

Storage Devices ◽

Related Information ◽

Current State ◽

Efficient Storage

Introduction: Currently, we witness an explosive growth in the amount of information produced by humanity. This raises new fundamental problems of its efficient storage and processing. Commonly used magnetic, optical, and semiconductor information storage devices have several drawbacks related to small information density and limited durability. One of the promising novel approaches to solving these problems is DNA-based data storage. Purpose: An overview of modern DNA-based storage systems and related information-theoretic problems. Results: The current state of the art of DNA-based storage systems is reviewed. Types of errors occurring in them as well as corresponding error-correcting codes are analized. The disadvantages of these codes are shown, and possible pathways for improvement are mentioned. Proposed information-theoretic models of DNA-based storage systems are analyzed, and their limitation highlighted. In conclusion, main obstacles to practical implementation of DNA-based storage systems are formulated, which can be potentially overcome using information-theoretic methods considered in this overview.

Download Full-text

A Combinatorial PCR Method for Efficient, Selective Oligo Retrieval from Complex Oligo Pools

10.1101/2021.08.25.457714 ◽

2021 ◽

Author(s):

Claris Winston ◽

Lee Organick ◽

Luis Ceze ◽

Karin Strauss ◽

Yuan-Jyue Chen

Keyword(s):

Data Storage ◽

Large Scale ◽

High Specificity ◽

Pcr Primers ◽

Dna Database ◽

Dna Storage ◽

Wet Lab ◽

Pcr Method ◽

Polymerase Chain ◽

Address System

ABSTRACTWith the rapidly decreasing cost of array-based oligo synthesis, large-scale oligo pools offer significant benefits for advanced applications, including gene synthesis, CRISPR-based gene editing, and DNA data storage. Selectively retrieving specific oligos from these complex pools traditionally uses Polymerase Chain Reaction (PCR), in which any selected oligos are exponentially amplified to quickly outnumber non-selected ones. In this case, the number of orthogonal PCR primers is limited due to interactions between them. This lack of specificity presents a serious challenge, particularly for DNA data storage, where the size of an oligo pool (i.e., a DNA database) is orders of magnitude larger than it is for other applications. Although a nested file address system was recently developed to increase the number of accessible files for DNA storage, it requires a more complicated lab protocol and more expensive reagents to achieve high specificity. Instead, we developed a new combinatorial PCR method that outperforms prior work without compromising the fidelity of retrieved material or complicating wet lab processes. Our method quadratically increases the number of accessible oligos while maintaining high specificity. In experiments, we accessed three arbitrarily chosen files from a DNA prototype database that contained 81 different files. Initially comprising only 1% of the original database, the selected files were enriched to over 99.9% using our combinatorial primer method. Our method thus provides a viable path for scaling up DNA data storage systems and has broader utility whenever scientists need access to a specific target oligo and can design their own primer regions.

Download Full-text

DNA stability: a central design consideration for DNA data storage systems

Nature Communications ◽

10.1038/s41467-021-21587-5 ◽

2021 ◽

Vol 12 (1) ◽

Cited By ~ 2

Author(s):

Karishma Matange ◽

James M. Tuck ◽

Albert J. Keung

Keyword(s):

Data Storage ◽

Molecular Mechanisms ◽

Storage Conditions ◽

Information Storage ◽

Processing Conditions ◽

Energy Materials ◽

Specific Design ◽

Dna Stability ◽

Design Considerations ◽

Dna Storage

AbstractData storage in DNA is a rapidly evolving technology that could be a transformative solution for the rising energy, materials, and space needs of modern information storage. Given that the information medium is DNA itself, its stability under different storage and processing conditions will fundamentally impact and constrain design considerations and data system capabilities. Here we analyze the storage conditions, molecular mechanisms, and stabilization strategies influencing DNA stability and pose specific design configurations and scenarios for future systems that best leverage the considerable advantages of DNA storage.

Download Full-text

High-scale random access on DNA storage systems

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab126 ◽

2022 ◽

Vol 4 (1) ◽

Author(s):

Alex El-Shaikh ◽

Marius Welzel ◽

Dominik Heider ◽

Bernhard Seeger

Keyword(s):

Storage Systems ◽

High Capacity ◽

Random Access ◽

General Purpose ◽

Information Storage ◽

Locality Sensitive Hashing ◽

Probe Design ◽

Dna Storage ◽

Data Objects ◽

Dna Pool

ABSTRACT Due to the rapid cost decline of synthesizing and sequencing deoxyribonucleic acid (DNA), high information density, and its durability of up to centuries, utilizing DNA as an information storage medium has received the attention of many scientists. State-of-the-art DNA storage systems exploit the high capacity of DNA and enable random access (predominantly random reads) by primers, which serve as unique identifiers for directly accessing data. However, primers come with a significant limitation regarding the maximum available number per DNA library. The number of different primers within a library is typically very small (e.g. ≈10). We propose a method to overcome this deficiency and present a general-purpose technique for addressing and directly accessing thousands to potentially millions of different data objects within the same DNA pool. Our approach utilizes a fountain code, sophisticated probe design, and microarray technologies. A key component is locality-sensitive hashing, making checks for dissimilarity among such a large number of probes and data objects feasible.

Download Full-text

NOREC4DNA: using near-optimal rateless erasure codes for DNA storage

BMC Bioinformatics ◽

10.1186/s12859-021-04318-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Peter Michael Schwarz ◽

Bernd Freisleben

Keyword(s):

Data Storage ◽

Dna Sequences ◽

Storage Systems ◽

High Capacity ◽

Digital Data ◽

Erasure Codes ◽

Software Framework ◽

Digital Information ◽

Dna Storage ◽

Dna Strands

Abstract Background DNA is a promising storage medium for high-density long-term digital data storage. Since DNA synthesis and sequencing are still relatively expensive tasks, the coding methods used to store digital data in DNA should correct errors and avoid unstable or error-prone DNA sequences. Near-optimal rateless erasure codes, also called fountain codes, are particularly interesting codes to realize high-capacity and low-error DNA storage systems, as shown by Erlich and Zielinski in their approach based on the Luby transform (LT) code. Since LT is the most basic fountain code, there is a large untapped potential for improvement in using near-optimal erasure codes for DNA storage. Results We present NOREC4DNA, a software framework to use, test, compare, and improve near-optimal rateless erasure codes (NORECs) for DNA storage systems. These codes can effectively be used to store digital information in DNA and cope with the restrictions of the DNA medium. Additionally, they can adapt to possible variable lengths of DNA strands and have nearly zero overhead. We describe the design and implementation of NOREC4DNA. Furthermore, we present experimental results demonstrating that NOREC4DNA can flexibly be used to evaluate the use of NORECs in DNA storage systems. In particular, we show that NORECs that apparently have not yet been used for DNA storage, such as Raptor and Online codes, can achieve significant improvements over LT codes that were used in previous work. NOREC4DNA is available on https://github.com/umr-ds/NOREC4DNA. Conclusion NOREC4DNA is a flexible and extensible software framework for using, evaluating, and comparing NORECs for DNA storage systems.

Download Full-text

Creating a standard method of controlling digital microscopes: the microscope access protocol (map)

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100136453 ◽

1995 ◽

Vol 53 ◽

pp. 16-17

Author(s):

T. A. Dodson ◽

E. Völkl ◽

L. F. Allard ◽

T. A. Nolan

Keyword(s):

Data Storage ◽

Storage Systems ◽

Digital Systems ◽

Data Networks ◽

Digital Microscopy ◽

Command Language ◽

Access Protocol ◽

Analog To Digital ◽

Basic Physics ◽

Scanning Electron

The process of moving to a fully digital microscopy laboratory requires changes in instrumentation, computing hardware, computing software, data storage systems, and data networks, as well as in the operating procedures of each facility. Moving from analog to digital systems in the microscopy laboratory is similar to the instrumentation projects being undertaken in many scientific labs. A central problem of any of these projects is to create the best combination of hardware and software to effectively control the parameters of data collection and then to actually acquire data from the instrument. This problem is particularly acute for the microscopist who wishes to "digitize" the operation of a transmission or scanning electron microscope. Although the basic physics of each type of instrument and the type of data (images & spectra) generated by each are very similar, each manufacturer approaches automation differently. The communications interfaces vary as well as the command language used to control the instrument.

Download Full-text

Activity of public control entities and development of distributed computing and distributed data storage systems

Journal of Law and Administration ◽

10.24833/2073-8420-2018-1-46-14-22 ◽

2018 ◽

pp. 14-22

Author(s):

D. V. Gribanov

Keyword(s):

Distributed Computing ◽

Data Storage ◽

Storage Systems ◽

Legal Regulation ◽

Distributed Data ◽

Distributed Data Storage ◽

Public Control ◽

Blockchain Technology ◽

Legal Method ◽

Digital Assets

Introduction. This article is devoted to legal regulation of digital assets turnover, utilization possibilities of distributed computing and distributed data storage systems in activities of public authorities and entities of public control. The author notes that some national and foreign scientists who study a “blockchain” technology (distributed computing and distributed data storage systems) emphasize its usefulness in different activities. Data validation procedure of digital transactions, legal regulation of creation, issuance and turnover of digital assets need further attention.Materials and methods. The research is based on common scientific (analysis, analogy, comparing) and particular methods of cognition of legal phenomena and processes (a method of interpretation of legal rules, a technical legal method, a formal legal method and a formal logical one).Results of the study. The author conducted an analysis which resulted in finding some advantages of the use of the “blockchain” technology in the sphere of public control which are as follows: a particular validation system; data that once were entered in the system of distributed data storage cannot be erased or forged; absolute transparency of succession of actions while exercising governing powers; automatic repeat of recurring actions. The need of fivefold validation of exercising governing powers is substantiated. The author stresses that the fivefold validation shall ensure complex control over exercising of powers by the civil society, the entities of public control and the Russian Federation as a federal state holding sovereignty over its territory. The author has also conducted a brief analysis of judicial decisions concerning digital transactions.Discussion and conclusion. The use of the distributed data storage system makes it easier to exercise control due to the decrease of risks of forge, replacement or termination of data. The author suggests defining digital transaction not only as some actions with digital assets, but also as actions toward modification and addition of information about legal facts with a purpose of its establishment in the systems of distributed data storage. The author suggests using the systems of distributed data storage for independent validation of information about activities of the bodies of state authority. In the author’s opinion, application of the “blockchain” technology may result not only in the increase of efficiency of public control, but also in the creation of a new form of public control – automatic control. It is concluded there is no legislation basis for regulation of legal relations concerning distributed data storage today.

Download Full-text