Multidimensional Data Organization and Random Access in Large-Scale DNA Storage Systems

AbstractWith impressive density and coding capacity, DNA offers a promising solution for building long-lasting data archival storage systems. In recent implementations, data retrieval such as random access typically relies on a large library of non-interacting PCR primers. While several algorithms automate the primer design process, the capacity and scalability of DNA-based storage systems are still fundamentally limited by the availability of experimentally validated orthogonal primers. In this work, we combine the nested and semi-nested PCR techniques to virtually enforce multidimensional data organization in large DNA storage systems. The strategy effectively pushes the limit of DNA storage capacity and reduces the number of primers needed for efficient random access from very large address space. Specifically, our design requires k * n unique primers to index nk data entries, where k specifies the number of dimensions and n indicates the number of data entries stored in each dimension. We strategically leverage forward/reverse primer pairs from the same or different address layers to virtually specify and maintain data retrievals in the form of rows, columns, tables, and blocks with respect to the original storage pool. This architecture enables various random-access patterns that could be tailored to preserve the underlying data structures and relations (e.g., files and folders) within the storage content. With just one or two rounds of PCR, specific data subsets or individual datum from the large multidimensional storage can be selectively enriched for simple extraction by gel electrophoresis or readout via sequencing.Abstract Figure

Download Full-text

Multidimensional data organization and random access in large-scale DNA storage systems

Theoretical Computer Science ◽

10.1016/j.tcs.2021.09.021 ◽

2021 ◽

Author(s):

Xin Song ◽

Shalin Shah ◽

John Reif

Keyword(s):

Large Scale ◽

Storage Systems ◽

Random Access ◽

Multidimensional Data ◽

Data Organization ◽

Dna Storage

Download Full-text

Cooperative Sequence Clustering and Decoding for DNA Storage System with Fountain Codes

Bioinformatics ◽

10.1093/bioinformatics/btab246 ◽

2021 ◽

Author(s):

Jaeho Jeong ◽

Seong-Joon Park ◽

Jae-Won Kim ◽

Jong-Seon No ◽

Ha Hyeon Jeon ◽

...

Keyword(s):

Hamming Distance ◽

Storage Systems ◽

Storage System ◽

Data Retrieval ◽

Illumina Miseq ◽

Error Correcting Codes ◽

Read Length ◽

Sequence Coverage ◽

Source Codes ◽

Dna Storage

Abstract Motivation In DNA storage systems, there are tradeoffs between writing and reading costs. Increasing the code rate of error-correcting codes may save writing cost, but it will need more sequence reads for data retrieval. There is potentially a way to improve sequencing and decoding processes in such a way that the reading cost induced by this tradeoff is reduced without increasing the writing cost. In past researches, clustering, alignment, and decoding processes were considered as separate stages but we believe that using the information from all these processes together may improve decoding performance. Actual experiments of DNA synthesis and sequencing should be performed because simulations cannot be relied on to cover all error possibilities in practical circumstances. Results For DNA storage systems using fountain code and Reed-Solomon (RS) code, we introduce several techniques to improve the decoding performance. We designed the decoding process focusing on the cooperation of key components: Hamming-distance based clustering, discarding of abnormal sequence reads, RS error correction as well as detection, and quality score-based ordering of sequences. We synthesized 513.6KB data into DNA oligo pools and sequenced this data successfully with Illumina MiSeq instrument. Compared to Erlich’s research, the proposed decoding method additionally incorporates sequence reads with minor errors which had been discarded before, and thuswas able to make use of 10.6–11.9% more sequence reads from the same sequencing environment, this resulted in 6.5–8.9% reduction in the reading cost. Channel characteristics including sequence coverage and read-length distributions are provided as well. Availability The raw data files and the source codes of our experiments are available at: https://github.com/jhjeong0702/dna-storage.

Download Full-text

A Combinatorial PCR Method for Efficient, Selective Oligo Retrieval from Complex Oligo Pools

10.1101/2021.08.25.457714 ◽

2021 ◽

Author(s):

Claris Winston ◽

Lee Organick ◽

Luis Ceze ◽

Karin Strauss ◽

Yuan-Jyue Chen

Keyword(s):

Data Storage ◽

Large Scale ◽

High Specificity ◽

Pcr Primers ◽

Dna Database ◽

Dna Storage ◽

Wet Lab ◽

Pcr Method ◽

Polymerase Chain ◽

Address System

ABSTRACTWith the rapidly decreasing cost of array-based oligo synthesis, large-scale oligo pools offer significant benefits for advanced applications, including gene synthesis, CRISPR-based gene editing, and DNA data storage. Selectively retrieving specific oligos from these complex pools traditionally uses Polymerase Chain Reaction (PCR), in which any selected oligos are exponentially amplified to quickly outnumber non-selected ones. In this case, the number of orthogonal PCR primers is limited due to interactions between them. This lack of specificity presents a serious challenge, particularly for DNA data storage, where the size of an oligo pool (i.e., a DNA database) is orders of magnitude larger than it is for other applications. Although a nested file address system was recently developed to increase the number of accessible files for DNA storage, it requires a more complicated lab protocol and more expensive reagents to achieve high specificity. Instead, we developed a new combinatorial PCR method that outperforms prior work without compromising the fidelity of retrieved material or complicating wet lab processes. Our method quadratically increases the number of accessible oligos while maintaining high specificity. In experiments, we accessed three arbitrarily chosen files from a DNA prototype database that contained 81 different files. Initially comprising only 1% of the original database, the selected files were enriched to over 99.9% using our combinatorial primer method. Our method thus provides a viable path for scaling up DNA data storage systems and has broader utility whenever scientists need access to a specific target oligo and can design their own primer regions.

Download Full-text

High-scale random access on DNA storage systems

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab126 ◽

2022 ◽

Vol 4 (1) ◽

Author(s):

Alex El-Shaikh ◽

Marius Welzel ◽

Dominik Heider ◽

Bernhard Seeger

Keyword(s):

Storage Systems ◽

High Capacity ◽

Random Access ◽

General Purpose ◽

Information Storage ◽

Locality Sensitive Hashing ◽

Probe Design ◽

Dna Storage ◽

Data Objects ◽

Dna Pool

ABSTRACT Due to the rapid cost decline of synthesizing and sequencing deoxyribonucleic acid (DNA), high information density, and its durability of up to centuries, utilizing DNA as an information storage medium has received the attention of many scientists. State-of-the-art DNA storage systems exploit the high capacity of DNA and enable random access (predominantly random reads) by primers, which serve as unique identifiers for directly accessing data. However, primers come with a significant limitation regarding the maximum available number per DNA library. The number of different primers within a library is typically very small (e.g. ≈10). We propose a method to overcome this deficiency and present a general-purpose technique for addressing and directly accessing thousands to potentially millions of different data objects within the same DNA pool. Our approach utilizes a fountain code, sophisticated probe design, and microarray technologies. A key component is locality-sensitive hashing, making checks for dissimilarity among such a large number of probes and data objects feasible.

Download Full-text

AFLP-derived, Codominant Markers for Locus-specific Applications

HortScience ◽

10.21273/hortsci.33.3.514e ◽

1998 ◽

Vol 33 (3) ◽

pp. 514e-514

Author(s):

James M. Bradeen ◽

Philipp W. Simon

Keyword(s):

Linkage Mapping ◽

Large Scale ◽

Pcr Primers ◽

Inverse Pcr ◽

Sequence Information ◽

Pcr Assay ◽

Specific Primers ◽

Simultaneous Evaluation ◽

Feral Populations ◽

Diversity Assessment

The amplified fragment length polymorphism (AFLP) is a powerful marker, allowing rapid and simultaneous evaluation of multiple potentially polymorphic sites. Although well-adapted to linkage mapping and diversity assessment, AFLPs are primarily dominant in nature. Dominance, relatively high cost, and technological difficulty limit use of AFLPs for marker-aided selection and other locus-specific applications. In carrot the Y2 locus conditions carotene accumulation in the root xylem. We identified AFLP fragments linked to the dominant Y2 allele and pursued conversion of those fragments to codominant, PCR-based forms useful for locus-specific applications. The short length of AFLPs (≈60 to 500 bp) precludes development of longer, more specific primers as in SCAR development. Instead, using sequence information from cloned AFLP fragments for primer design, regions outside of the original fragment were amplified by inverse PCR or ligation-mediated PCR, cloned, and sequenced. Differences in sequences associated with Y2 vs. y2 allowed development of simple PCR assays differentiating those alleles. PCR primers flanking an insertion associated with the recessive allele amplified differently sized products for the two Y2 alleles in one assay. This assay is rapid, technologically simple (requiring no radioactivity and little advanced training or equipment), reliable, inexpensive, and codominant. Our PCR assay has a variety of large scale, locus-specific applications including genotyping diverse carrot cultivars and wild and feral populations. Efforts are underway to improve upon conversion technology and to more extensively test the techniques we have developed.

Download Full-text

NVMCache: Wear-Aware Load Balancing NVM-based Caching for Large-Scale Storage Systems

2020 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom) ◽

10.1109/ispa-bdcloud-socialcom-sustaincom51426.2020.00108 ◽

2020 ◽

Author(s):

Zhenhua Cai ◽

Jiayun Lin ◽

Fang Liu ◽

Zhiguang Chen ◽

Hongtao Li

Keyword(s):

Load Balancing ◽

Large Scale ◽

Storage Systems

Download Full-text

Sizing and Management of Energy Storage Systems in Large-Scale Power Plants Using Price Control and Artificial Intelligence

Energies ◽

10.3390/en14113296 ◽

2021 ◽

Vol 14 (11) ◽

pp. 3296

Author(s):

Carlos García-Santacruz ◽

Luis Galván ◽

Juan M. Carrasco ◽

Eduardo Galván

Keyword(s):

Energy Storage ◽

Power Plants ◽

Large Scale ◽

Storage Systems ◽

Renewable Energy Sources ◽

Ancillary Services ◽

Energy Sources ◽

Energy Storage Systems ◽

Electric System ◽

Fundamental Part

Energy storage systems are expected to play a fundamental part in the integration of increasing renewable energy sources into the electric system. They are already used in power plants for different purposes, such as absorbing the effect of intermittent energy sources or providing ancillary services. For this reason, it is imperative to research managing and sizing methods that make power plants with storage viable and profitable projects. In this paper, a managing method is presented, where particle swarm optimisation is used to reach maximum profits. This method is compared to expert systems, proving that the former achieves better results, while respecting similar rules. The paper further presents a sizing method which uses the previous one to make the power plant as profitable as possible. Finally, both methods are tested through simulations to show their potential.

Download Full-text

Electrochemically induced structural reconstruction in promoting Zn storage performances of CaMn3O6 cathode for superior long life aqueous Zn-ion battery

Journal of Materials Chemistry A ◽

10.1039/d1ta03708k ◽

2021 ◽

Author(s):

peisheng guo ◽

gongzheng yang ◽

Chengxin Wang

Keyword(s):

Energy Storage ◽

Large Scale ◽

Storage Systems ◽

Low Cost ◽

Zinc Ion ◽

Manufacturing Processes ◽

Energy Storage Systems ◽

Long Life

Aqueous zinc-ion batteries (AZIBs) have been regarded as alternative and promising large-scale energy storage systems due to their low cost, convenient manufacturing processes, and high safety. However, their development was...

Download Full-text