Multidimensional data organization and random access in large-scale DNA storage systems

Multidimensional Data Organization and Random Access in Large-Scale DNA Storage Systems

10.1101/743369 ◽

2019 ◽

Author(s):

Xin Song ◽

Shalin Shah ◽

John Reif

Keyword(s):

Large Scale ◽

Storage Systems ◽

Random Access ◽

Data Retrieval ◽

Pcr Primers ◽

Multidimensional Data ◽

Data Organization ◽

Dna Storage ◽

Reverse Primer ◽

Access Patterns

AbstractWith impressive density and coding capacity, DNA offers a promising solution for building long-lasting data archival storage systems. In recent implementations, data retrieval such as random access typically relies on a large library of non-interacting PCR primers. While several algorithms automate the primer design process, the capacity and scalability of DNA-based storage systems are still fundamentally limited by the availability of experimentally validated orthogonal primers. In this work, we combine the nested and semi-nested PCR techniques to virtually enforce multidimensional data organization in large DNA storage systems. The strategy effectively pushes the limit of DNA storage capacity and reduces the number of primers needed for efficient random access from very large address space. Specifically, our design requires k * n unique primers to index nk data entries, where k specifies the number of dimensions and n indicates the number of data entries stored in each dimension. We strategically leverage forward/reverse primer pairs from the same or different address layers to virtually specify and maintain data retrievals in the form of rows, columns, tables, and blocks with respect to the original storage pool. This architecture enables various random-access patterns that could be tailored to preserve the underlying data structures and relations (e.g., files and folders) within the storage content. With just one or two rounds of PCR, specific data subsets or individual datum from the large multidimensional storage can be selectively enriched for simple extraction by gel electrophoresis or readout via sequencing.Abstract Figure

Download Full-text

High-scale random access on DNA storage systems

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab126 ◽

2022 ◽

Vol 4 (1) ◽

Author(s):

Alex El-Shaikh ◽

Marius Welzel ◽

Dominik Heider ◽

Bernhard Seeger

Keyword(s):

Storage Systems ◽

High Capacity ◽

Random Access ◽

General Purpose ◽

Information Storage ◽

Locality Sensitive Hashing ◽

Probe Design ◽

Dna Storage ◽

Data Objects ◽

Dna Pool

ABSTRACT Due to the rapid cost decline of synthesizing and sequencing deoxyribonucleic acid (DNA), high information density, and its durability of up to centuries, utilizing DNA as an information storage medium has received the attention of many scientists. State-of-the-art DNA storage systems exploit the high capacity of DNA and enable random access (predominantly random reads) by primers, which serve as unique identifiers for directly accessing data. However, primers come with a significant limitation regarding the maximum available number per DNA library. The number of different primers within a library is typically very small (e.g. ≈10). We propose a method to overcome this deficiency and present a general-purpose technique for addressing and directly accessing thousands to potentially millions of different data objects within the same DNA pool. Our approach utilizes a fountain code, sophisticated probe design, and microarray technologies. A key component is locality-sensitive hashing, making checks for dissimilarity among such a large number of probes and data objects feasible.

Download Full-text

NVMCache: Wear-Aware Load Balancing NVM-based Caching for Large-Scale Storage Systems

2020 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom) ◽

10.1109/ispa-bdcloud-socialcom-sustaincom51426.2020.00108 ◽

2020 ◽

Author(s):

Zhenhua Cai ◽

Jiayun Lin ◽

Fang Liu ◽

Zhiguang Chen ◽

Hongtao Li

Keyword(s):

Load Balancing ◽

Large Scale ◽

Storage Systems

Download Full-text

Sizing and Management of Energy Storage Systems in Large-Scale Power Plants Using Price Control and Artificial Intelligence

Energies ◽

10.3390/en14113296 ◽

2021 ◽

Vol 14 (11) ◽

pp. 3296

Author(s):

Carlos García-Santacruz ◽

Luis Galván ◽

Juan M. Carrasco ◽

Eduardo Galván

Keyword(s):

Energy Storage ◽

Power Plants ◽

Large Scale ◽

Storage Systems ◽

Renewable Energy Sources ◽

Ancillary Services ◽

Energy Sources ◽

Energy Storage Systems ◽

Electric System ◽

Fundamental Part

Energy storage systems are expected to play a fundamental part in the integration of increasing renewable energy sources into the electric system. They are already used in power plants for different purposes, such as absorbing the effect of intermittent energy sources or providing ancillary services. For this reason, it is imperative to research managing and sizing methods that make power plants with storage viable and profitable projects. In this paper, a managing method is presented, where particle swarm optimisation is used to reach maximum profits. This method is compared to expert systems, proving that the former achieves better results, while respecting similar rules. The paper further presents a sizing method which uses the previous one to make the power plant as profitable as possible. Finally, both methods are tested through simulations to show their potential.

Download Full-text

Electrochemically induced structural reconstruction in promoting Zn storage performances of CaMn3O6 cathode for superior long life aqueous Zn-ion battery

Journal of Materials Chemistry A ◽

10.1039/d1ta03708k ◽

2021 ◽

Author(s):

peisheng guo ◽

gongzheng yang ◽

Chengxin Wang

Keyword(s):

Energy Storage ◽

Large Scale ◽

Storage Systems ◽

Low Cost ◽

Zinc Ion ◽

Manufacturing Processes ◽

Energy Storage Systems ◽

Long Life

Aqueous zinc-ion batteries (AZIBs) have been regarded as alternative and promising large-scale energy storage systems due to their low cost, convenient manufacturing processes, and high safety. However, their development was...

Download Full-text

Evaluation of distributed recovery in large-scale storage systems

Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004. ◽

10.1109/hpdc.2004.1323523 ◽

2004 ◽

Cited By ~ 38

Author(s):

Qin Xin ◽

E.L. Miller ◽

T.J.E. Schwarz

Keyword(s):

Large Scale ◽

Storage Systems

Download Full-text

Effect of lanthanum doping on the electrical properties of sol-gel derived ferroelectric lead–zirconate–titanate for ultra-large-scale integration dynamic random access memory applications

Journal of Vacuum Science & Technology B Microelectronics Processing and Phenomena ◽

10.1116/1.586933 ◽

1993 ◽

Vol 11 (4) ◽

pp. 1302 ◽

Cited By ~ 34

Author(s):

C. Sudhama

Keyword(s):

Lead Zirconate Titanate ◽

Large Scale ◽

Sol Gel ◽

Random Access ◽

Lead Zirconate ◽

Access Memory ◽

Lanthanum Doping ◽

Large Scale Integration ◽

Memory Applications ◽

Scale Integration

Download Full-text

Cooperative Sequence Clustering and Decoding for DNA Storage System with Fountain Codes

Bioinformatics ◽

10.1093/bioinformatics/btab246 ◽

2021 ◽

Author(s):

Jaeho Jeong ◽

Seong-Joon Park ◽

Jae-Won Kim ◽

Jong-Seon No ◽

Ha Hyeon Jeon ◽

...

Keyword(s):

Hamming Distance ◽

Storage Systems ◽

Storage System ◽

Data Retrieval ◽

Illumina Miseq ◽

Error Correcting Codes ◽

Read Length ◽

Sequence Coverage ◽

Source Codes ◽

Dna Storage

Abstract Motivation In DNA storage systems, there are tradeoffs between writing and reading costs. Increasing the code rate of error-correcting codes may save writing cost, but it will need more sequence reads for data retrieval. There is potentially a way to improve sequencing and decoding processes in such a way that the reading cost induced by this tradeoff is reduced without increasing the writing cost. In past researches, clustering, alignment, and decoding processes were considered as separate stages but we believe that using the information from all these processes together may improve decoding performance. Actual experiments of DNA synthesis and sequencing should be performed because simulations cannot be relied on to cover all error possibilities in practical circumstances. Results For DNA storage systems using fountain code and Reed-Solomon (RS) code, we introduce several techniques to improve the decoding performance. We designed the decoding process focusing on the cooperation of key components: Hamming-distance based clustering, discarding of abnormal sequence reads, RS error correction as well as detection, and quality score-based ordering of sequences. We synthesized 513.6KB data into DNA oligo pools and sequenced this data successfully with Illumina MiSeq instrument. Compared to Erlich’s research, the proposed decoding method additionally incorporates sequence reads with minor errors which had been discarded before, and thuswas able to make use of 10.6–11.9% more sequence reads from the same sequencing environment, this resulted in 6.5–8.9% reduction in the reading cost. Channel characteristics including sequence coverage and read-length distributions are provided as well. Availability The raw data files and the source codes of our experiments are available at: https://github.com/jhjeong0702/dna-storage.

Download Full-text