scholarly journals Multidimensional data organization and random access in large-scale DNA storage systems

Author(s):  
Xin Song ◽  
Shalin Shah ◽  
John Reif
2019 ◽  
Author(s):  
Xin Song ◽  
Shalin Shah ◽  
John Reif

AbstractWith impressive density and coding capacity, DNA offers a promising solution for building long-lasting data archival storage systems. In recent implementations, data retrieval such as random access typically relies on a large library of non-interacting PCR primers. While several algorithms automate the primer design process, the capacity and scalability of DNA-based storage systems are still fundamentally limited by the availability of experimentally validated orthogonal primers. In this work, we combine the nested and semi-nested PCR techniques to virtually enforce multidimensional data organization in large DNA storage systems. The strategy effectively pushes the limit of DNA storage capacity and reduces the number of primers needed for efficient random access from very large address space. Specifically, our design requires k * n unique primers to index nk data entries, where k specifies the number of dimensions and n indicates the number of data entries stored in each dimension. We strategically leverage forward/reverse primer pairs from the same or different address layers to virtually specify and maintain data retrievals in the form of rows, columns, tables, and blocks with respect to the original storage pool. This architecture enables various random-access patterns that could be tailored to preserve the underlying data structures and relations (e.g., files and folders) within the storage content. With just one or two rounds of PCR, specific data subsets or individual datum from the large multidimensional storage can be selectively enriched for simple extraction by gel electrophoresis or readout via sequencing.Abstract Figure


2022 ◽  
Vol 4 (1) ◽  
Author(s):  
Alex El-Shaikh ◽  
Marius Welzel ◽  
Dominik Heider ◽  
Bernhard Seeger

ABSTRACT Due to the rapid cost decline of synthesizing and sequencing deoxyribonucleic acid (DNA), high information density, and its durability of up to centuries, utilizing DNA as an information storage medium has received the attention of many scientists. State-of-the-art DNA storage systems exploit the high capacity of DNA and enable random access (predominantly random reads) by primers, which serve as unique identifiers for directly accessing data. However, primers come with a significant limitation regarding the maximum available number per DNA library. The number of different primers within a library is typically very small (e.g. ≈10). We propose a method to overcome this deficiency and present a general-purpose technique for addressing and directly accessing thousands to potentially millions of different data objects within the same DNA pool. Our approach utilizes a fountain code, sophisticated probe design, and microarray technologies. A key component is locality-sensitive hashing, making checks for dissimilarity among such a large number of probes and data objects feasible.


Energies ◽  
2021 ◽  
Vol 14 (11) ◽  
pp. 3296
Author(s):  
Carlos García-Santacruz ◽  
Luis Galván ◽  
Juan M. Carrasco ◽  
Eduardo Galván

Energy storage systems are expected to play a fundamental part in the integration of increasing renewable energy sources into the electric system. They are already used in power plants for different purposes, such as absorbing the effect of intermittent energy sources or providing ancillary services. For this reason, it is imperative to research managing and sizing methods that make power plants with storage viable and profitable projects. In this paper, a managing method is presented, where particle swarm optimisation is used to reach maximum profits. This method is compared to expert systems, proving that the former achieves better results, while respecting similar rules. The paper further presents a sizing method which uses the previous one to make the power plant as profitable as possible. Finally, both methods are tested through simulations to show their potential.


Author(s):  
peisheng guo ◽  
gongzheng yang ◽  
Chengxin Wang

Aqueous zinc-ion batteries (AZIBs) have been regarded as alternative and promising large-scale energy storage systems due to their low cost, convenient manufacturing processes, and high safety. However, their development was...


Author(s):  
Jaeho Jeong ◽  
Seong-Joon Park ◽  
Jae-Won Kim ◽  
Jong-Seon No ◽  
Ha Hyeon Jeon ◽  
...  

Abstract Motivation In DNA storage systems, there are tradeoffs between writing and reading costs. Increasing the code rate of error-correcting codes may save writing cost, but it will need more sequence reads for data retrieval. There is potentially a way to improve sequencing and decoding processes in such a way that the reading cost induced by this tradeoff is reduced without increasing the writing cost. In past researches, clustering, alignment, and decoding processes were considered as separate stages but we believe that using the information from all these processes together may improve decoding performance. Actual experiments of DNA synthesis and sequencing should be performed because simulations cannot be relied on to cover all error possibilities in practical circumstances. Results For DNA storage systems using fountain code and Reed-Solomon (RS) code, we introduce several techniques to improve the decoding performance. We designed the decoding process focusing on the cooperation of key components: Hamming-distance based clustering, discarding of abnormal sequence reads, RS error correction as well as detection, and quality score-based ordering of sequences. We synthesized 513.6KB data into DNA oligo pools and sequenced this data successfully with Illumina MiSeq instrument. Compared to Erlich’s research, the proposed decoding method additionally incorporates sequence reads with minor errors which had been discarded before, and thuswas able to make use of 10.6–11.9% more sequence reads from the same sequencing environment, this resulted in 6.5–8.9% reduction in the reading cost. Channel characteristics including sequence coverage and read-length distributions are provided as well. Availability The raw data files and the source codes of our experiments are available at: https://github.com/jhjeong0702/dna-storage.


Energy ◽  
2017 ◽  
Vol 140 ◽  
pp. 656-672 ◽  
Author(s):  
Sahil Kapila ◽  
Abayomi Olufemi Oni ◽  
Amit Kumar

Author(s):  
Bingpeng Zhu ◽  
Gang Wang ◽  
Xiaoguang Liu ◽  
Dianming Hu ◽  
Sheng Lin ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document