A brief review on DNA storage, compression, and digitalization

2021 ◽  
pp. 100391
Author(s):  
Yesenia Cevallos ◽  
Tadashi Nakano ◽  
Luis Tello-Oquendo ◽  
Ahmad Rushdi ◽  
Deysi Inca ◽  
...  
Keyword(s):  
Author(s):  
Ben Cao ◽  
Xiaokang Zhang ◽  
Jieqiong Wu ◽  
Bin Wang ◽  
Qiang Zhang ◽  
...  

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Kyle J. Tomek ◽  
Kevin Volkel ◽  
Elaine W. Indermaur ◽  
James M. Tuck ◽  
Albert J. Keung

AbstractDNA holds significant promise as a data storage medium due to its density, longevity, and resource and energy conservation. These advantages arise from the inherent biomolecular structure of DNA which differentiates it from conventional storage media. The unique molecular architecture of DNA storage also prompts important discussions on how data should be organized, accessed, and manipulated and what practical functionalities may be possible. Here we leverage thermodynamic tuning of biomolecular interactions to implement useful data access and organizational features. Specific sets of environmental conditions including distinct DNA concentrations and temperatures were screened for their ability to switchably access either all DNA strands encoding full image files from a GB-sized background database or subsets of those strands encoding low resolution, File Preview, versions. We demonstrate File Preview with four JPEG images and provide an argument for the substantial and practical economic benefit of this generalizable strategy to organize data.


Author(s):  
Jaeho Jeong ◽  
Seong-Joon Park ◽  
Jae-Won Kim ◽  
Jong-Seon No ◽  
Ha Hyeon Jeon ◽  
...  

Abstract Motivation In DNA storage systems, there are tradeoffs between writing and reading costs. Increasing the code rate of error-correcting codes may save writing cost, but it will need more sequence reads for data retrieval. There is potentially a way to improve sequencing and decoding processes in such a way that the reading cost induced by this tradeoff is reduced without increasing the writing cost. In past researches, clustering, alignment, and decoding processes were considered as separate stages but we believe that using the information from all these processes together may improve decoding performance. Actual experiments of DNA synthesis and sequencing should be performed because simulations cannot be relied on to cover all error possibilities in practical circumstances. Results For DNA storage systems using fountain code and Reed-Solomon (RS) code, we introduce several techniques to improve the decoding performance. We designed the decoding process focusing on the cooperation of key components: Hamming-distance based clustering, discarding of abnormal sequence reads, RS error correction as well as detection, and quality score-based ordering of sequences. We synthesized 513.6KB data into DNA oligo pools and sequenced this data successfully with Illumina MiSeq instrument. Compared to Erlich’s research, the proposed decoding method additionally incorporates sequence reads with minor errors which had been discarded before, and thuswas able to make use of 10.6–11.9% more sequence reads from the same sequencing environment, this resulted in 6.5–8.9% reduction in the reading cost. Channel characteristics including sequence coverage and read-length distributions are provided as well. Availability The raw data files and the source codes of our experiments are available at: https://github.com/jhjeong0702/dna-storage.


2019 ◽  
Vol 65 (6) ◽  
pp. 3671-3691 ◽  
Author(s):  
Maya Levy ◽  
Eitan Yaakobi
Keyword(s):  

2021 ◽  
Author(s):  
Min Li ◽  
Junbiao Dai ◽  
Qingshan Jiang ◽  
Yang Wang

Abstract Current research on DNA storage usually focuses on the improvement of storage density with reduced gene synthesis cost by developing effective encoding and decoding schemes while lacking the consideration on the uncertainty in ultra long-term data storage and retention. Consequently, the current DNA storage systems are often not self-containment, implying that they have to resort to external tools for the restoration of the stored gene data. This may result in high risks in data loss since the required tools might not be available due to the high uncertainty in far future. To address this issue, we propose in this paper a self-contained DNA storage system that can make self-explanatory to its stored data without relying on any external tools. To this end, we design a specific DNA file format whereby a separate storage scheme is developed to reduce the data redundancy while an effective indexing is designed for random read operations to the stored data file. We verified through experimental data that the proposed self-contained and self-explanatory method can not only get rid of the reliance on external tools for data restoration but also minimize the data redundancy brought about when the amount of data to be stored reaches a certain scale.


Author(s):  
Reinhard Heckel ◽  
Ilan Shomorony ◽  
Kannan Ramchandran ◽  
David N. C. Tse

2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Philipp L. Antkowiak ◽  
Jory Lietard ◽  
Mohammad Zalbagi Darestani ◽  
Mark M. Somoza ◽  
Wendelin J. Stark ◽  
...  

Abstract Due to its longevity and enormous information density, DNA is an attractive medium for archival storage. The current hamstring of DNA data storage systems—both in cost and speed—is synthesis. The key idea for breaking this bottleneck pursued in this work is to move beyond the low-error and expensive synthesis employed almost exclusively in today’s systems, towards cheaper, potentially faster, but high-error synthesis technologies. Here, we demonstrate a DNA storage system that relies on massively parallel light-directed synthesis, which is considerably cheaper than conventional solid-phase synthesis. However, this technology has a high sequence error rate when optimized for speed. We demonstrate that even in this high-error regime, reliable storage of information is possible, by developing a pipeline of algorithms for encoding and reconstruction of the information. In our experiments, we store a file containing sheet music of Mozart, and show perfect data recovery from low synthesis fidelity DNA.


2019 ◽  
Vol 38 (1) ◽  
pp. 31-32
Author(s):  
Fahim Farzadfard
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document