scholarly journals New Levenshtein-Marker Code for DNA-based Data Storage Capable of Correcting Multiple Edit Errors

Author(s):  
Yan Zihui ◽  
Cong Liang

With the development of DNA synthesis and sequencing technologies, DNA becomes a promising medium forlong-term data storage. Three types of errors may occur in the DNA strand, insertions, deletions and substitutions,which we collectively call edit errors. It is still challenging to design a code that can correct multiple edit errors onnon-binary alphabets. In this paper, we propose a new coding schema for correcting multiple edit errors on DNAstrands by splitting the whole strand into consecutive blocks with appropriate length and correcting a single editerror in each block. Our method, called theDNA-LMcode, could be considered a generalization of the Levenshteincode combined with the marker code. We provide a linear encoding and decoding algorithm for ourDNA-LMcode.Compared to other encoding methods for DNA strands of several hundred base-pairs, ourDNA-LMcode achievedsimilar code rates and a much lower average nucleotide error rate in decoding.

2021 ◽  
Author(s):  
Yan Zihui ◽  
Cong Liang

With the development of DNA synthesis and sequencing technologies, DNA becomes a promising medium forlong-term data storage. Three types of errors may occur in the DNA strand, insertions, deletions and substitutions,which we collectively call edit errors. It is still challenging to design a code that can correct multiple edit errors onnon-binary alphabets. In this paper, we propose a new coding schema for correcting multiple edit errors on DNAstrands by splitting the whole strand into consecutive blocks with appropriate length and correcting a single editerror in each block. Our method, called theDNA-LMcode, could be considered a generalization of the Levenshteincode combined with the marker code. We provide a linear encoding and decoding algorithm for ourDNA-LMcode.Compared to other encoding methods for DNA strands of several hundred base-pairs, ourDNA-LMcode achievedsimilar code rates and a much lower average nucleotide error rate in decoding.


Author(s):  
Samiha Abdelrahman Mohammed Marwan ◽  
Ahmed Shawish ◽  
Khaled Nagaty

There are continuous threats to network technologies due to its rapidly-changing nature, which raises the demand for data-safe transmission. As a result, the need to come up with new techniques for securing data and accommodating the growing quantities of information is crucial. From nature to science, the idea that genes themselves are made of information stimulated the research in molecular deoxyribonucleic acid (DNA). DNA is capable of storing huge amounts of data, which leads to its promising effect in steganography. DNA steganography is the art of using DNA as an information carrier which achieves high data storage capacity as well as high security level. Currently, DNA steganography techniques utilize the properties of only one DNA strand, since the other strand is completely dependent on the first one. This paper presents a DNA-based steganography technique that hides data into both DNA strands with respect to the dependency between the two strands. In the proposed technique, a key of the same length of the reference DNA sequence is generated after using the second DNA strand. The sender sends both the encrypted DNA message and its reference DNA sequence together into a microdot. If the recipient receives this microdot uncontaminated, the sender can safely send the generated key afterwards. The proposed technique doubles the amount of data stored and guarantees a secure transmission process as well, for even if the attacker suspects the first-sent DNA sequence, they will never receive the key, and hence full data extraction is nearly impossible. The conducted experimental study confirms the effectiveness of the proposed.


2021 ◽  
Author(s):  
Zihui Yan ◽  
Cong Liang

In recent years, DNA-based systems have become a promising medium for long-term data storage. There are two layers of errors in DNA-based storage systems. The first is the dropouts of the DNA strands, which has been characterized in the shuffling-sampling channel. The second is insertions, deletions, and substitutions of nucleotides in individual DNA molecules. In this paper, we describe a DNA noisy synchronization error channel to characterize the errors in individual DNA molecules. We derive non-trivial lower and upper capacity bounds of the DNA noisy synchronization error channel based on information theory. By cascading these two channels, we provide theoretical capacity limits of the DNA storage system. These results reaffirm that DNA is a reliable storage medium with high storage density potential.


2020 ◽  
Author(s):  
Song Mao ◽  
Zhihua Chang ◽  
Ya Ying Zheng ◽  
Alexander Shekhtman ◽  
Jia Sheng

A new family of hydrazone modified cytidine phosphoramidite building block was synthesized and incorporated into DNA oligonucleotides to construct photoswitchable DNA strands. The <i>E-Z</i> isomerization triggered by the irradiation of blue light with a wavelength of 450 nm was investigated and confirmed by <sup>1</sup>H NMR and HPLC in the contexts of both nucleoside and DNA oligonucleotide. The light activated <i>Z</i> form isomer of this hydrazone-cytidine with a six-member intramolecular hydrogen bond was found to inhibit DNA synthesis in the primer extension model by using <i>Bst</i> DNA polymerase. In addition, the hydrazone modification caused the misincorporation of dATP together with dGTP into the growing DNA strand with similar selectivity, highlighting the potential G to A mutation. This work provides a novel functional DNA building block and an additional molecular tool that have potential chemical biology and bio-medicinal applications to control DNA synthesis and DNA-enzyme interactions using cell friendly blue light irradiation.


Author(s):  
Perrin C . White

Much of the knowledge presented in the following chapters has been gained using molecular genetic techniques to analyze the structure, synthesis, regulation, and effects of hormones. This chapter provides an overview of some of the relevant techniques and associated concepts. To allow the reader to understand older experiments, we have tried to include techniques that are now of mainly historical interest as well as current concepts. Deoxyribonucleic acid (DNA) and ribonucleic acid (RNA) consist of nucleotides . A nucleotide consists of a base , a sugar moiety (either deoxyribose or ribose), and a phosphate group. The sugars and phosphates alternate in the backbone of a nucleic acid strand. In general, there are four possible bases. In DNA, these are adenine ( A ), cytosine ( C ), guanine ( G ), and thymine ( T ). Adenine and guanine are purines , whereas cytosine and thymine are pyrimidines . The corresponding nucleotides are adenosine , cytidine , guanosine , and thymidine. In RNA, uracil (uridine) is substituted for thymine (thymidine). DNA is double stranded. Each strand has a direction because the deoxyribose molecules forming the backbone are asymmetrical, with the phosphate bonds linking each two sugar molecules going from the 3’ position of one to the 5’ position of the next. Thus, the 5’ position of a sugar molecule is free at one end (the 5’ end) of the strand, and the 3’ position is free at the other. The two strands of a DNA molecule run in opposite directions, so that the 5’ end of one strand is opposed to the 3’ end of the complementary strand. The DNA strands interact with each other through complementary (Watson-Crick) base pairing , in which A and T, or C and G, are paired through hydrogen bonds. Thus, the sequence of one DNA strand unambiguously determines the sequence of the complementary strand during DNA replication. The length of a DNA segment is typically given in bases or nucleotides (nt) or, if double stranded, base pairs (bp).


2020 ◽  
Vol 48 (20) ◽  
pp. 11695-11705
Author(s):  
Feng He ◽  
Kevin DuPrez ◽  
Eduardo Hilario ◽  
Zhenhang Chen ◽  
Li Fan

Abstract Nucleotide excision repair (NER) removes various DNA lesions caused by UV light and chemical carcinogens. The DNA helicase XPB plays a key role in DNA opening and coordinating damage incision by nucleases during NER, but the underlying mechanisms remain unclear. Here, we report crystal structures of XPB from Sulfurisphaera tokodaii (St) bound to the nuclease Bax1 and their complex with a bubble DNA having one arm unwound in the crystal. StXPB and Bax1 together spirally encircle 10 base pairs of duplex DNA at the double-/single-stranded (ds–ss) junction. Furthermore, StXPB has its ThM motif intruding between the two DNA strands and gripping the 3′-overhang while Bax1 interacts with the 5′-overhang. This ternary complex likely reflects the state of repair bubble extension by the XPB and nuclease machine. ATP binding and hydrolysis by StXPB could lead to a spiral translocation along dsDNA and DNA strand separation by the ThM motif, revealing an unconventional DNA unwinding mechanism. Interestingly, the DNA is kept away from the nuclease domain of Bax1, potentially preventing DNA incision by Bax1 during repair bubble extension.


2020 ◽  
Author(s):  
Song Mao ◽  
Zhihua Chang ◽  
Ya Ying Zheng ◽  
Alexander Shekhtman ◽  
Jia Sheng

A new family of hydrazone modified cytidine phosphoramidite building block was synthesized and incorporated into DNA oligonucleotides to construct photoswitchable DNA strands. The <i>E-Z</i> isomerization triggered by the irradiation of blue light with a wavelength of 450 nm was investigated and confirmed by <sup>1</sup>H NMR and HPLC in the contexts of both nucleoside and DNA oligonucleotide. The light activated <i>Z</i> form isomer of this hydrazone-cytidine with a six-member intramolecular hydrogen bond was found to inhibit DNA synthesis in the primer extension model by using <i>Bst</i> DNA polymerase. In addition, the hydrazone modification caused the misincorporation of dATP together with dGTP into the growing DNA strand with similar selectivity, highlighting the potential G to A mutation. This work provides a novel functional DNA building block and an additional molecular tool that have potential chemical biology and bio-medicinal applications to control DNA synthesis and DNA-enzyme interactions using cell friendly blue light irradiation.


2021 ◽  
Author(s):  
Zihui Yan ◽  
Cong Liang

In recent years, DNA-based systems have become a promising medium for long-term data storage. There are two layers of errors in DNA-based storage systems. The first is the dropouts of the DNA strands, which has been characterized in the shuffling-sampling channel. The second is insertions, deletions, and substitutions of nucleotides in individual DNA molecules. In this paper, we describe a DNA noisy synchronization error channel to characterize the errors in individual DNA molecules. We derive non-trivial lower and upper capacity bounds of the DNA noisy synchronization error channel based on information theory. By cascading these two channels, we provide theoretical capacity limits of the DNA storage system. These results reaffirm that DNA is a reliable storage medium with high storage density potential.


Sign in / Sign up

Export Citation Format

Share Document