scholarly journals Nucleotide Epi-Chains and New Nucleotide Probability Rules in Long DNA Sequences

Author(s):  
Sergey V. Petoukhov

One of creators of quantum mechanics P. Jordan in his work on quantum biology claimed that life's missing laws were the rules of chance and probability of the quantum world. The article presents author’s results of studying probabilities of nucleotides on so-called epi-chains of long DNA sequences of various eukaryotic and prokaryotic genomes. DNA epi-chains are algorithmically constructed subsequencies of DNA nucleotide sequences. According to the algorithm of construction of any epi-chain of the order n, the epi-chain is such nucleotide subsequence, in which the numerations of adjacent nucleotides differ by n    (n = 2, 3, 4,…). Correspondingly each epi-chain of order n contains n times less nucleotides than the original DNA sequence. The presented results unexpectedly show that nucleotide probabilities on such DNA epi-chains of different orders are practically identical to nucleotide probabilities in the original long DNA sequence. These data allow considering DNA as a regular rich set of epi-chains, which can play a certain role in genetic and epigenetic phenomena as the author belives. Appropriate rules of nucleotide probabilities on epi-chains of long DNA sequences are formulated for further their tests on a wider set of biological genomes. These phenomenological data and their possible biological meaning are discussed.

Author(s):  
Sergey Petoukhov

The article presents the author's results of studying hidden rules of structural organizations of long DNA sequences in eukaryotic and prokaryotic genomes. The results concern some rules of percentages (or probabilities) of n-plets in genomes. To reveal such rules, the author uses a tensor family of matrix representations of interrelated DNA-alphabets of 4 nucleotides, 16 doublets, 64 triplets, and 256 tetraplets. If percentages of each of these n-plets in tested genomic DNA-texts are disposed into appropriate cells of appropriate matrices, unexpected rules of invariance of total sums of their percentages in certain tetra-groupings of n-plets are revealed. The author connects the received results about these genomic percentages rules with a supposition of P. Jordan, who is one of the creators of quantum mechanics and quantum biology, that life's missing laws are the rules of chance and probability of the quantum world. Algebraic features of the genomic matrices of percentages of n-plets are analyzed and discussed. The received results can be used for further development of quantum biology.


Author(s):  
Sergey V. Petoukhov

The article is devoted to the new results of the author, which add his previously published ones, of studying hidden rules and symmetries in structures of long single-stranded DNA sequences in eukaryotic and prokaryotic genomes. The author uses the existence of different alphabets of n-plets in DNA: the alphabet of 4 nucleotides, the alphabet of 16 douplets, the alphabet of 64 triplets, etc. Each of such DNA alphabets of n-plets can serve for constructing a text as a chain of these n-plets. Using this possibility, the author represents any long DNA nucleotide sequence as a bunch of many so-called n-texts, each of which is written on the basis of one of these alphabets of n-plets. Each of such n-texts has its individual percents of different n-plets in its genomic DNA. But it turns out that in such multi-alphabetical or multilayer presentation of each of many genomic DNA, analyzed by the author, universal rules of probabilities and symmetry exist in interrelations of its different n-texts regarding their percents of n-plets. In this study, the tensor product of matrices and vectors is used as an effective analytical tool borrowed from the arsenal of quantum mechanics. Some additions to the topic of algebra-holographic principles in genetics are also presented. Taking into account the described genomic rules of probability, the author puts also forward a concept of the important role of stochastic resonances in genetic informatics.


Author(s):  
Sergey Petoukhov

The article presents the author's results of studying hidden rules of structural organizations of long DNA sequences in eukaryotic and prokaryotic genomes. The results concern some rules of percentages (or probabilities) of n-plets in genomes. To reveal such rules, the author considers genomic DNA nucleotide sequences as multilayers sequences of n-plets and studies the percentage contents of n-plets in different layers. Unexpected rules of invariance of total sums of percentages in certain tetra-groupings of n-plets in different layers of genomic DNA sequences are revealed. These discovered rules are candidates for the role of universal genomic rules. A tensor family of matrix representations of interrelated DNA-alphabets of 4 nucleotides, 16 doublets, 64 triplets, and 256 tetraplets is used in the study. This matrix approach allows revealing algebraic properties of the mentioned genetic rules of probabilities, which are useful for developing algebraic and quantum biology. Some analogies of the discovered genetic phenomena with phenomena of Gestalt psychology are noted and discussed. The author connects the received results about the genomic percentages rules with a supposition of P. Jordan, who is one of the creators of quantum mechanics and quantum biology, that life's missing laws are the rules of chance and probability of the quantum world. Additional attention is paid to the algebraic features of the system of structured DNA alphabets and their relationship with the methods of algebraic holography, known in the technique of processing discrete signals. The concept of algebraic-holographic genetics is being developed for the understanding of inherited holographic properties of organisms.


2018 ◽  
Author(s):  
Kirill Kryukov ◽  
Mahoko Takahashi Ueda ◽  
So Nakagawa ◽  
Tadashi Imanishi

AbstractSummaryDNA sequence databases use compression such as gzip to reduce the required storage space and network transmission time. We describe Nucleotide Archival Format (NAF) – a new file format for lossless reference-free compression of FASTA and FASTQ-formatted nucleotide sequences. NAF compression ratio is comparable to the best DNA compressors, while providing dramatically faster decompression. We compared our format with DNA compressors: DELIMINATE and MFCompress, and with general purpose compressors: gzip, bzip2, xz, brotli, and zstd.AvailabilityNAF compressor and decompressor, as well as format specification are available at https://github.com/KirillKryukov/naf. Format specification is in public domain. Compressor and decompressor are open source under the zlib/libpng license, free for nearly any [email protected]


Author(s):  
Sergey Petoukhov

The article presents the author's results of studying hidden rules of structural organizations of long DNA sequences in eukaryotic and prokaryotic genomes. The results concern some rules of percentages (or probabilities) of n-plets in genomes. To reveal such rules, the author considers genomic DNA nucleotide sequences as multilayers sequences of n-plets and studies the percentage contents of n-plets in different layers. Unexpected rules of invariance of total sums of percentages in certain tetra-groupings of n-plets in different layers of genomic DNA sequences are revealed. These discovered rules are candidates for the role of universal genomic rules. A tensor family of matrix representations of interrelated DNA-alphabets of 4 nucleotides, 16 doublets, 64 triplets, and 256 tetraplets is used in the study. This matrix approach allows revealing algebraic properties of the mentioned genetic rules of probabilities, which are useful for developing algebraic and quantum biology. Some analogies of the discovered genetic phenomena with phenomena of Gestalt psychology are noted and discussed. The author connects the received results about the genomic percentages rules with a supposition of P. Jordan, who is one of the creators of quantum mechanics and quantum biology, that life's missing laws are the rules of chance and probability of the quantum world.


Author(s):  
Barbara Trask ◽  
Susan Allen ◽  
Anne Bergmann ◽  
Mari Christensen ◽  
Anne Fertitta ◽  
...  

Using fluorescence in situ hybridization (FISH), the positions of DNA sequences can be discretely marked with a fluorescent spot. The efficiency of marking DNA sequences of the size cloned in cosmids is 90-95%, and the fluorescent spots produced after FISH are ≈0.3 μm in diameter. Sites of two sequences can be distinguished using two-color FISH. Different reporter molecules, such as biotin or digoxigenin, are incorporated into DNA sequence probes by nick translation. These reporter molecules are labeled after hybridization with different fluorochromes, e.g., FITC and Texas Red. The development of dual band pass filters (Chromatechnology) allows these fluorochromes to be photographed simultaneously without registration shift.


2013 ◽  
Vol 41 (2) ◽  
pp. 548-553 ◽  
Author(s):  
Andrew A. Travers ◽  
Georgi Muskhelishvili

How much information is encoded in the DNA sequence of an organism? We argue that the informational, mechanical and topological properties of DNA are interdependent and act together to specify the primary characteristics of genetic organization and chromatin structures. Superhelicity generated in vivo, in part by the action of DNA translocases, can be transmitted to topologically sensitive regions encoded by less stable DNA sequences.


2021 ◽  
Vol 15 (1) ◽  
Author(s):  
Anastasios A. Tsonis ◽  
Geli Wang ◽  
Lvyi Zhang ◽  
Wenxu Lu ◽  
Aristotle Kayafas ◽  
...  

Abstract Background Mathematical approaches have been for decades used to probe the structure of DNA sequences. This has led to the development of Bioinformatics. In this exploratory work, a novel mathematical method is applied to probe the DNA structure of two related viral families: those of coronaviruses and those of influenza viruses. The coronaviruses are SARS-CoV-2, SARS-CoV-1, and MERS. The influenza viruses include H1N1-1918, H1N1-2009, H2N2-1957, and H3N2-1968. Methods The mathematical method used is the slow feature analysis (SFA), a rather new but promising method to delineate complex structure in DNA sequences. Results The analysis indicates that the DNA sequences exhibit an elaborate and convoluted structure akin to complex networks. We define a measure of complexity and show that each DNA sequence exhibits a certain degree of complexity within itself, while at the same time there exists complex inter-relationships between the sequences within a family and between the two families. From these relationships, we find evidence, especially for the coronavirus family, that increasing complexity in a sequence is associated with higher transmission rate but with lower mortality. Conclusions The complexity measure defined here may hold a promise and could become a useful tool in the prediction of transmission and mortality rates in future new viral strains.


2021 ◽  
Vol 22 (6) ◽  
pp. 3079
Author(s):  
Xuechen Mu ◽  
Yueying Wang ◽  
Meiyu Duan ◽  
Shuai Liu ◽  
Fei Li ◽  
...  

Enhancers are short genomic regions exerting tissue-specific regulatory roles, usually for remote coding regions. Enhancers are observed in both prokaryotic and eukaryotic genomes, and their detections facilitate a better understanding of the transcriptional regulation mechanism. The accurate detection and transcriptional regulation strength evaluation of the enhancers remain a major bioinformatics challenge. Most of the current studies utilized the statistical features of short fixed-length nucleotide sequences. This study introduces the location information of each k-mer (SeqPose) into the encoding strategy of a DNA sequence and employs the attention mechanism in the two-layer bi-directional long-short term memory (BD-LSTM) model (spEnhancer) for the enhancer detection problem. The first layer of the delivered classifier discriminates between enhancers and non-enhancers, and the second layer evaluates the transcriptional regulation strength of the detected enhancer. The SeqPose-encoded features are selected by the Chi-squared test, and 45 positions are removed from further analysis. The existing studies may focus on selecting the statistical DNA sequence descriptors with large contributions to the prediction models. This study does not utilize these statistical DNA sequence descriptors. Then the word vector of the SeqPose-encoded features is obtained by using the word embedding layer. This study hypothesizes that different word vector features may contribute differently to the enhancer detection model, and assigns different weights to these word vectors through the attention mechanism in the BD-LSTM model. The previous study generously provided the training and independent test datasets, and the proposed spEnhancer is compared with the three existing state-of-the-art studies using the same experimental procedure. The leave-one-out validation data on the training dataset shows that the proposed spEnhancer achieves similar detection performances as the three existing studies. While spEnhancer achieves the best overall performance metric MCC for both of the two binary classification problems on the independent test dataset. The experimental data shows that the strategy of removing redundant positions (SeqPose) may help improve the DNA sequence-based prediction models. spEnhancer may serve well as a complementary model to the existing studies, especially for the novel query enhancers that are not included in the training dataset.


1999 ◽  
Vol 341 (1) ◽  
pp. 89-93 ◽  
Author(s):  
Gianluca TELL ◽  
Lucia PELLIZZARI ◽  
Gennaro ESPOSITO ◽  
Carlo PUCILLO ◽  
Paolo Emidio MACCHIA ◽  
...  

Pax proteins are transcriptional regulators that play important roles during embryogenesis. These proteins recognize specific DNA sequences via a conserved element: the paired domain (Prd domain). The low level of organized secondary structure, in the free state, is a general feature of Prd domains; however, these proteins undergo a dramatic gain in α-helical content upon interaction with DNA (‘induced fit’). Pax8 is expressed in the developing thyroid, kidney and several areas of the central nervous system. In humans, mutations of the Pax8 gene, which are mapped to the coding region of the Prd domain, give rise to congenital hypothyroidism. Here, we have investigated the molecular defects caused by a mutation in which leucine at position 62 is substituted for an arginine. Leu62 is conserved among Prd domains, and contributes towards the packing together of helices 1 and 3. The binding affinity of the Leu62Arg mutant for a specific DNA sequence (the C sequence of thyroglobulin promoter) is decreased 60-fold with respect to the wild-type Pax8 Prd domain. However, the affinities with which the wild-type and the mutant proteins bind to a non-specific DNA sequence are very similar. CD spectra demonstrate that, in the absence of DNA, both wild-type Pax8 and the Leu62Arg mutant possess a low α-helical content; however, in the Leu62Arg mutant, the gain in α-helical content upon interaction with DNA is greatly reduced with respect to the wild-type protein. Thus the molecular defect of the Leu62Arg mutant causes a reduced capability for induced fit upon DNA interaction.


Sign in / Sign up

Export Citation Format

Share Document