Multi-Fractal Analysis for Feature Extraction from DNA Sequences

Author(s):  
Witold Kinsner ◽  
Hong Zhang

This paper presents estimations of multi-scale (multi-fractal) measures for feature extraction from deoxyribonucleic acid (DNA) sequences, and demonstrates the intriguing possibility of identifying biological functionality using information contained within the DNA sequence. We have developed a technique that seeks patterns or correlations in the DNA sequence at a higher level than the local base-pair structure. The technique has three main steps: (i) transforms the DNA sequence symbols into a modified Lévy walk, (ii) transforms the Lévy walk into a signal spectrum, and (iii) breaks the spectrum into sub-spectra and treats each of these as an attractor from which the multi-fractal dimension spectrum is estimated. An optimal minimum window size and volume element size are found for estimation of the multi-fractal measures. Experimental results show that DNA is multi-fractal, and that the multi-fractality changes depending upon the location (coding or non-coding region) in the sequence.

Author(s):  
Witold Kinsner ◽  
Hong Zhang

This paper presents estimations of multi-scale (multi-fractal) measures for feature extraction from deoxyribonucleic acid (DNA) sequences, and demonstrates the intriguing possibility of identifying biological functionality using information contained within the DNA sequence. We have developed a technique that seeks patterns or correlations in the DNA sequence at a higher level than the local base-pair structure. The technique has three main steps: (i) transforms the DNA sequence symbols into a modified Lévy walk, (ii) transforms the Lévy walk into a signal spectrum, and (iii) breaks the spectrum into sub-spectra and treats each of these as an attractor from which the multi-fractal dimension spectrum is estimated. An optimal minimum window size and volume element size are found for estimation of the multi-fractal measures. Experimental results show that DNA is multi-fractal, and that the multi-fractality changes depending upon the location (coding or non-coding region) in the sequence.


1999 ◽  
Vol 341 (1) ◽  
pp. 89-93 ◽  
Author(s):  
Gianluca TELL ◽  
Lucia PELLIZZARI ◽  
Gennaro ESPOSITO ◽  
Carlo PUCILLO ◽  
Paolo Emidio MACCHIA ◽  
...  

Pax proteins are transcriptional regulators that play important roles during embryogenesis. These proteins recognize specific DNA sequences via a conserved element: the paired domain (Prd domain). The low level of organized secondary structure, in the free state, is a general feature of Prd domains; however, these proteins undergo a dramatic gain in α-helical content upon interaction with DNA (‘induced fit’). Pax8 is expressed in the developing thyroid, kidney and several areas of the central nervous system. In humans, mutations of the Pax8 gene, which are mapped to the coding region of the Prd domain, give rise to congenital hypothyroidism. Here, we have investigated the molecular defects caused by a mutation in which leucine at position 62 is substituted for an arginine. Leu62 is conserved among Prd domains, and contributes towards the packing together of helices 1 and 3. The binding affinity of the Leu62Arg mutant for a specific DNA sequence (the C sequence of thyroglobulin promoter) is decreased 60-fold with respect to the wild-type Pax8 Prd domain. However, the affinities with which the wild-type and the mutant proteins bind to a non-specific DNA sequence are very similar. CD spectra demonstrate that, in the absence of DNA, both wild-type Pax8 and the Leu62Arg mutant possess a low α-helical content; however, in the Leu62Arg mutant, the gain in α-helical content upon interaction with DNA is greatly reduced with respect to the wild-type protein. Thus the molecular defect of the Leu62Arg mutant causes a reduced capability for induced fit upon DNA interaction.


Genetics ◽  
1996 ◽  
Vol 142 (2) ◽  
pp. 603-618 ◽  
Author(s):  
An-Ping Hsia ◽  
Patrick S Schnable

Abstract Previous research has demonstrated that the autonomous Cy transposon can activate the excision of Mu transposons. To determine the relationship between Cy and the more recently described autonomous Mu transposon, MuDR, a Cy transposon inserted at the mutable a1 allele, a1-m5216, was isolated and cloned. DNA sequence analyses established that this Cy insertion is identical to MuDR (Mu9, GenBank accession No.: m76978.gb_pl). Therefore, Cy will henceforth be termed MuDR:Cy. Defective derivatives of MuDR:Cy were isolated that had lost their capacity to activate their own excision or the excision of a Mu7 transposon. Most of these derivatives are nonautonomous transposons because they can excise, but only in the presence of unlinked MuDR:Cy transposons. Physical mapping and DNA sequence analyses have established that six of these defective derivatives carry internal deletions. It has been proposed previously that such deletions arise via interrupted gap repair. The DNA sequences of the break points associated with all four sequenced deletions are consistent with this model. The finding that three of the excision-defective derivatives carry deletions that disrupt the coding region of the mudrA (but not the mudrB) transcript supports the view that mudrA plays a role in the excision of Mu transposons.


Author(s):  
Hsuan T. Chang

This chapter introduces various visualization (i.e., graphical representation) schemes of symbolic DNA sequences, which are basically represented by character strings in conventional sequence databases. Several visualization schemes are reviewed and their characterizations are summarized for comparison. Moreover, further potential applications based on the visualized sequences are discussed. By understanding the visualization process, the researchers will be able to analyze DNA sequences by designing signal processing algorithms for specific purposes such as sequence alignment, feature extraction, and sequence clustering, etc.


2006 ◽  
Vol 369 (2) ◽  
pp. 688-698 ◽  
Author(s):  
Yuan-Yen Tai ◽  
Ping-Cheng Li ◽  
Hsen-Che Tseng

1983 ◽  
Vol 3 (3) ◽  
pp. 448-456 ◽  
Author(s):  
M A Schuler ◽  
P McOsker ◽  
E B Keller

DNA sequences have been determined for two actin genes which are closely linked in the genome of the sea urchin Strongylocentrotus purpuratus. The two genes have the same 5'-3' orientation; they were apparently formed originally by tandem gene duplication. The amino acids encoded by the two genes closely resemble those of cytoplasmic actins of mammals and slime molds and differ somewhat from those of mammalian muscle actin. Actin gene 1 had been tentatively identified earlier as the gene for an embryonic cytoplasmic actin by the homology of the 3' noncoding region with that of the cDNA of an embryonic actin mRNA from S. purpuratus. The DNA sequence of gene 1 shows presumptive signals for the initiation and termination of transcription which would govern the formation of a mature mRNA of 1.9 kilobases. Both actin genes 1 and 2 have introns in their coding regions at codons 121/122 and 204. These positions for actin introns have been reported so far only in the rat, not in lower organisms. The divergence of the sequences of these coding-region introns in the two actin genes is 66%, suggesting that the genes diverged about 90 million years ago. By contrast to the introns, the coding regions have been highly conserved; the amino acids of the two genes differ by only 1.3%, and the silent sites of the codons differ by only 12%.


Author(s):  
Hsuan T. Chang

This chapter introduces various visualization (i.e., graphical representation) schemes of symbolic DNA sequences, which are basically represented by character strings in conventional sequence databases. Several visualization schemes are reviewed and their characterizations are summarized for comparison. Moreover, further potential applications based on the visualized sequences are discussed. By understanding the visualization process, the researchers will be able to analyze DNA sequences by designing signal processing algorithms for specific purposes such as sequence alignment, feature extraction, and sequence clustering, etc.


1983 ◽  
Vol 3 (3) ◽  
pp. 448-456
Author(s):  
M A Schuler ◽  
P McOsker ◽  
E B Keller

DNA sequences have been determined for two actin genes which are closely linked in the genome of the sea urchin Strongylocentrotus purpuratus. The two genes have the same 5'-3' orientation; they were apparently formed originally by tandem gene duplication. The amino acids encoded by the two genes closely resemble those of cytoplasmic actins of mammals and slime molds and differ somewhat from those of mammalian muscle actin. Actin gene 1 had been tentatively identified earlier as the gene for an embryonic cytoplasmic actin by the homology of the 3' noncoding region with that of the cDNA of an embryonic actin mRNA from S. purpuratus. The DNA sequence of gene 1 shows presumptive signals for the initiation and termination of transcription which would govern the formation of a mature mRNA of 1.9 kilobases. Both actin genes 1 and 2 have introns in their coding regions at codons 121/122 and 204. These positions for actin introns have been reported so far only in the rat, not in lower organisms. The divergence of the sequences of these coding-region introns in the two actin genes is 66%, suggesting that the genes diverged about 90 million years ago. By contrast to the introns, the coding regions have been highly conserved; the amino acids of the two genes differ by only 1.3%, and the silent sites of the codons differ by only 12%.


2005 ◽  
Vol 03 (03) ◽  
pp. 677-696 ◽  
Author(s):  
YINHE CAO ◽  
WEN-WEN TUNG ◽  
J. B. GAO ◽  
YAN QI

With the completion of the human and a few model organisms' genomes, and with the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time-based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Our method requires approximately 6 · N byte memory and a computational time of N log N to extract all the repeat-related and periodic or quasi-periodic features from a sequence of length N without any prior knowledge on the consensus sequence of those features, hence enables us to carry out sequence analysis on the whole genomic scale by a PC.


Author(s):  
Barbara Trask ◽  
Susan Allen ◽  
Anne Bergmann ◽  
Mari Christensen ◽  
Anne Fertitta ◽  
...  

Using fluorescence in situ hybridization (FISH), the positions of DNA sequences can be discretely marked with a fluorescent spot. The efficiency of marking DNA sequences of the size cloned in cosmids is 90-95%, and the fluorescent spots produced after FISH are ≈0.3 μm in diameter. Sites of two sequences can be distinguished using two-color FISH. Different reporter molecules, such as biotin or digoxigenin, are incorporated into DNA sequence probes by nick translation. These reporter molecules are labeled after hybridization with different fluorochromes, e.g., FITC and Texas Red. The development of dual band pass filters (Chromatechnology) allows these fluorochromes to be photographed simultaneously without registration shift.


Sign in / Sign up

Export Citation Format

Share Document