scholarly journals Numerical Characterization of DNA Sequence Based on Dinucleotides

2012 ◽  
Vol 2012 ◽  
pp. 1-6 ◽  
Author(s):  
Xingqin Qi ◽  
Edgar Fuller ◽  
Qin Wu ◽  
Cun-Quan Zhang

Sequence comparison is a primary technique for the analysis of DNA sequences. In order to make quantitative comparisons, one devises mathematical descriptors that capture the essence of the base composition and distribution of the sequence. Alignment methods and graphical techniques (where each sequence is represented by a curve in high-dimension Euclidean space) have been used popularly for a long time. In this contribution we will introduce a new nongraphical and nonalignment approach based on the frequencies of the dinucleotideXYin DNA sequences. The most important feature of this method is that it not only identifies adjacentXYpairs but also nonadjacentXYones whereXandYare separated by some number of nucleotides. This methodology preserves information in DNA sequence that is ignored by other methods. We test our method on the coding regions of exon-1 ofβ–globin for 11 species, and the utility of this new method is demonstrated.

Author(s):  
Natarajan Ramanathan ◽  
Jayalakshmi Ramamurthy ◽  
Ganapathy Natarajan

Background: Biological macromolecules namely, DNA, RNA, and protein have their building blocks organized in a particular sequence and the sequential arrangement encodes evolutionary history of the organism (species). Hence, biological sequences have been used for studying evolutionary relationships among the species. This is usually carried out by multiple sequence algorithms (MSA). Due to certain limitations of MSA, alignment-free sequence comparison methods were developed. The present review is on alignment-free sequence comparison methods carried out using numerical characterization of DNA sequences. Discussion: The graphical representation of DNA sequences by chaos game representation and other 2-dimesnional and 3-dimensional methods are discussed. The evolution of numerical characterization from the various graphical representations and the application of the DNA invariants thus computed in phylogenetic analysis is presented. The extension of computing molecular descriptors in chemometrics to the calculation of new set of DNA invariants and their use in alignment-free sequence comparison in a N-dimensional space and construction of phylogenetic tress is also reviewed. Conclusion: The phylogenetic tress constructed by the alignment-free sequence comparison methods using DNA invariants were found to be better than those constructed using alignment-based tools such as PHLYIP and ClustalW. One of the graphical representation methods is now extended to study viral sequences of infectious diseases for the identification of conserved regions to design peptide-based vaccine by combining numerical characterization and graphical representation.


2016 ◽  
Vol 6 (3) ◽  
pp. 63 ◽  
Author(s):  
Chun Li ◽  
Wenchao Fei ◽  
Yan Zhao ◽  
Xiaoqing Yu

1983 ◽  
Vol 3 (3) ◽  
pp. 448-456 ◽  
Author(s):  
M A Schuler ◽  
P McOsker ◽  
E B Keller

DNA sequences have been determined for two actin genes which are closely linked in the genome of the sea urchin Strongylocentrotus purpuratus. The two genes have the same 5'-3' orientation; they were apparently formed originally by tandem gene duplication. The amino acids encoded by the two genes closely resemble those of cytoplasmic actins of mammals and slime molds and differ somewhat from those of mammalian muscle actin. Actin gene 1 had been tentatively identified earlier as the gene for an embryonic cytoplasmic actin by the homology of the 3' noncoding region with that of the cDNA of an embryonic actin mRNA from S. purpuratus. The DNA sequence of gene 1 shows presumptive signals for the initiation and termination of transcription which would govern the formation of a mature mRNA of 1.9 kilobases. Both actin genes 1 and 2 have introns in their coding regions at codons 121/122 and 204. These positions for actin introns have been reported so far only in the rat, not in lower organisms. The divergence of the sequences of these coding-region introns in the two actin genes is 66%, suggesting that the genes diverged about 90 million years ago. By contrast to the introns, the coding regions have been highly conserved; the amino acids of the two genes differ by only 1.3%, and the silent sites of the codons differ by only 12%.


2013 ◽  
Vol 4 (1) ◽  
pp. 172-175
Author(s):  
Archana Verma ◽  
Mr. R.K.Bharti ◽  
Prof. R.K. Singh

DNA sequence comparison remains as one of the critical steps in the analysis of phylogenetic relationships between species. In order to get quantitative comparison, we want to devise an algorithm that would use the tabular representation of DNA sequences. The tabular approach of representation captures the essence of the base composition and distribution of the sequence. In this contribution, we take the tabular notation for DNA sequences and then these tables are compared to find the similarity/dissimilarity measure of the sequences. We have developed algorithms for comparing DNA sequences. These programs help us to search similar segments of sequences, calculate similarity scores and identify repetitions based on local sequence similarity. There are two approaches: one is to find the exact similarity and another is to find the measurement for similarity. The first approach is more sensitive, which can be used to search DNA sequence similarities only if complete matches occurred and can compare exactly similar sequences only. This approach violates if a single mismatch for any base character appears so it is not a general solution. To find the miss matches along with the matches we have suggested another approach which compiles the information matrix based on matches and miss matches. This approach is quiet general in terms of sequences which have a large fragment common with less no of dissimilar base characters. This alternate approach includes an additional step in the calculation of the similarity score that denotes multiple regions of similarity between sequences. For both these approaches computer programs are prepared and tested on data sets. These programs can be used to evaluate the significance of similarity scores using a shuffling method that preserves local sequence composition. In addition, these programs have been generalized to allow comparison of DNA sequences based on a variety of alternative scoring matrices. We have been developing tools for the analysis of protein The method is very simple and fast, and it can be used to analyze both short and long DNA sequences. The utility of this method is tested on the several sequences of species and the results are consistent with that reported.


2013 ◽  
Vol 10 (3) ◽  
pp. 31-39 ◽  
Author(s):  
Carlos A. C. Bastos ◽  
Vera Afreixo ◽  
Sara P. Garcia ◽  
Armando J. Pinho

Summary In this study we explore the potential of inter-STOP symbol distances for finding coding regions in DNA sequences. We use the distance between STOP symbols in the DNA sequence and a chi-square statistic to evaluate the nonhomogeneity of the three possible reading frames and the occurrence of one long distance in one of the frames. The results of this exploratory study suggest that inter-STOP symbol distances have strong ability to discriminate coding regions in prokaryotes and simple eukaryotes.


Sign in / Sign up

Export Citation Format

Share Document