A New Approach for DNA Sequence Similarity Analysis based on Triplets of Nucleic Acid Bases

Author(s):  
Dan Wei ◽  
Qingshan Jiang ◽  
Sheng Li

Similarity analysis of DNA sequences is a fundamental research area in Bioinformatics. The characteristic distribution of L-tuple, which is the tuple of length L, reflects the valuable information contained in a biological sequence and thus may be used in DNA sequence similarity analysis. However, similarity analysis based on characteristic distribution of L-tuple is not effective for the comparison of highly conservative sequences. In this paper, a new similarity measurement approach based on Triplets of Nucleic Acid Bases (TNAB) is introduced for DNA sequence similarity analysis. The new approach characterizes both the content feature and position feature of a DNA sequence using the frequency and position of occurrence of TNAB in the sequence. The experimental results show that the approach based on TNAB is effective for analysing DNA sequence similarity.

Symmetry ◽  
2020 ◽  
Vol 12 (12) ◽  
pp. 2090
Author(s):  
Yue Lu ◽  
Long Zhao ◽  
Zhao Li ◽  
Xiangjun Dong

Similarity analysis of DNA sequences can clarify the homology between sequences and predict the structure of, and relationship between, them. At the same time, the frequent patterns of biological sequences explain not only the genetic characteristics of the organism, but they also serve as relevant markers for certain events of biological sequences. However, most of the aforementioned biological sequence similarity analysis methods are targeted at the entire sequential pattern, which ignores the missing gene fragment that may induce potential disease. The similarity analysis of such sequences containing a missing gene item is a blank. Consequently, some sequences with missing bases are ignored or not effectively analyzed. Thus, this paper presents a new method for DNA sequence similarity analysis. Using this method, we first mined not only positive sequential patterns, but also sequential patterns that were missing some of the base terms (collectively referred to as negative sequential patterns). Subsequently, we used these frequent patterns for similarity analysis on a two-dimensional plane. Several experiments were conducted in order to verify the effectiveness of this algorithm. The experimental results demonstrated that the algorithm can obtain various results through the selection of frequent sequential patterns and that accuracy and time efficiency was improved.


Bioinformatics, which is now a well known field of study, originated in the context of biological sequence analysis. Recently graphical representation takes place for the research on DNA sequence. Research in biological sequence is mainly based on the function and its structure. Bioinformatics finds wide range of applications specifically in the domain of molecular biology which focuses on the analysis of molecules viz. DNA, RNA, Protein etc. In this review, we mainly deal with the similarity analysis between sequences and graphical representation of DNA sequence.


2011 ◽  
Vol 7 ◽  
pp. EBO.S7364 ◽  
Author(s):  
Xingqin Qi ◽  
Qin Wu ◽  
Yusen Zhang ◽  
Eddie Fuller ◽  
Cun-Quan Zhang

2008 ◽  
Vol 46 (3) ◽  
pp. 395-401 ◽  
Author(s):  
C. Meintanis ◽  
K.I. Chalkou ◽  
K. Ar. Kormas ◽  
D.S. Lymperopoulou ◽  
E.A. Katsifas ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document