Numerical Characterization of DNA Sequences for Alignment-free Sequence Comparison – A Review

Author(s):  
Natarajan Ramanathan ◽  
Jayalakshmi Ramamurthy ◽  
Ganapathy Natarajan

Background: Biological macromolecules namely, DNA, RNA, and protein have their building blocks organized in a particular sequence and the sequential arrangement encodes evolutionary history of the organism (species). Hence, biological sequences have been used for studying evolutionary relationships among the species. This is usually carried out by multiple sequence algorithms (MSA). Due to certain limitations of MSA, alignment-free sequence comparison methods were developed. The present review is on alignment-free sequence comparison methods carried out using numerical characterization of DNA sequences. Discussion: The graphical representation of DNA sequences by chaos game representation and other 2-dimesnional and 3-dimensional methods are discussed. The evolution of numerical characterization from the various graphical representations and the application of the DNA invariants thus computed in phylogenetic analysis is presented. The extension of computing molecular descriptors in chemometrics to the calculation of new set of DNA invariants and their use in alignment-free sequence comparison in a N-dimensional space and construction of phylogenetic tress is also reviewed. Conclusion: The phylogenetic tress constructed by the alignment-free sequence comparison methods using DNA invariants were found to be better than those constructed using alignment-based tools such as PHLYIP and ClustalW. One of the graphical representation methods is now extended to study viral sequences of infectious diseases for the identification of conserved regions to design peptide-based vaccine by combining numerical characterization and graphical representation.

2016 ◽  
Vol 6 (3) ◽  
pp. 63 ◽  
Author(s):  
Chun Li ◽  
Wenchao Fei ◽  
Yan Zhao ◽  
Xiaoqing Yu

2012 ◽  
Vol 2012 ◽  
pp. 1-6 ◽  
Author(s):  
Xingqin Qi ◽  
Edgar Fuller ◽  
Qin Wu ◽  
Cun-Quan Zhang

Sequence comparison is a primary technique for the analysis of DNA sequences. In order to make quantitative comparisons, one devises mathematical descriptors that capture the essence of the base composition and distribution of the sequence. Alignment methods and graphical techniques (where each sequence is represented by a curve in high-dimension Euclidean space) have been used popularly for a long time. In this contribution we will introduce a new nongraphical and nonalignment approach based on the frequencies of the dinucleotideXYin DNA sequences. The most important feature of this method is that it not only identifies adjacentXYpairs but also nonadjacentXYones whereXandYare separated by some number of nucleotides. This methodology preserves information in DNA sequence that is ignored by other methods. We test our method on the coding regions of exon-1 ofβ–globin for 11 species, and the utility of this new method is demonstrated.


2021 ◽  
Author(s):  
Dong Quan Ngoc Nguyen ◽  
Phuong Dong Tan Le ◽  
Lin Xing ◽  
Lizhen Lin

AbstractMethods for analyzing similarities among DNA sequences play a fundamental role in computational biology, and have a variety of applications in public health, and in the field of genetics. In this paper, a novel geometric and topological method for analyzing similarities among DNA sequences is developed, based on persistent homology from algebraic topology, in combination with chaos geometry in 4-dimensional space as a graphical representation of DNA sequences. Our topological framework for DNA similarity analysis is general, alignment-free, and can deal with DNA sequences of various lengths, while proving first-of-the-kind visualization features for visual inspection of DNA sequences directly, based on topological features of point clouds that represent DNA sequences. As an application, we test our methods on three datasets including genome sequences of different types of Hantavirus, Influenza A viruses, and Human Papillomavirus.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Andrzej Zielezinski ◽  
Hani Z. Girgis ◽  
Guillaume Bernard ◽  
Chris-Andre Leimeister ◽  
Kujin Tang ◽  
...  

2003 ◽  
Vol 368 (1-2) ◽  
pp. 1-6 ◽  
Author(s):  
Milan Randić ◽  
Marjan Vračko ◽  
Nella Lerš ◽  
Dejan Plavšić

2020 ◽  
Author(s):  
Yang Young Lu ◽  
Jiaxing Bai ◽  
Yiwen Wang ◽  
Ying Wang ◽  
Fengzhu Sun

AbstractMotivationRapid developments in sequencing technologies have boosted generating high volumes of sequence data. To archive and analyze those data, one primary step is sequence comparison. Alignment-free sequence comparison based on k-mer frequencies offers a computationally efficient solution, yet in practice, the k-mer frequency vectors for large k of practical interest lead to excessive memory and storage consumption.ResultsWe report CRAFT, a general genomic/metagenomic search engine to learn compact representations of sequences and perform fast comparison between DNA sequences. Specifically, given genome or high throughput sequencing (HTS) data as input, CRAFT maps the data into a much smaller embedding space and locates the best matching genome in the archived massive sequence repositories. With 102 – 104-fold reduction of storage space, CRAFT performs fast query for gigabytes of data within seconds or minutes, achieving comparable performance as six state-of-the-art alignment-free measures.AvailabilityCRAFT offers a user-friendly graphical user interface with one-click installation on Windows and Linux operating systems, freely available at https://github.com/jiaxingbai/[email protected]; [email protected] informationSupplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document