Numerical Characterization of DNA Sequences for Alignment-free Sequence Comparison – A Review

Dimensional Space ◽

Building Blocks ◽

Chaos Game Representation ◽

Alignment Free ◽

Comparison Methods ◽

Background: Biological macromolecules namely, DNA, RNA, and protein have their building blocks organized in a particular sequence and the sequential arrangement encodes evolutionary history of the organism (species). Hence, biological sequences have been used for studying evolutionary relationships among the species. This is usually carried out by multiple sequence algorithms (MSA). Due to certain limitations of MSA, alignment-free sequence comparison methods were developed. The present review is on alignment-free sequence comparison methods carried out using numerical characterization of DNA sequences. Discussion: The graphical representation of DNA sequences by chaos game representation and other 2-dimesnional and 3-dimensional methods are discussed. The evolution of numerical characterization from the various graphical representations and the application of the DNA invariants thus computed in phylogenetic analysis is presented. The extension of computing molecular descriptors in chemometrics to the calculation of new set of DNA invariants and their use in alignment-free sequence comparison in a N-dimensional space and construction of phylogenetic tress is also reviewed. Conclusion: The phylogenetic tress constructed by the alignment-free sequence comparison methods using DNA invariants were found to be better than those constructed using alignment-based tools such as PHLYIP and ClustalW. One of the graphical representation methods is now extended to study viral sequences of infectious diseases for the identification of conserved regions to design peptide-based vaccine by combining numerical characterization and graphical representation.

2010 IEEE Fifth International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA) ◽

A novel non-degenerate 2D graphical representation and numerical characterization of DNA sequences

10.1109/bicta.2010.5645339 ◽

2010 ◽

Author(s):

Shaohong Zhong ◽

Yachun Liu ◽

Renfa Li ◽

Lili Pan

Keyword(s):

Dna Sequences ◽

Novel Graphical Representation and Numerical Characterization of DNA Sequences

Applied Sciences ◽

10.3390/app6030063 ◽

2016 ◽

Vol 6 (3) ◽

pp. 63 ◽

Cited By ~ 5

Author(s):

Chun Li ◽

Wenchao Fei ◽

Yan Zhao ◽

Xiaoqing Yu

Keyword(s):

Dna Sequences ◽

Numerical characterization of DNA sequences in a 2-D graphical representation scheme of low degeneracy

Chemical Physics Letters ◽

10.1016/s0009-2614(02)02029-8 ◽

2003 ◽

Vol 369 (3-4) ◽

pp. 361-366 ◽

Cited By ~ 37

Author(s):

Xiaofeng Guo ◽

Ashesh Nandy

Keyword(s):

Dna Sequences ◽

Numerical Characterization ◽

Representation Scheme

New Approaches to Drug-DNA Interactions Based on Graphical Representation and Numerical Characterization of DNA Sequences

Current Computer - Aided Drug Design ◽

10.2174/1573409911006040283 ◽

2010 ◽

Vol 6 (4) ◽

pp. 283-289 ◽

Cited By ~ 5

Author(s):

Ashesh Nandy ◽

Subhash C. Basak

Keyword(s):

Dna Sequences ◽

Dna Interactions ◽

New Approaches ◽

Numerical Characterization of DNA Sequence Based on Dinucleotides

The Scientific World JOURNAL ◽

10.1100/2012/104269 ◽

2012 ◽

Vol 2012 ◽

pp. 1-6 ◽

Cited By ~ 2

Author(s):

Xingqin Qi ◽

Edgar Fuller ◽

Qin Wu ◽

Cun-Quan Zhang

Keyword(s):

Euclidean Space ◽

Dna Sequence ◽

Dna Sequences ◽

Sequence Comparison ◽

Base Composition ◽

Coding Regions ◽

Numerical Characterization ◽

Long Time ◽

Exon 1

Sequence comparison is a primary technique for the analysis of DNA sequences. In order to make quantitative comparisons, one devises mathematical descriptors that capture the essence of the base composition and distribution of the sequence. Alignment methods and graphical techniques (where each sequence is represented by a curve in high-dimension Euclidean space) have been used popularly for a long time. In this contribution we will introduce a new nongraphical and nonalignment approach based on the frequencies of the dinucleotideXYin DNA sequences. The most important feature of this method is that it not only identifies adjacentXYpairs but also nonadjacentXYones whereXandYare separated by some number of nucleotides. This methodology preserves information in DNA sequence that is ignored by other methods. We test our method on the coding regions of exon-1 ofβ–globin for 11 species, and the utility of this new method is demonstrated.

A topological characterization of DNA sequences based on chaos geometry and persistent homology

10.1101/2021.01.31.429071 ◽

2021 ◽

Author(s):

Dong Quan Ngoc Nguyen ◽

Phuong Dong Tan Le ◽

Lin Xing ◽

Lizhen Lin

Keyword(s):

Dna Sequences ◽

Algebraic Topology ◽

Influenza A ◽

Dimensional Space ◽

Persistent Homology ◽

Point Clouds ◽

Influenza A Viruses ◽

Topological Characterization ◽

Topological Features

AbstractMethods for analyzing similarities among DNA sequences play a fundamental role in computational biology, and have a variety of applications in public health, and in the field of genetics. In this paper, a novel geometric and topological method for analyzing similarities among DNA sequences is developed, based on persistent homology from algebraic topology, in combination with chaos geometry in 4-dimensional space as a graphical representation of DNA sequences. Our topological framework for DNA similarity analysis is general, alignment-free, and can deal with DNA sequences of various lengths, while proving first-of-the-kind visualization features for visual inspection of DNA sequences directly, based on topological features of point clouds that represent DNA sequences. As an application, we test our methods on three datasets including genome sequences of different types of Hantavirus, Influenza A viruses, and Human Papillomavirus.

Benchmarking of alignment-free sequence comparison methods

Genome Biology ◽

10.1186/s13059-019-1755-7 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 39

Author(s):

Andrzej Zielezinski ◽

Hani Z. Girgis ◽

Guillaume Bernard ◽

Chris-Andre Leimeister ◽

Kujin Tang ◽

...

Keyword(s):

Sequence Comparison ◽

Alignment Free ◽

Comparison Methods

Novel 2-D graphical representation of DNA sequences and their numerical characterization

Chemical Physics Letters ◽

10.1016/s0009-2614(02)01784-0 ◽

2003 ◽

Vol 368 (1-2) ◽

pp. 1-6 ◽

Cited By ~ 210

Author(s):

Milan Randić ◽

Marjan Vračko ◽

Nella Lerš ◽

Dejan Plavšić

Keyword(s):

Dna Sequences ◽

CRAFT: Compact genome Representation towards large-scale Alignment-Free daTabase

10.1101/2020.07.10.196741 ◽

2020 ◽

Author(s):

Yang Young Lu ◽

Jiaxing Bai ◽

Yiwen Wang ◽

Ying Wang ◽

Fengzhu Sun

Keyword(s):

Dna Sequences ◽

Sequence Comparison ◽

Large Scale ◽

High Throughput Sequencing ◽

Sequence Data ◽

Practical Interest ◽

Supplementary Information ◽

Computationally Efficient ◽

Sequencing Technologies ◽

Alignment Free

AbstractMotivationRapid developments in sequencing technologies have boosted generating high volumes of sequence data. To archive and analyze those data, one primary step is sequence comparison. Alignment-free sequence comparison based on k-mer frequencies offers a computationally efficient solution, yet in practice, the k-mer frequency vectors for large k of practical interest lead to excessive memory and storage consumption.ResultsWe report CRAFT, a general genomic/metagenomic search engine to learn compact representations of sequences and perform fast comparison between DNA sequences. Specifically, given genome or high throughput sequencing (HTS) data as input, CRAFT maps the data into a much smaller embedding space and locates the best matching genome in the archived massive sequence repositories. With 102 – 104-fold reduction of storage space, CRAFT performs fast query for gigabytes of data within seconds or minutes, achieving comparable performance as six state-of-the-art alignment-free measures.AvailabilityCRAFT offers a user-friendly graphical user interface with one-click installation on Windows and Linux operating systems, freely available at https://github.com/jiaxingbai/[email protected]; [email protected] informationSupplementary data are available at Bioinformatics online.

Numerical characterization of DNA sequences based on thek-step Markov chain transition probability

Journal of Computational Chemistry ◽

10.1002/jcc.20471 ◽

2006 ◽

Vol 27 (15) ◽

pp. 1830-1842 ◽

Cited By ~ 10

Author(s):

Qi Dai ◽

Xiao-Qing Liu ◽

Tian-Ming Wang

Keyword(s):

Markov Chain ◽

Dna Sequences ◽

Transition Probability ◽