F1000Research TMATCH: A New Algorithm for Protein Alignments using amino-acid hydrophobicities

AbstractThe identification of proteins of similar structure using sequence alignment is an important problem in bioinformatics. We decribe TMATCH, a basic dynamic programming alignment algorithm which can rapidly identify proteins of similar structure from a database. TMATCH was developed to utilize an optimal hydrophobicity metric for alignments traceable to fundamental properties of amino-acids. Standard alignment algorithms use affine gap penalties as contrasted with the TMATCH algorithm adaptation of local alignment score reinforcement of favorable diagonal paths (transitions) and punishment of unfavorable transitions paired with fixed gap opening penalties. The TMATCH algorithm is especially designed to take advantage of the extra information available within the hydrophobicity scale to detect homologies, as opposed to the probabilities derived from raw percent identities.

Download Full-text

Acceleration of Nucleotide Semi-Global Alignment with Adaptive Banded Dynamic Programming

10.1101/130633 ◽

2017 ◽

Cited By ~ 9

Author(s):

Hajime Suzuki ◽

Masahiro Kasahara

Keyword(s):

Dynamic Programming ◽

Single Molecule ◽

Computation Time ◽

Error Rates ◽

Nucleotide Sequences ◽

Sequencing Error ◽

Local Alignment ◽

Global Alignment ◽

Alignment Algorithm ◽

Short Read

AbstractMotivationPairwise alignment of nucleotide sequences has previously been carried out using the seed- and-extend strategy, where we enumerate seeds (shared patterns) between sequences and then extend the seeds by Smith-Waterman-like semi-global dynamic programming to obtain full pairwise alignments. With the advent of massively parallel short read sequencers, algorithms and data structures for efficiently finding seeds have been extensively explored. However, recent advances in single-molecule sequencing technologies have enabled us to obtain millions of reads, each of which is orders of magnitude longer than those output by the short-read sequencers, demanding a faster algorithm for the extension step that accounts for most of the computation time required for pairwise local alignment. Our goal is to design a faster extension algorithm suitable for single-molecule sequencers with high sequencing error rates (e.g., 10-15%) and with more frequent insertions and deletions than substitutions.ResultsWe propose an adaptive banded dynamic programming algorithm for calculating pairwise semi-global alignment of nucleotide sequences that allows a relatively high insertion or deletion rate while keeping band width relatively low (e.g., 32 or 64 cells) regardless of sequence lengths. Our new algorithm eliminated mutual dependences between elements in a vector, allowing an efficient Single-Instruction-Multiple-Data parallelization. We experimentally demonstrate that our algorithm runs approximately 5× faster than the extension alignment algorithm in NCBI BLAST+ while retaining similar sensitivity (recall).We also show that our extension algorithm is more sensitive than the extension alignment routine in DALIGNER, while the computation time is comparable.AvailabilityThe implementation of the algorithm and the benchmarking scripts are available at https://github.com/ocxtal/[email protected]

Download Full-text

A survey and evaluations of histogram-based statistics in alignment-free sequence comparison

Briefings in Bioinformatics ◽

10.1093/bib/bbx161 ◽

2017 ◽

Vol 20 (4) ◽

pp. 1222-1237 ◽

Cited By ~ 10

Author(s):

Brian B Luczak ◽

Benjamin T James ◽

Hani Z Girgis

Keyword(s):

Query Sequence ◽

Sequence Length ◽

Local Alignment ◽

Global Alignment ◽

Alignment Algorithm ◽

Earth Mover’S Distance ◽

Earth Mover's Distance ◽

Alignment Free ◽

Length Difference ◽

Alignment Algorithms

Abstract Motivation Since the dawn of the bioinformatics field, sequence alignment scores have been the main method for comparing sequences. However, alignment algorithms are quadratic, requiring long execution time. As alternatives, scientists have developed tens of alignment-free statistics for measuring the similarity between two sequences. Results We surveyed tens of alignment-free k-mer statistics. Additionally, we evaluated 33 statistics and multiplicative combinations between the statistics and/or their squares. These statistics are calculated on two k-mer histograms representing two sequences. Our evaluations using global alignment scores revealed that the majority of the statistics are sensitive and capable of finding similar sequences to a query sequence. Therefore, any of these statistics can filter out dissimilar sequences quickly. Further, we observed that multiplicative combinations of the statistics are highly correlated with the identity score. Furthermore, combinations involving sequence length difference or Earth Mover’s distance, which takes the length difference into account, are always among the highest correlated paired statistics with identity scores. Similarly, paired statistics including length difference or Earth Mover’s distance are among the best performers in finding the K-closest sequences. Interestingly, similar performance can be obtained using histograms of shorter words, resulting in reducing the memory requirement and increasing the speed remarkably. Moreover, we found that simple single statistics are sufficient for processing next-generation sequencing reads and for applications relying on local alignment. Finally, we measured the time requirement of each statistic. The survey and the evaluations will help scientists with identifying efficient alternatives to the costly alignment algorithm, saving thousands of computational hours. Availability The source code of the benchmarking tool is available as Supplementary Materials.

Download Full-text

LePrimAlign: local entropy-based alignment of PPI networks to predict conserved modules

BMC Genomics ◽

10.1186/s12864-019-6271-3 ◽

2019 ◽

Vol 20 (S9) ◽

Cited By ~ 1

Author(s):

Sawal Maskey ◽

Young-Rae Cho

Keyword(s):

Computational Cost ◽

Network Alignment ◽

System Level ◽

Local Network ◽

Local Alignment ◽

Global Alignment ◽

Alignment Algorithm ◽

Interaction Patterns ◽

Ppi Networks ◽

Alignment Algorithms

Abstract Background Cross-species analysis of protein-protein interaction (PPI) networks provides an effective means of detecting conserved interaction patterns. Identifying such conserved substructures between PPI networks of different species increases our understanding of the principles deriving evolution of cellular organizations and their functions in a system level. In recent years, network alignment techniques have been applied to genome-scale PPI networks to predict evolutionary conserved modules. Although a wide variety of network alignment algorithms have been introduced, developing a scalable local network alignment algorithm with high accuracy is still challenging. Results We present a novel pairwise local network alignment algorithm, called LePrimAlign, to predict conserved modules between PPI networks of three different species. The proposed algorithm exploits the results of a pairwise global alignment algorithm with many-to-many node mapping. It also applies the concept of graph entropy to detect initial cluster pairs from two networks. Finally, the initial clusters are expanded to increase the local alignment score that is formulated by a combination of intra-network and inter-network scores. The performance comparison with state-of-the-art approaches demonstrates that the proposed algorithm outperforms in terms of accuracy of identified protein complexes and quality of alignments. Conclusion The proposed method produces local network alignment of higher accuracy in predicting conserved modules even with large biological networks at a reduced computational cost.

Download Full-text

MATCHING ALGORITHM USING WAVELET THINNING FEATURES FOR OFFLINE SIGNATURE VERIFICATION

International Journal of Wavelets Multiresolution and Information Processing ◽

10.1142/s021969130700163x ◽

2007 ◽

Vol 05 (01) ◽

pp. 27-38 ◽

Cited By ~ 4

Author(s):

BIN FANG ◽

XINGE YOU ◽

WEN-SHENG CHEN ◽

YUAN YAN TANG

Keyword(s):

Similarity Measurement ◽

Local Alignment ◽

Alignment Algorithm ◽

Affine Model ◽

Alignment Algorithms ◽

Offline Signature Verification ◽

Training Samples ◽

Limited Training Samples ◽

Global And Local ◽

Discriminatory Information

Structure distortion evaluation allows us to directly measure the similarity between signature patterns without classification using feature vectors, which usually suffers from limited training samples. In this paper, we incorporate the merits of both global and local alignment algorithms to define structure distortion using signature skeletons identified by a robust wavelet thinning technique. A weak affine model is employed to globally register two signature skeletons and structure distortion between two signature patterns, which are determined by applying an elastic local alignment algorithm. Similarity measurement is evaluated in the form of Euclidean distance of all found corresponding feature points. Experimental results showed that the proposed similarity measurement was able to provide sufficient discriminatory information in terms of equal error rate being 18.6% with four training samples.

Download Full-text

Fast DNA Sequence Alignment Algorithm Based on Quality Score Using Improved Dynamic Programming and Fuzzy Gap Cost Control

Current Bioinformatics ◽

10.2174/1574893609666140523000227 ◽

2014 ◽

Vol 9 (5) ◽

pp. 540-547

Author(s):

Kwang Kim ◽

Hyun Park ◽

Doo Song

Keyword(s):

Dynamic Programming ◽

Dna Sequence ◽

Sequence Alignment ◽

Cost Control ◽

Quality Score ◽

Alignment Algorithm ◽

Sequence Alignment Algorithm ◽

Dna Sequence Alignment ◽

Improved Dynamic Programming

Download Full-text

A dynamic alignment algorithm for imperfect speech and transcript

Computer Science and Information Systems ◽

10.2298/csis1001075t ◽

2010 ◽

Vol 7 (1) ◽

pp. 75-84 ◽

Cited By ~ 4

Author(s):

Ye Tao ◽

Li Xueqing ◽

Wu Bian

Keyword(s):

Dynamic Programming ◽

Boundary Detection ◽

Multimedia Content ◽

Alignment Algorithm ◽

Optimal Alignment ◽

Multi Stage ◽

Sentence Level ◽

Sentence Boundary ◽

English Training ◽

Dynamic Alignment

This paper presents a novel alignment approach for imperfect speech and the corresponding transcription. The algorithm gets started with multi-stage sentence boundary detection in audio, followed by a dynamic programming based search, to find the optimal alignment and detect the mismatches at sentence level. Experiments show promising performance, compared to the traditional forced alignment approach. The proposed algorithm has already been applied in preparing multimedia content for an online English training platform.

Download Full-text

Exploring high reliable substructures in auto-reconstructions of a neuron

10.21203/rs.3.rs-615483/v1 ◽

2021 ◽

Author(s):

Yishan He ◽

Jiajin Huang ◽

Gaowei Wu ◽

Jian Yang

Keyword(s):

State Of The Art ◽

Recall Rate ◽

Local Alignment ◽

Alignment Algorithm ◽

Neuron Tracing ◽

Digital Reconstruction ◽

High Recall Rate ◽

Multiple Species ◽

Multiple Reference ◽

Precision Rate

Abstract The digital reconstruction of a neuron is the most direct and effective way to investigate its morphology. Many automatic neuron tracing methods have been proposed, but without manual check it is difficult to know whether a reconstruction or which substructure in a reconstruction is accurate. For a neuron’s reconstructions generated by multiple automatic tracing methods with different principles or models, their common substructures are highly reliable and named individual motifs. In this work, we propose a Vaa3D based method called Lamotif to explore individual motifs in automatic reconstructions of a neuron. Lamotif utilizes the local alignment algorithm in BlastNeuron to extract local alignment pairs between a specified objective reconstruction and multiple reference reconstructions, and combines these pairs to generate individual motifs on the objective reconstruction. The proposed Lamotif is evaluated on reconstructions of 163 multiple species neurons, which are generated by four state-of-the-art tracing methods. Experimental results show that individual motifs are almost on corresponding gold standard reconstructions and have much higher precision rate than objective reconstructions themselves. Furthermore, an objective reconstruction is mostly quite accurate if its individual motifs have high recall rate. Individual motifs contain common geometry substructures in multiple reconstructions, and can be used to select some accurate substructures from a reconstruction or some accurate reconstructions from automatic reconstruction dataset of different neurons.

Download Full-text

MONACO: accurate biological network alignment through optimal neighborhood matching between focal nodes

Bioinformatics ◽

10.1093/bioinformatics/btaa962 ◽

2020 ◽

Author(s):

Hyun-Myung Woo ◽

Byung-Jun Yoon

Keyword(s):

Biological Network ◽

Protein Complexes ◽

Ground Truth ◽

Network Alignment ◽

Supplementary Information ◽

Alignment Algorithm ◽

Computationally Efficient ◽

Topological Similarity ◽

Alignment Algorithms ◽

Multiple Network

Abstract Motivation Alignment of protein–protein interaction networks can be used for the unsupervised prediction of functional modules, such as protein complexes and signaling pathways, that are conserved across different species. To date, various algorithms have been proposed for biological network alignment, many of which attempt to incorporate topological similarity between the networks into the alignment process with the goal of constructing accurate and biologically meaningful alignments. Especially, random walk models have been shown to be effective for quantifying the global topological relatedness between nodes that belong to different networks by diffusing node-level similarity along the interaction edges. However, these schemes are not ideal for capturing the local topological similarity between nodes. Results In this article, we propose MONACO, a novel and versatile network alignment algorithm that finds highly accurate pairwise and multiple network alignments through the iterative optimal matching of ‘local’ neighborhoods around focal nodes. Extensive performance assessment based on real networks as well as synthetic networks, for which the ground truth is known, demonstrates that MONACO clearly and consistently outperforms all other state-of-the-art network alignment algorithms that we have tested, in terms of accuracy, coherence and topological quality of the aligned network regions. Furthermore, despite the sharply enhanced alignment accuracy, MONACO remains computationally efficient and it scales well with increasing size and number of networks. Availability and implementation Matlab implementation is freely available at https://github.com/bjyoontamu/MONACO. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Global peak alignment for comprehensive two-dimensional gas chromatography mass spectrometry using point matching algorithms

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720016500323 ◽

2016 ◽

Vol 14 (06) ◽

pp. 1650032 ◽

Cited By ~ 3

Author(s):

Beichuan Deng ◽

Seongho Kim ◽

Hengguang LI ◽

Elisabeth Heath ◽

Xiang Zhang

Keyword(s):

Mass Spectrometry ◽

Gas Chromatography ◽

Matrix Effects ◽

Heterogeneous Data ◽

Local Alignment ◽

Alignment Algorithm ◽

Two Dimensional ◽

Point Matching ◽

Peak Alignment ◽

Global Comparison

Comprehensive two-dimensional gas chromatography coupled with mass spectrometry (GC[Formula: see text][Formula: see text][Formula: see text]GC-MS) has been used to analyze multiple samples in a metabolomics study. However, due to some uncontrollable experimental conditions, such as the differences in temperature or pressure, matrix effects on samples and stationary phase degradation, there is always a shift of retention times in the two GC columns between samples. In order to correct the retention time shifts in GC[Formula: see text][Formula: see text][Formula: see text]GC-MS, the peak alignment is a crucial data analysis step to recognize the peaks generated by the same metabolite in different samples. Two approaches have been developed for GC[Formula: see text][Formula: see text][Formula: see text]GC-MS data alignment: profile alignment and peak matching alignment. However, these existing alignment methods are all based on a local alignment, resulting that a peak may not be correctly aligned in a dense chromatographic region where many peaks are present in a small region. False alignment will result in false discovery in the downstream statistical analysis. We, therefore, develop a global comparison-based peak alignment method using point matching algorithm (PMA-PA) for both homogeneous and heterogeneous data. The developed algorithm PMA-PA first extracts feature points (peaks) in the chromatography and then searches globally the matching peaks in the consecutive chromatography by adopting the projection of rigid and nonrigid transformation. PMA-PA is further applied to two real experimental data sets, showing that PMA-PA is a promising peak alignment algorithm for both homogenous and heterogeneous data in terms of [Formula: see text]1 score, although it uses only peak location information.

Download Full-text

LOCAL SEQUENCE-STRUCTURE MOTIFS IN RNA

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720004000818 ◽

2004 ◽

Vol 02 (04) ◽

pp. 681-698 ◽

Cited By ~ 28

Author(s):

ROLF BACKOFEN ◽

SEBASTIAN WILL

Keyword(s):

Information Structure ◽

Structure Alignment ◽

General Definition ◽

Local Alignment ◽

Sequence Information ◽

Sequence Structure ◽

Worst Case ◽

Rna Molecules ◽

Alignment Algorithms ◽

Local Sequence

Ribonuclic acid (RNA) enjoys increasing interest in molecular biology; despite this interest fundamental algorithms are lacking, e.g. for identifying local motifs. As proteins, RNA molecules have a distinctive structure. Therefore, in addition to sequence information, structure plays an important part in assessing the similarity of RNAs. Furthermore, common sequence-structure features in two or several RNA molecules are often only spatially local, where possibly large parts of the molecules are dissimilar. Consequently, we address the problem of comparing RNA molecules by computing an optimal local alignment with respect to sequence and structure information. While local alignment is superior to global alignment for identifying local similarities, no general local sequence-structure alignment algorithms are currently known. We suggest a new general definition of locality for sequence-structure alignments that is biologically motivated and efficiently tractable. To show the former, we discuss locality of RNA and prove that the defined locality means connectivity by atomic and non-atomic bonds. To show the latter, we present an efficient algorithm for the newly defined pairwise local sequence-structure alignment (lssa) problem for RNA. For molecules of lengthes n and m, the algorithm has worst-case time complexity of O(n2·m2· max (n,m)) and a space complexity of only O(n·m). An implementation of our algorithm is available at . Its runtime is competitive with global sequence-structure alignment.

Download Full-text