scholarly journals F1000Research TMATCH: A New Algorithm for Protein Alignments using amino-acid hydrophobicities

Author(s):  
David Cavanaugh ◽  
Krishnan Chittur

AbstractThe identification of proteins of similar structure using sequence alignment is an important problem in bioinformatics. We decribe TMATCH, a basic dynamic programming alignment algorithm which can rapidly identify proteins of similar structure from a database. TMATCH was developed to utilize an optimal hydrophobicity metric for alignments traceable to fundamental properties of amino-acids. Standard alignment algorithms use affine gap penalties as contrasted with the TMATCH algorithm adaptation of local alignment score reinforcement of favorable diagonal paths (transitions) and punishment of unfavorable transitions paired with fixed gap opening penalties. The TMATCH algorithm is especially designed to take advantage of the extra information available within the hydrophobicity scale to detect homologies, as opposed to the probabilities derived from raw percent identities.

2017 ◽  
Author(s):  
Hajime Suzuki ◽  
Masahiro Kasahara

AbstractMotivationPairwise alignment of nucleotide sequences has previously been carried out using the seed- and-extend strategy, where we enumerate seeds (shared patterns) between sequences and then extend the seeds by Smith-Waterman-like semi-global dynamic programming to obtain full pairwise alignments. With the advent of massively parallel short read sequencers, algorithms and data structures for efficiently finding seeds have been extensively explored. However, recent advances in single-molecule sequencing technologies have enabled us to obtain millions of reads, each of which is orders of magnitude longer than those output by the short-read sequencers, demanding a faster algorithm for the extension step that accounts for most of the computation time required for pairwise local alignment. Our goal is to design a faster extension algorithm suitable for single-molecule sequencers with high sequencing error rates (e.g., 10-15%) and with more frequent insertions and deletions than substitutions.ResultsWe propose an adaptive banded dynamic programming algorithm for calculating pairwise semi-global alignment of nucleotide sequences that allows a relatively high insertion or deletion rate while keeping band width relatively low (e.g., 32 or 64 cells) regardless of sequence lengths. Our new algorithm eliminated mutual dependences between elements in a vector, allowing an efficient Single-Instruction-Multiple-Data parallelization. We experimentally demonstrate that our algorithm runs approximately 5× faster than the extension alignment algorithm in NCBI BLAST+ while retaining similar sensitivity (recall).We also show that our extension algorithm is more sensitive than the extension alignment routine in DALIGNER, while the computation time is comparable.AvailabilityThe implementation of the algorithm and the benchmarking scripts are available at https://github.com/ocxtal/[email protected]


2017 ◽  
Vol 20 (4) ◽  
pp. 1222-1237 ◽  
Author(s):  
Brian B Luczak ◽  
Benjamin T James ◽  
Hani Z Girgis

Abstract Motivation Since the dawn of the bioinformatics field, sequence alignment scores have been the main method for comparing sequences. However, alignment algorithms are quadratic, requiring long execution time. As alternatives, scientists have developed tens of alignment-free statistics for measuring the similarity between two sequences. Results We surveyed tens of alignment-free k-mer statistics. Additionally, we evaluated 33 statistics and multiplicative combinations between the statistics and/or their squares. These statistics are calculated on two k-mer histograms representing two sequences. Our evaluations using global alignment scores revealed that the majority of the statistics are sensitive and capable of finding similar sequences to a query sequence. Therefore, any of these statistics can filter out dissimilar sequences quickly. Further, we observed that multiplicative combinations of the statistics are highly correlated with the identity score. Furthermore, combinations involving sequence length difference or Earth Mover’s distance, which takes the length difference into account, are always among the highest correlated paired statistics with identity scores. Similarly, paired statistics including length difference or Earth Mover’s distance are among the best performers in finding the K-closest sequences. Interestingly, similar performance can be obtained using histograms of shorter words, resulting in reducing the memory requirement and increasing the speed remarkably. Moreover, we found that simple single statistics are sufficient for processing next-generation sequencing reads and for applications relying on local alignment. Finally, we measured the time requirement of each statistic. The survey and the evaluations will help scientists with identifying efficient alternatives to the costly alignment algorithm, saving thousands of computational hours. Availability The source code of the benchmarking tool is available as Supplementary Materials.


BMC Genomics ◽  
2019 ◽  
Vol 20 (S9) ◽  
Author(s):  
Sawal Maskey ◽  
Young-Rae Cho

Abstract Background Cross-species analysis of protein-protein interaction (PPI) networks provides an effective means of detecting conserved interaction patterns. Identifying such conserved substructures between PPI networks of different species increases our understanding of the principles deriving evolution of cellular organizations and their functions in a system level. In recent years, network alignment techniques have been applied to genome-scale PPI networks to predict evolutionary conserved modules. Although a wide variety of network alignment algorithms have been introduced, developing a scalable local network alignment algorithm with high accuracy is still challenging. Results We present a novel pairwise local network alignment algorithm, called LePrimAlign, to predict conserved modules between PPI networks of three different species. The proposed algorithm exploits the results of a pairwise global alignment algorithm with many-to-many node mapping. It also applies the concept of graph entropy to detect initial cluster pairs from two networks. Finally, the initial clusters are expanded to increase the local alignment score that is formulated by a combination of intra-network and inter-network scores. The performance comparison with state-of-the-art approaches demonstrates that the proposed algorithm outperforms in terms of accuracy of identified protein complexes and quality of alignments. Conclusion The proposed method produces local network alignment of higher accuracy in predicting conserved modules even with large biological networks at a reduced computational cost.


Author(s):  
BIN FANG ◽  
XINGE YOU ◽  
WEN-SHENG CHEN ◽  
YUAN YAN TANG

Structure distortion evaluation allows us to directly measure the similarity between signature patterns without classification using feature vectors, which usually suffers from limited training samples. In this paper, we incorporate the merits of both global and local alignment algorithms to define structure distortion using signature skeletons identified by a robust wavelet thinning technique. A weak affine model is employed to globally register two signature skeletons and structure distortion between two signature patterns, which are determined by applying an elastic local alignment algorithm. Similarity measurement is evaluated in the form of Euclidean distance of all found corresponding feature points. Experimental results showed that the proposed similarity measurement was able to provide sufficient discriminatory information in terms of equal error rate being 18.6% with four training samples.


2010 ◽  
Vol 7 (1) ◽  
pp. 75-84 ◽  
Author(s):  
Ye Tao ◽  
Li Xueqing ◽  
Wu Bian

This paper presents a novel alignment approach for imperfect speech and the corresponding transcription. The algorithm gets started with multi-stage sentence boundary detection in audio, followed by a dynamic programming based search, to find the optimal alignment and detect the mismatches at sentence level. Experiments show promising performance, compared to the traditional forced alignment approach. The proposed algorithm has already been applied in preparing multimedia content for an online English training platform.


2021 ◽  
Author(s):  
Yishan He ◽  
Jiajin Huang ◽  
Gaowei Wu ◽  
Jian Yang

Abstract The digital reconstruction of a neuron is the most direct and effective way to investigate its morphology. Many automatic neuron tracing methods have been proposed, but without manual check it is difficult to know whether a reconstruction or which substructure in a reconstruction is accurate. For a neuron’s reconstructions generated by multiple automatic tracing methods with different principles or models, their common substructures are highly reliable and named individual motifs. In this work, we propose a Vaa3D based method called Lamotif to explore individual motifs in automatic reconstructions of a neuron. Lamotif utilizes the local alignment algorithm in BlastNeuron to extract local alignment pairs between a specified objective reconstruction and multiple reference reconstructions, and combines these pairs to generate individual motifs on the objective reconstruction. The proposed Lamotif is evaluated on reconstructions of 163 multiple species neurons, which are generated by four state-of-the-art tracing methods. Experimental results show that individual motifs are almost on corresponding gold standard reconstructions and have much higher precision rate than objective reconstructions themselves. Furthermore, an objective reconstruction is mostly quite accurate if its individual motifs have high recall rate. Individual motifs contain common geometry substructures in multiple reconstructions, and can be used to select some accurate substructures from a reconstruction or some accurate reconstructions from automatic reconstruction dataset of different neurons.


Author(s):  
Hyun-Myung Woo ◽  
Byung-Jun Yoon

Abstract Motivation Alignment of protein–protein interaction networks can be used for the unsupervised prediction of functional modules, such as protein complexes and signaling pathways, that are conserved across different species. To date, various algorithms have been proposed for biological network alignment, many of which attempt to incorporate topological similarity between the networks into the alignment process with the goal of constructing accurate and biologically meaningful alignments. Especially, random walk models have been shown to be effective for quantifying the global topological relatedness between nodes that belong to different networks by diffusing node-level similarity along the interaction edges. However, these schemes are not ideal for capturing the local topological similarity between nodes. Results In this article, we propose MONACO, a novel and versatile network alignment algorithm that finds highly accurate pairwise and multiple network alignments through the iterative optimal matching of ‘local’ neighborhoods around focal nodes. Extensive performance assessment based on real networks as well as synthetic networks, for which the ground truth is known, demonstrates that MONACO clearly and consistently outperforms all other state-of-the-art network alignment algorithms that we have tested, in terms of accuracy, coherence and topological quality of the aligned network regions. Furthermore, despite the sharply enhanced alignment accuracy, MONACO remains computationally efficient and it scales well with increasing size and number of networks. Availability and implementation Matlab implementation is freely available at https://github.com/bjyoontamu/MONACO. Supplementary information Supplementary data are available at Bioinformatics online.


2016 ◽  
Vol 14 (06) ◽  
pp. 1650032 ◽  
Author(s):  
Beichuan Deng ◽  
Seongho Kim ◽  
Hengguang LI ◽  
Elisabeth Heath ◽  
Xiang Zhang

Comprehensive two-dimensional gas chromatography coupled with mass spectrometry (GC[Formula: see text][Formula: see text][Formula: see text]GC-MS) has been used to analyze multiple samples in a metabolomics study. However, due to some uncontrollable experimental conditions, such as the differences in temperature or pressure, matrix effects on samples and stationary phase degradation, there is always a shift of retention times in the two GC columns between samples. In order to correct the retention time shifts in GC[Formula: see text][Formula: see text][Formula: see text]GC-MS, the peak alignment is a crucial data analysis step to recognize the peaks generated by the same metabolite in different samples. Two approaches have been developed for GC[Formula: see text][Formula: see text][Formula: see text]GC-MS data alignment: profile alignment and peak matching alignment. However, these existing alignment methods are all based on a local alignment, resulting that a peak may not be correctly aligned in a dense chromatographic region where many peaks are present in a small region. False alignment will result in false discovery in the downstream statistical analysis. We, therefore, develop a global comparison-based peak alignment method using point matching algorithm (PMA-PA) for both homogeneous and heterogeneous data. The developed algorithm PMA-PA first extracts feature points (peaks) in the chromatography and then searches globally the matching peaks in the consecutive chromatography by adopting the projection of rigid and nonrigid transformation. PMA-PA is further applied to two real experimental data sets, showing that PMA-PA is a promising peak alignment algorithm for both homogenous and heterogeneous data in terms of [Formula: see text]1 score, although it uses only peak location information.


2004 ◽  
Vol 02 (04) ◽  
pp. 681-698 ◽  
Author(s):  
ROLF BACKOFEN ◽  
SEBASTIAN WILL

Ribonuclic acid (RNA) enjoys increasing interest in molecular biology; despite this interest fundamental algorithms are lacking, e.g. for identifying local motifs. As proteins, RNA molecules have a distinctive structure. Therefore, in addition to sequence information, structure plays an important part in assessing the similarity of RNAs. Furthermore, common sequence-structure features in two or several RNA molecules are often only spatially local, where possibly large parts of the molecules are dissimilar. Consequently, we address the problem of comparing RNA molecules by computing an optimal local alignment with respect to sequence and structure information. While local alignment is superior to global alignment for identifying local similarities, no general local sequence-structure alignment algorithms are currently known. We suggest a new general definition of locality for sequence-structure alignments that is biologically motivated and efficiently tractable. To show the former, we discuss locality of RNA and prove that the defined locality means connectivity by atomic and non-atomic bonds. To show the latter, we present an efficient algorithm for the newly defined pairwise local sequence-structure alignment (lssa) problem for RNA. For molecules of lengthes n and m, the algorithm has worst-case time complexity of O(n2·m2· max (n,m)) and a space complexity of only O(n·m). An implementation of our algorithm is available at . Its runtime is competitive with global sequence-structure alignment.


Sign in / Sign up

Export Citation Format

Share Document