Sequence Alignment Algorithms in Hardware Implementation: A Systematic Mapping of the Literature

As a key algorithm in bioinformatics, sequence alignment algorithm is widely used in sequence similarity analysis and genome sequence database search. Existing research focuses mainly on the specific steps of the algorithm or is for specific problems, lack of high-level abstract domain algorithm framework. Multiple sequence alignment algorithms are more complex, redundant, and difficult to understand, and it is not easy for users to select the appropriate algorithm; some computing errors may occur. Based on our constructed pairwise sequence alignment algorithm component library and the convenient software platform PAR, a few expansion domain components are developed for multiple sequence alignment application domain, and specific multiple sequence alignment algorithm can be designed, and its corresponding program, i.e., C++/Java/Python program, can be generated efficiently and thus enables the improvement of the development efficiency of complex algorithms, as well as accuracy of sequence alignment calculation. A star alignment algorithm is designed and generated to demonstrate the development process.

Download Full-text

Species tree-aware simultaneous reconstruction of gene and domain evolution

10.1101/336453 ◽

2018 ◽

Cited By ~ 1

Author(s):

Sayyed Auwn Muhammad ◽

Bengt Sennblad ◽

Jens Lagergren

Keyword(s):

Sequence Alignment ◽

Gene Tree ◽

Phylogenetic Reconstruction ◽

Gene Families ◽

Species Tree ◽

Biological Data ◽

Sequence Evolution ◽

Multiple Sequence ◽

Alignment Algorithms ◽

Tandem Duplications

AbstractMost genes are composed of multiple domains, with a common evolutionary history, that typically perform a specific function in the resulting protein. As witnessed by many studies of key gene families, it is important to understand how domains have been duplicated, lost, transferred between genes, and rearranged. Analogously to the case of evolutionary events affecting entire genes, these domain events have large consequences for phylogenetic reconstruction and, in addition, they create considerable obstacles for gene sequence alignment algorithms, a prerequisite for phylogenetic reconstruction.We introduce the DomainDLRS model, a hierarchical, generative probabilistic model containing three levels corresponding to species, genes, and domains, respectively. From a dated species tree, a gene tree is generated according to the DL model, which is a birth-death model generalized to occur in a dated tree. Then, from the dated gene tree, a pre-specified number of dated domain trees are generated using the DL model and the molecular clock is relaxed, effectively converting edge times to edge lengths. Finally, for each domain tree and its lengths, domain sequences are generated for the leaves based on a selected model of sequence evolution.For this model, we present a MCMC-based inference framework called DomainDLRS that takes a dated species tree together with a multiple sequence alignment for each domain family as input and outputs an estimated posterior distribution over reconciled gene and domain trees. By requiring aligned domains rather than genes, our framework evades the problem of aligning full-length genes that have been exposed to domain duplications, in particular non-tandem domain duplications. We show that DomainDLRS performs better than MrBayes on synthetic data and that it outperforms MrBayes on biological data. We analyse several zincfinger genes and show that most domain duplications have been tandem duplications, some involving two or more domains, but non-tandem duplications have also been common.

Download Full-text

A Survey of Sequence Alignment Algorithms with Distributed System

The Journal of Korean Institute of Information Technology ◽

10.14801/kiitr.2014.12.7.145 ◽

2014 ◽

Vol 12 (7) ◽

Author(s):

Jun-Su Lee

Keyword(s):

Distributed System ◽

Sequence Alignment ◽

Alignment Algorithms

Download Full-text

A survey of sequence alignment algorithms for next-generation sequencing

Briefings in Bioinformatics ◽

10.1093/bib/bbq015 ◽

2010 ◽

Vol 11 (5) ◽

pp. 473-483 ◽

Cited By ~ 554

Author(s):

H. Li ◽

N. Homer

Keyword(s):

Next Generation Sequencing ◽

Sequence Alignment ◽

Next Generation ◽

Alignment Algorithms ◽

Generation Sequencing

Download Full-text

Evaluating global and local sequence alignment methods for comparing patient medical records

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-019-0965-y ◽

2019 ◽

Vol 19 (S6) ◽

Cited By ~ 4

Author(s):

Ming Huang ◽

Nilay D. Shah ◽

Lixia Yao

Keyword(s):

Sequence Alignment ◽

Medical Records ◽

Local Alignment ◽

Sequence Alignments ◽

Standard Data ◽

Alignment Algorithms ◽

Local Sequence ◽

Similar Disease ◽

Dynamic Time ◽

Similarity Scores

Abstract Background Sequence alignment is a way of arranging sequences (e.g., DNA, RNA, protein, natural language, financial data, or medical events) to identify the relatedness between two or more sequences and regions of similarity. For Electronic Health Records (EHR) data, sequence alignment helps to identify patients of similar disease trajectory for more relevant and precise prognosis, diagnosis and treatment of patients. Methods We tested two cutting-edge global sequence alignment methods, namely dynamic time warping (DTW) and Needleman-Wunsch algorithm (NWA), together with their local modifications, DTW for Local alignment (DTWL) and Smith-Waterman algorithm (SWA), for aligning patient medical records. We also used 4 sets of synthetic patient medical records generated from a large real-world EHR database as gold standard data, to objectively evaluate these sequence alignment algorithms. Results For global sequence alignments, 47 out of 80 DTW alignments and 11 out of 80 NWA alignments had superior similarity scores than reference alignments while the rest 33 DTW alignments and 69 NWA alignments had the same similarity scores as reference alignments. Forty-six out of 80 DTW alignments had better similarity scores than NWA alignments with the rest 34 cases having the equal similarity scores from both algorithms. For local sequence alignments, 70 out of 80 DTWL alignments and 68 out of 80 SWA alignments had larger coverage and higher similarity scores than reference alignments while the rest DTWL alignments and SWA alignments received the same coverage and similarity scores as reference alignments. Six out of 80 DTWL alignments showed larger coverage and higher similarity scores than SWA alignments. Thirty DTWL alignments had the equal coverage but better similarity scores than SWA. DTWL and SWA received the equal coverage and similarity scores for the rest 44 cases. Conclusions DTW, NWA, DTWL and SWA outperformed the reference alignments. DTW (or DTWL) seems to align better than NWA (or SWA) by inserting new daily events and identifying more similarities between patient medical records. The evaluation results could provide valuable information on the strengths and weakness of these sequence alignment methods for future development of sequence alignment methods and patient similarity-based studies.

Download Full-text

A novel sequence alignment algorithm based on deep learning of the protein folding code

Bioinformatics ◽

10.1093/bioinformatics/btaa810 ◽

2020 ◽

Cited By ~ 1

Author(s):

Mu Gao ◽

Jeffrey Skolnick

Keyword(s):

Protein Folding ◽

Deep Learning ◽

Sequence Alignment ◽

Protein Sequence ◽

Protein Structures ◽

Supplementary Information ◽

Alignment Algorithm ◽

Sequence Alignments ◽

Alignment Algorithms ◽

Structural Alignments

Abstract Motivation From evolutionary interference, function annotation to structural prediction, protein sequence comparison has provided crucial biological insights. While many sequence alignment algorithms have been developed, existing approaches often cannot detect hidden structural relationships in the ‘twilight zone’ of low sequence identity. To address this critical problem, we introduce a computational algorithm that performs protein Sequence Alignments from deep-Learning of Structural Alignments (SAdLSA, silent ‘d’). The key idea is to implicitly learn the protein folding code from many thousands of structural alignments using experimentally determined protein structures. Results To demonstrate that the folding code was learned, we first show that SAdLSA trained on pure α-helical proteins successfully recognizes pairs of structurally related pure β-sheet protein domains. Subsequent training and benchmarking on larger, highly challenging datasets show significant improvement over established approaches. For challenging cases, SAdLSA is ∼150% better than HHsearch for generating pairwise alignments and ∼50% better for identifying the proteins with the best alignments in a sequence library. The time complexity of SAdLSA is O(N) thanks to GPU acceleration. Availability and implementation Datasets and source codes of SAdLSA are available free of charge for academic users at http://sites.gatech.edu/cssb/sadlsa/. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text