Pairwise Distance Matrix Computation for Multiple Sequence Alignment on the Cell Broadband Engine

Although high quality multiple sequence alignment is an essential task in bioinformatics, it becomes a big dilemma nowadays due to the gigantic explosion in the amount of molecular data. The most consuming time and space phase is the distance matrix computation. This paper addresses this issue by proposing a vectorized parallel method that accomplishes the huge number of similarity comparisons faster in less space. Performance tests on real biological datasets using core-i7 show superior results in terms of time and space.

Download Full-text

Randomized And Parallel Algorithms For Distance Matrix Calculations In Multiple Sequence Alignment

Journal of Clinical Monitoring and Computing ◽

10.1007/s10877-005-0680-3 ◽

2005 ◽

Vol 19 (4-5) ◽

pp. 351-359 ◽

Cited By ~ 1

Author(s):

Sanguthevar Rajasekaran ◽

Vishal Thapar ◽

Hardik Dave ◽

Chun-Hsi Huang

Keyword(s):

Parallel Algorithms ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Distance Matrix ◽

Multiple Sequence

Download Full-text

Sequence similarity search, Multiple Sequence Alignment, Model Selection, Distance Matrix and Phylogeny Reconstruction

Protocol Exchange ◽

10.1038/protex.2013.065 ◽

2013 ◽

Cited By ~ 12

Author(s):

Felix Bast ◽

Felix Bast

Keyword(s):

Model Selection ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Similarity Search ◽

Sequence Similarity ◽

Distance Matrix ◽

Phylogeny Reconstruction ◽

Sequence Similarity Search ◽

Multiple Sequence ◽

Alignment Model

Download Full-text

CUDA-Parttree: A Multiple Sequence Alignment Parallel Strategy in GPU

10.5753/wscad.2019.8662 ◽

2019 ◽

Author(s):

Caina Razzolini ◽

Alba Melo

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Execution Time ◽

Distance Matrix ◽

Data Conversion ◽

Multiple Sequence ◽

Alignment Tool ◽

Multiple Sequence Alignment Tool ◽

Matrix Calculation ◽

Parallel Strategy

In this paper, we propose and evaluate CUDA-Parttree, a parallel strategy that executes the first phase of the MAFFT Parttree Multiple Sequence Alignment tool (distance matrix calculation with 6mers) on GPU. When compared to Parttree, CUDA-Parttree obtained a speedup of 6.10x on the distance matrix calculation for the Cyclodex gly tran (50, 280 sequences) set, reducing the execution time from 33.94s to 5.57s. Including data conversion and movement to/from the GPU, the speedup was 2.59x. With the sequence set Syn 100000 (100, 000 sequences), a speedup of 4.46x was attained, reducing execution time from 209.54s to 47.00s.

Download Full-text

A multiple sequence alignment method with sequence vectorization

Engineering Computations ◽

10.1108/ec-01-2013-0026 ◽

2014 ◽

Vol 31 (2) ◽

pp. 283-296

Author(s):

Guoli Ji ◽

Yong Zeng ◽

Zijiang Yang ◽

Congting Ye ◽

Jingci Yao

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Time Complexity ◽

Large Scale ◽

Distance Matrix ◽

Traditional Methods ◽

Multiple Sequence ◽

Guide Tree ◽

Content Type ◽

Matrix Calculation

Purpose – The time complexity of most multiple sequence alignment algorithm is O(N2) or O(N3) (N is the number of sequences). In addition, with the development of biotechnology, the amount of biological sequences grows significantly. The traditional methods have some difficulties in handling large-scale sequence. The proposed Lemk_MSA method aims to reduce the time complexity, especially for large-scale sequences. At the same time, it can keep similar accuracy level compared to the traditional methods. Design/methodology/approach – LemK_MSA converts multiple sequence alignment into corresponding 10D vector alignment by ten types of copy modes based on Lempel-Ziv. Then, it uses k-means algorithm and NJ algorithm to divide the sequences into several groups and calculate guide tree of each group. A complete guide tree for multiple sequence alignment could be constructed by merging guide tree of every group. Moreover, for large-scale multiple sequence, Lemk_MSA proposes a GPU-based parallel way for distance matrix calculation. Findings – Under this approach, the time efficiency to process multiple sequence alignment can be improved. The high-throughput mouse antibody sequences are used to validate the proposed method. Compared to ClustalW, MAFFT and Mbed, LemK_MSA is more than ten times efficient while ensuring the alignment accuracy at the same time. Originality/value – This paper proposes a novel method with sequence vectorization for multiple sequence alignment based on Lempel-Ziv. A GPU-based parallel method has been designed for large-scale distance matrix calculation. It provides a new way for multiple sequence alignment research.

Download Full-text

A Randomized Algorithm for Distance Matrix Calculations in Multiple Sequence Alignment

Knowledge Exploration in Life Science Informatics - Lecture Notes in Computer Science ◽

10.1007/978-3-540-30478-4_4 ◽

2004 ◽

pp. 33-45 ◽

Cited By ~ 2

Author(s):

Sanguthevar Rajasekaran ◽

Vishal Thapar ◽

Hardik Dave ◽

Chun-Hsi Huang

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Randomized Algorithm ◽

Distance Matrix ◽

Multiple Sequence

Download Full-text

Multiple Sequence Alignment and Profile Analysis of Protein Family Utsing Hidden Markov Model

International Journal of Scientific Research ◽

10.15373/22778179/june2013/66 ◽

2012 ◽

Vol 2 (6) ◽

pp. 208-211

Author(s):

Navjot Kaur ◽

◽

Rajbir Singh Cheema ◽

Harmandeep Singh Harmandeep Singh

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Profile Analysis ◽

Hidden Markov ◽

Protein Family ◽

Multiple Sequence

Download Full-text

Faculty Opinions recommendation of MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.731078852.793536612 ◽

2017 ◽

Author(s):

Feng Gao

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Online Service ◽

Multiple Sequence

Download Full-text

Computational Analysis of Therapeutic Enzyme Uricase from Different Source Organisms

Current Proteomics ◽

10.2174/1570164616666190617165107 ◽

2020 ◽

Vol 17 (1) ◽

pp. 59-77

Author(s):

Anand Kumar Nelapati ◽

JagadeeshBabu PonnanEttiyappan

Keyword(s):

Uric Acid ◽

Amino Acid ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Protein Sequences ◽

Amino Acid Sequences ◽

Amino Acid Residues ◽

Multiple Sequence ◽

Physiochemical Properties ◽

Pharmaceutical Industries

Background:Hyperuricemia and gout are the conditions, which is a response of accumulation of uric acid in the blood and urine. Uric acid is the product of purine metabolic pathway in humans. Uricase is a therapeutic enzyme that can enzymatically reduces the concentration of uric acid in serum and urine into more a soluble allantoin. Uricases are widely available in several sources like bacteria, fungi, yeast, plants and animals.Objective:The present study is aimed at elucidating the structure and physiochemical properties of uricase by insilico analysis.Methods:A total number of sixty amino acid sequences of uricase belongs to different sources were obtained from NCBI and different analysis like Multiple Sequence Alignment (MSA), homology search, phylogenetic relation, motif search, domain architecture and physiochemical properties including pI, EC, Ai, Ii, and were performed.Results:Multiple sequence alignment of all the selected protein sequences has exhibited distinct difference between bacterial, fungal, plant and animal sources based on the position-specific existence of conserved amino acid residues. The maximum homology of all the selected protein sequences is between 51-388. In singular category, homology is between 16-337 for bacterial uricase, 14-339 for fungal uricase, 12-317 for plants uricase, and 37-361 for animals uricase. The phylogenetic tree constructed based on the amino acid sequences disclosed clusters indicating that uricase is from different source. The physiochemical features revealed that the uricase amino acid residues are in between 300- 338 with a molecular weight as 33-39kDa and theoretical pI ranging from 4.95-8.88. The amino acid composition results showed that valine amino acid has a high average frequency of 8.79 percentage compared to different amino acids in all analyzed species.Conclusion:In the area of bioinformatics field, this work might be informative and a stepping-stone to other researchers to get an idea about the physicochemical features, evolutionary history and structural motifs of uricase that can be widely used in biotechnological and pharmaceutical industries. Therefore, the proposed in silico analysis can be considered for protein engineering work, as well as for gout therapy.

Download Full-text

LegumeDB: Development of Legume Medicinal Plant Database and Comparative Molecular Evolutionary Analysis of matK Proteins of Legumes and Mangroves

Current Nutrition & Food Science ◽

10.2174/1573401314666180223143523 ◽

2019 ◽

Vol 15 (4) ◽

pp. 353-362

Author(s):

Sambhaji B. Thakar ◽

Maruti J. Dhanavade ◽

Kailas D. Sonawane

Keyword(s):

Phylogenetic Analysis ◽

Medicinal Plants ◽

Homology Modeling ◽

Sequence Alignment ◽

Vigna Unguiculata ◽

Multiple Sequence Alignment ◽

Legume Species ◽

Mangrove Species ◽

Multiple Sequence ◽

Thespesia Populnea

Background: Legume plants are known for their rich medicinal and nutritional values. Large amount of medicinal information of various legume plants have been dispersed in the form of text. Objective: It is essential to design and construct a legume medicinal plants database, which integrate respective classes of legumes and include knowledge regarding medicinal applications along with their protein/enzyme sequences. Methods: The design and development of Legume Medicinal Plants Database (LegumeDB) has been done by using Microsoft Structure Query Language Server 2017. DBMS was used as back end and ASP.Net was used to lay out front end operations. VB.Net was used as arranged program for coding. Multiple sequence alignment, phylogenetic analysis and homology modeling techniques were also used. Results: This database includes information of 50 Legume medicinal species, which might be helpful to explore the information for researchers. Further, maturase K (matK) protein sequences of legumes and mangroves were retrieved from NCBI for multiple sequence alignment and phylogenetic analysis to understand evolutionary lineage between legumes and mangroves. Homology modeling technique was used to determine three-dimensional structure of matK from Legume species i.e. Vigna unguiculata using matK of mangrove species, Thespesia populnea as a template. The matK sequence analysis results indicate the conserved residues among legume and mangrove species. Conclusion: Phylogenetic analysis revealed closeness between legume species Vigna unguiculata and mangrove species Thespesia populnea to each other, indicating their similarity and origin from common ancestor. Thus, these studies might be helpful to understand evolutionary relationship between legumes and mangroves. : LegumeDB availability: http://legumedatabase.co.in

Download Full-text

Pairwise Distance Matrix Computation for Multiple Sequence Alignment on the Cell Broadband Engine

Fast vectorized distance matrix computation for multiple sequence alignment on multi-cores

Randomized And Parallel Algorithms For Distance Matrix Calculations In Multiple Sequence Alignment

Sequence similarity search, Multiple Sequence Alignment, Model Selection, Distance Matrix and Phylogeny Reconstruction

CUDA-Parttree: A Multiple Sequence Alignment Parallel Strategy in GPU

A multiple sequence alignment method with sequence vectorization

A Randomized Algorithm for Distance Matrix Calculations in Multiple Sequence Alignment

Multiple Sequence Alignment and Profile Analysis of Protein Family Utsing Hidden Markov Model

Faculty Opinions recommendation of MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization.

Computational Analysis of Therapeutic Enzyme Uricase from Different Source Organisms

LegumeDB: Development of Legume Medicinal Plant Database and Comparative Molecular Evolutionary Analysis of matK Proteins of Legumes and Mangroves

Export Citation Format