sequence alignment Latest Research Papers

Pairwise Sequence Alignment

10.1201/9781003226611-7 ◽

2022 ◽

pp. 383-405

Author(s):

Hamid D. Ismail

Keyword(s):

Sequence Alignment ◽

Pairwise Sequence Alignment

GFAP: ultra-fast and accurate gene functional annotation software for plants

10.1101/2022.01.05.475154 ◽

2022 ◽

Author(s):

Dong Xu ◽

Kangming Jin ◽

Heling Jiang ◽

Desheng Gong ◽

Jinbao Yang ◽

...

Keyword(s):

Plant Species ◽

Sequence Alignment ◽

Related Species ◽

Functional Annotation ◽

Gene Annotation ◽

Protein Domain ◽

Species Number ◽

Closely Related Species ◽

Video Tutorials ◽

Testing Data

Sequence alignment is the basis of gene functional annotation for unknow sequences. Selecting closely related species as the reference species should be an effective way to improve the accuracy of gene annotation for plants, compared with only based on one or some model plants. Therefore, limited species number in previous software or website is disadvantageous for plant gene annotation. Here, we collected the protein sequences of 236 plant species with known genomic information from 63 families. After that, these sequences were annotated by pfam, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases to construct our databases. Furthermore, we developed the software, Gene Annotation Software for Plants (GFAP), to perform gene annotation using our databases. GFAP, an open-source software running on Windows and MacOS systems, is an efficient and network independent tool. GFAP can search the protein domain, GO and KEGG information for 43000 genes within 4 minutes. In addition, GFAP can also perform the sequence alignment, statistical analysis and drawing. The website of https://gitee.com/simon198912167815/gfap-database provides the software, databases, testing data and video tutorials for users. GFAP contained large amount of plant-species information. We believe that it will become a powerful tool in gene annotation using closely related species for phytologists.

Pairwise sequence alignment

10.1016/b978-0-323-91128-3.00013-6 ◽

2022 ◽

pp. 37-45

Author(s):

Mohammad Yaseen Sofi ◽

Afshana Shafi ◽

Khalid Z. Masoodi

Keyword(s):

Sequence Alignment ◽

Pairwise Sequence Alignment

Multiple sequence alignment

10.1016/b978-0-323-91128-3.00011-2 ◽

2022 ◽

pp. 47-53

Author(s):

Mohammad Yaseen Sofi ◽

Afshana Shafi ◽

Khalid Z. Masoodi

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Multiple Sequence

Accelerating Smith-Waterman Algorithm for Faster Sequence Alignment using Graphical Processing Unit

Journal of Physics Conference Series ◽

10.1088/1742-6596/2161/1/012028 ◽

2022 ◽

Vol 2161 (1) ◽

pp. 012028

Author(s):

Karamjeet Kaur ◽

Sudeshna Chakraborty ◽

Manoj Kumar Gupta

Keyword(s):

Sequence Alignment ◽

Time Complexity ◽

Graphic Processing Unit ◽

Parallel Implementation ◽

Processing Unit ◽

Biological Sequences ◽

Sequential Approach ◽

Alignment Process ◽

Graphical Processing ◽

Sequential Implementation

Abstract In bioinformatics, sequence alignment is very important task to compare and find similarity between biological sequences. Smith Waterman algorithm is most widely used for alignment process but it has quadratic time complexity. This algorithm is using sequential approach so if the no. of biological sequences is increasing then it takes too much time to align sequences. In this paper, parallel approach of Smith Waterman algorithm is proposed and implemented according to the architecture of graphic processing unit using CUDA in which features of GPU is combined with CPU in such a way that alignment process is three times faster than sequential implementation of Smith Waterman algorithm and helps in accelerating the performance of sequence alignment using GPU. This paper describes the parallel implementation of sequence alignment using GPU and this intra-task parallelization strategy reduces the execution time. The results show significant runtime savings on GPU.

Multiple sequence alignment tools – software and resources

10.1016/b978-0-323-91128-3.00012-4 ◽

2022 ◽

pp. 55-73

Author(s):

Mohammad Yaseen Sofi ◽

Afshana Shafi ◽

Khalid Z. Masoodi

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Multiple Sequence

First Detection of NADC34-like PRRSV as a Main Epidemic Strain on a Large Farm in China

Pathogens ◽

10.3390/pathogens11010032 ◽

2021 ◽

Vol 11 (1) ◽

pp. 32

Author(s):

Chao Li ◽

Bangjun Gong ◽

Qi Sun ◽

Hu Xu ◽

Jing Zhao ◽

...

Keyword(s):

Sequence Alignment ◽

Large Scale ◽

Pseudorabies Virus ◽

The United States ◽

Fever Virus ◽

High Rate ◽

Respiratory Syndrome Virus ◽

Pig Farms ◽

Pig Farm ◽

The Impact

The newly emerged sublineage 1.5 (NADC34-like) porcine reproductive and respiratory syndrome virus (PRRSV) has posed a direct threat to the Chinese pig industry since 2018. However, the prevalence and impact of NADC34-like PRRSV on Chinese pig farms is unclear. In the present study, we continuously monitored pathogens—including PRRSV, African swine fever virus (ASFV), classical swine fever virus (CSFV), pseudorabies virus (PRV), and porcine circovirus 2 (PCV2)—on a fattening pig farm with strict biosecurity practices located in Heilongjiang Province, China, from 2020 to 2021. The results showed that multiple types of PRRSV coexisted on a single pig farm. NADC30-like and NADC34-like PRRSVs were the predominant strains on this pig farm. Importantly, NADC34-like PRRSV—detected during the period of peak mortality—was one of the predominant strains on this pig farm. Sequence alignment suggested that these strains shared the same 100 aa deletion in the NSP2 protein as IA/2014/NADC34 isolated from the United States (U.S.) in 2014. Phylogenetic analysis based on open reading frame 5 (ORF5) showed that the genetic diversity of NADC34-like PRRSV on this farm was relatively singular, but it had a relatively high rate of evolution. Restriction fragment length polymorphism (RFLP) pattern analysis showed that almost all ORF5 RFLPs were 1-7-4, with one 1-4-4. In addition, two complete genomes of NADC34-like PRRSVs were sequenced. Recombination analysis and sequence alignment demonstrated that both viruses, with 98.9% nucleotide similarity, were non-recombinant viruses. This study reports the prevalence and characteristics of NADC34-like PRRSVs on a large-scale breeding farm in northern China for the first time. These results will help to reveal the impact of NADC34-like PRRSVs on Chinese pig farms, and provide a reference for the detection and further prevention and control of NADC34-like PRRSVs.

Protein Complex Structure Prediction Powered by Multiple Sequence Alignment of Interologs from Multiple Taxonomic Ranks and AlphaFold2

10.1101/2021.12.21.473437 ◽

2021 ◽

Author(s):

Yunda Si ◽

Chengfei Yan

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Protein Complex ◽

Structure Prediction ◽

Complex Structure ◽

Complex Structures ◽

Success Rates ◽

Multiple Sequence ◽

Taxonomic Rank ◽

Protein Protein Interaction

AlphaFold2 is expected to be able to predict protein complex structures as long as a multiple sequence alignment (MSA) of the interologs of the target protein-protein interaction (PPI) can be provided. However, preparing the MSA of protein-protein interologs is a non-trivial task. In this study, a simplified phylogeny-based approach was applied to generate the MSA of interologs, which was then used as the input of AlphaFold2 for protein complex structure prediction. Extensively benchmarked this protocol on non-redundant PPI dataset, we show complex structures of 79.5% of the bacterial PPIs and 49.8% of the eukaryotic PPIs can be successfully predicted. Considering PPIs may not be conserved in species with long evolutionary distances, we further restricted interologs in the MSA to different taxonomic ranks of the species of the target PPI in protein complex structure prediction. We found the success rates can be increased to 87.9% for the bacterial PPIs and 56.3% of the eukaryotic PPIs if interologs in the MSA are restricted to a specific taxonomic rank of the species of each target PPI. Finally, we show the optimal taxonomic ranks for protein complex structure prediction can be selected with the application of the predicted TM-scores of the output models.

fastMSA: Accelerating Multiple Sequence Alignment with Dense Retrieval on Protein Language

10.1101/2021.12.20.473431 ◽

2021 ◽

Author(s):

Liang Hong ◽

Siqi Sun ◽

Liangzhen Zheng ◽

Qingxiong Tan ◽

Yu Li

Keyword(s):

Protein Structure ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Structure Prediction ◽

Structure And Function ◽

Sequence Alignments ◽

Protein Structure And Function ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

And Function

Evolutionarily related sequences provide information for the protein structure and function. Multiple sequence alignment, which includes homolog searching from large databases and sequence alignment, is efficient to dig out the information and assist protein structure and function prediction, whose efficiency has been proved by AlphaFold. Despite the existing tools for multiple sequence alignment, searching homologs from the entire UniProt is still time-consuming. Considering the success of AlphaFold, foreseeably, large- scale multiple sequence alignments against massive databases will be a trend in the field. It is very desirable to accelerate this step. Here, we propose a novel method, fastMSA, to improve the speed significantly. Our idea is orthogonal to all the previous accelerating methods. Taking advantage of the protein language model based on BERT, we propose a novel dual encoder architecture that can embed the protein sequences into a low-dimension space and filter the unrelated sequences efficiently before running BLAST. Extensive experimental results suggest that we can recall most of the homologs with a 34-fold speed-up. Moreover, our method is compatible with the downstream tasks, such as structure prediction using AlphaFold. Using multiple sequence alignments generated from our method, we have little performance compromise on the protein structure prediction with much less running time. fastMSA will effectively assist protein sequence, structure, and function analysis based on homologs and multiple sequence alignment.

The Design and Implementation of an Improved Lightweight BLASTP on CUDA GPU

Symmetry ◽

10.3390/sym13122385 ◽

2021 ◽

Vol 13 (12) ◽

pp. 2385

Author(s):

Xue Sun ◽

Chao-Chin Wu ◽

Yan-Fang Liu

Keyword(s):

Sequence Alignment ◽

Query Sequence ◽

Length Distribution ◽

Lookup Table ◽

Sequence Length ◽

Position Information ◽

Memory Space ◽

Table Entry ◽

The Usa ◽

Index Table

In the field of computational biology, sequence alignment is a very important methodology. BLAST is a very common tool for performing sequence alignment in bioinformatics provided by National Center for Biotechnology Information (NCBI) in the USA. The BLAST server receives tens of thousands of queries every day on average. Among the procedures of BLAST, the hit detection process whose core architecture is a lookup table is the most time-consuming. In the latest work, a lightweight BLASTP on CUDA GPU with a hybrid query-index table was proposed for servicing the sequence query length shorter than 512, which effectively improved the query efficiency. According to the reported protein sequence length distribution, about 90% of sequences are equal to or smaller than 1024. In this paper, we propose an improved lightweight BLASTP to speed up the hit detection time for longer query sequences. The largest sequence is enlarged from 512 to 1024. As a result, one more bit is required to encode each sequence position. To meet the requirement, an extended hybrid query-index table (EHQIT) is proposed to accommodate three sequence positions in a four-byte table entry, making only one memory access sufficient to retrieve all the position information as long as the number of hits is equal to or smaller than three. Moreover, if there are more than three hits for a possible word, all the position information will be stored in contiguous table entries, which eliminates branch divergence and reduces memory space for pointers to overflow buffer. A square symmetric scoring matrix, Blosum62, is used to determine the relative score made by matching two characters in a sequence alignment. The experimental results show that for queries shorter than 512 our improved lightweight BLASTP outperforms the original lightweight BLASTP with speedups of 1.2 on average. When the number of hit overflows increases, the speedup can be as high as two. For queries shorter than 1024, our improved lightweight BLASTP can provide speedups ranging from 1.56 to 3.08 over the CUDA-BLAST. In short, the improved lightweight BLASTP can replace the original one because it can support a longer query sequence and provide better performance.

sequence alignment
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Pairwise Sequence Alignment

GFAP: ultra-fast and accurate gene functional annotation software for plants

Pairwise sequence alignment

Multiple sequence alignment

Accelerating Smith-Waterman Algorithm for Faster Sequence Alignment using Graphical Processing Unit

Multiple sequence alignment tools – software and resources

First Detection of NADC34-like PRRSV as a Main Epidemic Strain on a Large Farm in China

Protein Complex Structure Prediction Powered by Multiple Sequence Alignment of Interologs from Multiple Taxonomic Ranks and AlphaFold2

fastMSA: Accelerating Multiple Sequence Alignment with Dense Retrieval on Protein Language

The Design and Implementation of an Improved Lightweight BLASTP on CUDA GPU

Export Citation Format

sequence alignmentRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Pairwise Sequence Alignment

GFAP: ultra-fast and accurate gene functional annotation software for plants

Pairwise sequence alignment

Multiple sequence alignment

Accelerating Smith-Waterman Algorithm for Faster Sequence Alignment using Graphical Processing Unit

Multiple sequence alignment tools – software and resources

First Detection of NADC34-like PRRSV as a Main Epidemic Strain on a Large Farm in China

Protein Complex Structure Prediction Powered by Multiple Sequence Alignment of Interologs from Multiple Taxonomic Ranks and AlphaFold2

fastMSA: Accelerating Multiple Sequence Alignment with Dense Retrieval on Protein Language

The Design and Implementation of an Improved Lightweight BLASTP on CUDA GPU

sequence alignment
Recently Published Documents