sequence alignment
Recently Published Documents


TOTAL DOCUMENTS

2413
(FIVE YEARS 434)

H-INDEX

105
(FIVE YEARS 7)

2022 ◽  
Author(s):  
Dong Xu ◽  
Kangming Jin ◽  
Heling Jiang ◽  
Desheng Gong ◽  
Jinbao Yang ◽  
...  

Sequence alignment is the basis of gene functional annotation for unknow sequences. Selecting closely related species as the reference species should be an effective way to improve the accuracy of gene annotation for plants, compared with only based on one or some model plants. Therefore, limited species number in previous software or website is disadvantageous for plant gene annotation. Here, we collected the protein sequences of 236 plant species with known genomic information from 63 families. After that, these sequences were annotated by pfam, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases to construct our databases. Furthermore, we developed the software, Gene Annotation Software for Plants (GFAP), to perform gene annotation using our databases. GFAP, an open-source software running on Windows and MacOS systems, is an efficient and network independent tool. GFAP can search the protein domain, GO and KEGG information for 43000 genes within 4 minutes. In addition, GFAP can also perform the sequence alignment, statistical analysis and drawing. The website of https://gitee.com/simon198912167815/gfap-database provides the software, databases, testing data and video tutorials for users. GFAP contained large amount of plant-species information. We believe that it will become a powerful tool in gene annotation using closely related species for phytologists.


2022 ◽  
pp. 37-45
Author(s):  
Mohammad Yaseen Sofi ◽  
Afshana Shafi ◽  
Khalid Z. Masoodi

2022 ◽  
pp. 47-53
Author(s):  
Mohammad Yaseen Sofi ◽  
Afshana Shafi ◽  
Khalid Z. Masoodi

2022 ◽  
Vol 2161 (1) ◽  
pp. 012028
Author(s):  
Karamjeet Kaur ◽  
Sudeshna Chakraborty ◽  
Manoj Kumar Gupta

Abstract In bioinformatics, sequence alignment is very important task to compare and find similarity between biological sequences. Smith Waterman algorithm is most widely used for alignment process but it has quadratic time complexity. This algorithm is using sequential approach so if the no. of biological sequences is increasing then it takes too much time to align sequences. In this paper, parallel approach of Smith Waterman algorithm is proposed and implemented according to the architecture of graphic processing unit using CUDA in which features of GPU is combined with CPU in such a way that alignment process is three times faster than sequential implementation of Smith Waterman algorithm and helps in accelerating the performance of sequence alignment using GPU. This paper describes the parallel implementation of sequence alignment using GPU and this intra-task parallelization strategy reduces the execution time. The results show significant runtime savings on GPU.


2022 ◽  
pp. 55-73
Author(s):  
Mohammad Yaseen Sofi ◽  
Afshana Shafi ◽  
Khalid Z. Masoodi

Pathogens ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. 32
Author(s):  
Chao Li ◽  
Bangjun Gong ◽  
Qi Sun ◽  
Hu Xu ◽  
Jing Zhao ◽  
...  

The newly emerged sublineage 1.5 (NADC34-like) porcine reproductive and respiratory syndrome virus (PRRSV) has posed a direct threat to the Chinese pig industry since 2018. However, the prevalence and impact of NADC34-like PRRSV on Chinese pig farms is unclear. In the present study, we continuously monitored pathogens—including PRRSV, African swine fever virus (ASFV), classical swine fever virus (CSFV), pseudorabies virus (PRV), and porcine circovirus 2 (PCV2)—on a fattening pig farm with strict biosecurity practices located in Heilongjiang Province, China, from 2020 to 2021. The results showed that multiple types of PRRSV coexisted on a single pig farm. NADC30-like and NADC34-like PRRSVs were the predominant strains on this pig farm. Importantly, NADC34-like PRRSV—detected during the period of peak mortality—was one of the predominant strains on this pig farm. Sequence alignment suggested that these strains shared the same 100 aa deletion in the NSP2 protein as IA/2014/NADC34 isolated from the United States (U.S.) in 2014. Phylogenetic analysis based on open reading frame 5 (ORF5) showed that the genetic diversity of NADC34-like PRRSV on this farm was relatively singular, but it had a relatively high rate of evolution. Restriction fragment length polymorphism (RFLP) pattern analysis showed that almost all ORF5 RFLPs were 1-7-4, with one 1-4-4. In addition, two complete genomes of NADC34-like PRRSVs were sequenced. Recombination analysis and sequence alignment demonstrated that both viruses, with 98.9% nucleotide similarity, were non-recombinant viruses. This study reports the prevalence and characteristics of NADC34-like PRRSVs on a large-scale breeding farm in northern China for the first time. These results will help to reveal the impact of NADC34-like PRRSVs on Chinese pig farms, and provide a reference for the detection and further prevention and control of NADC34-like PRRSVs.


2021 ◽  
Author(s):  
Yunda Si ◽  
Chengfei Yan

AlphaFold2 is expected to be able to predict protein complex structures as long as a multiple sequence alignment (MSA) of the interologs of the target protein-protein interaction (PPI) can be provided. However, preparing the MSA of protein-protein interologs is a non-trivial task. In this study, a simplified phylogeny-based approach was applied to generate the MSA of interologs, which was then used as the input of AlphaFold2 for protein complex structure prediction. Extensively benchmarked this protocol on non-redundant PPI dataset, we show complex structures of 79.5% of the bacterial PPIs and 49.8% of the eukaryotic PPIs can be successfully predicted. Considering PPIs may not be conserved in species with long evolutionary distances, we further restricted interologs in the MSA to different taxonomic ranks of the species of the target PPI in protein complex structure prediction. We found the success rates can be increased to 87.9% for the bacterial PPIs and 56.3% of the eukaryotic PPIs if interologs in the MSA are restricted to a specific taxonomic rank of the species of each target PPI. Finally, we show the optimal taxonomic ranks for protein complex structure prediction can be selected with the application of the predicted TM-scores of the output models.


2021 ◽  
Author(s):  
Liang Hong ◽  
Siqi Sun ◽  
Liangzhen Zheng ◽  
Qingxiong Tan ◽  
Yu Li

Evolutionarily related sequences provide information for the protein structure and function. Multiple sequence alignment, which includes homolog searching from large databases and sequence alignment, is efficient to dig out the information and assist protein structure and function prediction, whose efficiency has been proved by AlphaFold. Despite the existing tools for multiple sequence alignment, searching homologs from the entire UniProt is still time-consuming. Considering the success of AlphaFold, foreseeably, large- scale multiple sequence alignments against massive databases will be a trend in the field. It is very desirable to accelerate this step. Here, we propose a novel method, fastMSA, to improve the speed significantly. Our idea is orthogonal to all the previous accelerating methods. Taking advantage of the protein language model based on BERT, we propose a novel dual encoder architecture that can embed the protein sequences into a low-dimension space and filter the unrelated sequences efficiently before running BLAST. Extensive experimental results suggest that we can recall most of the homologs with a 34-fold speed-up. Moreover, our method is compatible with the downstream tasks, such as structure prediction using AlphaFold. Using multiple sequence alignments generated from our method, we have little performance compromise on the protein structure prediction with much less running time. fastMSA will effectively assist protein sequence, structure, and function analysis based on homologs and multiple sequence alignment.


Symmetry ◽  
2021 ◽  
Vol 13 (12) ◽  
pp. 2385
Author(s):  
Xue Sun ◽  
Chao-Chin Wu ◽  
Yan-Fang Liu

In the field of computational biology, sequence alignment is a very important methodology. BLAST is a very common tool for performing sequence alignment in bioinformatics provided by National Center for Biotechnology Information (NCBI) in the USA. The BLAST server receives tens of thousands of queries every day on average. Among the procedures of BLAST, the hit detection process whose core architecture is a lookup table is the most time-consuming. In the latest work, a lightweight BLASTP on CUDA GPU with a hybrid query-index table was proposed for servicing the sequence query length shorter than 512, which effectively improved the query efficiency. According to the reported protein sequence length distribution, about 90% of sequences are equal to or smaller than 1024. In this paper, we propose an improved lightweight BLASTP to speed up the hit detection time for longer query sequences. The largest sequence is enlarged from 512 to 1024. As a result, one more bit is required to encode each sequence position. To meet the requirement, an extended hybrid query-index table (EHQIT) is proposed to accommodate three sequence positions in a four-byte table entry, making only one memory access sufficient to retrieve all the position information as long as the number of hits is equal to or smaller than three. Moreover, if there are more than three hits for a possible word, all the position information will be stored in contiguous table entries, which eliminates branch divergence and reduces memory space for pointers to overflow buffer. A square symmetric scoring matrix, Blosum62, is used to determine the relative score made by matching two characters in a sequence alignment. The experimental results show that for queries shorter than 512 our improved lightweight BLASTP outperforms the original lightweight BLASTP with speedups of 1.2 on average. When the number of hit overflows increases, the speedup can be as high as two. For queries shorter than 1024, our improved lightweight BLASTP can provide speedups ranging from 1.56 to 3.08 over the CUDA-BLAST. In short, the improved lightweight BLASTP can replace the original one because it can support a longer query sequence and provide better performance.


Sign in / Sign up

Export Citation Format

Share Document