High speed BLASTN: an accelerated MegaBLAST search
                    tool

Abstract Sequence alignment is a long standing problem in bioinformatics. The Basic Local Alignment Search Tool (BLAST) is one of the most popular and fundamental alignment tools. The explosive growth of biological sequences calls for speedup of sequence alignment tools such as BLAST. To this end, we develop high speed BLASTN (HS-BLASTN), a parallel and fast nucleotide database search tool that accelerates MegaBLAST—the default module of NCBI-BLASTN. HS-BLASTN builds a new lookup table using the FMD-index of the database and employs an accurate and effective seeding method to find short stretches of identities (called seeds) between the query and the database. HS-BLASTN produces the same alignment results as MegaBLAST and its computational speed is much faster than MegaBLAST. Specifically, our experiments conducted on a 12-core server show that HS-BLASTN can be 22 times faster than MegaBLAST and exhibits better parallel performance than MegaBLAST. HS-BLASTN is written in C++ and the related source code is available at https://github.com/chenying2016/queries under the GPLv3 license.

Download Full-text

A WORK DISTRIBUTION STRATEGY FOR GLOBAL SEQUENCE ALIGNMENT

International Journal of Computing ◽

10.47839/ijc.18.1.1276 ◽

2019 ◽

pp. 75-81

Author(s):

Kailash Kalare ◽

Jitendra Tembhurne

Keyword(s):

Sequence Alignment ◽

Execution Time ◽

High Speed ◽

Global Memory ◽

Global Alignment ◽

Biological Sequences ◽

Distribution Strategy ◽

Work Distribution ◽

Speed Up ◽

Time Required

The sequence alignment comprises to identify similarities and dissimilarities between two given sequences. In this paper, we propose a work distribution strategy for the implementation of DNA global sequence alignment. The main objective of this work is to minimize the execution time required for DNA global alignment of large biological sequences. The proposed approach dealt the issues with the memory optimizations and minimization of execution time. We considered the biological sequences of different size to fit into the global memory of the system. The proposed strategy is implemented in shared memory architecture using OpenMP programming for large biological sequences. Parallelization using OpenMP directive is relatively easy and execute the code fast. We experimented on the Dell Precision Tower 7910 with Intel Xeon processor with 32GB RAM and 28 CPU cores. The efficient use of global memory and cache memory optimization dominate the results of execution time. The results demonstrate the significantly high speed up using OpenMP as compared with other implementations.

Download Full-text

Massively Parallel Implementation of Sequence Alignment with Basic Local Alignment Search Tool Using Parallel Computing in Java Library

Journal of Computational Biology ◽

10.1089/cmb.2018.0079 ◽

2018 ◽

Vol 25 (8) ◽

pp. 871-881 ◽

Cited By ~ 5

Author(s):

Marek Nowicki ◽

Davit Bzhalava ◽

Piotr BaŁa

Keyword(s):

Parallel Computing ◽

Sequence Alignment ◽

Parallel Implementation ◽

Massively Parallel ◽

Local Alignment ◽

Search Tool ◽

Java Library

Download Full-text

Study of Basic Local Alignment Search Tool (BLAST) and Multiple Sequence Alignment (Clustal- X) of Monoclonal mice/human antibodies

10.1101/2021.07.09.451785 ◽

2021 ◽

Author(s):

IVAN VITO FERRARI ◽

Paolo PATRIZIO

Keyword(s):

Amino Acids ◽

Monoclonal Antibodies ◽

Sequence Alignment ◽

Heavy Chain ◽

Light Chain ◽

Light Chains ◽

Local Alignment ◽

Multiple Sequence ◽

Heavy Chains ◽

Search Tool

In this work, we have focused on the study of the Basic Local Alignment Search Tool (BLAST) and Multiple Sequence Alignment (Clustal- X) of different monoclonal mice antibodies to understand better the multiple alignments of sequences. Our strategy was to compare the light chains of multiple monoclonal antibodies to each other, calculating their identity percentage and in which amino acid portion. (See below figure 2) Subsequently, the same survey of heavy chains was carried out with the same methodology. (See below figure 3) Finally, sequence alignment between the light chain of one antibody and the heavy chain of another antibody was studied to understand what happens if chains are exchanged between antibodies. (See below figure 4) From our results of BLAST estimation alignment, we have reported that the Light Chains (Ls) of Monoclonal Antibodies in Comparison have a sequence Homology of about 60-80% and they have a part identical in sequence zone in range 100-210 residues amino acids, except ID PDB 4ISV, which it turns out to have a 40% lower homology than the others antibodies. As far as, the heavy chains (Hs) of Monoclonal Antibodies are concerned, however they tend to have a less homology of sequences, compared to lights chains consideration, equal to 60%-70% and they have an identical part in the sequence zone between 150-210 residues amino acids; with the exception of ID PDB 3I9G-3W9D antibodies that have an equal homology at 50%. ( See supporting part) Summing up: about 70-80% identity among 2 light chains of 2 antibodies, 60-70% identity between 2 heavy chains of 2 antibodies, 30% identity between the two chains of a antibody and 30% if you compare the light chain of one antibody with the heavy chain of another antibody.

Download Full-text

BSSB:BLASTServer for Structural Biologists

Journal of Applied Crystallography ◽

10.1107/s0021889811008387 ◽

2011 ◽

Vol 44 (3) ◽

pp. 651-654 ◽

Cited By ~ 3

Author(s):

Muthukumarasamy Uthayakumar ◽

Govindhan Sowmiya ◽

Radhakrishnan Sabarinathan ◽

N. Udayaprakash ◽

M. Kirti Vaishnavi ◽

...

Keyword(s):

High Speed ◽

Three Dimensional ◽

Pairwise Alignment ◽

Query Protein ◽

Dimensional Structure ◽

Local Alignment ◽

Three Dimensional Structure ◽

Level Information ◽

Search Tool ◽

The Subject

The Basic Local Alignment Search Tool (BLAST) is one of the most widely used sequence alignment programs with which similarity searches, for both protein and nucleic acid sequences, can be performed against large databases at high speed. A large number of tools exist for processingBLASToutput, but none of them provide three-dimensional structure visualization. This shortcoming has been addressed in the proposed toolBLASTServer for Structural Biologists (BSSB), which maps aBLASToutput onto the three-dimensional structure of the subject protein. The three-dimensional structure of the subject protein is represented using a three-color coding scheme (identical: red; similar: yellow; and mismatch: white) based on the pairwise alignment obtained. Thus, the user will be able to visualize a possible three-dimensional structure for the query protein sequence. This information can be used to gain a deeper insight into the sequence–structure correlation. Furthermore, the additional structure-level information enables the user to make coherent and logical decisions regarding the type of input model structure or fragment that can be used for molecular replacement calculations. This tool is freely available to all users at http://bioserver1.physics.iisc.ernet.in/bssb/.

Download Full-text

Genetic Diversity of Cryptosporidium Spp. in Njoro Sub County, Nakuru, Kenya

10.21203/rs.3.rs-621237/v1 ◽

2021 ◽

Author(s):

Walter Miding'a Essendi ◽

Charles Inyagwa Muleke ◽

Miheso Manfred ◽

Elick Onyango Otachi

Keyword(s):

Genetic Diversity ◽

Phylogenetic Analysis ◽

Sequence Alignment ◽

Evolutionary Genetics ◽

Domestic Animals ◽

Local Alignment ◽

Cryptosporidium Spp ◽

Potential Source ◽

Search Tool ◽

Great Genetic Diversity

Abstract Cryptosporidium spp. cause Cryptosporidiosis in humans through zoonotic and anthroponotic transmission. Previous studies have illustrated the significance of domestic animals as reservoirs of this parasite. However, there is no information on the Cryptosporidium spp. and genotypes circulating in Njoro Sub County. A total of 2174 samples from humans, cattle, chicken, sheep and goats were assessed for presence of Cryptosporidium spp. Thirty-three positive samples were successfully sequenced. The sequences obtained were compared to Cryptosporidium sequences in the GenBank using NCBI’s (National Center for Biotechnology Information) online BLAST (Basic Local Alignment Search Tool) algorithmic program. Sequence alignment was done using the Clustal W program and phylogenetic analysis was executed in MEGA 6 (Molecular Evolutionary Genetics Analysis version 6.0). The Cryptosporidium spp. present in the watershed showed great genetic diversity with nine (9) Cryptosporidium spp. namely: C. parvum, C. hominis, C. ubiquitum, C. meleagridis, C. andersoni, C. baileyi, C. muris, C. xiaoi and C. viatorum. Cattle were the biggest reservoirs of zoonotic Cryptosporidium spp. hence a potential source of zoonosis in humans while goats had the least species. This is the first study that reported presence of C. viatorum in Kenya.

Download Full-text

Algoritma Needleman-Wunsch dalam Menentukan Tingkat Kemiripan Urutan DNA Rusa Timor (Cervus timorensis) dan Rusa Merah (Cervus elaphus)

EIGEN MATHEMATICS JOURNAL ◽

10.29303/emj.v3i2.65 ◽

2020 ◽

Vol 3 (2) ◽

pp. 125

Author(s):

Hibban Kholiq ◽

Mamika Ujianita Romdhini ◽

Marliadi Susanto

Keyword(s):

Dna Sequence ◽

Sequence Alignment ◽

Cervus Elaphus ◽

Dna Sequences ◽

Sequence Data ◽

Local Alignment ◽

Base Pairs ◽

European Continent ◽

Dna Sequence Data ◽

Search Tool

Sequence alignment is a basic method in sequence analysis. This method is used to determine the similaritiy level of DNA sequences. The Needleman-Wunsch algorithm is an algorithm that can be used to solve the problem of sequence alignment. This research shows that the relation T (i, j) used in the Needleman-Wunsch algorithm is a function where T: (ℕ0 ℕ0) → ℤ. The function T (i, j) is a recursive function. Moreover, DNA sequence data used are DNA sequences from the Timor Deer, which are the identities of the provinces of West Nusa Tenggara and Red Deer, which are typical deer from the European continent as a comparison. The DNA sequence data was obtained from BLAST (Basic Local Alignment Search Tool). Based on the alignment, the most optimal alignment is obtained by forming 666 base pairs sequences with 322 matches, 230 missmatches and 114 gaps, meaning that the two DNA sequences have a 48% similarity (322/666).

Download Full-text

The Influence of Memory-Aware Computation on Distributed BLAST

Current Bioinformatics ◽

10.2174/1574893613666180601080811 ◽

2019 ◽

Vol 14 (2) ◽

pp. 157-163

Author(s):

Majid Hajibaba ◽

Mohsen Sharifi ◽

Saeid Gorgin

Keyword(s):

Search Time ◽

Genomic Research ◽

Local Alignment ◽

Negative Effects ◽

Sequencing Technologies ◽

Percent Improvement ◽

Fast Processing ◽

Search Tool ◽

Memory Awareness ◽

Generation Sequencing

Background: One of the pivotal challenges in nowadays genomic research domain is the fast processing of voluminous data such as the ones engendered by high-throughput Next-Generation Sequencing technologies. On the other hand, BLAST (Basic Local Alignment Search Tool), a longestablished and renowned tool in Bioinformatics, has shown to be incredibly slow in this regard. Objective: To improve the performance of BLAST in the processing of voluminous data, we have applied a novel memory-aware technique to BLAST for faster parallel processing of voluminous data. Method: We have used a master-worker model for the processing of voluminous data alongside a memory-aware technique in which the master partitions the whole data in equal chunks, one chunk for each worker, and consequently each worker further splits and formats its allocated data chunk according to the size of its memory. Each worker searches every split data one-by-one through a list of queries. Results: We have chosen a list of queries with different lengths to run insensitive searches in a huge database called UniProtKB/TrEMBL. Our experiments show 20 percent improvement in performance when workers used our proposed memory-aware technique compared to when they were not memory aware. Comparatively, experiments show even higher performance improvement, approximately 50 percent, when we applied our memory-aware technique to mpiBLAST. Conclusion: We have shown that memory-awareness in formatting bulky database, when running BLAST, can improve performance significantly, while preventing unexpected crashes in low-memory environments. Even though distributed computing attempts to mitigate search time by partitioning and distributing database portions, our memory-aware technique alleviates negative effects of page-faults on performance.

Download Full-text

Predicting bacteriophage hosts based on sequences of annotated receptor-binding proteins

Scientific Reports ◽

10.1038/s41598-021-81063-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Dimitri Boeckaerts ◽

Michiel Stock ◽

Bjorn Criel ◽

Hans Gerstmans ◽

Bernard De Baets ◽

...

Keyword(s):

Machine Learning ◽

Predictive Model ◽

Receptor Binding ◽

Bacterial Infections ◽

Sequence Data ◽

Sequence Similarity ◽

Area Under The Curve ◽

Local Alignment ◽

Search Tool ◽

Different Levels

AbstractNowadays, bacteriophages are increasingly considered as an alternative treatment for a variety of bacterial infections in cases where classical antibiotics have become ineffective. However, characterizing the host specificity of phages remains a labor- and time-intensive process. In order to alleviate this burden, we have developed a new machine-learning-based pipeline to predict bacteriophage hosts based on annotated receptor-binding protein (RBP) sequence data. We focus on predicting bacterial hosts from the ESKAPE group, Escherichia coli, Salmonella enterica and Clostridium difficile. We compare the performance of our predictive model with that of the widely used Basic Local Alignment Search Tool (BLAST). Our best-performing predictive model reaches Precision-Recall Area Under the Curve (PR-AUC) scores between 73.6 and 93.8% for different levels of sequence similarity in the collected data. Our model reaches a performance comparable to that of BLASTp when sequence similarity in the data is high and starts outperforming BLASTp when sequence similarity drops below 75%. Therefore, our machine learning methods can be especially useful in settings in which sequence similarity to other known sequences is low. Predicting the hosts of novel metagenomic RBP sequences could extend our toolbox to tune the host spectrum of phages or phage tail-like bacteriocins by swapping RBPs.

Download Full-text

Kelch 13-propeller polymorphisms in Plasmodium falciparum from Jazan region, southwest Saudi Arabia

Malaria Journal ◽

10.1186/s12936-020-03467-3 ◽

2020 ◽

Vol 19 (1) ◽

Author(s):

Ommer Mohammed Dafalla ◽

Mohammed Alzahrani ◽

Ahmed Sahli ◽

Mohammed Abdulla Al Helal ◽

Mohammad Mohammad Alhazmi ◽

...

Keyword(s):

Plasmodium Falciparum ◽

Saudi Arabia ◽

Parasite Clearance ◽

Sub Saharan Africa ◽

Local Alignment ◽

Nucleotide Polymorphisms ◽

Gene Encoding ◽

Synonymous Mutations ◽

Search Tool ◽

Sub Saharan

Abstract Background Artemisinin-based combination therapy (ACT) is recommended at the initial phase for treatment of Plasmodium falciparum, to reduce morbidity and mortality in all countries where malaria is endemic. Polymorphism in portions of P. falciparum gene encoding kelch (K13)-propeller domains is associated with delayed parasite clearance after ACT. Of about 124 different non-synonymous mutations, 46 have been identified in Southeast Asia (SEA), 62 in sub-Saharan Africa (SSA) and 16 in both the regions. This is the first study designed to analyse the prevalence of polymorphism in the P. falciparum k13-propeller domain in the Jazan region of southwest Saudi Arabia, where malaria is endemic. Methods One-hundred and forty P. falciparum samples were collected from Jazan region of southwest Saudi Arabia at three different times: 20 samples in 2011, 40 samples in 2016 and 80 samples in 2020 after the implementation of ACT. Plasmodium falciparum kelch13 (k13) gene DNA was extracted, amplified, sequenced, and analysed using a basic local alignment search tool (BLAST). Results This study obtained 51 non-synonymous (NS) mutations in three time groups, divided as follows: 6 single nucleotide polymorphisms (SNPs) ‘11.8%’ in samples collected in 2011 only, 3 (5.9%) in 2011and 2016, 5 (9.8%) in 2011 and 2020, 5 (9.8%) in 2016 only, 8 (15.7%) in 2016 and 2020, 14 (27.5%) in 2020 and 10 (19.6%) in all the groups. The BLAST revealed that the 2011 isolates were genetically closer to African isolates (53.3%) than Asian ones (46.7%). Interestingly, this proportion changed completely in 2020, to become closer to Asian isolates (81.6%) than to African ones (18.4%). Conclusions Despite the diversity of the identified mutations in the k13-propeller gene, these data did not report widespread artemisinin-resistant polymorphisms in the Jazan region where these samples were collected. Such a process would be expected to increase frequencies of mutations associated with the resistance of ACT.

Download Full-text

Fish Identification based on Partial Fragments of The Mitochondrial COI Subunit I Gene

E3S Web of Conferences ◽

10.1051/e3sconf/202132201038 ◽

2021 ◽

Vol 322 ◽

pp. 01038

Author(s):

Tuah N. M. Wulandari

Keyword(s):

Phylogenetic Tree ◽

Coi Gene ◽

Local Alignment ◽

Mtdna Sequences ◽

Mitochondrial Coi ◽

Search Tool ◽

Very High ◽

I Gene

The mtDNA sequences revealed that several of the fish studied were Hampala macrolepidota and Barbonymus gonionotus. The objective of this research was to learn the pattern of COI gene in mtDNA and establish a phylogenetic tree. Basic Local Alignment Search Tool-nucleotide (BLASTn) confirmed that Barbonymus gonionotus froma the Ranau Lake, South Sumatera has 100% matching ranges to the species from Memberamo River (Indonesia), India, Bangladesh, Thailand (Mae Khlong), Indo-Myanmar, and Malaysia_1. The lowest closeness (98.76%) is related to species from Thailand (Lower Ing). The Blast investigation appears us that the level of familiarity was very high, it is coming to 98-100% in Barbonymus gonionotus. Hampala macrolepidota had 100% matching ranges to the species from Indonesia (SouthaSumatera_1) and Vietnam. They had 99.05%-99.84% closeness from Malaysia_1,2&3, Indonesia (South Sumatera_2&3, Java and Bali_1,2&3).

Download Full-text