The Influence of Memory-Aware Computation on Distributed BLAST

Background: One of the pivotal challenges in nowadays genomic research domain is the fast processing of voluminous data such as the ones engendered by high-throughput Next-Generation Sequencing technologies. On the other hand, BLAST (Basic Local Alignment Search Tool), a longestablished and renowned tool in Bioinformatics, has shown to be incredibly slow in this regard. Objective: To improve the performance of BLAST in the processing of voluminous data, we have applied a novel memory-aware technique to BLAST for faster parallel processing of voluminous data. Method: We have used a master-worker model for the processing of voluminous data alongside a memory-aware technique in which the master partitions the whole data in equal chunks, one chunk for each worker, and consequently each worker further splits and formats its allocated data chunk according to the size of its memory. Each worker searches every split data one-by-one through a list of queries. Results: We have chosen a list of queries with different lengths to run insensitive searches in a huge database called UniProtKB/TrEMBL. Our experiments show 20 percent improvement in performance when workers used our proposed memory-aware technique compared to when they were not memory aware. Comparatively, experiments show even higher performance improvement, approximately 50 percent, when we applied our memory-aware technique to mpiBLAST. Conclusion: We have shown that memory-awareness in formatting bulky database, when running BLAST, can improve performance significantly, while preventing unexpected crashes in low-memory environments. Even though distributed computing attempts to mitigate search time by partitioning and distributing database portions, our memory-aware technique alleviates negative effects of page-faults on performance.

Download Full-text

Detecting somatic mutations in genomic sequences by means of Kolmogorov–Arnold analysis

Royal Society Open Science ◽

10.1098/rsos.150143 ◽

2015 ◽

Vol 2 (8) ◽

pp. 150143 ◽

Cited By ~ 3

Author(s):

V. G. Gurzadyan ◽

H. Yan ◽

G. Vlahovic ◽

A. Kashin ◽

P. Killela ◽

...

Keyword(s):

Clinical Diagnostics ◽

Genomic Research ◽

Genomic Sequences ◽

Sequencing Data ◽

Sequencing Technologies ◽

Cancer Genome Sequencing ◽

Frequent Mutations ◽

Using Data ◽

First Time ◽

Generation Sequencing

The Kolmogorov–Arnold stochasticity parameter technique is applied for the first time to the study of cancer genome sequencing, to reveal mutations. Using data generated by next-generation sequencing technologies, we have analysed the exome sequences of brain tumour patients with matched tumour and normal blood. We show that mutations contained in sequencing data can be revealed using this technique, thus providing a new methodology for determining subsequences of given length containing mutations, i.e. its value differs from those of subsequences without mutations. A potential application for this technique involves simplifying the procedure of finding segments with mutations, speeding up genomic research and accelerating its implementation in clinical diagnostics. Moreover, the prediction of a mutation associated with a family of frequent mutations in numerous types of cancers based purely on the value of the Kolmogorov function indicates that this applied marker may recognize genomic sequences that are in extremely low abundance and can be used in revealing new types of mutations.

Download Full-text

Seqs-Extractor: Automated sequences extraction to reduce tedious manual corrections of large datasets

10.7287/peerj.preprints.3364v1 ◽

2017 ◽

Author(s):

Patrick D C Pereira ◽

Cleyssian Dias ◽

Mauro A D Melo ◽

Nara G M Magalhães ◽

Cristovam G Diniz ◽

...

Keyword(s):

Large Datasets ◽

Local Alignment ◽

Large Numbers ◽

Analytical Work ◽

Search Tool ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Generation Sequencing ◽

Manual Correction ◽

Reducing Potential

The analysis of large numbers of sequences requires the reduction of ambiguities during the analytical work to ensure that the effort will focus only on confirmed sequences. Performing this work automatically may help to minimize potential errors associated with tedious manual correction, allowing more effective results. Basic local alignment search tool (BLAST) seems to be the most widely used sequence analysis program. It is free, but commercial parties enhanced BLAST applications and charge a fee for their uses. There are some tools of public domain that can perform the search of microsatellites in the next generation sequencing (NGS) data, as the microsatellite identification tool (MISA), which has some features to discover microsatellites in large datasets. Here, we developed a basic shell script (BASH script) to be ran under Linux environment that can be used to extract from a sequence dataset only confirmed (BLASTed) sequences from both nucleotide (BLASTN) and protein (BLASTX) databases and extract sequences that contains microsatellites using MISA tool, using a friendly interface and no fees charged. Seqs-Extractor is a helpful tool that may enhance the analysis of large datasets in BLAST+ and MISA by minimizing the time of management, reducing potential errors caused by manipulating data and no fees charged. Seqs-Extractor is available at https://github.com/patrick-douglas/Seqs-Extractor/wiki .

Download Full-text

Bringing Next-Generation Sequencing into the Classroom through a Comparison of Molecular Biology Techniques

The American Biology Teacher ◽

10.1525/abt.2014.76.6.7 ◽

2014 ◽

Vol 76 (6) ◽

pp. 396-401 ◽

Cited By ~ 4

Author(s):

Bethany Bowling ◽

Erin Zimmer ◽

Robert E. Pyatt

Keyword(s):

Molecular Biology ◽

Genomic Research ◽

Classroom Activity ◽

Next Generation ◽

Sequencing Technologies ◽

Nextgen Sequencing ◽

Interactive Classroom ◽

Biotechnology Companies ◽

High Degree ◽

Generation Sequencing

Although the development of next-generation (NextGen) sequencing technologies has revolutionized genomic research and medicine, the incorporation of these topics into the classroom is challenging, given an implied high degree of technical complexity. We developed an easy-to-implement, interactive classroom activity investigating the similarities and differences between current sequencing methodology and three NextGen technologies. The activity uses existing materials created by each of the biotechnology companies that outline their instrumentation and chemistries. Following this activity, students will understand the molecular biology behind these NextGen applications and the similarities to existing Sanger sequencing methods.

Download Full-text

Local Alignment Tool Based on Hadoop Framework and GPU Architecture

BioMed Research International ◽

10.1155/2014/541490 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7 ◽

Cited By ~ 7

Author(s):

Che-Lun Hung ◽

Guan-Jie Hua

Keyword(s):

High Availability ◽

Local Alignment ◽

Sequencing Technologies ◽

Computational Performance ◽

Huge Data ◽

Alignment Tool ◽

Gpu Architectures ◽

Gpu Architecture ◽

Generation Sequencing ◽

Hadoop Framework

With the rapid growth of next generation sequencing technologies, such as Slex, more and more data have been discovered and published. To analyze such huge data the computational performance is an important issue. Recently, many tools, such as SOAP, have been implemented on Hadoop and GPU parallel computing architectures. BLASTP is an important tool, implemented on GPU architectures, for biologists to compare protein sequences. To deal with the big biology data, it is hard to rely on single GPU. Therefore, we implement a distributed BLASTP by combining Hadoop and multi-GPUs. The experimental results present that the proposed method can improve the performance of BLASTP on single GPU, and also it can achieve high availability and fault tolerance.

Download Full-text

Seqs-Extractor: Automated sequences extraction to reduce tedious manual corrections of large datasets

10.7287/peerj.preprints.3364 ◽

2017 ◽

Author(s):

Patrick D C Pereira ◽

Cleyssian Dias ◽

Mauro A D Melo ◽

Nara G M Magalhães ◽

Cristovam G Diniz ◽

...

Keyword(s):

Large Datasets ◽

Local Alignment ◽

Large Numbers ◽

Analytical Work ◽

Search Tool ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Generation Sequencing ◽

Manual Correction ◽

Reducing Potential

Download Full-text

MPI-blastn and NCBI-TaxCollector: Improving metagenomic analysis with high performance classification and wide taxonomic attachment

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720014500139 ◽

2014 ◽

Vol 12 (03) ◽

pp. 1450013 ◽

Cited By ~ 6

Author(s):

R. Dias ◽

M. G. Xavier ◽

F. D. Rossi ◽

M. V. Neves ◽

T. A. P. Lange ◽

...

Keyword(s):

High Performance ◽

Parallel Implementation ◽

Metagenomic Analysis ◽

Local Alignment ◽

Metagenomic Sequencing ◽

Multiple Sequence ◽

Genetic Sequencing ◽

Sequencing Technologies ◽

Sequence Search ◽

Search Tool

Metagenomic sequencing technologies are advancing rapidly and the size of output data from high-throughput genetic sequencing has increased substantially over the years. This brings us to a scenario where advanced computational optimizations are requested to perform a metagenomic analysis. In this paper, we describe a new parallel implementation of nucleotide BLAST (MPI-blastn) and a new tool for taxonomic attachment of Basic Local Alignment Search Tool (BLAST) results that supports the NCBI taxonomy (NCBI-TaxCollector). MPI-blastn obtained a high performance when compared to the mpiBLAST and ScalaBLAST. In our best case, MPI-blastn was able to run 408 times faster in 384 cores. Our evaluations demonstrated that NCBI-TaxCollector is able to perform taxonomic attachments 125 times faster and needs 120 times less RAM than the previous TaxCollector. Through our optimizations, a multiple sequence search that currently takes 37 hours can be performed in less than 6 min and a post processing with NCBI taxonomic data attachment, which takes 48 hours, now is able to run in 23 min.

Download Full-text

Next-Generation Sequencing: An Emerging Tool for Drug Designing

Current Pharmaceutical Design ◽

10.2174/1381612825666190911155508 ◽

2019 ◽

Vol 25 (31) ◽

pp. 3350-3357 ◽

Cited By ~ 1

Author(s):

Pooja Tripathi ◽

Jyotsna Singh ◽

Jonathan A. Lal ◽

Vijay Tripathi

Keyword(s):

Next Generation Sequencing ◽

High Throughput ◽

Large Scale ◽

Massively Parallel Sequencing ◽

Genomic Research ◽

Biological Research ◽

Next Generation ◽

Human Welfare ◽

Drug Designing ◽

Generation Sequencing

Background: With the outbreak of high throughput next-generation sequencing (NGS), the biological research of drug discovery has been directed towards the oncology and infectious disease therapeutic areas, with extensive use in biopharmaceutical development and vaccine production. Method: In this review, an effort was made to address the basic background of NGS technologies, potential applications of NGS in drug designing. Our purpose is also to provide a brief introduction of various Nextgeneration sequencing techniques. Discussions: The high-throughput methods execute Large-scale Unbiased Sequencing (LUS) which comprises of Massively Parallel Sequencing (MPS) or NGS technologies. The Next geneinvolved necessarily executes Largescale Unbiased Sequencing (LUS) which comprises of MPS or NGS technologies. These are related terms that describe a DNA sequencing technology which has revolutionized genomic research. Using NGS, an entire human genome can be sequenced within a single day. Conclusion: Analysis of NGS data unravels important clues in the quest for the treatment of various lifethreatening diseases and other related scientific problems related to human welfare.

Download Full-text

Clinical Implications of Polymicrobial Synergism Effects on Antimicrobial Susceptibility

Pathogens ◽

10.3390/pathogens10020144 ◽

2021 ◽

Vol 10 (2) ◽

pp. 144

Author(s):

William Little ◽

Caroline Black ◽

Allie Clinton Smith

Keyword(s):

Antimicrobial Susceptibility ◽

Chronic Wounds ◽

Clinical Laboratory ◽

Patient Treatment ◽

Clinical Implications ◽

Clinical Environment ◽

Tolerance Mechanisms ◽

Sequencing Technologies ◽

Generation Sequencing ◽

Polymicrobial Infections

With the development of next generation sequencing technologies in recent years, it has been demonstrated that many human infectious processes, including chronic wounds, cystic fibrosis, and otitis media, are associated with a polymicrobial burden. Research has also demonstrated that polymicrobial infections tend to be associated with treatment failure and worse patient prognoses. Despite the importance of the polymicrobial nature of many infection states, the current clinical standard for determining antimicrobial susceptibility in the clinical laboratory is exclusively performed on unimicrobial suspensions. There is a growing body of research demonstrating that microorganisms in a polymicrobial environment can synergize their activities associated with a variety of outcomes, including changes to their antimicrobial susceptibility through both resistance and tolerance mechanisms. This review highlights the current body of work describing polymicrobial synergism, both inter- and intra-kingdom, impacting antimicrobial susceptibility. Given the importance of polymicrobial synergism in the clinical environment, a new system of determining antimicrobial susceptibility from polymicrobial infections may significantly impact patient treatment and outcomes.

Download Full-text

Predicting bacteriophage hosts based on sequences of annotated receptor-binding proteins

Scientific Reports ◽

10.1038/s41598-021-81063-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Dimitri Boeckaerts ◽

Michiel Stock ◽

Bjorn Criel ◽

Hans Gerstmans ◽

Bernard De Baets ◽

...

Keyword(s):

Machine Learning ◽

Predictive Model ◽

Receptor Binding ◽

Bacterial Infections ◽

Sequence Data ◽

Sequence Similarity ◽

Area Under The Curve ◽

Local Alignment ◽

Search Tool ◽

Different Levels

AbstractNowadays, bacteriophages are increasingly considered as an alternative treatment for a variety of bacterial infections in cases where classical antibiotics have become ineffective. However, characterizing the host specificity of phages remains a labor- and time-intensive process. In order to alleviate this burden, we have developed a new machine-learning-based pipeline to predict bacteriophage hosts based on annotated receptor-binding protein (RBP) sequence data. We focus on predicting bacterial hosts from the ESKAPE group, Escherichia coli, Salmonella enterica and Clostridium difficile. We compare the performance of our predictive model with that of the widely used Basic Local Alignment Search Tool (BLAST). Our best-performing predictive model reaches Precision-Recall Area Under the Curve (PR-AUC) scores between 73.6 and 93.8% for different levels of sequence similarity in the collected data. Our model reaches a performance comparable to that of BLASTp when sequence similarity in the data is high and starts outperforming BLASTp when sequence similarity drops below 75%. Therefore, our machine learning methods can be especially useful in settings in which sequence similarity to other known sequences is low. Predicting the hosts of novel metagenomic RBP sequences could extend our toolbox to tune the host spectrum of phages or phage tail-like bacteriocins by swapping RBPs.

Download Full-text

Kelch 13-propeller polymorphisms in Plasmodium falciparum from Jazan region, southwest Saudi Arabia

Malaria Journal ◽

10.1186/s12936-020-03467-3 ◽

2020 ◽

Vol 19 (1) ◽

Author(s):

Ommer Mohammed Dafalla ◽

Mohammed Alzahrani ◽

Ahmed Sahli ◽

Mohammed Abdulla Al Helal ◽

Mohammad Mohammad Alhazmi ◽

...

Keyword(s):

Plasmodium Falciparum ◽

Saudi Arabia ◽

Parasite Clearance ◽

Sub Saharan Africa ◽

Local Alignment ◽

Nucleotide Polymorphisms ◽

Gene Encoding ◽

Synonymous Mutations ◽

Search Tool ◽

Sub Saharan

Abstract Background Artemisinin-based combination therapy (ACT) is recommended at the initial phase for treatment of Plasmodium falciparum, to reduce morbidity and mortality in all countries where malaria is endemic. Polymorphism in portions of P. falciparum gene encoding kelch (K13)-propeller domains is associated with delayed parasite clearance after ACT. Of about 124 different non-synonymous mutations, 46 have been identified in Southeast Asia (SEA), 62 in sub-Saharan Africa (SSA) and 16 in both the regions. This is the first study designed to analyse the prevalence of polymorphism in the P. falciparum k13-propeller domain in the Jazan region of southwest Saudi Arabia, where malaria is endemic. Methods One-hundred and forty P. falciparum samples were collected from Jazan region of southwest Saudi Arabia at three different times: 20 samples in 2011, 40 samples in 2016 and 80 samples in 2020 after the implementation of ACT. Plasmodium falciparum kelch13 (k13) gene DNA was extracted, amplified, sequenced, and analysed using a basic local alignment search tool (BLAST). Results This study obtained 51 non-synonymous (NS) mutations in three time groups, divided as follows: 6 single nucleotide polymorphisms (SNPs) ‘11.8%’ in samples collected in 2011 only, 3 (5.9%) in 2011and 2016, 5 (9.8%) in 2011 and 2020, 5 (9.8%) in 2016 only, 8 (15.7%) in 2016 and 2020, 14 (27.5%) in 2020 and 10 (19.6%) in all the groups. The BLAST revealed that the 2011 isolates were genetically closer to African isolates (53.3%) than Asian ones (46.7%). Interestingly, this proportion changed completely in 2020, to become closer to Asian isolates (81.6%) than to African ones (18.4%). Conclusions Despite the diversity of the identified mutations in the k13-propeller gene, these data did not report widespread artemisinin-resistant polymorphisms in the Jazan region where these samples were collected. Such a process would be expected to increase frequencies of mutations associated with the resistance of ACT.

Download Full-text