scholarly journals Not just BLAST nt: WGS database joins the party

2019 ◽  
Author(s):  
Jose Manuel Martí ◽  
Carlos P. Garay

AbstractSince its introduction in 1990 and with over 50k citations, the NCBI BLAST family has been an essential tool of in silico molecular biology. The BLAST nt database, based on the traditional divisions of GenBank, has been the default and most comprehensive database for nucleotide BLAST searches and for taxonomic classification software in metagenomics. Here we argue that this is no longer the case. Currently, the NCBI WGS database contains one billion reads (almost five times more than GenBank), and with 4.4 trillion nucleotides, WGS has about 14 times more nucleotides than GenBank. This ratio is growing with time. We advocate a change in the database paradigm in taxonomic classification by systematically combining the nt and WGS databases in order to boost taxonomic classifiers sensitivity. We present here a case in which, by adding WGS data, we obtained over five times more classified reads and with a higher confidence score. To facilitate the adoption of this approach, we provide the draftGenomes script.Author summaryCulture-independent methods are revolutionizing biology. The NIH/NCBI Basic Local Alignment Search Tool (BLAST) is one of the most widely used methods in computational biology. The BLAST nt database has become a de facto standard for taxonomic classifiers in metagenomics. We believe that it is time for a change in the database paradigm for such a classification. We advocate the systematic combination of the BLAST nt database with genomes of the massive NCBI Whole-Genome Shotgun (WGS) database. We make draftGenomes available, a script that eases the adoption of this approach. Current developments and technologies make it feasible now. Our recent results in several metagenomic projects indicate that this strategy boosts the sensitivity in taxonomic classifications.

Leonardo ◽  
2005 ◽  
Vol 38 (4) ◽  
pp. 286-293 ◽  
Author(s):  
Ruth West ◽  
Jeff Burke ◽  
Cheryl Kerfeld ◽  
Eitan Mendelowitz ◽  
Thomas Holton ◽  
...  

Ecce Homology, a physically interactive new-media work, visualizes genetic data as calligraphic forms. A novel computer-vision user interface allows multiple participants, through their movement in the installation space, to select genes from the human genome for visualizing the Basic Local Alignment Search Tool (BLAST), a primary algorithm in comparative genomics. Ecce Homology was successfully installed in the UCLA Fowler Museum, 6 November 2003–4 January 2004.


Author(s):  
Agus Hartoko ◽  
Delianis Pringgenies ◽  
Amelia Cahya Anggelina ◽  
Takashi Matsuishi

Aims: Morphology and molecular biology of a rare shark-rays Rhina ancylostoma caught from Java sea.Indonesia. Study Design: Morphology, morphometry and DNA analysis of the fish specimen. Place and Duration of Study: Sample wascollected from fish auction hall at north Java, specimen now stored in Lab of Ichtiology. Department of Fisheries. Faculty of Fisheries and Marine Science. Diponegoro University, between March to December 2019. Methodology: DNA extraction, amplification and molecular identification of fish sample.Yield of DNA supernatant transferred in an eppendeorf tube and stored in 4°C for further process. PCR amplification. Part of Mithocondrial DNA COI (Cytochrome Oxidase subunit-I) gen was amplified using Polymerase Chain Reaction (PCR) method. Then matched up with GenBank database at NCBI using Basic Local Alignment Search Tool (BLAST) analysis. Philogenetic tree was set using Neighbor-Joining method, Kimura-2 parameter model and 1,000 bootstrap replication. Some sequence from NCBI GenBank were inputted into philogenetic tree as comparison. Results: Rhina ancylostoma as one of demersal shark rays catch from tropical shallow Java sea. Range of Total Length 73 – 200 cm and dominant Total Length catch 125 cm. Morphologyis characterised with three lines of spine thorn on the head, morphometry characteriswide of head to TL ratio 0.77. DNA analysis had confirmed the specimen of AH2 as Rhina ancylostoma based on homological match up of sequence of Gen Bank database with reference accesion number KU721837.1 with length sequence of 665 bpand identical similarity of 99.84% for specimen Accession number LC 505461. Conclusion: Rhina ancylostoma as one of demersal shark rays catch from Java sea. Range Total Length (TL) of catch 73 – 200 cm and dominant TL catch range 101 – 125 cm. Morphologically character of wide head ratio with three lines of spine thorn on the head. Nearest genetic distance of 0.02 to Rhincobatushorkelii and 0.017 to R. australiae. Longest genetic distance of 0.243 to Potamotrygon motoro.


2018 ◽  
Vol 56 (6) ◽  
pp. 408-412 ◽  
Author(s):  
Landry Nfonsam ◽  
Shelley Ordorica ◽  
Mahdi Ghani ◽  
Ryan Potter ◽  
Audrey Schaffer ◽  
...  

BackgroundAdvances in molecular technologies and in-silico variant prediction tools offer wide-ranging opportunities in diagnostic settings, yet they also present with significant limitations.ObjectiveHere, we contextualise the limitations of next-generation sequencing (NGS), multiplex ligation-dependent probe amplification (MLPA) and in-silico prediction tools routinely used by diagnostic laboratories by reviewing specific experiences from our diagnostic laboratory.MethodsWe investigated discordant annotations and/or incorrect variant ‘callings’ in exons of 56 genes constituting our cardiomyopathy and connective tissue disorder NGS panels. Discordant variants and segmental duplications (SD) were queried using the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool and the University of California Santa Cruz genome browser, respectively, to identify regions of high homology. Discrepant variant analyses by in-silico models were re-evaluated using updated file entries.ResultsWe observed a 5% error rate in MYH7 variant ‘calling’ using MLPA, which resulted from >90% homology of the MYH7 probe-binding site to MYH6. SDs were detected in TTN, PKP2 and MYLK. SDs in MYLK presented the highest risk (15.7%) of incorrect variant ‘calling’. The inaccurate ‘callings’ and discrepant in-silico predictions were resolved following detailed investigation into the source of error.ConclusionRecognising the limitations described here may help avoid incorrect diagnoses and leverage the power of new molecular technologies in diagnostic settings.


2009 ◽  
Vol 8 (4) ◽  
pp. 326-337 ◽  
Author(s):  
Joann M. Lau ◽  
David L. Robinson

With rapid advances in biotechnology and molecular biology, instructors are challenged to not only provide undergraduate students with hands-on experiences in these disciplines but also to engage them in the “real-world” scientific process. Two common topics covered in biotechnology or molecular biology courses are gene-cloning and bioinformatics, but to provide students with a continuous laboratory-based research experience in these techniques is difficult. To meet these challenges, we have partnered with Bio-Rad Laboratories in the development of the “Cloning and Sequencing Explorer Series,” which combines wet-lab experiences (e.g., DNA extraction, polymerase chain reaction, ligation, transformation, and restriction digestion) with bioinformatics analysis (e.g., evaluation of DNA sequence quality, sequence editing, Basic Local Alignment Search Tool searches, contig construction, intron identification, and six-frame translation) to produce a sequence publishable in the National Center for Biotechnology Information GenBank. This 6- to 8-wk project-based exercise focuses on a pivotal gene of glycolysis (glyceraldehyde-3-phosphate dehydrogenase), in which students isolate, sequence, and characterize the gene from a plant species or cultivar not yet published in GenBank. Student achievement was evaluated using pre-, mid-, and final-test assessments, as well as with a survey to assess student perceptions. Student confidence with basic laboratory techniques and knowledge of bioinformatics tools were significantly increased upon completion of this hands-on exercise.


2020 ◽  
Vol 11 (2) ◽  
pp. 70-76
Author(s):  
Yustinus Maladan ◽  
Hana Krismawati ◽  
Hotma Martogi Laurensia Hutapea ◽  
Antonius Oktavian

Latar belakang: Lepra merupakan penyakit yang disebabkan oleh Mycobacterium leprae. Resistensi obat merupakan salah satu tantangan dalam pemberantasan kusta khususnya di Papua. Adanya mutasi pada gen folP1 penyandi dihydropteroate synthase (DHPS) merupakan dasar untuk deteksi molekuler resistensi dapson pada penyakit lepra. Tujuan penelitian ini adalah mendeteksi mutasi pada gen folP1 Mycobacterium leprae dari Papua, Indonesia dan menganalisis pengaruh mutasi tersebut terhadap dapson dengan metode in silico. Metode: Identifikasi mutasi pada gen folp1 M. leprae dilakukan melalui proses Basic Local Alignment Search Tool (BLAST) di gene bank. Analisis efek mutasi dengan menggunakan server Have (y) Our Protein Explained (HOPE). Prediksi binding pocket menggunakan Computed Atlas of Surface Topography of proteins (CASTp). Homologi modeling struktur 3D DHPS menggunakan server Iterative Threading ASSEmbly Refinement (I TASSER). Analisis docking dengan menggunakan AutoDock Vina yang terintegrasi dengan aplikasi Python Prescription (PyRx). Hasil: Hasil sekuensing menunjukkan adanya variasi dalam gen folP1 M. leprae yaitu perubahan dari Timin (T) menjadi Sitosin (C) pada nukleotida 143. Residu yang bermutasi (V48A) terletak pada domain yang penting untuk aktivitas protein dan kontak dengan residu di domain lain. Ada kemungkinan bahwa interaksi ini penting untuk fungsi protein secara benar. Mutan V48A tidak banyak mempengaruhi stabilitas dari dihydropteroate synthase M. leprae. Kesimpulan: Berdasarkan analisis molecular docking, mutasi V48A tidak mempengaruhi binding affinity dapson terhadap dihydropteroate synthase M. leprae. Hasil ini menunjukkan mutan V48A kemungkinan tetaprentan terhadap dapson. Dengan demikian perlu dilakukan uji in vivo untuk mengkofirmasi efek mutasi V48A. Kata kunci: Mycobacterium leprae, folP1 gene, dihydropteroate synthase, dapson   Abstract Background: Leprosy is a disease caused by Mycobacterium leprae. Drug resistance is one of the challenges in leprosy elimination especially in Papua. The presence of mutations in folP1 gene that encode dihydropteroate synthase (DHPS) was considered as the exclusive basis for molecular detection of dapsone resistance in leprosy. The objective of this study was to detect mutations in the folP1 gene of Mycobacterium leprae from Papua, Indonesia and to analyze the effect of these mutations on dapsone using the in-silico method. Methods: Identification of mutations in the folp1 M. leprae gene is carried out through the Basic Local Alignment Search Tool (BLAST) process in the gene bank. The analysis of the effects of mutations using the Have (y)Our Protein Explained (HOPE) server. Bindings pocket prediction is done using the Computed Atlas of Surface Topography of proteins (CASTp). Homology modeling 3D structure of DHPS using the Iterative Threading ASSEmbly Refinement (I-TASSER) server. Docking analysis was performed using AutoDock Vina which is integrated with the Python Prescription (PyRx) application. Results: The sequencing results showed a variation in the folP1 M. leprae gene, namely a change from thymine (T) to cytosine (C) in nucleotide 143. The mutated residue (V48A) is in a domain that is essential for the activity of the protein and in contact with residues in another domain. It is possible that this interaction is important for the correct function of the protein. V48A mutants did not significantly affect the stability of DHPS M. leprae. Conclusion: Based on molecular docking analysis, this mutation does not affect binding affinity dapsone against M. leprae dihydropteroate synthase. These results indicate that the V48A mutant is likely to remain susceptible to dapsone. Thus, it is necessary to do an in vivo test to confirm the effect of the V48A mutation. Keywords: Mycobacterium leprae, folP1 gene, dihydropteroate synthase, dapsone


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Jie-Xia Liu ◽  
Qian Jiang ◽  
Jian-Ping Tao ◽  
Kai Feng ◽  
Tong Li ◽  
...  

AbstractWater dropwort (Liyang Baiqin, Oenanthe javanica (BI.) DC.) is an aquatic perennial plant from the Apiaceae family with abundant protein, dietary fiber, vitamins, and minerals. It usually grows in wet soils and can even grow in water. Here, whole-genome sequencing of O. javanica via HiSeq 2000 sequencing technology was reported for the first time. The genome size was 1.28 Gb, including 42,270 genes, of which 93.92% could be functionally annotated. An online database of the whole-genome sequences of water dropwort, Water dropwortDB, was established to share the results and facilitate further research on O. javanica (database homepage: http://apiaceae.njau.edu.cn/waterdropwortdb). Water dropwortDB offers whole-genome and transcriptome sequences and a Basic Local Alignment Search Tool. Comparative analysis with other species showed that the evolutionary relationship between O. javanica and Daucus carota was the closest. Twenty-five gene families of O. javanica were found to be expanded, and some genetic factors (such as genes and miRNAs) related to phenotypic and anatomic differentiation in O. javanica under different water conditions were further investigated. Two miRNA and target gene pairs (miR408 and Oja15472, miR171 and Oja47040) were remarkably regulated by water stress. The obtained reference genome of O. javanica provides important information for future work, thus making in-depth genetic breeding and gene editing possible. The present study also provides a foundation for the understanding of the O. javanica response to water stress, including morphological, anatomical, and genetic differentiation.


2019 ◽  
Vol 14 (2) ◽  
pp. 157-163
Author(s):  
Majid Hajibaba ◽  
Mohsen Sharifi ◽  
Saeid Gorgin

Background: One of the pivotal challenges in nowadays genomic research domain is the fast processing of voluminous data such as the ones engendered by high-throughput Next-Generation Sequencing technologies. On the other hand, BLAST (Basic Local Alignment Search Tool), a longestablished and renowned tool in Bioinformatics, has shown to be incredibly slow in this regard. Objective: To improve the performance of BLAST in the processing of voluminous data, we have applied a novel memory-aware technique to BLAST for faster parallel processing of voluminous data. Method: We have used a master-worker model for the processing of voluminous data alongside a memory-aware technique in which the master partitions the whole data in equal chunks, one chunk for each worker, and consequently each worker further splits and formats its allocated data chunk according to the size of its memory. Each worker searches every split data one-by-one through a list of queries. Results: We have chosen a list of queries with different lengths to run insensitive searches in a huge database called UniProtKB/TrEMBL. Our experiments show 20 percent improvement in performance when workers used our proposed memory-aware technique compared to when they were not memory aware. Comparatively, experiments show even higher performance improvement, approximately 50 percent, when we applied our memory-aware technique to mpiBLAST. Conclusion: We have shown that memory-awareness in formatting bulky database, when running BLAST, can improve performance significantly, while preventing unexpected crashes in low-memory environments. Even though distributed computing attempts to mitigate search time by partitioning and distributing database portions, our memory-aware technique alleviates negative effects of page-faults on performance.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Dimitri Boeckaerts ◽  
Michiel Stock ◽  
Bjorn Criel ◽  
Hans Gerstmans ◽  
Bernard De Baets ◽  
...  

AbstractNowadays, bacteriophages are increasingly considered as an alternative treatment for a variety of bacterial infections in cases where classical antibiotics have become ineffective. However, characterizing the host specificity of phages remains a labor- and time-intensive process. In order to alleviate this burden, we have developed a new machine-learning-based pipeline to predict bacteriophage hosts based on annotated receptor-binding protein (RBP) sequence data. We focus on predicting bacterial hosts from the ESKAPE group, Escherichia coli, Salmonella enterica and Clostridium difficile. We compare the performance of our predictive model with that of the widely used Basic Local Alignment Search Tool (BLAST). Our best-performing predictive model reaches Precision-Recall Area Under the Curve (PR-AUC) scores between 73.6 and 93.8% for different levels of sequence similarity in the collected data. Our model reaches a performance comparable to that of BLASTp when sequence similarity in the data is high and starts outperforming BLASTp when sequence similarity drops below 75%. Therefore, our machine learning methods can be especially useful in settings in which sequence similarity to other known sequences is low. Predicting the hosts of novel metagenomic RBP sequences could extend our toolbox to tune the host spectrum of phages or phage tail-like bacteriocins by swapping RBPs.


F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 2003 ◽  
Author(s):  
Michael P. Heaton ◽  
Timothy P.L. Smith ◽  
Jacky K. Carnahan ◽  
Veronica Basnayake ◽  
Jiansheng Qiu ◽  
...  

The availability of whole genome sequence (WGS) data has made it possible to discover protein variantsin silico. However, existing bovine WGS databases do not show data in a form conducive to protein variant analysis, and tend to under represent the breadth of genetic diversity in global beef cattle. Thus, our first aim was to use 96 beef sires, sharing minimal pedigree relationships, to create a searchable and publicly viewable set of mapped genomes relevant for 19 popular breeds of U.S. cattle. Our second aim was to identify protein variants encoded by the bovine endothelial PAS domain-containing protein 1 gene (EPAS1), a gene associated with pulmonary hypertension in Angus cattle. The identity and quality of genomic sequences were verified by comparing WGS genotypes to those derived from other methods. The average read depth, genotype scoring rate, and genotype accuracy exceeded 14, 99%, and 99%, respectively. The 96 genomes were used to discover four amino acid variants encoded byEPAS1(E270Q, P362L, A671G, and L701F) and confirm two variants previously associated with disease (A606T and G610S). The sixEPAS1missense mutations were verified with matrix-assisted laser desorption/ionization time-of-flight mass spectrometry assays, and their frequencies were estimated in a separate collection of 1154 U.S. cattle representing 46 breeds. A rooted phylogenetic tree of eight polypeptide sequences provided a framework for evaluating the likely order of mutations and potential impact ofEPAS1alleles on the adaptive response to chronic hypoxia in U.S. cattle. This public, whole genome resource facilitatesin silicoidentification of protein variants in diverse types of U.S. beef cattle, and provides a means of translating WGS data into a practical biological and evolutionary context for generating and testing hypotheses.


2020 ◽  
Vol 19 (1) ◽  
Author(s):  
Ommer Mohammed Dafalla ◽  
Mohammed Alzahrani ◽  
Ahmed Sahli ◽  
Mohammed Abdulla Al Helal ◽  
Mohammad Mohammad Alhazmi ◽  
...  

Abstract Background Artemisinin-based combination therapy (ACT) is recommended at the initial phase for treatment of Plasmodium falciparum, to reduce morbidity and mortality in all countries where malaria is endemic. Polymorphism in portions of P. falciparum gene encoding kelch (K13)-propeller domains is associated with delayed parasite clearance after ACT. Of about 124 different non-synonymous mutations, 46 have been identified in Southeast Asia (SEA), 62 in sub-Saharan Africa (SSA) and 16 in both the regions. This is the first study designed to analyse the prevalence of polymorphism in the P. falciparum k13-propeller domain in the Jazan region of southwest Saudi Arabia, where malaria is endemic. Methods One-hundred and forty P. falciparum samples were collected from Jazan region of southwest Saudi Arabia at three different times: 20 samples in 2011, 40 samples in 2016 and 80 samples in 2020 after the implementation of ACT. Plasmodium falciparum kelch13 (k13) gene DNA was extracted, amplified, sequenced, and analysed using a basic local alignment search tool (BLAST). Results This study obtained 51 non-synonymous (NS) mutations in three time groups, divided as follows: 6 single nucleotide polymorphisms (SNPs) ‘11.8%’ in samples collected in 2011 only, 3 (5.9%) in 2011and 2016, 5 (9.8%) in 2011 and 2020, 5 (9.8%) in 2016 only, 8 (15.7%) in 2016 and 2020, 14 (27.5%) in 2020 and 10 (19.6%) in all the groups. The BLAST revealed that the 2011 isolates were genetically closer to African isolates (53.3%) than Asian ones (46.7%). Interestingly, this proportion changed completely in 2020, to become closer to Asian isolates (81.6%) than to African ones (18.4%). Conclusions Despite the diversity of the identified mutations in the k13-propeller gene, these data did not report widespread artemisinin-resistant polymorphisms in the Jazan region where these samples were collected. Such a process would be expected to increase frequencies of mutations associated with the resistance of ACT.


Sign in / Sign up

Export Citation Format

Share Document