scholarly journals Progress in quickly finding orthologs as reciprocal best hits

2020 ◽  
Author(s):  
Julie E Hernández-Salmerón ◽  
Gabriel Moreno-Hagelsieb

AbstractIntroductionFinding orthologs remains an important bottleneck in comparative genomics analyses. While the authors of software for the quick comparison of protein sequences evaluate the speed of their software and compare their results against the most usual software for the task, it is not common for them to evaluate their software for more particular uses, such as finding orthologs as reciprocal best hits (RBH). Here we compared RBH results, between prokaryotic genomes, obtained using software that runs faster than blastp. Namely, lastal, diamond, and MMseqs2.ResultsWe found that lastal required the least time to produce results. However, it yielded fewer results than any other program when comparing evolutionarily distant genomes. The program producing the most similar number of RBH as blastp was MMseqs2. This program also resulted in the lowest error estimates among the programs tested. The results with diamond were very close to those obtained with MMseqs2, with diamond running faster. Our results suggest that the best of the programs tested was diamond, ran with the “sensitive” option, which took 7% of the time as blastp to run, and produced results with lower error rates than blastp.AvailabilityA program to obtain reciprocal best hits using the software we tested is maintained at https://github.com/Computational-conSequences/SequenceTools

BMC Genomics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Julie E. Hernández-Salmerón ◽  
Gabriel Moreno-Hagelsieb

Abstract Background Finding orthologs remains an important bottleneck in comparative genomics analyses. While the authors of software for the quick comparison of protein sequences evaluate the speed of their software and compare their results against the most usual software for the task, it is not common for them to evaluate their software for more particular uses, such as finding orthologs as reciprocal best hits (RBH). Here we compared RBH results obtained using software that runs faster than blastp. Namely, lastal, diamond, and MMseqs2. Results We found that lastal required the least time to produce results. However, it yielded fewer results than any other program when comparing the proteins encoded by evolutionarily distant genomes. The program producing the most similar number of RBH to blastp was diamond ran with the “ultra-sensitive” option. However, this option was diamond’s slowest, with the “very-sensitive” option offering the best balance between speed and RBH results. The speeding up of the programs was much more evident when dealing with eukaryotic genomes, which code for more numerous proteins. For example, lastal took a median of approx. 1.5% of the blastp time to run with bacterial proteomes and 0.6% with eukaryotic ones, while diamond with the very-sensitive option took 7.4% and 5.2%, respectively. Though estimated error rates were very similar among the RBH obtained with all programs, RBH obtained with MMseqs2 had the lowest error rates among the programs tested. Conclusions The fast algorithms for pairwise protein comparison produced results very similar to blast in a fraction of the time, with diamond offering the best compromise in speed, sensitivity and quality, as long as a sensitivity option, other than the default, was chosen.


2016 ◽  
Vol 54 (6) ◽  
pp. 1416-1417 ◽  
Author(s):  
Richard B. Thomson

The Gram stain is one of the most commonly performed tests in the clinical microbiology laboratory, yet it is poorly controlled and lacks standardization. It was once the best rapid test in microbiology, but it is no longer trusted by many clinicians. The publication by Samuel et al. (J. Clin. Microbiol. 54:1442–1447, 2016,http://dx.doi.org/10.1128/JCM.03066-15) is a start for those who want to evaluate and improve Gram stain performance. In an age of emerging rapid molecular results, is the Gram stain still relevant? How should clinical microbiologists respond to the call to reduce Gram stain error rates?


2012 ◽  
Vol 56 (7) ◽  
pp. 3481-3491 ◽  
Author(s):  
Michael Widmann ◽  
Jürgen Pleiss ◽  
Peter Oelschlaeger

ABSTRACTMetallo-β-lactamases (MBLs) are enzymes that hydrolyze β-lactam antibiotics, resulting in bacterial resistance to these drugs. These proteins have caused concerns due to their facile transference, broad substrate spectra, and the absence of clinically useful inhibitors. To facilitate the classification, nomenclature, and analysis of MBLs, an automated database system was developed, the Metallo-β-Lactamase Engineering Database (MBLED) (http://www.mbled.uni-stuttgart.de). It contains information on MBLs retrieved from the NCBI peptide database while strictly following the nomenclature by Jacoby and Bush (http://www.lahey.org/Studies/) and the generally accepted class B β-lactamase (BBL) standard numbering scheme for MBLs. The database comprises 597 MBL protein sequences and enables systematic analyses of these sequences. A systematic analysis employing the database resulted in the generation of mutation profiles of assigned IMP- and VIM-type MBLs, the identification of five MBL protein entries from the NCBI peptide database that were inconsistent with the Jacoby and Bush nomenclature, and the identification of 15 new IMP candidates and 9 new VIM candidates. Furthermore, the database was used to identify residues with high mutation frequencies and variability (mutation hot spots) that were unexpectedly distant from the active site located in the ββ sandwich: positions 208 and 266 in the IMP family and positions 215 and 258 in the VIM family. We expect that the MBLED will be a valuable tool for systematically cataloguing and analyzing the increasing number of MBLs being reported.


2017 ◽  
Author(s):  
Morgan N. Price ◽  
Adam P. Arkin

AbstractLarge-scale genome sequencing has identified millions of protein-coding genes whose function is unknown. Many of these proteins are similar to characterized proteins from other organisms, but much of this information is missing from annotation databases and is hidden in the scientific literature. To make this information accessible, PaperBLAST uses EuropePMC to search the full text of scientific articles for references to genes. PaperBLAST also takes advantage of curated resources that link protein sequences to scientific articles (Swiss-Prot, GeneRIF, and EcoCyc). PaperBLAST’s database includes over 700,000 scientific articles that mention over 400,000 different proteins. Given a protein of interest, PaperBLAST quickly finds similar proteins that are discussed in the literature and presents snippets of text from relevant articles or from the curators. PaperBLAST is available at http://papers.genomics.lbl.gov/.


Author(s):  
Juan Guzman ◽  
Atena Sadat Sombolestani ◽  
Anja Poehlein ◽  
Rolf Daniel ◽  
Ilse Cleenwerck ◽  
...  

A novel bacterium designated G55GPT and pertaining to the family Acetobacteraceae was isolated from the gut of the Madagascar hissing cockroach Gromphadorhina portentosa. The Gram-negative cells were rod-shaped and non-motile. The complete 16S rRNA sequence of the strain G55GPT showed the highest pairwise similarity to Gluconacetobacter johannae CFN-Cf-55T (95.35 %), suggesting it represents a potential new genus of the family Acetobacteraceae . Phylogenetic analysis based on 16S rRNA gene and 106 orthologous housekeeping protein sequences revealed that G55GPT forms a monophyletic clade with the genus Commensalibacter , which thus far has also been isolated exclusively from insects. The G55GPT genome size was 2.70 Mbp, and the G+C content was 45.4 mol%, which is lower than most acetic acid bacteria (51–68 mol%) but comparable to Swingsia samuiensis AH83T (45.1 mol%) and higher than Commensalibacter intestini A911T (36.8 mol%). Overall genome relatedness indices based on gene and protein sequences strongly supported the assignment of G55GPT to a new genus within the family Acetobacteraceae . The percentage of conserved proteins, which is a useful metric for genus differentiation, was below 54 % when comparing G55GPT to type strains of acetic acid bacteria, thus strongly supporting our hypothesis that G55GPT is a member of a yet-undescribed genus. The fatty acid composition of G55GPT differed from that of closely related acetic acid bacteria, particularly given the presence of C19 : 1  ω9c/ω11c and the absence of C14 : 0 and C14 : 0 2-OH fatty acids. Strain G55GPT also differed in terms of metabolic features such as its ability to produce acid from d-mannitol, and its inability to produce acetic acid from ethanol or to oxidize glycerol to dihydroxyacetone. Based on the results of combined genomic, phenotypic and phylogenetic characterizations, isolate G55GPT (=LMG 31394T=DSM 111244T) is considered to represent a new species in a new genus, for which we propose the name Entomobacter blattae gen. nov., sp. nov.


2020 ◽  
Vol 18 (1) ◽  
Author(s):  
Charlotte Hall ◽  
Dean Robertson ◽  
Margaret Rolfe ◽  
Sharene Pascoe ◽  
Megan E. Passey ◽  
...  

Abstract Background Resuscitation of patients with time-critical and life-threatening illness represents a cognitive challenge for emergency room (ER) clinicians. We designed a cognitive aid, the Emergency Protocols Handbook, to simplify clinical management and team processes. Resuscitation guidelines were reformatted into simple, single step-by-step pathways. This Australian randomised controlled trial tested the effectiveness of this cognitive aid in a simulated ER environment by observing team error rates when current resuscitation guidelines were followed, with and without the handbook. Methods Resuscitation teams were randomised to manage two scenarios with the handbook and two without in a high-fidelity simulation centre. Each scenario was video-recorded. The primary outcome measure was error rates (the number of errors made out of 15 key tasks per scenario). Key tasks varied by scenario. Each team completed four scenarios and was measured on 60 key tasks. Participants were surveyed regarding their perception of the usefulness of the handbook. Results Twenty-one groups performed 84 ER crisis simulations. The unadjusted error rate in the handbook group was 18.8% (121/645) versus 38.9% (239/615) in the non-handbook group. There was a statistically significant reduction of 54.0% (95% CI 49.9–57.9) in the estimated percentage error rate when the handbook was available across all scenarios 17.9% (95% CI 14.4–22.0%) versus 38.9% (95% CI 34.2–43.9%). Almost all (97%) participants said they would want to use this cognitive aid during a real medical crisis situation. Conclusion This trial showed that by following the step-by-step, linear pathways in the handbook, clinicians more than halved their teams’ rate of error, across four simulated medical crises. The handbook improves team performance and enables healthcare teams to reduce clinical error rates and thus reduce harm for patients. Trial registration ACTRN12616001456448 registered: www.anzctr.org.au. Trial site: http://emergencyprotocols.org.au/


2007 ◽  
Vol 189 (23) ◽  
pp. 8693-8703 ◽  
Author(s):  
Jonathan Livny ◽  
Yoshiharu Yamaichi ◽  
Matthew K. Waldor

ABSTRACT Partitioning of low-copy-number plasmids to daughter cells often depends on ParA and ParB proteins acting on centromere-like parS sites. Similar chromosome-encoded par loci likely also contribute to chromosome segregation. Here, we used bioinformatic approaches to search for chromosomal parS sites in 400 prokaryotic genomes. Although the consensus sequence matrix used to search for parS sites was derived from two gram-positive species, putative parS sites were identified on the chromosomes of 69% of strains from all branches of bacteria. Strains that were not found to contain parS sites clustered among relatively few branches of the prokaryotic evolutionary tree. In the vast majority of cases, parS sites were identified in origin-proximal regions of chromosomes. The widespread conservation of parS sites across diverse bacteria suggests that par loci evolved very early in the evolution of bacterial chromosomes and that the absence of parS, parA, and/or parB in certain strains likely reflects the loss of one of more of these loci much later in evolution. Moreover, the highly conserved origin-proximal position of parS suggests par loci are primarily devoted to regulating processes that involve the origin region of bacterial chromosomes. In species containing multiple chromosomes, the parS sites found on secondary chromosomes diverge significantly from those found on their primary chromosomes, suggesting that chromosome segregation of multipartite genomes requires distinct replicon-specific par loci. Furthermore, parS sites on secondary chromosomes are not well conserved among different species, suggesting that the evolutionary histories of secondary chromosomes are more diverse than those of primary chromosomes.


2017 ◽  
Author(s):  
Robert M. Waterhouse ◽  
Mathieu Seppey ◽  
Felipe A. Simão ◽  
Mosè Manni ◽  
Panagiotis Ioannidis ◽  
...  

ABSTRACTGenomics promises comprehensive surveying of genomes and metagenomes, but rapidly changing technologies and expanding data volumes make evaluation of completeness a challenging task. Technical sequencing quality metrics can be complemented by quantifying completeness in terms of the expected gene content of Benchmarking Universal Single-Copy Orthologs (BUSCO, http://busco.ezlab.org). Now in its third release, BUSCO utilities extend beyond quality control to applications in comparative genomics, gene predictor training, metagenomics, and phylogenomics.


2016 ◽  
Author(s):  
Gregorio Iraola ◽  
Hugo Naya

Taxonomy of prokaryotes has remained a controversial discipline due to the extreme plasticity of microorganisms, causing inconsistencies between phenotypic and genotypic classifications. The genomics era has enhanced taxonomy but also opened new debates about the best practices for incorporating genomic data into polyphasic taxonomy protocols, which are fairly biased towards the identification of bacterial species. Here we use an extensive dataset of Archaea and Bacteria to prove that metabolic signatures coded in their genomes are informative traits that allow to accurately classify organisms coherently to higher taxonomic ranks, and to associate functional features with the definition of taxa. Our results support the ecological coherence of higher taxonomic ranks and reconciles taxonomy with traditional chemotaxonomic traits inferred from genomes. KARL, a simple and free tool useful for assisting polyphasic taxonomy or to perform functional prospections is also presented (https://github.com/giraola/KARL).


2021 ◽  
Author(s):  
Simon Lee ◽  
Loan T. Nguyen ◽  
Ben J. Hayes ◽  
Elizabeth M Ross

Motivation: Quality control (QC) tools are critical in DNA sequencing analysis because they increase the accuracy of sequence alignments and thus the reliability of results. Oxford Nanopore Technologies (ONT) QC is currently rudimentary, generally based on whole read average quality. This results in discarding reads that contain regions of high quality sequence. Here we propose Prowler, a multi-window approach inspired by algorithms used to QC short read data. Importantly, we retain the phase and read length information by optionally replacing trimmed sections with Ns. Results: Prowler was applied to mammalian and bacterial datasets, to assess effects on alignment and assembly respectively. Compared to Nanofilt, alignments of data QCed with Prowler had lower error rates and more mapped reads. Assemblies of Prowler QCed data had a lower error rate than Nanofilt QCed data however this came at some cost to assembly contiguity. Availability and implementation: Prowler is implemented in Python and is available at: https://github.com/ProwlerForNanopore/ProwlerTrimmer Contact: [email protected]


Sign in / Sign up

Export Citation Format

Share Document