scholarly journals GenMap: Fast and Exact Computation of Genome Mappability

2019 ◽  
Author(s):  
Christopher Pockrandt ◽  
Mai Alzamel ◽  
Costas S. Iliopoulos ◽  
Knut Reinert

AbstractWe present a fast and exact algorithm to compute the (k, e)-mappability. Its inverse, the (k, e)-frequency counts the number of occurrences of each k-mer with up to e errors in a sequence. The algorithm we present is a magnitude faster than the algorithm in the widely used GEM suite while not relying on heuristics, and can even compute the mappability for short k-mers on highly repetitive plant genomes. We also show that mappability can be computed on multiple sequences to identify marker genes illustrated by the example of E. coli strains. GenMap allows exporting the mappability information into different formats such as raw output, wig and bed files. The application and its C++ source code is available on https://github.com/cpockrandt/genmap.

2020 ◽  
Author(s):  
Mikko Rautiainen ◽  
Tobias Marschall

MotivationDe Bruijn graphs can be constructed from short reads efficiently and have been used for many purposes. Traditionally long read sequencing technologies have had too high error rates for de Bruijn graph-based methods. Recently, HiFi reads have provided a combination of long read length and low error rate, which enables de Bruijn graphs to be used with HiFi reads.ResultsWe have implemented MBG, a tool for building sparse de Bruijn graphs from HiFi reads. MBG outperforms existing tools for building dense de Bruijn graphs, and can build a graph of 50x coverage whole human genome HiFi reads in four hours on a single core. MBG also assembles the bacterial E. coli genome into a single contig in 8 seconds.AvailabilityPackage manager: https://anaconda.org/bioconda/mbg and source code: https://github.com/maickrau/MBG


2021 ◽  
Vol 7 (7) ◽  
Author(s):  
Dongyun Jung ◽  
Soyoun Park ◽  
Janina Ruffini ◽  
Forest Dussault ◽  
Simon Dufour ◽  
...  

Escherichia coli is a major causative agent of environmental bovine mastitis and this disease causes significant economic losses for the dairy industry. There is still debate in the literature as to whether mammary pathogenic E. coli (MPEC) is indeed a unique E. coli pathotype, or whether this infection is merely an opportunistic infection caused by any E. coli isolate being displaced from the bovine gastrointestinal tract to the environment and, then, into the udder. In this study, we conducted a thorough genomic analysis of 113 novel MPEC isolates from clinical mastitis cases and 100 bovine commensal E. coli isolates. A phylogenomic analysis indicated that MPEC and commensal E. coli isolates formed clades based on common sequence types and O antigens, but did not cluster based on mammary pathogenicity. A comparative genomic analysis of MPEC and commensal isolates led to the identification of nine genes that were part of either the core or the soft-core MPEC genome, but were not found in any bovine commensal isolates. These apparent MPEC marker genes were genes involved with nutrient intake and metabolism [adeQ, adenine permease; nifJ, pyruvate-flavodoxin oxidoreductase; and yhjX, putative major facilitator superfamily (MFS)-type transporter], included fitness and virulence factors commonly seen in uropathogenic E. coli (pqqL, zinc metallopeptidase, and fdeC, intimin-like adhesin, respectively), and putative proteins [yfiE, uncharacterized helix-turn-helix-type transcriptional activator; ygjI, putative inner membrane transporter; and ygjJ, putative periplasmic protein]. Further characterization of these highly conserved MPEC genes may be critical to understanding the pathobiology of MPEC.


Microbiology ◽  
2021 ◽  
Vol 167 (3) ◽  
Author(s):  
Sathi Mallick ◽  
Shanti Kiran ◽  
Tapas Kumar Maiti ◽  
Anindya S. Ghosh

Escherichia coli low-molecular-mass (LMM) Penicillin-binding proteins (PBPs) help in hydrolysing the peptidoglycan fragments from their cell wall and recycling them back into the growing peptidoglycan matrix, in addition to their reported involvement in biofilm formation. Biofilms are external slime layers of extra-polymeric substances that sessile bacterial cells secrete to form a habitable niche for themselves. Here, we hypothesize the involvement of Escherichia coli LMM PBPs in regulating the nature of exopolysaccharides (EPS) prevailing in its extra-polymeric substances during biofilm formation. Therefore, this study includes the assessment of physiological characteristics of E. coli CS109 LMM PBP deletion mutants to address biofilm formation abilities, viability and surface adhesion. Finally, EPS from parent CS109 and its ΔPBP4 and ΔPBP5 mutants were purified and analysed for sugars present. Deletions of LMM PBP reduced biofilm formation, bacterial adhesion and their viability in biofilms. Deletions also diminished EPS production by ΔPBP4 and ΔPBP5 mutants, purification of which suggested an increased overall negative charge compared with their parent. Also, EPS analyses from both mutants revealed the appearance of an unusual sugar, xylose, that was absent in CS109. Accordingly, the reason for reduced biofilm formation in LMM PBP mutants may be speculated as the subsequent production of xylitol and a hindrance in the standard flow of the pentose phosphate pathway.


2020 ◽  
Author(s):  
Xun Zhu ◽  
Ti-Cheng Chang ◽  
Richard Webby ◽  
Gang Wu

AbstractidCOV is a phylogenetic pipeline for quickly identifying the clades of SARS-CoV-2 virus isolates from raw sequencing data based on a selected clade-defining marker list. Using a public dataset, we show that idCOV can make equivalent calls as annotated by Nextstrain.org on all three common clade systems using user uploaded FastQ files directly. Web and equivalent command-line interfaces are available. It can be deployed on any Linux environment, including personal computer, HPC and the cloud. The source code is available at https://github.com/xz-stjude/idcov. A documentation for installation can be found at https://github.com/xz-stjude/idcov/blob/master/README.md.


2018 ◽  
Author(s):  
Sandra Y. Wotzka ◽  
Markus Kreuzer ◽  
Lisa Maier ◽  
Mirjam Zünd ◽  
Markus Schlumberger ◽  
...  

AbstractBackground and aimsLactulose is a common food ingredient and widely used as a treatment for constipation or hepatic encephalopathy and a substrate for hydrogen breath tests. Lactulose is fermented by the colon microbiota resulting in the production of hydrogen (H2). H2is a substrate for enteropathogens includingSalmonellaTyphimurium (S. Typhimurium) and increased H2production upon lactulose ingestion might favor the growth of H2-consuming enteropathogens. We aimed to analyze effects of single-dose lactulose ingestion on the growth of intrinsicEscherichia coli(E. coli), which can be efficiently quantified by plating and which share most metabolic requirements withS. Typhimurium.Methods32 healthy volunteers (18 females, 14 males) were recruited. Participants were randomized for single-dose ingestion of 50 g lactulose or 50 g sucrose (controls). After ingestion, H2in expiratory air and symptoms were recorded. Stool samples were acquired at days −1, 1 and 14. We analyzed 16S microbiota composition and abundance and characteristics ofE.coliisolates.ResultsLactulose ingestion resulted in diarrhea in 14/17 individuals. In 14/17 individuals, H2-levels in expiratory air increased by ≥20 ppm within 3 hours after lactulose challenge. H2-levels correlated with the number of defecations within 6 hours.E. coliwas detectable in feces of all subjects (2 x 102- 109CFU/g). However, the number ofE.colicolony forming units (CFU) on selective media did not differ between any time point before or after challenge with sucrose or lactulose. The microbiota composition also remained stable upon lactulose exposure.ConclusionIngestion of a single dose of 50 g lactulose does not significantly alterE.colidensity in stool samples of healthy volunteers. 50 g lactulose therefore seems unlikely to sufficiently alter growth conditions in the intestine for a significant predisposition to infection with H2-consuming enteropathogens such asS. Typhimurium (www.clinicaltrials.govNCT02397512).


2020 ◽  
Author(s):  
N Goonasekera ◽  
A Mahmoud ◽  
J Chilton ◽  
E Afgan

AbstractSummaryThe existence of more than 100 public Galaxy servers with service quotas is indicative of the need for an increased availability of compute resources for Galaxy to use. The GalaxyCloudRunner enables a Galaxy server to easily expand its available compute capacity by sending user jobs to cloud resources. User jobs are routed to the acquired resources based on a set of configurable rules and the resources can be dynamically acquired from any of 4 popular cloud providers (AWS, Azure, GCP, or OpenStack) in an automated fashion.Availability and implementationGalaxyCloudRunner is implemented in Python and leverages Docker containers. The source code is MIT licensed and available at https://github.com/cloudve/galaxycloudrunner. The documentation is available at http://gcr.cloudve.org/.ContactEnis Afgan ([email protected])Supplementary informationNone


Microbiology ◽  
2021 ◽  
Vol 167 (10) ◽  
Author(s):  
James P. R. Connolly ◽  
Natasha C. A. Turner ◽  
Jennifer C. Hallam ◽  
Patricia T. Rimbi ◽  
Tom Flett ◽  
...  

Appropriate interpretation of environmental signals facilitates niche specificity in pathogenic bacteria. However, the responses of niche-specific pathogens to common host signals are poorly understood. d-Serine (d-ser) is a toxic metabolite present in highly variable concentrations at different colonization sites within the human host that we previously found is capable of inducing changes in gene expression. In this study, we made the striking observation that the global transcriptional response of three Escherichia coli pathotypes – enterohaemorrhagic E. coli (EHEC), uropathogenic E. coli (UPEC) and neonatal meningitis-associated E. coli (NMEC) – to d-ser was highly distinct. In fact, we identified no single differentially expressed gene common to all three strains. We observed the induction of ribosome-associated genes in extraintestinal pathogens UPEC and NMEC only, and the induction of purine metabolism genes in gut-restricted EHEC, and UPEC indicating distinct transcriptional responses to a common signal. UPEC and NMEC encode dsdCXA – a genetic locus required for detoxification and hence normal growth in the presence of d-ser. Specific transcriptional responses were induced in strains accumulating d-ser (WT EHEC and UPEC/NMEC mutants lacking the d-ser-responsive transcriptional activator DsdC), corroborating the notion that d-ser is an unfavourable metabolite if not metabolized. Importantly, many of the UPEC-associated transcriptome alterations correlate with published data on the urinary transcriptome, supporting the hypothesis that d-ser sensing forms a key part of urinary niche adaptation in this pathotype. Collectively, our results demonstrate distinct pleiotropic responses to a common metabolite in diverse E. coli pathotypes, with important implications for niche selectivity.


2019 ◽  
Author(s):  
Kokulapalan Wimalanathan ◽  
Carolyn J. Lawrence-Dill

AbstractAnnotating gene structures and functions to genome assemblies is a must to make assembly resources useful for biological inference. Gene Ontology (GO) term assignment is the most pervasively used functional annotation system, and new methods for GO assignment have improved the quality of GO-based function predictions. GOMAP, the Gene Ontology Meta Annotator for Plants (GOMAP) is an optimized, high-throughput, and reproducible pipeline for genome-scale GO annotation for plant genomes. GOMAP’s methods have been shown to expand and improve the number of genes annotated and annotations assigned per gene as well as the quality (based on F-score) of GO assignments in maize. Here we report on the pipeline’s availability and performance for annotating large, repetitive plant genomes and describe how to deploy GOMAP to annotate additional plant genomes. We containerized GOMAP to increase portability and reproducibility, and optimized its performance for HPC environments. GOMAP has been used to annotate multiple maize lines, and is currently being deployed to annotate other species including wheat, rice, barley, cotton, soy, and others. Instructions along with access to the GOMAP Singularity container are freely available online at https://gomap-singularity.readthedocs.io/en/latest/. A list of annotated genomes and links to data is maintained at https://dill-picl.org/projects/gomap/gomap-datasets/.


Microbiology ◽  
2021 ◽  
Vol 167 (9) ◽  
Author(s):  
Nirbhay Singh ◽  
Anu Chauhan ◽  
Ram Kumar ◽  
Sudheer Kumar Singh

Branched-chain amino acids (BCAAs) are essential amino acids, but their biosynthetic pathway is absent in mammals. Ketol-acid reductoisomerase (IlvC) is a BCAA biosynthetic enzyme that is coded by Rv3001c in Mycobacterium tuberculosis H37Rv (Mtb-Rv) and MRA_3031 in M. tuberculosis H37Ra (Mtb-Ra). IlvCs are essential in Mtb-Rv as well as in Escherichia coli . Compared to wild-type and IlvC-complemented Mtb-Ra strains, IlvC knockdown strain showed reduced survival at low pH and under low pH+starvation stress conditions. Further, increased expression of IlvC was observed under low pH and starvation stress conditions. Confirmation of a role for IlvC in pH and starvation stress was achieved by developing E. coli BL21(DE3) IlvC knockout, which was defective for growth in M9 minimal medium, but growth could be rescued by isoleucine and valine supplementation. Growth was also restored by complementing with over-expressing constructs of Mtb-Ra and E. coli IlvCs. The E. coli knockout also had a survival deficit at pH=5.5 and 4.5 and was more susceptible to killing at pH=3.0. The biochemical characterization of Mtb-Ra and E. coli IlvCs confirmed that both have NADPH-dependent activity. In conclusion, this study demonstrates the functional complementation of E. coli IlvC by Mtb-Ra IlvC and also suggests that IlvC has a role in tolerance to low pH and starvation stress.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Marius Welzel ◽  
Anja Lange ◽  
Dominik Heider ◽  
Michael Schwarz ◽  
Bernd Freisleben ◽  
...  

Abstract Background Sequencing of marker genes amplified from environmental samples, known as amplicon sequencing, allows us to resolve some of the hidden diversity and elucidate evolutionary relationships and ecological processes among complex microbial communities. The analysis of large numbers of samples at high sequencing depths generated by high throughput sequencing technologies requires efficient, flexible, and reproducible bioinformatics pipelines. Only a few existing workflows can be run in a user-friendly, scalable, and reproducible manner on different computing devices using an efficient workflow management system. Results We present Natrix, an open-source bioinformatics workflow for preprocessing raw amplicon sequencing data. The workflow contains all analysis steps from quality assessment, read assembly, dereplication, chimera detection, split-sample merging, sequence representative assignment (OTUs or ASVs) to the taxonomic assignment of sequence representatives. The workflow is written using Snakemake, a workflow management engine for developing data analysis workflows. In addition, Conda is used for version control. Thus, Snakemake ensures reproducibility and Conda offers version control of the utilized programs. The encapsulation of rules and their dependencies support hassle-free sharing of rules between workflows and easy adaptation and extension of existing workflows. Natrix is freely available on GitHub (https://github.com/MW55/Natrix) or as a Docker container on DockerHub (https://hub.docker.com/r/mw55/natrix). Conclusion Natrix is a user-friendly and highly extensible workflow for processing Illumina amplicon data.


Sign in / Sign up

Export Citation Format

Share Document