scholarly journals CeMbio - The C. elegans microbiome resource

2020 ◽  
Author(s):  
Philipp Dirksen ◽  
Adrien Assié ◽  
Johannes Zimmermann ◽  
Fan Zhang ◽  
Adina-Malin Tietje ◽  
...  

ABSTRACTThe study of microbiomes by sequencing has revealed a plethora of correlations between microbial community composition and various life-history characteristics of the corresponding host species. However, inferring causation from correlation is often hampered by the sheer compositional complexity of microbiomes, even in simple organisms. Synthetic communities offer an effective approach to infer cause-effect relationships in host-microbiome systems. Yet the available communities suffer from several drawbacks, such as artificial (thus non-natural) choice of microbes, microbe-host mismatch (e.g. human microbes in gnotobiotic mice), or hosts lacking genetic tractability. Here we introduce CeMbio, a simplified natural Caenorhabditis elegans microbiota derived from our previous meta-analysis of the natural microbiome of this nematode. The CeMbio resource is amenable to all strengths of the C. elegans model system, strains included are readily culturable, they all colonize the worm gut individually, and comprise a robust community that distinctly affects nematode life-history. Several tools have additionally been developed for the CeMbio strains, including diagnostic PCR primers, completely sequenced genomes, and metabolic network models. With CeMbio, we provide a versatile resource and toolbox for the in-depth dissection of naturally relevant host-microbiome interactions in C. elegans.Dataset accession numbersWhole genome sequencing data (PRJNA624308); microbiome sequencing [PRJEB37101 and PRJEB37035]; data supplement on the GSA Figshare Portal.

2020 ◽  
Vol 10 (9) ◽  
pp. 3025-3039 ◽  
Author(s):  
Philipp Dirksen ◽  
Adrien Assié ◽  
Johannes Zimmermann ◽  
Fan Zhang ◽  
Adina-Malin Tietje ◽  
...  

Abstract The study of microbiomes by sequencing has revealed a plethora of correlations between microbial community composition and various life-history characteristics of the corresponding host species. However, inferring causation from correlation is often hampered by the sheer compositional complexity of microbiomes, even in simple organisms. Synthetic communities offer an effective approach to infer cause-effect relationships in host-microbiome systems. Yet the available communities suffer from several drawbacks, such as artificial (thus non-natural) choice of microbes, microbe-host mismatch (e.g., human microbes in gnotobiotic mice), or hosts lacking genetic tractability. Here we introduce CeMbio, a simplified natural Caenorhabditis elegans microbiota derived from our previous meta-analysis of the natural microbiome of this nematode. The CeMbio resource is amenable to all strengths of the C. elegans model system, strains included are readily culturable, they all colonize the worm gut individually, and comprise a robust community that distinctly affects nematode life-history. Several tools have additionally been developed for the CeMbio strains, including diagnostic PCR primers, completely sequenced genomes, and metabolic network models. With CeMbio, we provide a versatile resource and toolbox for the in-depth dissection of naturally relevant host-microbiome interactions in C. elegans.


2021 ◽  
Author(s):  
Guhan Ram Venkataraman ◽  
Yosuke Tanigawa ◽  
Matti Pirinen ◽  
Manuel A Rivas

Rare-variant aggregate analysis from exome and whole genome sequencing data typically summarizes with a single statistic the signal for a gene or the unit that is being aggre- gated. However, when doing so, the effect profile within the unit may not be easily characterized across one or multiple phenotypes. Here, we present an approach we call Multiple Rare-Variants and Phenotypes Mixture Model (MRPMM), which clusters rare variants into groups based on their effects on the multivariate phenotype and makes statistical inferences about the properties of the underlying mixture of genetic effects. Using summary statistic data from a meta-analysis of exome sequencing data of 184,698 individuals in the UK Biobank across 6 populations, we demonstrate that our mixture model can identify clusters of variants responsible for significantly disparate effects across a multivariate phenotype; we study three lipid and three renal traits separately. The method is able to estimate (1) the proportion of non-null variants, (2) whether variants with the same predicted consequence in one gene behave similarly, (3) whether variants across genes share effect profiles across the multivariate phenotype, and (4) whether different annotations differ in the magnitude of their effects. As rare-variant data and aggregation techniques become more common, this method can be used to ascribe further meaning to association results.


2021 ◽  
Vol 5 (Supplement_2) ◽  
pp. 952-952
Author(s):  
Kenneth Westerman ◽  
Maura Walker ◽  
Jordi Merino ◽  
Alisa Manning

Abstract Objectives Identification of robust gene-diet interactions impacting cardiometabolic traits has been limited due to low statistical power and poor replication across populations. Emerging statistical methods increase power by simultaneously testing genetic interactions with multiple exposures, an especially appealing strategy for complex and highly-correlated dietary traits. Furthermore, meta-analysis across ancestrally and behaviorally diverse populations may allow for more robust discoveries. Here, our objective was to leverage multi-exposure interaction tests in a diverse set of cohorts to identify interactions involving macronutrient ratios and impacting multiple biomarkers of glycemia. Methods Seven cohorts from the TOPMed consortium (total N ∼ 20,000) contributed whole-genome sequencing data, self-reported dietary data, and glycemic trait measurements (fasting glucose and insulin (FG and FI) and hemoglobin A1c). Three macronutrient ratios were defined to model realistic dietary exchanges and minimize collinearity: carbohydrate: fat, polyunsaturated: saturated fat, and fiber: carbohydrate. For each glycemic trait outcome and common genetic variant, we fit a model including all three ratios and their genetic interactions, with joint significance testing in each cohort followed by cross-cohort meta-analysis. Results Four variants showed promising sub-threshold signals (p < 1 × 10−7), though none reached genome-wide significance. For example, diet ratios collectively interacted with rs2276620 in the oxysterol-binding OSBPL6 gene to influence FG (p = 6.5 × 10−8) and rs114448070 in the beta cell function-associated KAT2B gene (p = 5.4 × 10−8) to influence FI. These associations were contributed to by multiple cohorts and multiple dietary factors, emphasizing the value of this multi-cohort and multi-exposure approach to GDI variant detection. Conclusions Our approach takes advantage of population diversity and multi-faceted dietary exposures to understand genetic effects on the diet-glycemia relationship and support the development of genome-based precision nutrition. Our results will be updated with additional cohorts and gene-based tests using rare genetic variants, which are less common but may have stronger effects. Funding Sources KEW is supported by an NIDDK T32 award.


Author(s):  
Eric S Tvedte ◽  
Mark Gasser ◽  
Benjamin C Sparklin ◽  
Jane Michalski ◽  
Carl E Hjelmen ◽  
...  

Abstract The newest generation of DNA sequencing technology is highlighted by the ability to generate sequence reads hundreds of kilobases in length. Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) have pioneered competitive long read platforms, with more recent work focused on improving sequencing throughput and per-base accuracy. We used whole-genome sequencing data produced by three PacBio protocols (Sequel II CLR, Sequel II HiFi, RS II) and two ONT protocols (Rapid Sequencing and Ligation Sequencing) to compare assemblies of the bacteria Escherichia coli and the fruit fly Drosophila ananassae. In both organisms tested, Sequel II assemblies had the highest consensus accuracy, even after accounting for differences in sequencing throughput. ONT and PacBio CLR had the longest reads sequenced compared to PacBio RS II and HiFi, and genome contiguity was highest when assembling these datasets. ONT Rapid Sequencing libraries had the fewest chimeric reads in addition to superior quantification of E. coli plasmids versus ligation-based libraries. The quality of assemblies can be enhanced by adopting hybrid approaches using Illumina libraries for bacterial genome assembly or polishing eukaryotic genome assemblies, and an ONT-Illumina hybrid approach would be more cost-effective for many users. Genome-wide DNA methylation could be detected using both technologies, however ONT libraries enabled the identification of a broader range of known E. coli methyltransferase recognition motifs in addition to undocumented D. ananassae motifs. The ideal choice of long read technology may depend on several factors including the question or hypothesis under examination. No single technology outperformed others in all metrics examined.


Author(s):  
Johanna L. Jones ◽  
Mark A. Corbett ◽  
Elise Yeaman ◽  
Duran Zhao ◽  
Jozef Gecz ◽  
...  

AbstractInherited paediatric cataract is a rare Mendelian disease that results in visual impairment or blindness due to a clouding of the eye’s crystalline lens. Here we report an Australian family with isolated paediatric cataract, which we had previously mapped to Xq24. Linkage at Xq24–25 (LOD = 2.53) was confirmed, and the region refined with a denser marker map. In addition, two autosomal regions with suggestive evidence of linkage were observed. A segregating 127 kb deletion (chrX:g.118373226_118500408del) in the Xq24–25 linkage region was identified from whole-genome sequencing data. This deletion completely removed a commonly deleted long non-coding RNA gene LOC101928336 and truncated the protein coding progesterone receptor membrane component 1 (PGRMC1) gene following exon 1. A literature search revealed a report of two unrelated males with non-syndromic intellectual disability, as well as congenital cataract, who had contiguous gene deletions that accounted for their intellectual disability but also disrupted the PGRMC1 gene. A morpholino-induced pgrmc1 knockdown in a zebrafish model produced significant cataract formation, supporting a role for PGRMC1 in lens development and cataract formation. We hypothesise that the loss of PGRMC1 causes cataract through disrupted PGRMC1-CYP51A1 protein–protein interactions and altered cholesterol biosynthesis. The cause of paediatric cataract in this family is the truncating deletion of PGRMC1, which we report as a novel cataract gene.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Zhongbo Chen ◽  
◽  
David Zhang ◽  
Regina H. Reynolds ◽  
Emil K. Gustavsson ◽  
...  

AbstractKnowledge of genomic features specific to the human lineage may provide insights into brain-related diseases. We leverage high-depth whole genome sequencing data to generate a combined annotation identifying regions simultaneously depleted for genetic variation (constrained regions) and poorly conserved across primates. We propose that these constrained, non-conserved regions (CNCRs) have been subject to human-specific purifying selection and are enriched for brain-specific elements. We find that CNCRs are depleted from protein-coding genes but enriched within lncRNAs. We demonstrate that per-SNP heritability of a range of brain-relevant phenotypes are enriched within CNCRs. We find that genes implicated in neurological diseases have high CNCR density, including APOE, highlighting an unannotated intron-3 retention event. Using human brain RNA-sequencing data, we show the intron-3-retaining transcript to be more abundant in Alzheimer’s disease with more severe tau and amyloid pathological burden. Thus, we demonstrate potential association of human-lineage-specific sequences in brain development and neurological disease.


Sign in / Sign up

Export Citation Format

Share Document