scholarly journals Integrative analysis of rare variants and pathway information shows convergent results between immune pathways, drug targets and epilepsy genes

2018 ◽  
Author(s):  
Hoang T. Nguyen ◽  
Amanda Dobbyn ◽  
Alexander W. Charney ◽  
Julien Bryois ◽  
April Kim ◽  
...  

AbstractTrio family and case-control studies of next-generation sequencing data have proven integral to understanding the contribution of rare inherited and de novo single-nucleotide variants to the genetic architecture of complex disease. Ideally, such studies should identify individual risk genes of moderate to large effect size to generate novel treatment hypotheses for further follow-up. However, due to insufficient power, gene set enrichment analyses have come to be relied upon for detecting differences between cases and controls, implicating sets of hundreds of genes rather than specific targets for further investigation. Here, we present a Bayesian statistical framework, termed gTADA, that integrates gene-set membership information with gene-level de novo and rare inherited case-control counts, to prioritize risk genes with excess rare variant burden within enriched gene sets. Applying gTADA to available whole-exome sequencing datasets for several neuropsychiatric conditions, we replicated previously reported gene set enrichments and identified novel risk genes. For epilepsy, gTADA prioritized 40 risk genes (posterior probabilities > 0.95), 6 of which replicate in an independent whole-genome sequencing study. In addition, 30/40 genes are novel genes. We found that epilepsy genes had high protein-protein interaction (PPI) network connectivity, and show specific expression during human brain development. Some of the top prioritized EPI genes were connected to a PPI subnetwork of immune genes and show specific expression in prenatal microglia. We also identified multiple enriched drug-target gene sets for EPI which included immunostimulants as well as known antiepileptics. Immune biology was supported specifically by case-control variants from familial epilepsies rather than do novo mutations in generalized encephalitic epilepsy.

2020 ◽  
Author(s):  
Roozbeh Manshaei ◽  
Daniele Merico ◽  
Miriam S. Reuter ◽  
Worrawat Engchuan ◽  
Bahareh A. Mojarad ◽  
...  

AbstractRecent genome-wide studies of rare genetic variants have begun to implicate novel mechanisms for tetralogy of Fallot (TOF), a severe congenital heart defect (CHD).To provide statistical support for case-only data without parental genomes, we re-analyzed genome sequences of 231 individuals with TOF or related CHD. We adapted a burden test originally developed for de novo variants to assess singleton variant burden in individual genes, and in gene-sets corresponding to functional pathways and mouse phenotypes, accounting for highly correlated gene-sets, and for multiple testing.The gene burden test identified a significant burden of deleterious missense variants in NOTCH1 (Bonferroni-corrected p-value <0.01). These NOTCH1 variants showed significant enrichment for those affecting the extracellular domain, and especially for disruption of cysteine residues forming disulfide bonds (OR 39.8 vs gnomAD). Individuals with NOTCH1 variants, all with TOF, were enriched for positive family history of CHD. Other genes not previously implicated in TOF had more modest statistical support and singleton missense variant results were non-significant for gene-set burden. For singleton truncating variants, the gene burden test confirmed significant burden in FLT4. Gene-set burden tests identified a cluster of pathways corresponding to VEGF signaling (FDR=0%), and of mouse phenotypes corresponding to abnormal vasculature (FDR=0.8%), that suggested additional candidate genes not previously identified (e.g., WNT5A and ZFAND5). Analyses using unrelated sequencing datasets supported specificity of the findings for CHD.The findings support the importance of ultra-rare variants disrupting genes involved in VEGF and NOTCH signaling in the genetic architecture of TOF. These proof-of-principle data indicate that this statistical methodology could assist in analyzing case-only sequencing data in which ultra-rare variants, whether de novo or inherited, contribute to the genetic etiopathogenesis of a complex disorder.Author summaryWe analyzed the ultra-rare nonsynonymous variant burden for genome sequencing data from 231 individuals with congenital heart defects, most with tetralogy of Fallot. We adapted a burden test originally developed for de novo variants. In line with other studies, we identified a significant truncating variant burden for FLT4 and deleterious missense burden for NOTCH1, both passing a stringent Bonferroni multiple-test correction. For NOTCH1, we observed frequent disruption of cysteine residues establishing disulfide bonds in the extracellular domain. We also identified genes with BH-FDR <10% that were not previously implicated. To overcome limited power for individual genes, we tested gene-sets corresponding to functional pathways and mouse phenotypes. Gene-set burden of truncating variants was significant for vascular endothelial growth factor signaling and abnormal vasculature phenotypes. These results confirmed previous findings and suggested additional candidate genes for experimental validation in future studies. This methodology can be extended to other case-only sequencing data in which ultra-rare variants make a substantial contribution to genetic etiology.


Author(s):  
Tan-Hoang Nguyen ◽  
Xin He ◽  
Ruth C Brown ◽  
Bradley T Webb ◽  
Kenneth S Kendler ◽  
...  

Abstract Motivation: Rare variant-based analyses are beginning to identify risk genes for neuropsychiatric disorders and other diseases. However, the identified genes only account for a fraction of predicted causal genes. Recent studies have shown that rare damaging variants are significantly enriched in specific gene-sets. Methods which are able to jointly model rare variants and gene-sets to identify enriched gene-sets and use these enriched gene-sets to prioritize additional risk genes could improve understanding of the genetic architecture of diseases. Results: We propose DECO (Integrated analysis of de novo mutations, rare case/control variants and omics information via gene-sets), an integrated method for rare-variant and gene-set analysis. The method can (i) test the enrichment of gene-sets directly within the statistical model, and (ii) use enriched gene-sets to rank existing genes and prioritize additional risk genes for tested disorders. In simulations, DECO performs better than a homologous method that uses only variant data. To demonstrate the application of the proposed protocol, we have applied this approach to rare-variant datasets of schizophrenia. Compared with a method which only uses variant information, DECO is able to prioritize additional risk genes. Availability: DECO can be used to analyze rare-variants and biological pathways or cell types for any disease. The package is available on Github https://github.com/hoangtn/DECO.


2021 ◽  
Vol 22 (S10) ◽  
Author(s):  
Zhenmiao Zhang ◽  
Lu Zhang

Abstract Background Due to the complexity of microbial communities, de novo assembly on next generation sequencing data is commonly unable to produce complete microbial genomes. Metagenome assembly binning becomes an essential step that could group the fragmented contigs into clusters to represent microbial genomes based on contigs’ nucleotide compositions and read depths. These features work well on the long contigs, but are not stable for the short ones. Contigs can be linked by sequence overlap (assembly graph) or by the paired-end reads aligned to them (PE graph), where the linked contigs have high chance to be derived from the same clusters. Results We developed METAMVGL, a multi-view graph-based metagenomic contig binning algorithm by integrating both assembly and PE graphs. It could strikingly rescue the short contigs and correct the binning errors from dead ends. METAMVGL learns the two graphs’ weights automatically and predicts the contig labels in a uniform multi-view label propagation framework. In experiments, we observed METAMVGL made use of significantly more high-confidence edges from the combined graph and linked dead ends to the main graph. It also outperformed many state-of-the-art contig binning algorithms, including MaxBin2, MetaBAT2, MyCC, CONCOCT, SolidBin and GraphBin on the metagenomic sequencing data from simulation, two mock communities and Sharon infant fecal samples. Conclusions Our findings demonstrate METAMVGL outstandingly improves the short contig binning and outperforms the other existing contig binning tools on the metagenomic sequencing data from simulation, mock communities and infant fecal samples.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Lidong Guo ◽  
Mengyang Xu ◽  
Wenchao Wang ◽  
Shengqiang Gu ◽  
Xia Zhao ◽  
...  

Abstract Background Synthetic long reads (SLR) with long-range co-barcoding information are now widely applied in genomics research. Although several tools have been developed for each specific SLR technique, a robust standalone scaffolder with high efficiency is warranted for hybrid genome assembly. Results In this work, we developed a standalone scaffolding tool, SLR-superscaffolder, to link together contigs in draft assemblies using co-barcoding and paired-end read information. Our top-to-bottom scheme first builds a global scaffold graph based on Jaccard Similarity to determine the order and orientation of contigs, and then locally improves the scaffolds with the aid of paired-end information. We also exploited a screening algorithm to reduce the negative effect of misassembled contigs in the input assembly. We applied SLR-superscaffolder to a human single tube long fragment read sequencing dataset and increased the scaffold NG50 of its corresponding draft assembly 1349 fold. Moreover, benchmarking on different input contigs showed that this approach overall outperformed existing SLR scaffolders, providing longer contiguity and fewer misassemblies, especially for short contigs assembled by next-generation sequencing data. The open-source code of SLR-superscaffolder is available at https://github.com/BGI-Qingdao/SLR-superscaffolder. Conclusions SLR-superscaffolder can dramatically improve the contiguity of a draft assembly by integrating a hybrid assembly strategy.


2021 ◽  
Author(s):  
Jet van der Spek ◽  
Joery den Hoed ◽  
Lot Snijders Blok ◽  
Alexander J. M. Dingemans ◽  
Dick Schijven ◽  
...  

Interpretation of next-generation sequencing data of individuals with an apparent sporadic neurodevelopmental disorder (NDD) often focusses on pathogenic variants in genes associated with NDD, assuming full clinical penetrance with limited variable expressivity. Consequently, inherited variants in genes associated with dominant disorders may be overlooked when the transmitting parent is clinically unaffected. While de novo variants explain a substantial proportion of cases with NDDs, a significant number remains undiagnosed possibly explained by coding variants associated with reduced penetrance and variable expressivity. We characterized twenty families with inherited heterozygous missense or protein-truncating variants (PTVs) in CHD3, a gene in which de novo variants cause Snijders Blok-Campeau syndrome, characterized by intellectual disability, speech delay and recognizable facial features (SNIBCPS). Notably, the majority of the inherited CHD3 variants were maternally transmitted. Computational facial and human phenotype ontology-based comparisons demonstrated that the phenotypic features of probands with inherited CHD3 variants overlap with the phenotype previously associated with de novo variants in the gene, while carrier parents are mildly or not affected, suggesting variable expressivity. Additionally, similarly reduced expression levels of CHD3 protein in cells of an affected proband and of related healthy carriers with a CHD3 PTV, suggested that compensation of expression from the wildtype allele is unlikely to be an underlying mechanism. Our results point to a significant role of inherited variation in SNIBCPS, a finding that is critical for correct variant interpretation and genetic counseling and warrants further investigation towards understanding the broader contributions of such variation to the landscape of human disease.


Cancers ◽  
2020 ◽  
Vol 12 (5) ◽  
pp. 1159 ◽  
Author(s):  
Franz J. Gassner ◽  
Nadja Zaborsky ◽  
Daniel Feldbacher ◽  
Richard Greil ◽  
Roland Geisberger

Chronic lymphocytic leukemia (CLL) is a high incidence B cell leukemia with a highly variable clinical course, leading to survival times ranging from months to several decades. MicroRNAs (miRNAs) are small non-coding RNAs that regulate the expression levels of genes by binding to the untranslated regions of transcripts. Although miRNAs have been previously shown to play a crucial role in CLL development, progression and treatment resistance, their further processing and diversification by RNA editing (specifically adenosine to inosine or cytosine to uracil deamination) has not been addressed so far. In this study, we analyzed next generation sequencing data to provide a detailed map of adenosine to inosine and cytosine to uracil changes in miRNAs from CLL and normal B cells. Our results reveal that in addition to a CLL-specific expression pattern, there is also specific RNA editing of many miRNAs, particularly miR-3157 and miR-6503, in CLL. Our data draw further light on how miRNAs and miRNA editing might be implicated in the pathogenesis of the disease.


2017 ◽  
Author(s):  
Deidre R. Krupp ◽  
Rebecca A. Barnard ◽  
Yannis Duffourd ◽  
Sara A. Evans ◽  
Ryan M. Mulqueen ◽  
...  

AbstractGenetic risk factors for autism spectrum disorder (ASD) have yet to be fully elucidated. Postzygotic mosaic mutations (PMMs) have been implicated in several neurodevelopmental disorders and overgrowth syndromes. We systematically evaluated PMMs by leveraging whole-exome sequencing data on a large family-based ASD cohort, the Simons Simplex Collection. We found evidence that 11% of published single nucleotide variant (SNV) de novo mutations are potentially PMMs. We then developed a robust SNV PMM calling approach that leverages complementary callers, logistic regression modeling, and additional heuristics. Using this approach, we recalled SNVs and found that 22% of de novo mutations likely occur as PMMs in children. Unexpectedly, we found a significant burden of synonymous PMMs in probands that are predicted to alter splicing. We found no evidence of missense PMM burden in the full cohort. However, we did observe increased signal for missense PMMs in families without germline mutations in probands, which strengthens in genes intolerant to mutations. We also determined that 7-11% of parental mosaics are transmitted to children. Parental mosaic mutations make up 6.8% of all mutations newly germline in children, which has important implications for recurrence risk. PMMs intersect previously implicated high confidence and other ASD candidate risk genes, further suggesting that this class of mutations contribute to ASD risk. We also identified PMMs in novel candidate risk genes involved with chromatin remodeling or neurodevelopment. We estimate that PMMs contribute risk to 4-8% of simplex ASD cases. Overall, these findings argue for future studies of PMMs in ASD and related-disorders.


2019 ◽  
Author(s):  
Soeren Lukassen ◽  
Foo Wei Ten ◽  
Roland Eils ◽  
Christian Conrad

AbstractRecent advances in single-cell RNA sequencing (scRNA-Seq) have driven the simultaneous measurement of the expression of 1,000s of genes in 1,000s of single cells. These growing data sets allow us to model gene sets in biological networks at an unprecedented level of detail, in spite of heterogenous cell populations. Here, we propose an unsupervised deep neural network model that is a hybrid of matrix factorization and conditional variational autoencoders (CVA), which utilizes weights as matrix factorizations to obtain gene sets, while class-specific inputs to the latent variable space facilitate a plausible identification of cell types. This artificial neural network model seamlessly integrates functional gene set inference, experimental batch effect correction, and static gene identification, which we conceptually prove here for three single-cell RNA-Seq datasets and suggest for future single-cell-gene analytics.


2021 ◽  
Author(s):  
Gelana Khazeeva ◽  
Karolis Sablauskas ◽  
Bart van der Sanden ◽  
Wouter Steyaert ◽  
Michael Kwint ◽  
...  

De novo mutations (DNMs) are an important cause of genetic disorders. The accurate identification of DNMs from sequencing data is therefore fundamental to rare disease research and diagnostics. Unfortunately, identifying reliable DNMs remains a major challenge due to sequence errors, uneven coverage, and mapping artifacts. Here, we developed a deep convolutional neural network (CNN) DNM caller (DeNovoCNN), that encodes alignment of sequence reads for a trio as 160×164 resolution images. DeNovoCNN was trained on DNMs of whole exome sequencing (WES) of 2003 trios achieving on average 99.2% recall and 93.8% precision. We find that DeNovoCNN has increased recall/sensitivity and precision compared to existing de novo calling approaches (GATK, DeNovoGear, Samtools) based on the Genome in a Bottle reference dataset. Sanger validations of DNMs called in both exome and genome datasets confirm that DeNovoCNN outperforms existing methods. Most importantly, we show that DeNovoCNN is robust against different exome sequencing and analyses approaches, thereby allowing it to be applied on other datasets. DeNovoCNN is freely available and can be run on existing alignment (BAM/CRAM) and variant calling (VCF) files from WES and WGS without a need for variant recalling.


Sign in / Sign up

Export Citation Format

Share Document