scholarly journals A statistical framework for mapping risk genes from de novo mutations in whole-genome sequencing studies

2016 ◽  
Author(s):  
Yuwen Liu ◽  
Yanyu Liang ◽  
A. Ercument Cicek ◽  
Zhongshan Li ◽  
Jinchen Li ◽  
...  

AbstractAnalysis of de novo mutations (DNMs) from sequencing data of nuclear families has identified risk genes for many complex diseases, including multiple neurodevelopmental and psychiatric disorders. Most of these efforts have focused on mutations in protein-coding sequences. Evidence from genome-wide association studies (GWAS) strongly suggests that variants important to human diseases often lie in non-coding regions. Extending DNM-based approaches to non-coding sequences is, however, challenging because the functional significance of non-coding mutations is difficult to predict. We propose a new statistical framework for analyzing DNMs from whole-genome sequencing (WGS) data. This method, TADA-Annotations (TADA-A), is a major advance of the TADA method we developed earlier for DNM analysis in coding regions. TADA-A is able to incorporate many functional annotations such as conservation and enhancer marks, learn from data which annotations are informative of pathogenic mutations and combine both coding and non-coding mutations at the gene level to detect risk genes. It also supports meta-analysis of multiple DNM studies, while adjusting for study-specific technical effects. We applied TADA-A to WGS data of ∼300 autism family trios across five studies, and discovered several new autism risk genes. The software is freely available for all research uses.

2018 ◽  
Vol 102 (6) ◽  
pp. 1031-1047 ◽  
Author(s):  
Yuwen Liu ◽  
Yanyu Liang ◽  
A. Ercument Cicek ◽  
Zhongshan Li ◽  
Jinchen Li ◽  
...  

2018 ◽  
Author(s):  
Elizabeth K. Ruzzo ◽  
Laura Pérez-Cano ◽  
Jae-Yoon Jung ◽  
Lee-kai Wang ◽  
Dorna Kashef-Haghighi ◽  
...  

AbstractGenetic studies of autism spectrum disorder (ASD) have revealed a complex, heterogeneous architecture, in which the contribution of rare inherited variation remains relatively un-explored. We performed whole-genome sequencing (WGS) in 2,308 individuals from families containing multiple affected children, including analysis of single nucleotide variants (SNV) and structural variants (SV). We identified 16 new ASD-risk genes, including many supported by inherited variation, and provide statistical support for 69 genes in total, including previously implicated genes. These risk genes are enriched in pathways involving negative regulation of synaptic transmission and organelle organization. We identify a significant protein-protein interaction (PPI) network seeded by inherited, predicted damaging variants disrupting highly constrained genes, including members of the BAF complex and established ASD risk genes. Analysis of WGS also identified SVs effecting non-coding regulatory regions in developing human brain, implicating NR3C2 and a recurrent 2.5Kb deletion within the promoter of DLG2. These data lend support to studying multiplex families for identifying inherited risk for ASD. We provide these data through the Hartwell Autism Research and Technology Initiative (iHART), an open access cloud-computing repository for ASD genetics research.


2021 ◽  
Vol 5 (Supplement_1) ◽  
pp. A501-A502
Author(s):  
Michael P Dougherty ◽  
Lynn P Chorich ◽  
Lawrence Clarke Layman

Abstract Introduction: MRKH is a characterized by the congenital absence of the uterus and vagina in 46,XX individuals. A subset of these patients also has associated renal, skeletal, cardiac and/or auditory defects. Familial cases suggest a genetic component, but to date only pathogenic variants in WNT4 and HNF1B have been confirmed. We hypothesize that de novo heterozygous variants in candidate genes will be present in some patients with MRKH. Methods: DNAs from 30 quads (an MRKH proband and three relatives) were subjected to whole genome sequencing (WGS), and heterozygous variants in coding regions with < 0.02 frequency were filtered by two different methods. In the first approach, variants were filtered by 1) top consequence variant (splice site, stop-gain, frameshift, and missense, respectively); 2) impact score; 3) mapping quality; 4) cytobands; 5) intolerance; 6) de novo variants; and 7) plausibility based on familial genotype. The second approach considered only heterozygous variants found in the proband and absent in all other family members, which were then filtered by top consequence (splice donor and acceptor sites, stop-gain, frameshift). Results: Five pedigrees were excluded for inadequate sequence in one or more individuals. 55,033 variants in coding regions with < 2% frequency were identified in the 25 remaining quads for analysis. Using the first approach, 42 candidate gene variants in 32 genes were identified - 12 splice variants, 10 stop-gains, 15 frameshift variants and 5 missense variants. Of these, MUC22 contained 2 missense variants from different families. Additionally, DICER1 had multiple splice variants and is essential for mouse urogenital tract development. In the second approach, 39 candidate genes were identified—6 splice variants in 6 genes, 18 stop-gains in 17 genes, and 17 frameshift variants in 16 genes. Zinc finger genes (ZNF418, ZNF646, ZNF135, and ZNF772) comprised the most frequent class of the 39 genes. Two genes (MIR4436A and ZNF418) contained attractive variants in two different families. Conclusion: WGS has been shown to improve detection of gene variants in coding regions, more so than whole exome sequencing (WES). We previously performed WES on 111 MRKH probands without family members and analyzed variants in candidate genes suggested by mouse and preliminary human studies. Interestingly, in this study, only three genes overlapped with previously suspected candidate genes. Here, we identified new candidates based upon potential deleteriousness. These candidate genes will be studied further in our families to determine their role in Mullerian development.


2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Brent S. Pedersen ◽  
Joe M. Brown ◽  
Harriet Dashnow ◽  
Amelia D. Wallace ◽  
Matt Velinder ◽  
...  

AbstractIn studies of families with rare disease, it is common to screen for de novo mutations, as well as recessive or dominant variants that explain the phenotype. However, the filtering strategies and software used to prioritize high-confidence variants vary from study to study. In an effort to establish recommendations for rare disease research, we explore effective guidelines for variant (SNP and INDEL) filtering and report the expected number of candidates for de novo dominant, recessive, and autosomal dominant modes of inheritance. We derived these guidelines using two large family-based cohorts that underwent whole-genome sequencing, as well as two family cohorts with whole-exome sequencing. The filters are applied to common attributes, including genotype-quality, sequencing depth, allele balance, and population allele frequency. The resulting guidelines yield ~10 candidate SNP and INDEL variants per exome, and 18 per genome for recessive and de novo dominant modes of inheritance, with substantially more candidates for autosomal dominant inheritance. For family-based, whole-genome sequencing studies, this number includes an average of three de novo, ten compound heterozygous, one autosomal recessive, four X-linked variants, and roughly 100 candidate variants following autosomal dominant inheritance. The slivar software we developed to establish and rapidly apply these filters to VCF files is available at https://github.com/brentp/slivar under an MIT license, and includes documentation and recommendations for best practices for rare disease analysis.


2018 ◽  
Vol 8 (1) ◽  
Author(s):  
Gabriel Costa Monteiro Moreira ◽  
Clarissa Boschiero ◽  
Aline Silva Mello Cesar ◽  
James M. Reecy ◽  
Thaís Fernanda Godoy ◽  
...  

PLoS ONE ◽  
2021 ◽  
Vol 16 (6) ◽  
pp. e0253440
Author(s):  
Samantha Gunasekera ◽  
Sam Abraham ◽  
Marc Stegger ◽  
Stanley Pang ◽  
Penghao Wang ◽  
...  

Whole-genome sequencing is essential to many facets of infectious disease research. However, technical limitations such as bias in coverage and tagmentation, and difficulties characterising genomic regions with extreme GC content have created significant obstacles in its use. Illumina has claimed that the recently released DNA Prep library preparation kit, formerly known as Nextera Flex, overcomes some of these limitations. This study aimed to assess bias in coverage, tagmentation, GC content, average fragment size distribution, and de novo assembly quality using both the Nextera XT and DNA Prep kits from Illumina. When performing whole-genome sequencing on Escherichia coli and where coverage bias is the main concern, the DNA Prep kit may provide higher quality results; though de novo assembly quality, tagmentation bias and GC content related bias are unlikely to improve. Based on these results, laboratories with existing workflows based on Nextera XT would see minor benefits in transitioning to the DNA Prep kit if they were primarily studying organisms with neutral GC content.


2020 ◽  
Vol 29 (6) ◽  
pp. 967-979 ◽  
Author(s):  
Revital Bronstein ◽  
Elizabeth E Capowski ◽  
Sudeep Mehrotra ◽  
Alex D Jansen ◽  
Daniel Navarro-Gomez ◽  
...  

Abstract Inherited retinal degenerations (IRDs) are at the focus of current genetic therapeutic advancements. For a genetic treatment such as gene therapy to be successful, an accurate genetic diagnostic is required. Genetic diagnostics relies on the assessment of the probability that a given DNA variant is pathogenic. Non-coding variants present a unique challenge for such assessments as compared to coding variants. For one, non-coding variants are present at much higher number in the genome than coding variants. In addition, our understanding of the rules that govern the non-coding regions of the genome is less complete than our understanding of the coding regions. Methods that allow for both the identification of candidate non-coding pathogenic variants and their functional validation may help overcome these caveats allowing for a greater number of patients to benefit from advancements in genetic therapeutics. We present here an unbiased approach combining whole genome sequencing (WGS) with patient-induced pluripotent stem cell (iPSC)-derived retinal organoids (ROs) transcriptome analysis. With this approach, we identified and functionally validated a novel pathogenic non-coding variant in a small family with a previously unresolved genetic diagnosis.


2020 ◽  
Vol 29 (1) ◽  
pp. 184-193 ◽  
Author(s):  
Jonas Carlsson Almlöf ◽  
Sara Nystedt ◽  
Aikaterini Mechtidou ◽  
Dag Leonard ◽  
Maija-Leena Eloranta ◽  
...  

AbstractBy performing whole-genome sequencing in a Swedish cohort of 71 parent-offspring trios, in which the child in each family is affected by systemic lupus erythematosus (SLE, OMIM 152700), we investigated the contribution of de novo variants to risk of SLE. We found de novo single nucleotide variants (SNVs) to be significantly enriched in gene promoters in SLE patients compared with healthy controls at a level corresponding to 26 de novo promoter SNVs more in each patient than expected. We identified 12 de novo SNVs in promoter regions of genes that have been previously implicated in SLE, or that have functions that could be of relevance to SLE. Furthermore, we detected three missense de novo SNVs, five de novo insertion-deletions, and three de novo structural variants with potential to affect the expression of genes that are relevant for SLE. Based on enrichment analysis, disease-affecting de novo SNVs are expected to occur in one-third of SLE patients. This study shows that de novo variants in promoters commonly contribute to the genetic risk of SLE. The fact that de novo SNVs in SLE were enriched to promoter regions highlights the importance of using whole-genome sequencing for identification of de novo variants.


Sign in / Sign up

Export Citation Format

Share Document