scholarly journals Effective variant filtering and expected candidate variant yield in studies of rare human disease

Author(s):  
Brent S. Pedersen ◽  
Joe M. Brown ◽  
Harriet Dashnow ◽  
Amelia D. Wallace ◽  
Matt Velinder ◽  
...  

ABSTRACTIn studies of families with rare disease, it is common to screen for de novo mutations, as well as recessive or dominant variants that explain the phenotype. However, the filtering strategies and software used to prioritize high-confidence variants vary from study to study. In an effort to establish recommendations for rare disease research, we derive effective guidelines for variant filtering and report the expected number of candidates for de novo dominant and recessive modes of inheritance. The filters are applied to common attributes, including genotype quality, sequencing depth, allele balance, and population allele frequency. The resulting guidelines yield approximately 10 candidate SNP and INDEL variants per exome, and 19 per genome. For whole genomes, this includes an average of three de novo, ten compound-heterozygotes, one autosomal recessive, four X-linked variants, and roughly 100 candidate variants following autosomal dominant inheritance. The slivar software we developed to establish and rapidly apply these filters to VCF files is available at https://github.com/brentp/slivar under an MIT license, and includes documentation and recommendations for best practices for rare disease analysis.

2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Brent S. Pedersen ◽  
Joe M. Brown ◽  
Harriet Dashnow ◽  
Amelia D. Wallace ◽  
Matt Velinder ◽  
...  

AbstractIn studies of families with rare disease, it is common to screen for de novo mutations, as well as recessive or dominant variants that explain the phenotype. However, the filtering strategies and software used to prioritize high-confidence variants vary from study to study. In an effort to establish recommendations for rare disease research, we explore effective guidelines for variant (SNP and INDEL) filtering and report the expected number of candidates for de novo dominant, recessive, and autosomal dominant modes of inheritance. We derived these guidelines using two large family-based cohorts that underwent whole-genome sequencing, as well as two family cohorts with whole-exome sequencing. The filters are applied to common attributes, including genotype-quality, sequencing depth, allele balance, and population allele frequency. The resulting guidelines yield ~10 candidate SNP and INDEL variants per exome, and 18 per genome for recessive and de novo dominant modes of inheritance, with substantially more candidates for autosomal dominant inheritance. For family-based, whole-genome sequencing studies, this number includes an average of three de novo, ten compound heterozygous, one autosomal recessive, four X-linked variants, and roughly 100 candidate variants following autosomal dominant inheritance. The slivar software we developed to establish and rapidly apply these filters to VCF files is available at https://github.com/brentp/slivar under an MIT license, and includes documentation and recommendations for best practices for rare disease analysis.


2021 ◽  
Vol 9 ◽  
pp. 232470962110146
Author(s):  
Erin Finn ◽  
Kimberly Kripps ◽  
Christina Chambers ◽  
Michele Rapp ◽  
Naomi J. L. Meeks ◽  
...  

Lipoid congenital adrenal hyperplasia (LCAH) is typically inherited as an autosomal recessive condition. There are 3 reports of individuals with a dominantly acting heterozygous variant leading to a clinically significant phenotype. We report a 46,XY child with a novel heterozygous intronic variant in STAR resulting in LCAH with an attenuated genital phenotype. The patient presented with neonatal hypoglycemia and had descended testes with a fused scrotum and small phallus. Evaluation revealed primary adrenal insufficiency with deficiencies of cortisol, aldosterone, and androgens. He was found to have a de novo heterozygous novel variant in STAR: c.65-2A>C. We report a case of a novel variant and review of other dominant mutations at the same position in the literature. Clinicians should be aware of the possibility of attenuated genital phenotypes of LCAH and the contribution of de novo variants in STAR at c.65-2 to the pathogenesis of that phenotype.


Author(s):  
Quan-Kuan Shen ◽  
Min-Sheng Peng ◽  
Adeniyi C Adeola ◽  
Ling Kui ◽  
Shengchang Duan ◽  
...  

Abstract Domestication of the helmeted guinea fowl (HGF; Numida meleagris) in Africa remains elusive. Here we report a high-quality de novo genome assembly for domestic HGF generated by long and short-reads sequencing together with optical and chromatin interaction mapping. Using this assembly as the reference, we performed population genomic analyses for newly sequenced whole-genomes for 129 birds from Africa, Asia, and Europe, including domestic animals (n = 89), wild progenitors (n = 34), and their closely related wild species (n = 6). Our results reveal domestication of HGF in West Africa around 1,300-5,500 years ago. Scanning for selective signals characterized the functional genes in behavior and locomotion changes involved in domestication of HGF. The pleiotropy and linkage in genes affecting plumage color and fertility were revealed in the recent breeding of Italian domestic HGF. In addition to presenting a missing piece to the jigsaw puzzle of domestication in poultry, our study provides valuable genetic resources for researchers and breeders to improve production in this species.


2019 ◽  
Vol 20 (S22) ◽  
Author(s):  
Hang Zhang ◽  
Ke Wang ◽  
Juan Zhou ◽  
Jianhua Chen ◽  
Yizhou Xu ◽  
...  

Abstract Background Variant calling and refinement from whole genome/exome sequencing data is a fundamental task for genomics studies. Due to the limited accuracy of NGS sequencing and variant callers, IGV-based manual review is required for further false positive variant filtering, which costs massive labor and time, and results in high inter- and intra-lab variability. Results To overcome the limitation of manual review, we developed a novel approach for Variant Filter by Automated Scoring based on Tagged-signature (VariFAST), and also provided a pipeline integrating GATK Best Practices with VariFAST, which can be easily used for high quality variants detection from raw data. Using the bam and vcf files, VariFAST calculates a v-score by sum of weighted metrics causing false positive variations, and marks tags in the manner of keeping high consistency with manual review, for each variant. We validated the performance of VariFAST for germline variant filtering using the benchmark sequencing data from GIAB, and also for somatic variant filtering using sequencing data of both malignant carcinoma and benign adenomas as well. VariFAST also includes a predictive model trained by XGBOOST algorithm for germline variants refinement, which reveals better MCC and AUC than the state-of-the-art VQSR, especially outcompete in INDEL variant filtering. Conclusion VariFAST can assist researchers efficiently and conveniently to filter the false positive variants, including both germline and somatic ones, in NGS data analysis. The VariFAST source code and the pipeline integrating with GATK Best Practices are available at https://github.com/bioxsjtu/VariFAST.


2019 ◽  
Vol 30 (2) ◽  
pp. 93-95
Author(s):  
Alvin Oliver Payus ◽  
Cheong Lei Wah ◽  
Syahrul Sazliyana Shaharir

Hemophagocytic lymphohistiocytosis (HLH) is a life-threatening medical condition characterized by hyperphagocytosis secondary to an inappropriate over-activation of macrophages and lymphocytes that driven by excessive cytokines production which resulted in cellular destructions. It can arise de novo as a result of an autosomal recessive genetic disorder, or in the background of an infection, malignancy or autoimmune disease. Dengue fever is one of the uncommon causes of infection related secondary HLH. Here, we present a case of a Dengue associated HLH which was successfully treated with intravenous methylprednisolone and immunoglobulin G. In conclusion, the purpose of this case report is to illustrate the importance of early recognition and prompt initiation of the appropriate treatment for HLH suspected patient whom otherwise has high mortality rate. Bangladesh J Medicine July 2019; 30(2) : 93-95


2019 ◽  
Vol 57 (4) ◽  
pp. 283-288 ◽  
Author(s):  
Joohyun Park ◽  
Bianca R Flores ◽  
Katalin Scherer ◽  
Hanna Kuepper ◽  
Mari Rossi ◽  
...  

BackgroundCharcot-Marie-Tooth disease (CMT) is a clinically and genetically heterogeneous disorder of the peripheral nervous system. Biallelic variants in SLC12A6 have been associated with autosomal-recessive hereditary motor and sensory neuropathy with agenesis of the corpus callosum (HMSN/ACC). We identified heterozygous de novo variants in SLC12A6 in three unrelated patients with intermediate CMT.MethodsWe evaluated the clinical reports and electrophysiological data of three patients carrying de novo variants in SLC12A6 identified by diagnostic trio exome sequencing. For functional characterisation of the identified variants, potassium influx of mutated KCC3 cotransporters was measured in Xenopus oocytes.ResultsWe identified two different de novo missense changes (p.Arg207His and p.Tyr679Cys) in SLC12A6 in three unrelated individuals with early-onset progressive CMT. All presented with axonal/demyelinating sensorimotor neuropathy accompanied by spasticity in one patient. Cognition and brain MRI were normal. Modelling of the mutant KCC3 cotransporter in Xenopus oocytes showed a significant reduction in potassium influx for both changes.ConclusionOur findings expand the genotypic and phenotypic spectrum associated with SLC12A6 variants from autosomal-recessive HMSN/ACC to dominant-acting de novo variants causing a milder clinical presentation with early-onset neuropathy.


2019 ◽  
Vol 6 (4) ◽  
pp. 810-824 ◽  
Author(s):  
Elaine A Ostrander ◽  
Guo-Dong Wang ◽  
Greger Larson ◽  
Bridgett M vonHoldt ◽  
Brian W Davis ◽  
...  

ABSTRACT Dogs are the most phenotypically diverse mammalian species, and they possess more known heritable disorders than any other non-human mammal. Efforts to catalog and characterize genetic variation across well-chosen populations of canines are necessary to advance our understanding of their evolutionary history and genetic architecture. To date, no organized effort has been undertaken to sequence the world's canid populations. The Dog10K Consortium (http://www.dog10kgenomes.org) is an international collaboration of researchers from across the globe who will generate 20× whole genomes from 10 000 canids in 5 years. This effort will capture the genetic diversity that underlies the phenotypic and geographical variability of modern canids worldwide. Breeds, village dogs, niche populations and extended pedigrees are currently being sequenced, and de novo assemblies of multiple canids are being constructed. This unprecedented dataset will address the genetic underpinnings of domestication, breed formation, aging, behavior and morphological variation. More generally, this effort will advance our understanding of human and canine health.


Blood ◽  
2011 ◽  
Vol 118 (21) ◽  
pp. 1229-1229
Author(s):  
Thomas L. Ortel ◽  
Gary Beecham ◽  
Dale Hedges ◽  
Patrice Whitehead ◽  
Ashley Beecham ◽  
...  

Abstract Abstract 1229 Background: Thrombotic storm (TS) is an extremely severe clinical phenotype that occurs in a very small subset of patients with venous thromboembolic disease. It is characterized by patients who exhibit two or more of the following in a short period of time; 1) > 2 acute arterial/venous thromboemboli, and/or thrombotic microangiopathy, 2) unusual location, 3) progressive/recent unexplained recurrence, and/or 4) refractory to and/or atypical response to therapy (Kitchens et al., Am J Med, 2011). We hypothesize these patients possess an underlying prothrombotic risk factor that results in an accelerated form of thrombosis following an initial event that provokes the attack in the relevant clinical context. Methods: To identify potential genetic risk variants we performed whole-exome sequencing on a TS participant and his unaffected parents and sibling. The proband was a 14 year old male who presented with thrombosis of the sagittal, right transverse and sigmoid sinuses following a sports-related knee injury. There was no personal or family history of venous thromboembolism, and a hypercoagulable workup, including testing for antiphospholipid antibodies, was negative. His course was complicated by the development of disseminated intravascular coagulation, delaying early initiation of anticoagulant therapy. Despite aggressive supportive care, which included anticoagulation therapy, the proband did not improve and expired after severe cerebral edema with herniation was diagnosed by clinical exam and CT imaging. At autopsy, bilateral pulmonary emboli and extensive pelvic vein thrombosis were also identified. DNA was extracted from whole blood and the relevant regions were captured using the Agilent Sure Select 50mb kit. Sequencing was performed on the Illumina HiSeq2000 under the manufacturer's recommended protocol. Alignment of reads to the reference was performed using BWA, and genotype calls were made with GATK. Variants were initially filtered based on quality (depth ≥ 8, phred-like quality ≥ 30), function (nonsense, missense, splicing), and novelty. Additional filters include inheritance mode (autosomal recessive or de novo heterozygote), conservation (phastcons score > 0.5, GERP score > 2), and damage prediction (SIFT or Polyphen). Potential variants were validated using Sanger sequencing. Results: Whole-exome sequencing identified over 127,000 variants in the nuclear family with at least one member having a high quality variant at the position. Filtering these variants based on function, novelty, and high quality in parents and affected proband reduced the list to 2,735 variants. Of these, 7 variants fit an autosomal recessive model (homozygous in the proband, heterozygous in both parents, not homozygous in the unaffected sibling); of these 7, two were at conserved sites, predicted to be damaging, and also called using SAMTOOLS. The first of the recessive variants is a nonsense variation in the EGFL8 gene (tyrosine to stop codon, at the 74th amino acid; tyr74stop), and the second is in HLA-E (gln276pro). Of the initial list of 2,735 variants there were 138 that fit a de novo heterozygous model (present in the affected proband, but not parents); of these 138, two were at a conserved site, predicted to be damaging, and were also called with SAMTOOLS. The first de novo heterozygote is in SLC26A2 (arg178stop), and the second variant is in PRMT7 (arg531trp). These four variants were resequenced using Sanger sequencing within the family. Three of the variants (EGFL8, SLC26A2, and PRMT7) were confirmed using Sanger; the fourth (HLA-E) is still being resequenced. Discussion: These variants represent excellent candidate loci for thrombotic storm risk. In particular, the EGFL8 variant is a homozygous change to a stop codon less than one quarter of the way through the open reading frame – a change that likely severely damages protein function. Additionally, EGFL8 (epidermal growth factor-like domain-containing protein 8) has two EGF domains, a common motif identified in hemostatic and fibrinolytic proteins, and is therefore potentially involved in coagulation. These variants will be further analyzed for frequency in controls and tested in animal models for functional significance. Disclosures: No relevant conflicts of interest to declare.


Sign in / Sign up

Export Citation Format

Share Document