scholarly journals Identification of misclassified ClinVar variants using disease population prevalence

2016 ◽  
Author(s):  
Naisha Shah ◽  
Ying-Chen Claire Hou ◽  
Hung-Chun Yu ◽  
Rachana Sainger ◽  
Eric Dec ◽  
...  

ABSTRACTThere is a significant interest in the standardized classification of human genetic variants. The availability of new large datasets generated through genome sequencing initiatives provides a ground for the computational evaluation of the supporting evidence. We used whole genome sequence data from 8,102 unrelated individuals to analyze the adequacy of estimated rates of disease on the basis of genetic risk and the expected population prevalence of the disease. Analyses included the ACMG recommended 56 gene-condition sets for incidental findings and 631 genes associated with 348 OrphaNet conditions. A total of 21,004 variants were used to identify patterns of inflation (i.e. excess genetic risk). Inflation, i.e., misclassification, increases as the level of evidence in ClinVar supporting the pathogenic nature of the variant decreases. The burden of rare variants was a main contributing factor of the observed inflation indicating misclassified benign private mutations. We also analyzed the dynamics of re-classification of variant pathogenicity in ClinVar over time. The study strongly suggests that ClinVar includes a significant proportion of wrongly ascertained variants, and underscores the critical role of ClinVar to contrast claims, and foster validation across submitters.

Author(s):  
Viola Kurm ◽  
Ilse Houwers ◽  
Claudia E. Coipan ◽  
Peter Bonants ◽  
Cees Waalwijk ◽  
...  

AbstractIdentification and classification of members of the Ralstonia solanacearum species complex (RSSC) is challenging due to the heterogeneity of this complex. Whole genome sequence data of 225 strains were used to classify strains based on average nucleotide identity (ANI) and multilocus sequence analysis (MLSA). Based on the ANI score (>95%), 191 out of 192(99.5%) RSSC strains could be grouped into the three species R. solanacearum, R. pseudosolanacearum, and R. syzygii, and into the four phylotypes within the RSSC (I,II, III, and IV). R. solanacearum phylotype II could be split in two groups (IIA and IIB), from which IIB clustered in three subgroups (IIBa, IIBb and IIBc). This division by ANI was in accordance with MLSA. The IIB subgroups found by ANI and MLSA also differed in the number of SNPs in the primer and probe sites of various assays. An in-silico analysis of eight TaqMan and 11 conventional PCR assays was performed using the whole genome sequences. Based on this analysis several cases of potential false positives or false negatives can be expected upon the use of these assays for their intended target organisms. Two TaqMan assays and two PCR assays targeting the 16S rDNA sequence should be able to detect all phylotypes of the RSSC. We conclude that the increasing availability of whole genome sequences is not only useful for classification of strains, but also shows potential for selection and evaluation of clade specific nucleic acid-based amplification methods within the RSSC.


2013 ◽  
Vol 63 (Pt_7) ◽  
pp. 2742-2751 ◽  
Author(s):  
Henryk Urbanczyk ◽  
Yoshitoshi Ogura ◽  
Tetsuya Hayashi

Use of inadequate methods for classification of bacteria in the so-called Harveyi clade (family Vibrionaceae, Gammaproteobacteria) has led to incorrect assignment of strains and proliferation of synonymous species. In order to resolve taxonomic ambiguities within the Harveyi clade and to test usefulness of whole genome sequence data for classification of Vibrionaceae, draft genome sequences of 12 strains were determined and analysed. The sequencing included type strains of seven species: Vibrio sagamiensis NBRC 104589T, Vibrio azureus NBRC 104587T, Vibrio harveyi NBRC 15634T, Vibrio rotiferianus LMG 21460T, Vibrio campbellii NBRC 15631T, Vibrio jasicida LMG 25398T, and Vibrio owensii LMG 25443T. Draft genome sequences of strain LMG 25430, previously designated the type strain of [Vibrio communis], and two strains (MWB 21 and 090810c) from the ‘beijerinckii’ lineage were also determined. Whole genomes of two additional strains (ATCC 25919 and 200612B) that previously could not be assigned to any Harveyi clade species were also sequenced. Analysis of the genome sequence data revealed a clear case of synonymy between V. owensii and [V. communis], confirming an earlier proposal to synonymize both species. Both strains from the ‘beijerinckii’ lineage were classified as V. jasicida, while the strains ATCC 25919 and 200612B were classified as V. owensii and V. campbellii, respectively. We also found that two strains, AND4 and Ex25, are closely related to Harveyi clade bacteria, but could not be assigned to any species of the family Vibrionaceae. The use of whole genome sequence data for the taxonomic classification of the Harveyi clade bacteria and other members of the family Vibrionaceae is also discussed.


2020 ◽  
Author(s):  
Emily DiBlasi ◽  
Andrey A. Shabalin ◽  
Eric T. Monson ◽  
Brooks R. Keeshin ◽  
Amanda V. Bakian ◽  
...  

ABSTRACTSuicide death is a worldwide health crisis, claiming close to 800,000 lives per year. Recent evidence suggests that prediction and prevention challenges may be aided by discoveries of genetic risk factors. Here we focus on the role of rare (MAF <1%), putatively functional single nucleotide polymorphisms (SNPs) in suicide death using the large genetic resources available in the Utah Suicide Genetic Risk Study (USGRS). We conducted a single-variant association analysis of 30,377 rare putatively functional SNPs present on the PsychArray genotyping array in 2,672 USGRS suicides of non-Finnish European (NFE) ancestry and 51,583 publicly available NFE controls from gnomAD, with additional follow-up analyses using an independent control sample of 21,324 NFE controls from the Psychiatric Genomics Consortium. SNPs underwent rigorous quality control, and among SNPs meeting significance thresholds, we considered only those that were validated in sequence data. We identified five novel, high-impact, rare SNPs with significant associations with suicide death (SNAPC1, rs75418419; TNKS1BP1, rs143883793; ADGRF5, rs149197213; PER1, rs145053802; and ESS2, rs62223875). Both PER1 and SNAPC1 have other supporting gene-level evidence of suicide risk, and an association with bipolar disorder has been reported for PER1 and with schizophrenia for PER1, TNKS1BP1, and ESS2. Three genes (PER1, TNKS1BP1, and ADGRF5), with additional genes implicated by GWAS studies on suicidal behavior, showed significant enrichment in immune system, homeostatic and signal transduction processes. Pain, depression, and accidental trauma were the most prevalent phenotypes in electronic medical record data for the categories assessed. These findings suggest an important role for rare variants in suicide risk and provide new insights into the genetic architecture of suicide death. Furthermore, we demonstrate the added utility of careful assessment of genotyping arrays in rare variant discovery.


2014 ◽  
Vol 38 (S1) ◽  
pp. S13-S20 ◽  
Author(s):  
Yun Ju Sung ◽  
Keegan D. Korthauer ◽  
Michael D. Swartz ◽  
Corinne D. Engelman

2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Masato Akiyama ◽  
Kazuyoshi Ishigaki ◽  
Saori Sakaue ◽  
Yukihide Momozawa ◽  
Momoko Horikoshi ◽  
...  

Abstract Human height is a representative phenotype to elucidate genetic architecture. However, the majority of large studies have been performed in European population. To investigate the rare and low-frequency variants associated with height, we construct a reference panel (N = 3,541) for genotype imputation by integrating the whole-genome sequence data from 1,037 Japanese with that of the 1000 Genomes Project, and perform a genome-wide association study in 191,787 Japanese. We report 573 height-associated variants, including 22 rare and 42 low-frequency variants. These 64 variants explain 1.7% of the phenotypic variance. Furthermore, a gene-based analysis identifies two genes with multiple height-increasing rare and low-frequency nonsynonymous variants (SLC27A3 and CYP26B1; PSKAT-O < 2.5 × 10−6). Our analysis shows a general tendency of the effect sizes of rare variants towards increasing height, which is contrary to findings among Europeans, suggesting that height-associated rare variants are under different selection pressure in Japanese and European populations.


2015 ◽  
Vol 6 (1) ◽  
Author(s):  
Peter N. Taylor ◽  
◽  
Eleonora Porcu ◽  
Shelby Chew ◽  
Purdey J. Campbell ◽  
...  

Abstract Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF≥1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 10−9) and a new independent variant in PDE8B (MAF=10.4%, P=5.94 × 10−14). For FT4, we report a low-frequency variant near B4GALT6/SLC25A52 (MAF=3.2%, P=1.27 × 10−9) tagging a rare TTR variant (MAF=0.4%, P=2.14 × 10−11). All common variants explain ≥20% of the variance in TSH and FT4. Analysis of rare variants (MAF<1%) using sequence kernel association testing reveals a novel association with FT4 in NRG1. Our results demonstrate that increased coverage in whole-genome sequence association studies identifies novel variants associated with thyroid function.


mSphere ◽  
2018 ◽  
Vol 3 (1) ◽  
Author(s):  
Hülya Kaya ◽  
Henrik Hasman ◽  
Jesper Larsen ◽  
Marc Stegger ◽  
Thor Bech Johannesen ◽  
...  

SCCmec in MRSA is acknowledged to be of importance not only because it contains the mecA or mecC gene but also for staphylococcal adaptation to different environments, e.g., in hospitals, the community, and livestock. Typing of SCCmec by PCR techniques has, because of its heterogeneity, been challenging, and whole-genome sequencing has only partially solved this since no good bioinformatic tools have been available. In this article, we describe the development of a new bioinformatic tool, SCCmecFinder, that includes most of the needs for infection control professionals and researchers regarding the interpretation of SCCmec elements. The software detects all of the SCCmec elements accepted by the International Working Group on the Classification of Staphylococcal Cassette Chromosome Elements, and users will be prompted if diverging and potential new elements are uploaded. Furthermore, SCCmecFinder will be curated and updated as new elements are found and it is easy to use and freely accessible.


2017 ◽  
Author(s):  
Daniel Shriner ◽  
Charles N. Rotimi

ABSTRACTFive classical designations of sickle haplotypes are based on the presence/absence of restriction sites and named after ethnic groups or geographic regions from which patients originated. Each haplotype is thought to represent an independent occurrence of the sickle mutation. We investigated the origins of the sickle mutation using whole genome sequence data. We identified 156 carriers from the 1000 Genomes Project, the African Genome Variation Project, and Qatar. We defined a new haplotypic classification using 27 polymorphisms in linkage disequilibrium with rs334. Network analysis revealed a common haplotype that differed from the ancestral haplotype only by the derived sickle mutation at rs334 and correlated collectively with the Central African Republic/Bantu, Cameroon, and Arabian/Indian designations. Other haplotypes were derived from this haplotype and fell into two clusters, one comprised of haplotypes correlated with the Senegal designation and the other comprised of haplotypes correlated with both the Benin and Senegal designations. The near-exclusive presence of the original sickle haplotype in the Central African Republic, Kenya, Uganda, and South Africa is consistent with this haplotype predating the Bantu Expansion. Modeling of balancing selection indicated that the heterozygote advantage was 15.2%, an equilibrium frequency of 12.0% was reached after 87 generations, and the selective environment predated the mutation. The posterior distribution of the ancestral recombination graph yielded an age of the sickle mutation of 259 generations, corresponding to 7,300 years and the Holocene Wet Phase. These results clarify the origin of the sickle allele and improve and simplify the classification of sickle haplotypes.


Sign in / Sign up

Export Citation Format

Share Document