Rare Non-coding Variation Identified by Large Scale Whole Genome Sequencing Reveals Unexplained Heritability of Type 2 Diabetes

Mapping Intimacies ◽

10.1101/2020.11.13.20221812 ◽

2020 ◽

Author(s):

Jennifer Wessel ◽

Timothy D Majarian ◽

Heather M Highland ◽

Sridharan Raghavan ◽

Mindy D Szeto ◽

...

Keyword(s):

Type 2 Diabetes ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Large Scale ◽

Regulatory Elements ◽

Whole Genome Sequence ◽

European Ancestry ◽

Substantial Contribution ◽

Whole Genome

Type 2 diabetes is increasing in all ancestry groups1. Part of its genetic basis may reside among the rare (minor allele frequency <0.1%) variants that make up the vast majority of human genetic variation2. We analyzed high-coverage (mean depth 38.2x) whole genome sequencing from 9,639 individuals with T2D and 34,994 controls in the NHLBI’s Trans-Omics for Precision Medicine (TOPMed) program2 to show that rare, non-coding variants that are poorly captured by genotyping arrays or imputation panels contribute h2=53% (P=4.2×10−5) to the genetic component of risk in the largest (European) ancestry subset. We coupled sequence variation with islet epigenomic signatures3 to annotate and group rare variants with respect to gene expression4, chromatin state5 and three-dimensional chromatin architecture6, and show that pancreatic islet regulatory elements contribute to T2D genetic risk (h2=8%, P=2.4×10−3). We used islet annotation to create a non-coding framework for rare variant aggregation testing. This approach identified five loci containing rare alleles in islet regulatory elements that suggest novel biological mechanisms readily linked to hypotheses about variant-to-function. Large scale whole genome sequence analysis reveals the substantial contribution of rare, non-coding variation to the genetic architecture of T2D and highlights the value of tissue-specific regulatory annotation for variant-to-function discovery.

Download Full-text

1722-P: Colocalization of TOPMed Whole Genome Sequencing Analysis and Tissue-Specific eQTL Signals Detects Target Genes for Type 2 Diabetes Risk

Diabetes ◽

10.2337/db19-1722-p ◽

2019 ◽

Vol 68 (Supplement 1) ◽

pp. 1722-P

Author(s):

MINDY D. SZETO ◽

HEATHER M. HIGHLAND ◽

ALISA MANNING ◽

Keyword(s):

Type 2 Diabetes ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Target Genes ◽

Diabetes Risk ◽

Whole Genome ◽

Sequencing Analysis ◽

Tissue Specific

Download Full-text

Epidemic Clostridioides difficile Ribotype 027 Lineages: Comparisons of Texas Versus Worldwide Strains

Open Forum Infectious Diseases ◽

10.1093/ofid/ofz013 ◽

2019 ◽

Vol 6 (2) ◽

Cited By ~ 7

Author(s):

Bradley T Endres ◽

Khurshida Begum ◽

Hua Sun ◽

Seth T Walk ◽

Ali Memariani ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Phylogenetic Trees ◽

Large Scale ◽

The United States ◽

Whole Genome Sequence ◽

Snp Analysis ◽

Whole Genome ◽

Ribotype 027 ◽

Clostridioides Difficile

Abstract Background The epidemic Clostridioides difficile ribotype 027 strain resulted from the dissemination of 2 separate fluoroquinolone-resistant lineages: FQR1 and FQR2. Both lineages were reported to originate in North America; however, confirmatory large-scale investigations of C difficile ribotype 027 epidemiology using whole genome sequencing has not been undertaken in the United States. Methods Whole genome sequencing and single-nucleotide polymorphism (SNP) analysis was performed on 76 clinical ribotype 027 isolates obtained from hospitalized patients in Texas with C difficile infection and compared with 32 previously sequenced worldwide strains. Maximum-likelihood phylogeny based on a set of core genome SNPs was used to construct phylogenetic trees investigating strain macro- and microevolution. Bayesian phylogenetic and phylogeographic analyses were used to incorporate temporal and geographic variables with the SNP strain analysis. Results Whole genome sequence analysis identified 2841 SNPs including 900 nonsynonymous mutations, 1404 synonymous substitutions, and 537 intergenic changes. Phylogenetic analysis separated the strains into 2 prominent groups, which grossly differed by 28 SNPs: the FQR1 and FQR2 lineages. Five isolates were identified as pre-epidemic strains. Phylogeny demonstrated unique clustering and resistance genes in Texas strains indicating that spatiotemporal bias has defined the microevolution of ribotype 027 genetics. Conclusions Clostridioides difficile ribotype 027 lineages emerged earlier than previously reported, coinciding with increased use of fluoroquinolones. Both FQR1 and FQR2 ribotype 027 epidemic lineages are present in Texas, but they have evolved geographically to represent region-specific public health threats.

Download Full-text

Quality Control and Integration of Genotypes from Two Calling Pipelines for Whole Genome Sequence Data in the Alzheimer’s Disease Sequencing Project

10.1101/318857 ◽

2018 ◽

Cited By ~ 1

Author(s):

Adam C. Naj ◽

Honghuang Lin ◽

Badri N. Vardarajan ◽

Simon White ◽

Daniel Lancour ◽

...

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Quality Control ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Large Scale ◽

Read Depth ◽

Whole Genome Sequence ◽

Whole Genome ◽

Sequencing Project

AbstractThe Alzheimer’s Disease Sequencing Project (ADSP) performed whole genome sequencing (WGS) of 584 subjects from 111 multiplex families at three sequencing centers. Genotype calling of single nucleotide variants (SNVs) and insertion-deletion variants (indels) was performed centrally using GATK-HaplotypeCaller and Atlas V2. The ADSP Quality Control (QC) Working Group applied QC protocols to project-level variant call format files (VCFs) from each pipeline, and developed and implemented a novel protocol, termed “consensus calling,” to combine genotype calls from both pipelines into a single high-quality set. QC was applied to autosomal bi-allelic SNVs and indels, and included pipeline-recommended QC filters, variant-level QC, and sample-level QC. Low-quality variants or genotypes were excluded, and sample outliers were noted. Quality was assessed by examining Mendelian inconsistencies (MIs) among 67 parent-offspring pairs, and MIs were used to establish additional genotype-specific filters for GATK calls. After QC, 578 subjects remained. Pipeline-specific QC excluded ~12.0% of GATK and 14.5% of Atlas SNVs. Between pipelines, ~91% of SNV genotypes across all QCed variants were concordant; 4.23% and 4.56% of genotypes were exclusive to Atlas or GATK, respectively; the remaining ~0.01% of discordant genotypes were excluded. For indels, variant-level QC excluded ~36.8% of GATK and 35.3% of Atlas indels. Between pipelines, ~55.6% of indel genotypes were concordant; while 10.3% and 28.3% were exclusive to Atlas or GATK, respectively; and ~0.29% of discordant genotypes were. The final WGS consensus dataset contains 27,896,774 SNVs and 3,133,926 indels and is publicly available.AbbreviationsAD, Alzheimer’s disease; QC, Quality Control; LSSAC, Large-Scale Sequencing and Analysis Center; Broad, Broad Institute Genomics Service; Baylor, Baylor College of Medicine Human Genome Sequencing Center; WashU, Washington University-St. Louis McDonnell Genome Institute; WGS, whole genome sequencing; WES, whole exome sequencing; indel, insertion-deletion variants; VCF, variant control format; MI, Mendelian inconsistency; MC, Mendelian consistency; GWAS, genome-wide association study; VR, referent allele read depth; DP, overall read depth; MS, mapping score; GQ, genotype quality score; Ti/Tv, Transition/Transversion; CS, concordance code

Download Full-text

30-OR: Whole Genome Sequence Analysis of Type 2 Diabetes Risk in 44,713 Humans of Diverse Ancestry in the TOPMed Study

Diabetes ◽

10.2337/db19-30-or ◽

2019 ◽

Vol 68 (Supplement 1) ◽

pp. 30-OR

Author(s):

HEATHER M. HIGHLAND ◽

JENNIFER WESSEL ◽

ALISA MANNING ◽

Keyword(s):

Type 2 Diabetes ◽

Sequence Analysis ◽

Genome Sequence ◽

Diabetes Risk ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genome Sequence Analysis

Download Full-text

0306 Exploring the feasibility of using copy number variants as genetic markers through large-scale whole genome sequencing experiments

Journal of Animal Science ◽

10.2527/jam2016-0306 ◽

2016 ◽

Vol 94 (suppl_5) ◽

pp. 146-146

Author(s):

D. M. Bickhart ◽

L. Xu ◽

J. L. Hutchison ◽

J. B. Cole ◽

D. J. Null ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genetic Markers ◽

Genome Sequencing ◽

Copy Number ◽

Large Scale ◽

Copy Number Variants ◽

Whole Genome

Download Full-text

Plasmids or no plasmids? A comparison between the agilent TapeStation and whole-genome sequencing data in a large-scale bacterial sequencing project

10.26226/morressier.56d5ba27d462b80296c95fe7 ◽

2016 ◽

Author(s):

Sarah Alexander

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Large Scale ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Sequencing Project

Download Full-text

SureSelect targeted enrichment, a new cost effective method for the whole genome sequencing of Candidatus Liberibacter asiaticus

Scientific Reports ◽

10.1038/s41598-019-55144-4 ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 2

Author(s):

Weili Cai ◽

Schyler Nunziata ◽

John Rascoe ◽

Michael J. Stulberg

Keyword(s):

Whole Genome Sequencing ◽

Molecular Characterization ◽

Genome Sequencing ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genome Coverage ◽

Metagenomic Sample ◽

Liberibacter Asiaticus

AbstractHuanglongbing (HLB) is a worldwide deadly citrus disease caused by the phloem-limited bacteria ‘Candidatus Liberibacter asiaticus’ (CLas) vectored by Asian citrus psyllids. In order to effectively manage this disease, it is crucial to understand the relationship among the bacterial isolates from different geographical locations. Whole genome sequencing approaches will provide more precise molecular characterization of the diversity among populations. Due to the lack of in vitro culture, obtaining the whole genome sequence of CLas is still a challenge, especially for medium to low titer samples. Hundreds of millions of sequencing reads are needed to get good coverage of CLas from an HLB positive citrus sample. In order to overcome this limitation, we present here a new method, Agilent SureSelect XT HS target enrichment, which can specifically enrich CLas from a metagenomic sample while greatly reducing cost and increasing whole genome coverage of the pathogen. In this study, the CLas genome was successfully sequenced with 99.3% genome coverage and over 72X sequencing coverage from low titer tissue samples (equivalent to 28.52 Cq using Li 16 S qPCR). More importantly, this method also effectively captures regions of diversity in the CLas genome, which provides precise molecular characterization of different strains.

Download Full-text

Abstract 343: Bayesian Selection of Modifier Genes in Hypertrophic Cardiomyopathy Through Whole Genome Sequencing

Circulation Research ◽

10.1161/res.117.suppl_1.343 ◽

2015 ◽

Vol 117 (suppl_1) ◽

Author(s):

Matthew Wheeler ◽

Daryl Waggott ◽

Megan Grove ◽

Frederick Dewey ◽

Cuiping Pan ◽

...

Keyword(s):

Hypertrophic Cardiomyopathy ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

A Priori ◽

Copy Number Variants ◽

Whole Genome Sequence ◽

Monogenic Disease ◽

Whole Genome ◽

Genetic Modifiers ◽

Structural Variants

Background: Technological advances have greatly reduced the cost of whole genome sequencing. For single individuals clinical application is apparent, while exome sequencing in tens of thousands of people has allowed a more global view of genetic variation that can inform interpretation of specific variants in individuals. We hypothesized that genome sequencing of patients with monogenic cardiomyopathy would facilitate discovery of genetic modifiers of phenotype. Methods and Results: We identified 48 individuals diagnosed with cardiomyopathy and with putative mutations in MYH7, the gene encoding beta myosin heavy chain. We carried out whole genome sequencing and applied a newly developed analytical pipeline optimized for discovery of genes modifying severity of clinical presentation and outcomes. Using a combination of external priors and rare variant burden tests we scored genes as potential modifiers. There were 96 genes that reached a modifier score of 6 out of 12 or better (9=2, 8=8, 7=17, 6=69). We identified NCKAP1, a gene that regulates actin filament dynamics, and CAMSAP1, a calmodulin regulate gene that regulates microtubule dynamics, as top scoring modifiers of hypertrophic cardiomyopathy phenotypes (score=9) while LDB2, RYR2, FBN1 and ATP1A2 had modifier scores of 8. Of the top scoring genes, 21 out of 96 were identified as candidates a priori. Our candidate prioritization scheme identified the previously described modifiers of cardiomyopathy phenotype, FHOD3 and MYBPC3, as top scoring genes. We identified structural variants in 21 clinically sequenced cardiomyopathy associated genes, 13 of which were at less than 10% frequency. Copy number variants in ILK and CSRP3 were nominally associated with ejection fraction (p=0.03), while 8 genes showed copy gains (GLA, FKTN, SGCD, TTN, SOS1, ANKRD1, VCL and NEBL). Structural variants were found in CSRP3, MYL3 and TNNC1, all of which have been implicated as causative for HCM. Conclusion: Evaluation of the whole genome sequence, even in the case of putatively monogenic disease, leads to important diagnostic and scientific insights not revealed by panel-based sequencing.

Download Full-text

A large-scale whole-genome sequencing analysis reveals false positives of bacterial essential genes

Applied Microbiology and Biotechnology ◽

10.1007/s00253-021-11702-3 ◽

2021 ◽

Author(s):

Yuanhao Li ◽

Bo Jiang ◽

Weijun Dai

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Large Scale ◽

False Positives ◽

Essential Genes ◽

Whole Genome ◽

Sequencing Analysis

Download Full-text

Improving tuberculosis surveillance by detecting international transmission using publicly available whole-genome sequencing data

10.1101/834150 ◽

2019 ◽

Author(s):

Andrea Sanchini ◽

Christine Jandrasits ◽

Julius Tembrockhaus ◽

Thomas Andreas Kohl ◽

Christian Utpatel ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Large Scale ◽

Added Value ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

International Transmission ◽

The Public ◽

Public Dataset ◽

Public Repositories

AbstractIntroductionImproving the surveillance of tuberculosis (TB) is especially important for multidrug-resistant (MDR) and extensively drug-resistant (XDR)-TB. The large amount of publicly available whole-genome sequencing (WGS) data for TB gives us the chance to re-use data and to perform additional analysis at a large scale.AimWe assessed the usefulness of raw WGS data of global MDR/XDR-TB isolates available from public repositories to improve TB surveillance.MethodsWe extracted raw WGS data and the related metadata of Mycobacterium tuberculosis isolates available from the Sequence Read Archive. We compared this public dataset with WGS data and metadata of 131 MDR- and XDR-TB isolates from Germany in 2012-2013.ResultsWe aggregated a dataset that includes 1,081 MDR and 250 XDR isolates among which we identified 133 molecular clusters. In 16 clusters, the isolates were from at least two different countries. For example, cluster2 included 56 MDR/XDR isolates from Moldova, Georgia, and Germany. By comparing the WGS data from Germany and the public dataset, we found that 11 clusters contained at least one isolate from Germany and at least one isolate from another country. We could, therefore, connect TB cases despite missing epidemiological information.ConclusionWe demonstrated the added value of using WGS raw data from public repositories to contribute to TB surveillance. By comparing the German and the public dataset, we identified potential international transmission events. Thus, using this approach might support the interpretation of national surveillance results in an international context.

Download Full-text