A description of large-scale metabolomics studies: increasing value by combining metabolomics with genome-wide SNP genotyping and transcriptional profiling

Georg Homuth; Alexander Teumer; Uwe Völker; Matthias Nauck

doi:10.1530/joe-12-0144

A description of large-scale metabolomics studies: increasing value by combining metabolomics with genome-wide SNP genotyping and transcriptional profiling

Journal of Endocrinology ◽

10.1530/joe-12-0144 ◽

2012 ◽

Vol 215 (1) ◽

pp. 17-28 ◽

Cited By ~ 18

Author(s):

Georg Homuth ◽

Alexander Teumer ◽

Uwe Völker ◽

Matthias Nauck

Keyword(s):

Blood Cells ◽

Large Scale ◽

Genetic Factors ◽

Association Studies ◽

Transcriptional Profiling ◽

Genome Wide Association Studies ◽

Protein Levels ◽

Future Developments ◽

Genome Wide ◽

Metabolome Data

The metabolome, defined as the reflection of metabolic dynamics derived from parameters measured primarily in easily accessible body fluids such as serum, plasma, and urine, can be considered as the omics data pool that is closest to the phenotype because it integrates genetic influences as well as nongenetic factors. Metabolic traits can be related to genetic polymorphisms in genome-wide association studies, enabling the identification of underlying genetic factors, as well as to specific phenotypes, resulting in the identification of metabolome signatures primarily caused by nongenetic factors. Similarly, correlation of metabolome data with transcriptional or/and proteome profiles of blood cells also produces valuable data, by revealing associations between metabolic changes and mRNA and protein levels. In the last years, the progress in correlating genetic variation and metabolome profiles was most impressive. This review will therefore try to summarize the most important of these studies and give an outlook on future developments.

Download Full-text

Genetic Determinants of Paget’s Disease of Bone

Current Osteoporosis Reports ◽

10.1007/s11914-021-00676-w ◽

2021 ◽

Author(s):

Navnit S. Makaram ◽

Stuart H. Ralston

Keyword(s):

Genetic Factors ◽

Association Studies ◽

Paget’S Disease ◽

Paget's Disease ◽

Paget’S Disease Of Bone ◽

Genome Wide Association Studies ◽

Paget's Disease Of Bone ◽

Genome Wide ◽

Family Based

Abstract Purpose of Review To provide an overview of the role of genes and loci that predispose to Paget’s disease of bone and related disorders. Recent Findings Studies over the past ten years have seen major advances in knowledge on the role of genetic factors in Paget’s disease of bone (PDB). Genome wide association studies have identified six loci that predispose to the disease whereas family based studies have identified a further eight genes that cause PDB. This brings the total number of genes and loci implicated in PDB to fourteen. Emerging evidence has shown that a number of these genes also predispose to multisystem proteinopathy syndromes where PDB is accompanied by neurodegeneration and myopathy due to the accumulation of abnormal protein aggregates, emphasising the importance of defects in autophagy in the pathogenesis of PDB. Summary Genetic factors play a key role in the pathogenesis of PDB and the studies in this area have identified several genes previously not suspected to play a role in bone metabolism. Genetic testing coupled to targeted therapeutic intervention is being explored as a way of halting disease progression and improving outcome before irreversible skeletal damage has occurred.

Download Full-text

GWASpro: a high-performance genome-wide association analysis server

Bioinformatics ◽

10.1093/bioinformatics/bty989 ◽

2018 ◽

Vol 35 (14) ◽

pp. 2512-2514 ◽

Cited By ~ 4

Author(s):

Bongsong Kim ◽

Xinbin Dai ◽

Wenchao Zhang ◽

Zhaohong Zhuang ◽

Darlene L Sanchez ◽

...

Keyword(s):

High Performance ◽

Large Scale ◽

Linear Mixed Model ◽

Association Studies ◽

Learning Curves ◽

Experimental Designs ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Genome Wide

Abstract Summary We present GWASpro, a high-performance web server for the analyses of large-scale genome-wide association studies (GWAS). GWASpro was developed to provide data analyses for large-scale molecular genetic data, coupled with complex replicated experimental designs such as found in plant science investigations and to overcome the steep learning curves of existing GWAS software tools. GWASpro supports building complex design matrices, by which complex experimental designs that may include replications, treatments, locations and times, can be accounted for in the linear mixed model. GWASpro is optimized to handle GWAS data that may consist of up to 10 million markers and 10 000 samples from replicable lines or hybrids. GWASpro provides an interface that significantly reduces the learning curve for new GWAS investigators. Availability and implementation GWASpro is freely available at https://bioinfo.noble.org/GWASPRO. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Better estimation of SNP heritability from summary statistics provides a new understanding of the genetic architecture of complex traits

10.1101/284976 ◽

2018 ◽

Cited By ~ 6

Author(s):

Doug Speed ◽

David J Balding

Keyword(s):

Complex Traits ◽

Genetic Architecture ◽

Large Scale ◽

Association Studies ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Confounding Bias ◽

Conserved Regions ◽

Genome Wide ◽

Variation Explained

LD Score Regression (LDSC) has been widely applied to the results of genome-wide association studies. However, its estimates of SNP heritability are derived from an unrealistic model in which each SNP is expected to contribute equal heritability. As a consequence, LDSC tends to over-estimate confounding bias, under-estimate the total phenotypic variation explained by SNPs, and provide misleading estimates of the heritability enrichment of SNP categories. Therefore, we present SumHer, software for estimating SNP heritability from summary statistics using more realistic heritability models. After demonstrating its superiority over LDSC, we apply SumHer to the results of 24 large-scale association studies (average sample size 121 000). First we show that these studies have tended to substantially over-correct for confounding, and as a result the number of genome-wide significant loci has under-reported by about 20%. Next we estimate enrichment for 24 categories of SNPs defined by functional annotations. A previous study using LDSC reported that conserved regions were 13-fold enriched, and found a further twelve categories with above 2-fold enrichment. By contrast, our analysis using SumHer finds that conserved regions are only 1.6-fold (SD 0.06) enriched, and that no category has enrichment above 1.7-fold. SumHer provides an improved understanding of the genetic architecture of complex traits, which enables more efficient analysis of future genetic data.

Download Full-text

RAFFI: Accurate and fast familial relationship inference in large scale biobank studies using RaPID

PLoS Genetics ◽

10.1371/journal.pgen.1009315 ◽

2021 ◽

Vol 17 (1) ◽

pp. e1009315

Author(s):

Ardalan Naseri ◽

Junjie Shi ◽

Xihong Lin ◽

Shaojie Zhang ◽

Degui Zhi

Keyword(s):

Large Scale ◽

Association Studies ◽

Scale Up ◽

Data Driven ◽

Genome Wide Association Studies ◽

Inference Method ◽

Genome Wide ◽

Familial Relationship ◽

Kinship Coefficients ◽

Data Driven Approach

Inference of relationships from whole-genome genetic data of a cohort is a crucial prerequisite for genome-wide association studies. Typically, relationships are inferred by computing the kinship coefficients (ϕ) and the genome-wide probability of zero IBD sharing (π0) among all pairs of individuals. Current leading methods are based on pairwise comparisons, which may not scale up to very large cohorts (e.g., sample size >1 million). Here, we propose an efficient relationship inference method, RAFFI. RAFFI leverages the efficient RaPID method to call IBD segments first, then estimate the ϕ and π0 from detected IBD segments. This inference is achieved by a data-driven approach that adjusts the estimation based on phasing quality and genotyping quality. Using simulations, we showed that RAFFI is robust against phasing/genotyping errors, admix events, and varying marker densities, and achieves higher accuracy compared to KING, the current leading method, especially for more distant relatives. When applied to the phased UK Biobank data with ~500K individuals, RAFFI is approximately 18 times faster than KING. We expect RAFFI will offer fast and accurate relatedness inference for even larger cohorts.

Download Full-text

Germline Genetic Factors Influence Outcome of Interferon Alpha Therapy in Polycythemia Vera

Blood ◽

10.1182/blood.2020005792 ◽

2020 ◽

Author(s):

Roland Jäger ◽

Heinz Gisslinger ◽

Elisabeth Fuchs ◽

Edith Bogner ◽

Jelena D. Milosevic Feenstra ◽

...

Keyword(s):

Polycythemia Vera ◽

Interferon Alpha ◽

Genetic Factors ◽

Myeloproliferative Neoplasms ◽

Association Studies ◽

Study Cohort ◽

Molecular Response ◽

Genome Wide Association Studies ◽

Association Analyses ◽

Genome Wide

Interferon alpha (IFNα) based therapies can induce hematologic and molecular responses in polycythemia vera (PV); however, patients do not respond equally. Germline genetic factors have previously been implicated in differential drug response. We addressed the effect of common germline polymorphisms on hematologic and molecular response (HR/MR) in PV therapy within the PROUD-PV and CONTINUATION-PV studies including 122 patients with PV receiving ropeginterferon alfa-2b. Genome-wide association studies using longitudinal data on HR and MR over 36 months follow-up did not reveal any associations at genome-wide significance. Further, we performed targeted association analyses at the interferon lambda 4 (IFNL4) locus, well known for its role in hepatitis C viral clearance and recently reported to influence HR during therapy of myeloproliferative neoplasms. While we did not observe any association of IFNL4 polymorphisms with HR in our study cohort, we demonstrated a statistically significant effect of the functionally causative IFNL4 diplotype (haplotype pair including the protein-coding variants rs368234815/rs117648444) on MR (p=3.91x10-4; OR=10.80; 95%CI:[2.39-69.97]) as reflected in differential JAK2V617F mutational burden changes according to IFNL4 diplotype status. Stratification of PV patients based on IFNL4 functionality may allow for optimizing patient management during IFNα treatment.

Download Full-text

Genetics of juvenile rheumatic diseases

10.1093/med/9780199642489.003.0043_update_002 ◽

2015 ◽

Author(s):

Anne Hinks ◽

Wendy Thomson

Keyword(s):

Risk Factors ◽

Rheumatic Diseases ◽

Large Scale ◽

Association Studies ◽

Genetic Diseases ◽

Response To Treatment ◽

Genome Wide Association Studies ◽

Established Risk Factor ◽

Genome Wide ◽

Juvenile Rheumatic Diseases

Juvenile rheumatic diseases are heterogeneous, complex genetic diseases; to date only juvenile idiopathic arthritis (JIA) has been extensively studied in terms of identifying genetic risk factors. The MHC region is a well-established risk factor but in the last few years candidate gene and large-scale genome-wide association studies have been utilized in the search for non-HLA risk factors. There are now 17 JIA susceptibility loci which reach the genome-wide significance threshold for association and a further 7 regions with evidence for association in more than one study. In addition, some subtype-specific associations are emerging. These risk loci now need to be investigated further using fine-mapping strategies and then appropriate functional studies to show how the variant alters the gene function. This knowledge will not only lead to a better understanding of disease pathogenesis for juvenile rheumatic diseases but may also aid in the classification of these heterogeneous diseases. It may identify new pathways for potential therapeutic targets and help in the prediction of disease outcome and response to treatment.

Download Full-text

Understanding the genetic determinants of the brain with MOSTest

Nature Communications ◽

10.1038/s41467-020-17368-1 ◽

2020 ◽

Vol 11 (1) ◽

Cited By ~ 3

Author(s):

Dennis van der Meer ◽

Oleksandr Frei ◽

Tobias Kaufmann ◽

Alexey A. Shadrin ◽

Anna Devor ◽

...

Keyword(s):

Large Scale ◽

Association Studies ◽

Computational Design ◽

Brain Regions ◽

Brain Morphology ◽

Genome Wide Association Studies ◽

Small Individual ◽

Significance Threshold ◽

Regional Brain ◽

Genome Wide

Abstract Regional brain morphology has a complex genetic architecture, consisting of many common polymorphisms with small individual effects. This has proven challenging for genome-wide association studies (GWAS). Due to the distributed nature of genetic signal across brain regions, multivariate analysis of regional measures may enhance discovery of genetic variants. Current multivariate approaches to GWAS are ill-suited for complex, large-scale data of this kind. Here, we introduce the Multivariate Omnibus Statistical Test (MOSTest), with an efficient computational design enabling rapid and reliable inference, and apply it to 171 regional brain morphology measures from 26,502 UK Biobank participants. At the conventional genome-wide significance threshold of α = 5 × 10−8, MOSTest identifies 347 genomic loci associated with regional brain morphology, more than any previous study, improving upon the discovery of established GWAS approaches more than threefold. Our findings implicate more than 5% of all protein-coding genes and provide evidence for gene sets involved in neuron development and differentiation.

Download Full-text

A Review of the Hereditary Component of Triple Negative Breast Cancer: High- and Moderate-Penetrance Breast Cancer Genes, Low-Penetrance Loci, and the Role of Nontraditional Genetic Elements

Journal of Oncology ◽

10.1155/2019/4382606 ◽

2019 ◽

Vol 2019 ◽

pp. 1-10 ◽

Cited By ~ 11

Author(s):

Darrell L. Ellsworth ◽

Clesson E. Turner ◽

Rachel E. Ellsworth

Keyword(s):

Breast Cancer ◽

Triple Negative Breast Cancer ◽

Large Scale ◽

Triple Negative ◽

Association Studies ◽

African Ancestry ◽

Genome Wide Association Studies ◽

Genetic Elements ◽

Genome Wide ◽

Increased Risk

Triple negative breast cancer (TNBC), representing 10-15% of breast tumors diagnosed each year, is a clinically defined subtype of breast cancer associated with poor prognosis. The higher incidence of TNBC in certain populations such as young women and/or women of African ancestry and a unique pathological phenotype shared between TNBC and BRCA1-deficient tumors suggest that TNBC may be inherited through germline mutations. In this article, we describe genes and genetic elements, beyond BRCA1 and BRCA2, which have been associated with increased risk of TNBC. Multigene panel testing has identified high- and moderate-penetrance cancer predisposition genes associated with increased risk for TNBC. Development of large-scale genome-wide SNP assays coupled with genome-wide association studies (GWAS) has led to the discovery of low-penetrance TNBC-associated loci. Next-generation sequencing has identified variants in noncoding RNAs, viral integration sites, and genes in underexplored regions of the human genome that may contribute to the genetic underpinnings of TNBC. Advances in our understanding of the genetics of TNBC are driving improvements in risk assessment and patient management.

Download Full-text

Secure large-scale genome-wide association studies using homomorphic encryption

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1918257117 ◽

2020 ◽

Vol 117 (21) ◽

pp. 11608-11613 ◽

Cited By ~ 1

Author(s):

Marcelo Blatt ◽

Alexander Gusev ◽

Yuriy Polyakov ◽

Shafi Goldwasser

Keyword(s):

Large Scale ◽

Homomorphic Encryption ◽

Association Studies ◽

Genome Wide Association ◽

Single Server ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

User Interactions ◽

Individual Level ◽

Genome Wide

Genome-wide association studies (GWASs) seek to identify genetic variants associated with a trait, and have been a powerful approach for understanding complex diseases. A critical challenge for GWASs has been the dependence on individual-level data that typically have strict privacy requirements, creating an urgent need for methods that preserve the individual-level privacy of participants. Here, we present a privacy-preserving framework based on several advances in homomorphic encryption and demonstrate that it can perform an accurate GWAS analysis for a real dataset of more than 25,000 individuals, keeping all individual data encrypted and requiring no user interactions. Our extrapolations show that it can evaluate GWASs of 100,000 individuals and 500,000 single-nucleotide polymorphisms (SNPs) in 5.6 h on a single server node (or in 11 min on 31 server nodes running in parallel). Our performance results are more than one order of magnitude faster than prior state-of-the-art results using secure multiparty computation, which requires continuous user interactions, with the accuracy of both solutions being similar. Our homomorphic encryption advances can also be applied to other domains where large-scale statistical analyses over encrypted data are needed.

Download Full-text

Animal-ImputeDB: a comprehensive database with multiple animal reference panels for genotype imputation

Nucleic Acids Research ◽

10.1093/nar/gkz854 ◽

2019 ◽

Vol 48 (D1) ◽

pp. D659-D667 ◽

Cited By ~ 2

Author(s):

Wenqian Yang ◽

Yanbo Yang ◽

Cecheng Zhao ◽

Kun Yang ◽

Dongyang Wang ◽

...

Keyword(s):

Large Scale ◽

Association Studies ◽

Genotype Imputation ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

High Quality ◽

Single Nucleotide ◽

Genome Wide ◽

Whole Genome Resequencing ◽

Missing Genotypes

Abstract Animal-ImputeDB (http://gong_lab.hzau.edu.cn/Animal_ImputeDB/) is a public database with genomic reference panels of 13 animal species for online genotype imputation, genetic variant search, and free download. Genotype imputation is a process of estimating missing genotypes in terms of the haplotypes and genotypes in a reference panel. It can effectively increase the density of single nucleotide polymorphisms (SNPs) and thus can be widely used in large-scale genome-wide association studies (GWASs) using relatively inexpensive and low-density SNP arrays. However, most animals except humans lack high-quality reference panels, which greatly limits the application of genotype imputation in animals. To overcome this limitation, we developed Animal-ImputeDB, which is dedicated to collecting genotype data and whole-genome resequencing data of nonhuman animals from various studies and databases. A computational pipeline was developed to process different types of raw data to construct reference panels. Finally, 13 high-quality reference panels including ∼400 million SNPs from 2265 samples were constructed. In Animal-ImputeDB, an easy-to-use online tool consisting of two popular imputation tools was designed for the purpose of genotype imputation. Collectively, Animal-ImputeDB serves as an important resource for animal genotype imputation and will greatly facilitate research on animal genomic selection and genetic improvement.

Download Full-text