Multilevel Twin Models: Geographical Region as a Third Level Variable

AbstractThe classical twin model can be reparametrized as an equivalent multilevel model. The multilevel parameterization has underexplored advantages, such as the possibility to include higher-level clustering variables in which lower levels are nested. When this higher-level clustering is not modeled, its variance is captured by the common environmental variance component. In this paper we illustrate the application of a 3-level multilevel model to twin data by analyzing the regional clustering of 7-year-old children’s height in the Netherlands. Our findings show that 1.8%, of the phenotypic variance in children’s height is attributable to regional clustering, which is 7% of the variance explained by between-family or common environmental components. Since regional clustering may represent ancestry, we also investigate the effect of region after correcting for genetic principal components, in a subsample of participants with genome-wide SNP data. After correction, region no longer explained variation in height. Our results suggest that the phenotypic variance explained by region might represent ancestry effects on height.

Download Full-text

Multilevel Twin Models: Geographical Region as a Third Level Variable

10.1101/2020.11.11.377820 ◽

2020 ◽

Author(s):

Zenab Tamimy ◽

Sofieke T. Kevenaar ◽

Jouke Jan Hottenga ◽

Michael D. Hunter ◽

Eveline L. de Zeeuw ◽

...

Keyword(s):

Multilevel Model ◽

Phenotypic Variance ◽

Environmental Variance ◽

Twin Model ◽

Snp Data ◽

Genome Wide ◽

Twin Models ◽

Regional Clustering ◽

Level Variable ◽

Variance Explained

AbstractThe classical twin model can be reparametrized as an equivalent multilevel model. The multilevel parameterization has underexplored advantages, such as the possibility to include higher-level clustering variables in which lower levels are nested. When this higher-level clustering is not modeled, its variance is captured by the common environmental variance component. In this paper we illustrate the application of a 3-level multilevel model to twin data by analyzing the regional clustering of 7-year-old children’s height in the Netherlands. Our findings show that 1.8%, of the phenotypic variance in children’s height is attributable to regional clustering, which is 7% of the variance explained by between-family or common environmental components. Since regional clustering may represent ancestry, we also investigate the effect of region after correcting for genetic principal components, in a subsample of participants with genome-wide SNP data. After correction, region did no longer explain variation in height. Our results suggest that the phenotypic variance explained by region actually represent ancestry effects on height.

Download Full-text

GxEsum: a novel approach to estimate the phenotypic variance explained by genome-wide GxE interaction based on GWAS summary statistics for biobank-scale data

Genome Biology ◽

10.1186/s13059-021-02403-1 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Jisu Shin ◽

Sang Hong Lee

Keyword(s):

Complex Traits ◽

Error Rates ◽

Type I ◽

Phenotypic Variance ◽

Environment Interaction ◽

Summary Statistics ◽

Gxe Interaction ◽

Genome Wide ◽

Scale Data ◽

Variance Explained

AbstractGenetic variation in response to the environment, that is, genotype-by-environment interaction (GxE), is fundamental in the biology of complex traits and diseases. However, existing methods are computationally demanding and infeasible to handle biobank-scale data. Here, we introduce GxEsum, a method for estimating the phenotypic variance explained by genome-wide GxE based on GWAS summary statistics. Through comprehensive simulations and analysis of UK Biobank with 288,837 individuals, we show that GxEsum can handle a large-scale biobank dataset with controlled type I error rates and unbiased GxE estimates, and its computational efficiency can be hundreds of times higher than existing GxE methods.

Download Full-text

Inbreeding depression across the genome of Dutch Holstein Friesian dairy cattle

Genetics Selection Evolution ◽

10.1186/s12711-020-00583-1 ◽

2020 ◽

Vol 52 (1) ◽

Author(s):

Harmen P. Doekes ◽

Piter Bijma ◽

Roel F. Veerkamp ◽

Gerben de Jong ◽

Yvonne C. J. Wientjes ◽

...

Keyword(s):

Inbreeding Depression ◽

Relationship Matrix ◽

Phenotypic Variance ◽

Yield Traits ◽

Udder Health ◽

Snp Data ◽

Genome Wide ◽

Holstein Friesian ◽

Phenotype Data ◽

Regions Of Homozygosity

Abstract Background Inbreeding depression refers to the decrease in mean performance due to inbreeding. Inbreeding depression is caused by an increase in homozygosity and reduced expression of (on average) favourable dominance effects. Dominance effects and allele frequencies differ across loci, and consequently inbreeding depression is expected to differ along the genome. In this study, we investigated differences in inbreeding depression across the genome of Dutch Holstein Friesian cattle, by estimating dominance effects and effects of regions of homozygosity (ROH). Methods Genotype (75 k) and phenotype data of 38,792 cows were used. For nine yield, fertility and udder health traits, GREML models were run to estimate genome-wide inbreeding depression and estimate additive, dominance and ROH variance components. For this purpose, we introduced a ROH-based relationship matrix. Additive, dominance and ROH effects per SNP were obtained through back-solving. In addition, a single SNP GWAS was performed to identify significant additive, dominance or ROH associations. Results Genome-wide inbreeding depression was observed for all yield, fertility and udder health traits. For example, a 1% increase in genome-wide homozygosity was associated with a decrease in 305-d milk yield of approximately 99 kg. For yield traits only, including dominance and ROH effects in the GREML model resulted in a better fit (P < 0.05) than a model with only additive effects. After correcting for the effect of genome-wide homozygosity, dominance and ROH variance explained less than 1% of the phenotypic variance for all traits. Furthermore, dominance and ROH effects were distributed evenly along the genome. The most notable region with a favourable dominance effect for yield traits was on chromosome 5, but overall few regions with large favourable dominance effects and significant dominance associations were detected. No significant ROH-associations were found. Conclusions Inbreeding depression was distributed quite equally along the genome and was well captured by genome-wide homozygosity. These findings suggest that, based on 75 k SNP data, there is little benefit of accounting for region-specific inbreeding depression in selection schemes.

Download Full-text

Extended Twin Study of Alcohol Use in Virginia and Australia

Twin Research and Human Genetics ◽

10.1017/thg.2018.21 ◽

2018 ◽

Vol 21 (3) ◽

pp. 163-178 ◽

Cited By ~ 2

Author(s):

Brad Verhulst ◽

Michael C. Neale ◽

Lindon J. Eaves ◽

Sarah E. Medland ◽

Andrew C. Heath ◽

...

Keyword(s):

Environmental Factors ◽

Variance Components ◽

Age Of Onset ◽

Environmental Variance ◽

Twin Design ◽

Twin Model ◽

Twin Models ◽

Extended Twin Design ◽

Drinking Behaviors ◽

High Level

Drinking alcohol is a normal behavior in many societies, and prior studies have demonstrated it has both genetic and environmental sources of variation. Using two very large samples of twins and their first-degree relatives (Australia ≈ 20,000 individuals from 8,019 families; Virginia ≈ 23,000 from 6,042 families), we examine whether there are differences: (1) in the genetic and environmental factors that influence four interrelated drinking behaviors (quantity, frequency, age of initiation, and number of drinks in the last week), (2) between the twin-only design and the extended twin design, and (3) the Australian and Virginia samples. We find that while drinking behaviors are interrelated, there are substantial differences in the genetic and environmental architectures across phenotypes. Specifically, drinking quantity, frequency, and number of drinks in the past week have large broad genetic variance components, and smaller but significant environmental variance components, while age of onset is driven exclusively by environmental factors. Further, the twin-only design and the extended twin design come to similar conclusions regarding broad-sense heritability and environmental transmission, but the extended twin models provide a more nuanced perspective. Finally, we find a high level of similarity between the Australian and Virginian samples, especially for the genetic factors. The observed differences, when present, tend to be at the environmental level. Implications for the extended twin model and future directions are discussed.

Download Full-text

VarExp: Estimating variance explained by Genome-Wide GxE summary statistics

10.1101/224634 ◽

2017 ◽

Cited By ~ 1

Author(s):

Vincent Laville ◽

Amy R. Bentley ◽

Florian Privé ◽

Xiafoeng Zhu ◽

Jim Gauderman ◽

...

Keyword(s):

Association Studies ◽

Statistical Genetics ◽

Interaction Effects ◽

Phenotypic Variance ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Ethical Challenges ◽

Gxe Interaction ◽

Genome Wide ◽

Variance Explained

AbstractMany genomic analyses, such as genome-wide association studies (GWAS) or genome-wide screening for Gene-Environment (GxE) interactions have been performed to elucidate the underlying mechanisms of human traits and diseases. When the analyzed outcome is quantitative, the overall contribution of identified genetic variants to the outcome is often expressed as the percentage of phenotypic variance explained. In practice, this is commonly estimated using individual genotype data. However, using individual-level data faces practical and ethical challenges when the GWAS results are derived in large consortia through meta-analysis of results from multiple cohorts. In this work, we present a R package, “VarExp”, that allows for the estimation of the percentage of phenotypic variance explained by variants of interest using summary statistics only. Our package allows for a range of models to be evaluated, including marginal genetic effects, GxE interaction effects, and main genetic and interaction effects jointly. Its implementation integrates all recent methodological developments on the topic and does not need external data to be uploaded by users.The R source code, tutorial and associated example are available at https://gitlab.pasteur.fr/statistical-genetics/VarExp.git.

Download Full-text

Multilevel Dynamic Twin Modeling Preprint

10.31234/osf.io/x52af ◽

2020 ◽

Author(s):

Noémi Katalin Schuurman

Keyword(s):

Repeated Measures ◽

Multilevel Model ◽

Population Level ◽

Single Pair ◽

Data Set ◽

Intensive Longitudinal Data ◽

Twin Model ◽

Simplex Model ◽

Twin Models ◽

Environmental Component

Recent development in the collection and modeling of intensive longitudinal data has made it possible to fit dynamic genetic twin models, in which within-person processes are separated into a genetic and environmental component. A relatively well-known dynamic twin model is the genetic simplex model, which is fitted to a few repeated measures for a group of twins. A more recently developed model is the iFACE model, which is fitted to many repeated measures for a single pair of twins. In this paper we introduce a missing link between these two models - a multilevel extension that allows for making both population-level and twin-level inferences. We provide a proof-of-principle simulation study for this model, and apply it to an experience sampling data set on 148 monozygotic and 88 dizygotic twins. We use the multilevel model to examine the overlap and differences between the dynamic genetic twin models and the classic twin models, as well as their interpretation.

Download Full-text

Systematic pathway-analysis of kinesin protein family (KIF) using genome-wide SNP data in patients with myocardial infarction: Genetic variation in KIFC3 gene associates with myocardial infarction

The Thoracic and Cardiovascular Surgeon ◽

10.1055/s-0029-1191643 ◽

2009 ◽

Vol 56 (S 01) ◽

Author(s):

S Eifert ◽

A Goetz ◽

P Linsel-Nitschke ◽

A Medack ◽

C Hengstenberg ◽

...

Keyword(s):

Myocardial Infarction ◽

Genetic Variation ◽

Pathway Analysis ◽

Protein Family ◽

Snp Data ◽

Genome Wide

Download Full-text

Genome-wide SNPs redefines species boundaries and conservation units in the freshwater mussel genus Cyprogenia of North America

Scientific Reports ◽

10.1038/s41598-021-90325-0 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Kyung Seok Kim ◽

Kevin J. Roe

Keyword(s):

Phylogenetic Analyses ◽

Freshwater Mussel ◽

Conservation Strategies ◽

Nucleotide Polymorphisms ◽

Conservation Units ◽

Genetic Structuring ◽

The North ◽

Snp Data ◽

Genome Wide ◽

Significant Difference

AbstractDetailed information on species delineation and population genetic structure is a prerequisite for designing effective restoration and conservation strategies for imperiled organisms. Phylogenomic and population genomic analyses based on genome-wide double digest restriction-site associated DNA sequencing (ddRAD-Seq) data has identified three allopatric lineages in the North American freshwater mussel genus Cyprogenia. Cyprogenia stegaria is restricted to the Eastern Highlands and displays little genetic structuring within this region. However, two allopatric lineages of C. aberti in the Ozark and Ouachita highlands exhibit substantial levels (mean uncorrected FST = 0.368) of genetic differentiation and each warrants recognition as a distinct evolutionary lineage. Lineages of Cyprogenia in the Ouachita and Ozark highlands are further subdivided reflecting structuring at the level of river systems. Species tree inference and species delimitation in a Bayesian framework using single nucleotide polymorphisms (SNP) data supported results from phylogenetic analyses, and supports three species of Cyprogenia over the currently recognized two species. A comparison of SNPs generated from both destructively and non-destructively collected samples revealed no significant difference in the SNP error rate, quality and amount of ddRAD sequence reads, indicating that nondestructive or trace samples can be effectively utilized to generate SNP data for organisms for which destructive sampling is not permitted.

Download Full-text

Genome-wide SNP data unravel the ancestry and signatures of divergent selection in Ghurrah pigs of India

Livestock Science ◽

10.1016/j.livsci.2021.104587 ◽

2021 ◽

pp. 104587

Author(s):

Arnav Mehrotra ◽

Bharat Bhushan ◽

Karthikeyan A ◽

Akansha Singh ◽

Snehasmita Panda ◽

...

Keyword(s):

Divergent Selection ◽

Snp Data ◽

Genome Wide

Download Full-text

Genome-Wide Association and Prediction of Male and Female Floral Hybrid Potential Traits in Elite Spring Bread Wheat Genotypes

Plants ◽

10.3390/plants10050895 ◽

2021 ◽

Vol 10 (5) ◽

pp. 895

Author(s):

Samira El Hanafi ◽

Souad Cherkaoui ◽

Zakaria Kehel ◽

Ayed Al-Abdallat ◽

Wuletaw Tadesse

Keyword(s):

Genomic Selection ◽

Bread Wheat ◽

Genome Wide Association ◽

Hybrid Breeding ◽

Floral Traits ◽

Phenotypic Variance ◽

Spring Bread Wheat ◽

Genome Wide ◽

Anther Length ◽

Anther Extrusion

Hybrid wheat breeding is one of the most promising technologies for further sustainable yield increases. However, the cleistogamous nature of wheat displays a major bottleneck for a successful hybrid breeding program. Thus, an optimized breeding strategy by developing appropriate parental lines with favorable floral trait combinations is the best way to enhance the outcrossing ability. This study, therefore, aimed to dissect the genetic basis of various floral traits using genome-wide association study (GWAS) and to assess the potential of genome-wide prediction (GP) for anther extrusion (AE), visual anther extrusion (VAE), pollen mass (PM), pollen shedding (PSH), pollen viability (PV), anther length (AL), openness of the flower (OPF), duration of floret opening (DFO) and stigma length. To this end, we employed 196 ICARDA spring bread wheat lines evaluated for three years and genotyped with 10,477 polymorphic SNP. In total, 70 significant markers were identified associated to the various assessed traits at FDR ≤ 0.05 contributing a minor to large proportion of the phenotypic variance (8–26.9%), affecting the traits either positively or negatively. GWAS revealed multi-marker-based associations among AE, VAE, PM, OPF and DFO, most likely linked markers, suggesting a potential genomic region controlling the genetic association of these complex traits. Of these markers, Kukri_rep_c103359_233 and wsnp_Ex_rep_c107911_91350930 deserve particular attention. The consistently significant markers with large effect could be useful for marker-assisted selection. Genomic selection revealed medium to high prediction accuracy ranging between 52% and 92% for the assessed traits with the least and maximum value observed for stigma length and visual anther extrusion, respectively. This indicates the feasibility to implement genomic selection to predict the performance of hybrid floral traits with high reliability.

Download Full-text