Power Analysis Provides Bounds for Genetic Architecture and Insights to Challenges for Rare Variant Association Studies

AbstractGenome-wide association studies are now shifting focus from analysis of common to uncommon and rare variants with an anticipation to explain additional heritability of complex traits. As power for association testing for individual rare variants may often be low, various aggregate level association tests have been proposed to detect genetic loci that may contain clusters of susceptibility variants. Typically, power calculations for such tests require specification of large number of parameters, including effect sizes and allele frequencies of individual variants, making them difficult to use in practice. In this report, we approximate power to varying degree of accuracy using a smaller number of key parameters, including the total genetic variance explained by multiple variants within a locus. We perform extensive simulation studies to assess the accuracy of the proposed approximations in realistic settings. Using the simplified power calculation methods, we then develop an analytic framework to obtain bounds on genetic architecture of an underlying trait given results from a genome-wide study and observe important implications for the completely lack of or limited number of findings in many currently reported studies. Finally, we provide insights into the required quality of annotation/functional information for identification of likely causal variants to make meaningful improvement in power of subsequent association tests. A shiny application, Power Analysis for GEnetic AssociatioN Tests (PAGEANT), in R implementing the methods is made publicly available.

Download Full-text

Genome-wide association studies of callus differentiation for the desert tree, Populus euphratica

Tree Physiology ◽

10.1093/treephys/tpaa098 ◽

2020 ◽

Vol 40 (12) ◽

pp. 1762-1777

Author(s):

Qianru Zhang ◽

Zhifang Su ◽

Yunqian Guo ◽

Shilong Zhang ◽

Libo Jiang ◽

...

Keyword(s):

Quantitative Trait ◽

Genetic Architecture ◽

Association Studies ◽

Genotypic Variation ◽

Genetic Network ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Callus Differentiation ◽

A Genome

Abstract Callus differentiation is a key developmental process in plant regeneration from cells. A better understanding of the genetic architecture of callus differentiation timing can help improve tissue transformation and the efficiency of artificial propagation. In this study, we investigated genotypic variation in callus differentiation capacity among 297 diverse P. euphratica trees sampled from a natural population. We employed a genome-wide association study (GWAS) of binary and growth-based parameters to identify loci and characterize the genetic architecture and genetic network underlying regulation of callus differentiation in P. euphratica. The results of this GWAS experiment suggested potential associations controlling whether the callus could differentiate and the process of callus differentiation. We identified multiple significant quantitative trait loci (QTLs), including the genes LOG1 and LOG7 and a locus containing WOX1. We reconstructed a genetic network that visualizes how each QTL interacts uniquely with other variants, and several core QTLs were detected that are involved in the degree of callus differentiation, providing potential targets for selection. This study represents one of the first to identify genetic variants affecting callus differentiation in a forest tree. Our results suggest that callus differentiation may be a typical qualitative-quantitative trait controlled by a major gene as well as polygenes across the genome of P. euphratica. This GWAS will help to design more complex and specific molecular tools for systematically manipulating organ regeneration.

Download Full-text

Novel Linkage Peaks Discovered for Diabetic Nephropathy in Individuals With Type 1 Diabetes

10.2337/figshare.13507995 ◽

2021 ◽

Author(s):

Jani Haukka ◽

Niina Sandholm ◽

Erkka Valo ◽

Carol Forsblom ◽

Valma Harjutsalo ◽

...

Keyword(s):

Type 1 Diabetes ◽

Diabetic Nephropathy ◽

Rare Variants ◽

Association Studies ◽

Linkage Study ◽

Study Cohort ◽

Genome Wide Association Studies ◽

Genome Wide ◽

A Genome

Genome-wide association studies (GWAS) and linkage studies have had only limited success in discovering genome-wide significantly linked regions or risk loci for diabetic nephropathy in individuals with type 1 diabetes (T1D). As GWAS cohorts have grown, they have also included more documented and undocumented familial relationships. Here, we computationally inferred and manually curated pedigrees in a study cohort of more than 6,000 individuals with T1D and their non-diabetic relatives. We performed linkage study for 177 pedigrees consisting of 452 individuals with T1D and their relatives using a genome- wide genotyping array with more than 300,000 SNPs and the PSEUDOMARKER software. The analysis resulted in genome-wide significant linkage peaks on eight chromosomal regions from five chromosomes (logarithm of odds [LOD]>3.3). The highest peak was localized at the HLA region on chromosome 6p, but whether the peak originates from T1D or diabetic nephropathy, remains ambiguous. Of the other significant peaks, the chromosome 4p22 region is localized on top of a gene associated with focal segmental glomerulosclerosis, ARHGAP24, suggesting that the gene may play a role in diabetic nephropathy as well. Furthermore, rare variants have been associated with diabetic nephropathy and chronic kidney disease near the 4q25 peak, localized on top of CCSER1.

Download Full-text

Novel Linkage Peaks Discovered for Diabetic Nephropathy in Individuals With Type 1 Diabetes

10.2337/figshare.13507995.v1 ◽

2021 ◽

Author(s):

Jani Haukka ◽

Niina Sandholm ◽

Erkka Valo ◽

Carol Forsblom ◽

Valma Harjutsalo ◽

...

Keyword(s):

Type 1 Diabetes ◽

Diabetic Nephropathy ◽

Rare Variants ◽

Association Studies ◽

Linkage Study ◽

Study Cohort ◽

Genome Wide Association Studies ◽

Genome Wide ◽

A Genome

Download Full-text

Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program

Nature ◽

10.1038/s41586-021-03205-y ◽

2021 ◽

Vol 590 (7845) ◽

pp. 290-299 ◽

Cited By ~ 22

Author(s):

Daniel Taliun ◽

◽

Daniel N. Harris ◽

Michael D. Kessler ◽

Jedidiah Carlson ◽

...

Keyword(s):

Rare Variants ◽

Sequence Data ◽

Association Studies ◽

Genotype Imputation ◽

Genome Wide Association Studies ◽

Phenotypic Data ◽

Treatment And Prevention ◽

Genome Wide ◽

Diverse Backgrounds ◽

Unmapped Reads

AbstractThe Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1. In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.

Download Full-text

A meta-analysis of genome-wide association studies for average daily gain and lean meat percentage in two Duroc pig populations

BMC Genomics ◽

10.1186/s12864-020-07288-1 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Shenping Zhou ◽

Rongrong Ding ◽

Fanming Meng ◽

Xingwang Wang ◽

Zhanwei Zhuang ◽

...

Keyword(s):

Candidate Genes ◽

Growth And Development ◽

Genetic Architecture ◽

Association Studies ◽

Meta Analysis ◽

Average Daily Gain ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Daily Gain

Abstract Background Average daily gain (ADG) and lean meat percentage (LMP) are the main production performance indicators of pigs. Nevertheless, the genetic architecture of ADG and LMP is still elusive. Here, we conducted genome-wide association studies (GWAS) and meta-analysis for ADG and LMP in 3770 American and 2090 Canadian Duroc pigs. Results In the American Duroc pigs, one novel pleiotropic quantitative trait locus (QTL) on Sus scrofa chromosome 1 (SSC1) was identified to be associated with ADG and LMP, which spans 2.53 Mb (from 159.66 to 162.19 Mb). In the Canadian Duroc pigs, two novel QTLs on SSC1 were detected for LMP, which were situated in 3.86 Mb (from 157.99 to 161.85 Mb) and 555 kb (from 37.63 to 38.19 Mb) regions. The meta-analysis identified ten and 20 additional SNPs for ADG and LMP, respectively. Finally, four genes (PHLPP1, STC1, DYRK1B, and PIK3C2A) were detected to be associated with ADG and/or LMP. Further bioinformatics analysis showed that the candidate genes for ADG are mainly involved in bone growth and development, whereas the candidate genes for LMP mainly participated in adipose tissue and muscle tissue growth and development. Conclusions We performed GWAS and meta-analysis for ADG and LMP based on a large sample size consisting of two Duroc pig populations. One pleiotropic QTL that shared a 2.19 Mb haplotype block from 159.66 to 161.85 Mb on SSC1 was found to affect ADG and LMP in the two Duroc pig populations. Furthermore, the combination of single-population and meta-analysis of GWAS improved the efficiency of detecting additional SNPs for the analyzed traits. Our results provide new insights into the genetic architecture of ADG and LMP traits in pigs. Moreover, some significant SNPs associated with ADG and/or LMP in this study may be useful for marker-assisted selection in pig breeding.

Download Full-text

Genetics of complex traits: prediction of phenotype, identification of causal polymorphisms and genetic architecture

Proceedings of The Royal Society B Biological Sciences ◽

10.1098/rspb.2016.0569 ◽

2016 ◽

Vol 283 (1835) ◽

pp. 20160569 ◽

Cited By ~ 52

Author(s):

M. E. Goddard ◽

K. E. Kemper ◽

I. M. MacLeod ◽

A. J. Chamberlain ◽

B. J. Hayes

Keyword(s):

Complex Traits ◽

Genetic Architecture ◽

Quantitative Traits ◽

Association Studies ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Crop Breeding ◽

Single Nucleotide ◽

Genome Wide ◽

Phenotype Identification

Complex or quantitative traits are important in medicine, agriculture and evolution, yet, until recently, few of the polymorphisms that cause variation in these traits were known. Genome-wide association studies (GWAS), based on the ability to assay thousands of single nucleotide polymorphisms (SNPs), have revolutionized our understanding of the genetics of complex traits. We advocate the analysis of GWAS data by a statistical method that fits all SNP effects simultaneously, assuming that these effects are drawn from a prior distribution. We illustrate how this method can be used to predict future phenotypes, to map and identify the causal mutations, and to study the genetic architecture of complex traits. The genetic architecture of complex traits is even more complex than previously thought: in almost every trait studied there are thousands of polymorphisms that explain genetic variation. Methods of predicting future phenotypes, collectively known as genomic selection or genomic prediction, have been widely adopted in livestock and crop breeding, leading to increased rates of genetic improvement.

Download Full-text

Better estimation of SNP heritability from summary statistics provides a new understanding of the genetic architecture of complex traits

10.1101/284976 ◽

2018 ◽

Cited By ~ 6

Author(s):

Doug Speed ◽

David J Balding

Keyword(s):

Complex Traits ◽

Genetic Architecture ◽

Large Scale ◽

Association Studies ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Confounding Bias ◽

Conserved Regions ◽

Genome Wide ◽

Variation Explained

LD Score Regression (LDSC) has been widely applied to the results of genome-wide association studies. However, its estimates of SNP heritability are derived from an unrealistic model in which each SNP is expected to contribute equal heritability. As a consequence, LDSC tends to over-estimate confounding bias, under-estimate the total phenotypic variation explained by SNPs, and provide misleading estimates of the heritability enrichment of SNP categories. Therefore, we present SumHer, software for estimating SNP heritability from summary statistics using more realistic heritability models. After demonstrating its superiority over LDSC, we apply SumHer to the results of 24 large-scale association studies (average sample size 121 000). First we show that these studies have tended to substantially over-correct for confounding, and as a result the number of genome-wide significant loci has under-reported by about 20%. Next we estimate enrichment for 24 categories of SNPs defined by functional annotations. A previous study using LDSC reported that conserved regions were 13-fold enriched, and found a further twelve categories with above 2-fold enrichment. By contrast, our analysis using SumHer finds that conserved regions are only 1.6-fold (SD 0.06) enriched, and that no category has enrichment above 1.7-fold. SumHer provides an improved understanding of the genetic architecture of complex traits, which enables more efficient analysis of future genetic data.

Download Full-text

Impact of Pre and Post Variant Filtration Strategies on Imputation

10.21203/rs.3.rs-128366/v1 ◽

2020 ◽

Author(s):

Celine Charon ◽

Rodrigue Allodji ◽

Vincent Meyer ◽

Jean-François Deleuze

Keyword(s):

Quality Control ◽

Rare Variants ◽

Association Studies ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Direct Effects ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Genome Wide ◽

Conservative Post

Abstract Quality control methods for genome-wide association studies and fine mapping are commonly used for imputation, however, they result in loss of many single nucleotide polymorphisms (SNPs). To investigate the consequences of filtration on imputation, we studied the direct effects on the number of markers, their allele frequencies, imputation quality scores and post-filtration events. We pre-phrased 1,031 genotyped individuals from diverse ethnicities and compared the imputed variants to 1,089 NCBI recorded individuals for additional validation.Without variant pre-filtration based on quality control (QC), we observed no impairment in the imputation of SNPs that failed QC whereas with pre-filtration there was an overall loss of information. Significant differences between frequencies with and without pre-filtration were found only in the range of very rare (5E-04-1E-03) and rare variants (1E-03-5E-03) (p < 1E-04). Increasing the post-filtration imputation quality score from 0.3 to 0.8 reduced the number of single nucleotide variants (SNVs) <0.001 2.5 fold with or without QC pre-filtration and halved the number of very rare variants (5E-04). As a result, to maintain confidence and enough SNVs, we propose here a 2-step post-filtration approach to increase the number of very rare and rare variants compared to conservative post-filtration methods.

Download Full-text

Prioritization of genes associated with the pathogenesis of leukosis in cattle

Vavilov Journal of Genetics and Breeding ◽

10.18699/vj18.451 ◽

2019 ◽

Vol 22 (8) ◽

pp. 1063-1069 ◽

Cited By ~ 1

Author(s):

N. S. Yudin ◽

N. L. Podkolodnyy ◽

T. A. Agarkova ◽

E. V. Ignatieva

Keyword(s):

Protein Interactions ◽

Genome Wide Association Study ◽

Association Studies ◽

Mammalian Species ◽

Genome Wide Association ◽

Farm Animals ◽

Genome Wide Association Studies ◽

Protein Protein Interactions ◽

Genome Wide ◽

A Genome

Selection by means of genetic markers is a promising approach to the eradication of infectious diseases in farm animals, especially in the absence of eﬀective methods of treatment and prevention. Bovine leukemia virus (BLV) is spread throughout the world and represents one of the biggest problems for the livestock production and food security in Russia. However, recent genome-wide association studies have shown that sensitivity/resistance to BLV is polygenic. The aim of this study was to create a catalog of cattle genes and genes of other mammalian species involved in the pathogenesis of BLV-induced infection and to perform gene prioritization using bioinformatics methods. Based on manually collected information from a range of open sources, a total of 446 genes were included in the catalog of cattle genes and genes of other mammals involved in the pathogenesis of BLV-induced infection. The following criteria were used to prioritize 446 genes from the catalog: (1) the gene is associated with leukemia according to a genome-wide association study; (2) the gene is associated with leukemia according to a case-control study; (3) the role of the gene in leukemia development has been studied using knockout mice; (4) protein-protein interactions exist between the gene-encoded protein and either viral particles or individual viral proteins; (5) the gene is annotated with Gene Ontology terms that are overrepresented for a given list of genes; (6) the gene participates in biological pathways from the KEGG or REACTOME databases, which are over-represented for a given list of genes; (7) the protein encoded by the gene has a high number of protein-protein interactions with proteins encoded by other genes from the catalog. Based on each criterion, a rank was assigned to each gene. Then the ranks were summarized and an overall rank was determined. Prioritization of 446 candidate genes allowed us to identify 5 genes of interest (TNF,LTB,BOLA-DQA1,BOLA-DRB3,ATF2), which can aﬀect the sensitivity/resistance of cattle to leukemia.

Download Full-text

Meta-analysis of genome-wide association studies for body fat distribution in 694 649 individuals of European ancestry

Human Molecular Genetics ◽

10.1093/hmg/ddy327 ◽

2018 ◽

Vol 28 (1) ◽

pp. 166-174 ◽

Cited By ~ 109

Author(s):

Sara L Pulit ◽

Charli Stoneman ◽

Andrew P Morris ◽

Andrew R Wood ◽

Craig A Glastonbury ◽

...

Keyword(s):

Body Fat ◽

Association Studies ◽

Meta Analysis ◽

Fat Distribution ◽

Body Fat Distribution ◽

Genome Wide Association ◽

European Ancestry ◽

Genome Wide Association Studies ◽

Genome Wide ◽

A Genome

Abstract More than one in three adults worldwide is either overweight or obese. Epidemiological studies indicate that the location and distribution of excess fat, rather than general adiposity, are more informative for predicting risk of obesity sequelae, including cardiometabolic disease and cancer. We performed a genome-wide association study meta-analysis of body fat distribution, measured by waist-to-hip ratio (WHR) adjusted for body mass index (WHRadjBMI), and identified 463 signals in 346 loci. Heritability and variant effects were generally stronger in women than men, and we found approximately one-third of all signals to be sexually dimorphic. The 5% of individuals carrying the most WHRadjBMI-increasing alleles were 1.62 times more likely than the bottom 5% to have a WHR above the thresholds used for metabolic syndrome. These data, made publicly available, will inform the biology of body fat distribution and its relationship with disease.

Download Full-text