scholarly journals Statistical models and computational tools for predicting complex traits and diseases

2021 ◽  
Vol 19 (4) ◽  
pp. e36
Author(s):  
Wonil Chung

Predicting individual traits and diseases from genetic variants is critical to fulfilling the promise of personalized medicine. The genetic variants from genome-wide association studies (GWAS), including variants well below GWAS significance, can be aggregated into highly significant predictions across a wide range of complex traits and diseases. The recent arrival of large-sample public biobanks enables highly accurate polygenic predictions based on genetic variants across the whole genome. Various statistical methodologies and diverse computational tools have been introduced and developed to computed the polygenic risk score (PRS) more accurately. However, many researchers utilize PRS tools without a thorough understanding of the underlying model and how to specify the parameters for the best performance. It is advantageous to study the statistical models implemented in computational tools for PRS estimation and the formulas of parameters to be specified. Here, we review a variety of recent statistical methodologies and computational tools for PRS computation.

2019 ◽  
Author(s):  
Tom G Richardson ◽  
Gibran Hemani ◽  
Tom R Gaunt ◽  
Caroline L Relton ◽  
George Davey Smith

AbstractBackgroundDeveloping insight into tissue-specific transcriptional mechanisms can help improve our understanding of how genetic variants exert their effects on complex traits and disease. By applying the principles of Mendelian randomization, we have undertaken a systematic analysis to evaluate transcriptome-wide associations between gene expression across 48 different tissue types and 395 complex traits.ResultsOverall, we identified 100,025 gene-trait associations based on conventional genome-wide corrections (P < 5 × 10−08) that also provided evidence of genetic colocalization. These results indicated that genetic variants which influence gene expression levels in multiple tissues are more likely to influence multiple complex traits. We identified many examples of tissue-specific effects, such as genetically-predicted TPO, NR3C2 and SPATA13 expression only associating with thyroid disease in thyroid tissue. Additionally, FBN2 expression was associated with both cardiovascular and lung function traits, but only when analysed in heart and lung tissue respectively.We also demonstrate that conducting phenome-wide evaluations of our results can help flag adverse on-target side effects for therapeutic intervention, as well as propose drug repositioning opportunities. Moreover, we find that exploring the tissue-dependency of associations identified by genome-wide association studies (GWAS) can help elucidate the causal genes and tissues responsible for effects, as well as uncover putative novel associations.ConclusionsThe atlas of tissue-dependent associations we have constructed should prove extremely valuable to future studies investigating the genetic determinants of complex disease. The follow-up analyses we have performed in this study are merely a guide for future research. Conducting similar evaluations can be undertaken systematically at http://mrcieu.mrsoftware.org/Tissue_MR_atlas/.


2019 ◽  
Vol 20 (10) ◽  
pp. 765-780 ◽  
Author(s):  
Diana Cruz ◽  
Ricardo Pinto ◽  
Margarida Freitas-Silva ◽  
José Pedro Nunes ◽  
Rui Medeiros

Atrial fibrillation (AF) and stroke are included in a group of complex traits that have been approached regarding of their study by susceptibility genetic determinants. Since 2007, several genome-wide association studies (GWAS) aiming to identify genetic variants modulating AF risk have been conducted. Thus, 11 GWAS have identified 26 SNPs (p < 5 × 10-2), of which 19 reached genome-wide significance (p < 5 × 10-8). From those variants, seven were also associated with cardioembolic stroke and three reached genome-wide significance in stroke GWAS. These associations may shed a light on putative shared etiologic mechanisms between AF and cardioembolic stroke. Additionally, some of these identified variants have been incorporated in genetic risk scores in order to elucidate new approaches of stroke prediction, prevention and treatment.


2019 ◽  
Vol 25 (10) ◽  
pp. 2455-2467 ◽  
Author(s):  
Tim B. Bigdeli ◽  
◽  
Giulio Genovese ◽  
Penelope Georgakopoulos ◽  
Jacquelyn L. Meyers ◽  
...  

Abstract Schizophrenia is a common, chronic and debilitating neuropsychiatric syndrome affecting tens of millions of individuals worldwide. While rare genetic variants play a role in the etiology of schizophrenia, most of the currently explained liability is within common variation, suggesting that variation predating the human diaspora out of Africa harbors a large fraction of the common variant attributable heritability. However, common variant association studies in schizophrenia have concentrated mainly on cohorts of European descent. We describe genome-wide association studies of 6152 cases and 3918 controls of admixed African ancestry, and of 1234 cases and 3090 controls of Latino ancestry, representing the largest such study in these populations to date. Combining results from the samples with African ancestry with summary statistics from the Psychiatric Genomics Consortium (PGC) study of schizophrenia yielded seven newly genome-wide significant loci, and we identified an additional eight loci by incorporating the results from samples with Latino ancestry. Leveraging population differences in patterns of linkage disequilibrium, we achieve improved fine-mapping resolution at 22 previously reported and 4 newly significant loci. Polygenic risk score profiling revealed improved prediction based on trans-ancestry meta-analysis results for admixed African (Nagelkerke’s R2 = 0.032; liability R2 = 0.017; P < 10−52), Latino (Nagelkerke’s R2 = 0.089; liability R2 = 0.021; P < 10−58), and European individuals (Nagelkerke’s R2 = 0.089; liability R2 = 0.037; P < 10−113), further highlighting the advantages of incorporating data from diverse human populations.


2020 ◽  
Vol 38 (15_suppl) ◽  
pp. 1528-1528
Author(s):  
Heena Desai ◽  
Anh Le ◽  
Ryan Hausler ◽  
Shefali Verma ◽  
Anurag Verma ◽  
...  

1528 Background: The discovery of rare genetic variants associated with cancer have a tremendous impact on reducing cancer morbidity and mortality when identified; however, rare variants are found in less than 5% of cancer patients. Genome wide association studies (GWAS) have identified hundreds of common genetic variants significantly associated with a number of cancers, but the clinical utility of individual variants or a polygenic risk score (PRS) derived from multiple variants is still unclear. Methods: We tested the ability of polygenic risk score (PRS) models developed from genome-wide significant variants to differentiate cases versus controls in the Penn Medicine Biobank. Cases for 15 different cancers and cancer-free controls were identified using electronic health record billing codes for 11,524 European American and 5,994 African American individuals from the Penn Medicine Biobank. Results: The discriminatory ability of the 15 PRS models to distinguish their respective cancer cases versus controls ranged from 0.68-0.79 in European Americans and 0.74-0.93 in African Americans. Seven of the 15 cancer PRS trended towards an association with their cancer at a p<0.05 (Table), and PRS for prostate, thyroid and melanoma were significantly associated with their cancers at a bonferroni corrected p<0.003 with OR 1.3-1.6 in European Americans. Conclusions: Our data demonstrate that common variants with significant associations from GWAS studies can distinguish cancer cases versus controls for some cancers in an unselected biobank population. Given the small effects, future studies are needed to determine how best to incorporate PRS with other risk factors in the precision prediction of cancer risk. [Table: see text]


2020 ◽  
Author(s):  
Min Zhao ◽  
Hong Qu

Abstract Background: Circular RNAs (circRNAs) play important roles in regulating gene expression through binding miRNAs and RNA binding proteins. Genetic variation of circRNAs may affect complex traits/diseases by changing their binding efficiency to target miRNAs and proteins. There is a growing demand for investigations of the functions of genetic changes using large-scale experimental evidence. However, there is no online genetic resource for circRNA genes. Results: We performed extensive genetic annotation of 295,526 circRNAs integrated from circBase, circNet and circRNAdb. All pre-computed genetic variants were presented at our online resource, circVAR, with data browsing and search functionality. We explored the chromosome-based distribution of circRNAs and their associated variants. We found that, based on mapping to the 1000 Genomes and ClinVAR databases, chromosome 17 has a relatively large number of circRNAs and associated common and health-related genetic variants. Following the annotation of genome wide association studies (GWAS)-based circRNA variants, we found many non-coding variants within circRNAs, suggesting novel mechanisms for common diseases reported from GWAS studies. For cancer-based somatic variants, we found that chromosome 7 has many highly complex mutations that have been overlooked in previous research. Conclusion: We used the circVAR database to collect SNPs and small insertions and deletions (INDELs) in putative circRNA regions and to identify their potential phenotypic information. To provide a reusable resource for the circRNA research community, we have published all the pre-computed genetic data concerning circRNAs and associated genes together with data query and browsing functions at http://soft.bioinfo-minzhao.org/circvar .


2018 ◽  
Author(s):  
Bingxin Zhao ◽  
Fei Zou

Polygenic risk score (PRS) is the state-of-art prediction method for complex traits using summary level data from discovery genome-wide association studies (GWAS). The PRS, as its name suggests, is designed for polygenic traits by aggregating small genetic effects from a large number of causal SNPs and thus is viewed as a powerful method for predicting complex polygenic traits by the genetics community. However, one concern is that the prediction accuracy of PRS in practice remains low with little clinical utility, even for highly heritable traits. Another practical concern is whether genome-wide SNPs should be used in constructing PRS or not. To address the two concerns, we investigate PRS both empirically and theoretically. We show how the performance of PRS is influenced by the triplet (n, p, m), where n, p, m are the sample size, the number of SNPs studied, and the number of true causal SNPs, respectively. For a given heritability, we find that i) when PRS is constructed with all p SNPs (referred as GWAS-PRS), its prediction accuracy is controlled by the p/n ratio; while ii) when PRS is built with a set of top-ranked SNPs that pass a pre-specified threshold (referred as threshold-PRS), its accuracy varies depending on how sparse the true genetic signals are. Only when m is magnitude smaller than n, or genetic signals are sparse, can threshold-PRS perform well and outperform GWAS-PRS. Our results demystify the low performance of PRS in predicting highly polygenic traits, which will greatly increase researchers’ aware-ness of the power and limitations of PRS, and clear up some confusion on the clinical application of PRS.


2019 ◽  
Author(s):  
Sarah J. C. Craig ◽  
Ana M. Kenney ◽  
Junli Lin ◽  
Ian M. Paul ◽  
Leann L. Birch ◽  
...  

AbstractObesity is highly heritable, yet only a small fraction of its heritability has been attributed to specific genetic variants. These variants are traditionally ascertained from genome-wide association studies (GWAS), which utilize samples with tens or hundreds of thousands of individuals for whom a single summary measurement (e.g., BMI) is collected. An alternative approach is to focus on a smaller, more deeply characterized sample in conjunction with advanced statistical models that leverage detailed phenotypes. Here we use novel functional data analysis (FDA) techniques to capitalize on longitudinal growth information and construct a polygenic risk score (PRS) for obesity in children followed from birth to three years of age. This score, comprised of 24 single nucleotide polymorphisms (SNPs), is significantly higher in children with (vs. without) rapid infant weight gain—a predictor of obesity later in life. Using two independent cohorts, we show that genetic variants identified in early childhood are also informative in older children and in adults, consistent with early childhood obesity being predictive of obesity later in life. In contrast, PRSs based on SNPs identified by adult obesity GWAS are not predictive of weight gain in our cohort of children. Our research provides an example of a successful application of FDA to GWAS. We demonstrate that a deep, statistically sophisticated characterization of a longitudinal phenotype can provide increased statistical power to studies with relatively small sample sizes. This study shows how FDA approaches can be used as an alternative to the traditional GWAS.Author SummaryFinding genetic variants that confer an increased risk of developing a particular disease has long been a focus of modern genetics. Genome wide association studies (GWAS) have catalogued single nucleotide polymorphisms (SNPs) associated with a variety of complex diseases in humans, including obesity, but by and large have done so using increasingly large samples-- tens or even hundreds of thousands of individuals, whose phenotypes are thus often only superficially characterized. This, in turn, may hide the intricacies of the genetic influence on disease. GWAS findings are also usually study-population dependent. We found that genetic risk scores based on SNPs from large adult obesity studies are not predictive of the propensity to gain weight in very young children. However, using a small cohort of a few hundred children deeply characterized with growth trajectories between birth and two years, and leveraging such trajectories through novel functional data analysis (FDA) techniques, we were able to produce a strong childhood obesity genetic risk score.


2020 ◽  
Author(s):  
Meng Luo ◽  
Shiliang Gu

AbstractAlthough genome-wide association studies have successfully identified thousands of markers associated with various complex traits and diseases, our ability to predict such phenotypes remains limited. A perhaps ignored explanation lies in the limitations of the genetic models and statistical techniques commonly used in association studies. However, using genotype data for individuals to perform accurate genetic prediction of complex traits can promote genomic selection in animal and plant breeding and can lead to the development of personalized medicine in humans. Because most complex traits have a polygenic architecture, accurate genetic prediction often requires modeling genetic variants together via polygenic methods. Here, we also utilize our proposed polygenic methods, which refer to as the iterative screen regression model (ISR) for genome prediction. We compared ISR with several commonly used prediction methods with simulations. We further applied ISR to predicting 15 traits, including the five species of cattle, rice, wheat, maize, and mice. The results of the study indicate that the ISR method performs well than several commonly used polygenic methods and stability.


2014 ◽  
Vol 11 (94) ◽  
pp. 20130908 ◽  
Author(s):  
Beatriz Valcárcel ◽  
Timothy M. D. Ebbels ◽  
Antti J. Kangas ◽  
Pasi Soininen ◽  
Paul Elliot ◽  
...  

Current studies of phenotype diversity by genome-wide association studies (GWAS) are mainly focused on identifying genetic variants that influence level changes of individual traits without considering additional alterations at the system-level. However, in addition to level alterations of single phenotypes, differences in association between phenotype levels are observed across different physiological states. Such differences in molecular correlations between states can potentially reveal information about the system state beyond that reported by changes in mean levels alone. In this study, we describe a novel methodological approach, which we refer to as genome metabolome integrated network analysis (GEMINi) consisting of a combination of correlation network analysis and genome-wide correlation study. The proposed methodology exploits differences in molecular associations to uncover genetic variants involved in phenotype variation. We test the performance of the GEMINi approach in a simulation study and illustrate its use in the context of obesity and detailed quantitative metabolomics data on systemic metabolism. Application of GEMINi revealed a set of metabolic associations which differ between normal and obese individuals. While no significant associations were found between genetic variants and body mass index using a standard GWAS approach, further investigation of the identified differences in metabolic association revealed a number of loci, several of which have been previously implicated with obesity-related processes. This study highlights the advantage of using molecular associations as an alternative phenotype when studying the genetic basis of complex traits and diseases.


2017 ◽  
Vol 60 (3) ◽  
pp. 335-346 ◽  
Author(s):  
Markus Schmid ◽  
Jörn Bennewitz

Abstract. Quantitative or complex traits are controlled by many genes and environmental factors. Most traits in livestock breeding are quantitative traits. Mapping genes and causative mutations generating the genetic variance of these traits is still a very active area of research in livestock genetics. Since genome-wide and dense SNP panels are available for most livestock species, genome-wide association studies (GWASs) have become the method of choice in mapping experiments. Different statistical models are used for GWASs. We will review the frequently used single-marker models and additionally describe Bayesian multi-marker models. The importance of nonadditive genetic and genotype-by-environment effects along with GWAS methods to detect them will be briefly discussed. Different mapping populations are used and will also be reviewed. Whenever possible, our own real-data examples are included to illustrate the reviewed methods and designs. Future research directions including post-GWAS strategies are outlined.


Sign in / Sign up

Export Citation Format

Share Document