scholarly journals transferGWAS: GWAS of images using deep transfer learning

2021 ◽  
Author(s):  
Matthias Kirchler ◽  
Stefan Konigorski ◽  
Matthias Norden ◽  
Christian Meltendorf ◽  
Marius Kloft ◽  
...  

Medical images can provide rich information about diseases and their biology. However, investigating their association with genetic variation requires non-standard methods. We propose transferGWAS, a novel approach to perform genome-wide association studies directly on full medical images. First, we learn semantically meaningful representations of the images based on a transfer learning task, during which a deep neural network is trained on independent but similar data. Then, we perform genetic association tests with these representations. We validate the type I error rates and power of transferGWAS in simulation studies of synthetic images. Then we apply transferGWAS in a genome-wide association study of retinal fundus images from the UK Biobank. This first-of-a-kind GWAS of full imaging data yielded 60 genomic regions associated with retinal fundus images, of which 7 are novel candidate loci for eye-related traits and diseases.

Author(s):  
Greg Dyson ◽  
Charles F. Sing

AbstractWe have developed a modified Patient Rule-Induction Method (PRIM) as an alternative strategy for analyzing representative samples of non-experimental human data to estimate and test the role of genomic variations as predictors of disease risk in etiologically heterogeneous sub-samples. A computational limit of the proposed strategy is encountered when the number of genomic variations (predictor variables) under study is large (>500) because permutations are used to generate a null distribution to test the significance of a term (defined by values of particular variables) that characterizes a sub-sample of individuals through the peeling and pasting processes. As an alternative, in this paper we introduce a theoretical strategy that facilitates the quick calculation of Type I and Type II errors in the evaluation of terms in the peeling and pasting processes carried out in the execution of a PRIM analysis that are under-estimated and non-existent, respectively, when a permutation-based hypothesis test is employed. The resultant savings in computational time makes possible the consideration of larger numbers of genomic variations (an example genome-wide association study is given) in the selection of statistically significant terms in the formulation of PRIM prediction models.


2018 ◽  
Vol 2018 ◽  
pp. 1-9 ◽  
Author(s):  
Baolin Wu ◽  
James S. Pankow

Multiple correlated traits are often collected in genetic studies. By jointly analyzing multiple traits, we can increase power by aggregating multiple weak effects and reveal additional insights into the genetic architecture of complex human diseases. In this article, we propose a multivariate linear regression-based method to test the joint association of multiple quantitative traits. It is flexible to accommodate any covariates, has very accurate control of type I errors, and offers very competitive performance. We also discuss fast and accurate significance p value computation especially for genome-wide association studies with small-to-medium sample sizes. We demonstrate through extensive numerical studies that the proposed method has competitive performance. Its usefulness is further illustrated with application to genome-wide association analysis of diabetes-related traits in the Atherosclerosis Risk in Communities (ARIC) study. We found some very interesting associations with diabetes traits which have not been reported before. We implemented the proposed methods in a publicly available R package.


2018 ◽  
Vol 43 (1) ◽  
pp. 102-111 ◽  
Author(s):  
Jeremy A. Sabourin ◽  
Cheryl D. Cropp ◽  
Heejong Sung ◽  
Lawrence C. Brody ◽  
Joan E. Bailey-Wilson ◽  
...  

BMC Genetics ◽  
2005 ◽  
Vol 6 (Suppl 1) ◽  
pp. S134 ◽  
Author(s):  
Qiong Yang ◽  
Jing Cui ◽  
Irmarie Chazaro ◽  
L Adrienne Cupples ◽  
Serkalem Demissie

Genes ◽  
2021 ◽  
Vol 12 (5) ◽  
pp. 736
Author(s):  
Xiaotian Dai ◽  
Guifang Fu ◽  
Shaofei Zhao ◽  
Yifei Zeng

Despite the fact that imbalance between case and control groups is prevalent in genome-wide association studies (GWAS), it is often overlooked. This imbalance is getting more significant and urgent as the rapid growth of biobanks and electronic health records have enabled the collection of thousands of phenotypes from large cohorts, in particular for diseases with low prevalence. The unbalanced binary traits pose serious challenges to traditional statistical methods in terms of both genomic selection and disease prediction. For example, the well-established linear mixed models (LMM) yield inflated type I error rates in the presence of unbalanced case-control ratios. In this article, we review multiple statistical approaches that have been developed to overcome the inaccuracy caused by the unbalanced case-control ratio, with the advantages and limitations of each approach commented. In addition, we also explore the potential for applying several powerful and popular state-of-the-art machine-learning approaches, which have not been applied to the GWAS field yet. This review paves the way for better analysis and understanding of the unbalanced case-control disease data in GWAS.


2020 ◽  
Author(s):  
Christiaan de Leeuw ◽  
Nancy Y. A. Sey ◽  
Danielle Posthuma ◽  
Hyejung Won

AbstractHi-C coupled multimarker analysis of genomic annotation (H-MAGMA) was initially developed to advance MAGMA by assigning non-coding SNPs to their cognate genes based on threedimensional chromatin architecture. Yurko and colleagues raised concerns that the SNP-wise mean gene-analysis model of MAGMA may allow inflation in type I errors. Accordingly, we updated MAGMA and found that the updated version (MAGMA v.1.08) effectively controls for error rate inflation. Intrigued by this result, H-MAGMA was also updated by implementing MAGMA v.1.08. As expected, H-MAGMA v.1.08 detected a smaller set of risk genes than its original version (v.1.07), but the overall statistical architecture remained largely unchanged between v.1.07 and v.1.08. H-MAGMA v.1.08 was then applied to genome-wide association studies (GWAS) of five psychiatric disorders, from which we recapitulated our previous findings that psychiatric disorder risk genes display neuronal and prenatal enrichment. Therefore, issues raised by Yurko and colleagues can be overcome by using (H-)MAGMA v.1.08.


2020 ◽  
Author(s):  
Wenjian Bi ◽  
Wei Zhou ◽  
Rounak Dey ◽  
Bhramar Mukherjee ◽  
Joshua N Sampson ◽  
...  

AbstractIn genome-wide association studies (GWAS), ordinal categorical phenotypes are widely used to measure human behaviors, satisfaction, and preferences. However, due to the lack of analysis tools, methods designed for binary and quantitative traits have often been used inappropriately to analyze categorical phenotypes, which produces inflated type I error rates or is less powerful. To accurately model the dependence of an ordinal categorical phenotype on covariates, we propose an efficient mixed model association test, Proportional Odds Logistic Mixed Model (POLMM). POLMM is demonstrated to be computationally efficient to analyze large datasets with hundreds of thousands of genetic related samples, can control type I error rates at a stringent significance level regardless of the phenotypic distribution, and is more powerful than other alternative methods. We applied POLMM to 258 ordinal categorical phenotypes on array-genotypes and imputed samples from 408,961 individuals in UK Biobank. In total, we identified 5,885 genome-wide significant variants, of which 424 variants (7.2%) are rare variants with MAF < 0.01.


Sign in / Sign up

Export Citation Format

Share Document