scholarly journals Assessing digital phenotyping to enhance genetic studies of human diseases

2019 ◽  
Author(s):  
Christopher DeBoever ◽  
Yosuke Tanigawa ◽  
Matthew Aguirre ◽  
Greg McInnes ◽  
Adam Lavertu ◽  
...  

AbstractPopulation-scale biobanks that combine genetic data and high-dimensional phenotyping for a large number of participants provide an exciting opportunity to perform genome-wide association studies (GWAS) to identify genetic variants associated with diverse quantitative traits and diseases. A major challenge for GWAS in population biobanks is ascertaining disease cases from heterogeneous data sources such as hospital records, digital questionnaire responses, or interviews. In this study, we use genetic parameters including genetic correlation to evaluate whether GWAS performed using cases in the UK Biobank ascertained from hospital records, questionnaire responses, and family history of diseases implicate similar disease genetics across a range of effect sizes. We find that hospital record and questionnaire GWAS largely identify similar genetic effects for many complex phenotypes and that combining together both phenotyping methods improves power to detect genetic associations. We also show that family GWAS using cases ascertained on family history of disease agrees with combined hospital record/questionnaire GWAS and that family history GWAS has better power to detect genetic associations for some phenotypes. Overall, this work demonstrates that digital phenotyping and unstructured phenotype data can be combined with structured data such as hospital records to identify cases for GWAS in biobanks and improve the ability of such studies to identify genetic associations.

2020 ◽  
Author(s):  
Saori Sakaue ◽  
Masahiro Kanai ◽  
Yosuke Tanigawa ◽  
Juha Karjalainen ◽  
Mitja Kurki ◽  
...  

AbstractThe current genome-wide association studies (GWASs) do not yet capture sufficient diversity in terms of populations and scope of phenotypes. To address an essential need to expand an atlas of genetic associations in non-European populations, we conducted 220 deep-phenotype GWASs (disease endpoints, biomarkers, and medication usage) in BioBank Japan (n = 179,000), by incorporating past medical history and text-mining results of electronic medical records. Meta-analyses with the harmonized phenotypes in the UK Biobank and FinnGen (ntotal = 628,000) identified over 4,000 novel loci, which substantially deepened the resolution of the genomic map of human traits, benefited from East Asian endemic diseases and East Asian specific variants. This atlas elucidated the globally shared landscape of pleiotropy as represented by the MHC locus, where we conducted fine-mapping by HLA imputation. Finally, to intensify the value of deep-phenotype GWASs, we performed statistical decomposition of matrices of phenome-wide summary statistics, and identified the latent genetic components, which pinpointed the responsible variants and shared biological mechanisms underlying current disease classifications across populations. The decomposed components enabled genetically informed subtyping of similar diseases (e.g., allergic diseases). Our study suggests a potential avenue for hypothesis-free re-investigation of human disease classifications through genetics.


2021 ◽  
Author(s):  
Shu-Yi Huang ◽  
Yu-Xiang Yang ◽  
Kevin Kuo ◽  
Hong-Qi Li ◽  
Xue-Ning Shen ◽  
...  

Abstract BackgroundObservational studies have suggested that herpesvirus infection increased the risk of Alzheimer’s disease (AD), but it is unclear whether the association is causal. The aim of the present study is to evaluate the causal relationship between four herpesvirus infections and AD. MethodsWe performed a two-sample Mendelian randomization analysis to investigate association of four active herpesvirus infections with AD using summary statistics from genome-wide association studies. The four herpesvirus infections (i.e., chickenpox, shingles, cold sores, mononucleosis) are caused by varicella-zoster virus, herpes simplex virus type 1, and Epstein-Barr virus (EBV), respectively. A large summary statistics data from International Genomics of Alzheimer’s Project was used in primary analysis, including 21,982 AD cases and 41,944 controls. Validation was further performed using family history of AD data from UK Biobank (27,696 cases of maternal AD, 14,338 cases of paternal AD and 272,244 controls).ResultsWe found evidence of a suggestive association between mononucleosis (caused by EBV) and risk of AD (odds ratio [OR] = 1.634, 95% confidence interval [CI] = 1.092-2.446, P = 0.017) after Bonferroni correction. It has been verified in validation analysis that mononucleosis is also associated with family history of AD (OR [95% CI] = 1.392 [1.061, 1.826], P=0.017). Genetically predicted shingles were associated with AD risk (OR [95% CI] = 0.867 [0.784, 0.958], P = 0.005). While genetically predicted chickenpox was suggestively associated with increased family history of AD (OR [95% CI] = 1.147 [1.007, 1.307], P = 0.039).ConclusionsOur findings provided evidence supporting a positive relationship between mononucleosis and AD, indicating a causal link between EBV infection and AD. Further elucidations of this association and underlying mechanisms are likely to identify feasible interventions to promote AD prevention.


2021 ◽  
Author(s):  
Konrad Karczewski ◽  
Matthew Solomonson ◽  
Katherine R Chao ◽  
Julia K Goodrich ◽  
Grace Tiao ◽  
...  

Genome-wide association studies have successfully discovered thousands of common variants associated with human diseases and traits, but the landscape of rare variation in human disease has not been explored at scale. Exome sequencing studies of population biobanks provide an opportunity to systematically evaluate the impact of rare coding variation across a wide range of phenotypes to discover genes and allelic series relevant to human health and disease. Here, we present results from systematic association analyses of 3,700 phenotypes using single-variant and gene tests of 281,850 individuals in the UK Biobank with exome sequence data. We find that the discovery of genetic associations is tightly linked to frequency as well as correlated with metrics of deleteriousness and natural selection. We highlight biological findings elucidated by these data and release the dataset as a public resource alongside a browser framework for rapidly exploring rare variant association results.


2021 ◽  
Author(s):  
Suyash S Shringarpure ◽  
Wei Wang ◽  
Yunxuan Jiang ◽  
Alison Acevedo ◽  
Devika Dhamija ◽  
...  

A key challenge in the study of rare disease genetics is assembling large case cohorts for well- powered studies. We demonstrate the use of self-reported diagnosis data to study rare diseases at scale. We performed genome-wide association studies (GWAS) for 33 rare diseases using self-reported diagnosis phenotypes and re-discovered 29 known associations to validate our approach. In addition, we performed the first GWAS for Duane retraction syndrome, vestibular schwannoma and spontaneous pneumothorax, and report novel genome-wide significant associations for these diseases. We replicated these novel associations in non-European populations within the 23andMe, Inc. cohort as well as in the UK Biobank cohort. We also show that mixed model analyses including all ethnicities and related samples increase the power for finding associations in rare diseases. Our results, based on analysis of 19,084 rare disease cases for 33 diseases from 7 populations, show that large-scale online collection of self-reported data is a viable method for discovery and replication of genetic associations for rare diseases. This approach, which is complementary to sequencing-based approaches, will enable the discovery of more novel genetic associations for increasingly rare diseases across multiple ancestries and shed more light on the genetic architecture of rare diseases.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Shu-Yi Huang ◽  
Yu-Xiang Yang ◽  
Kevin Kuo ◽  
Hong-Qi Li ◽  
Xue-Ning Shen ◽  
...  

Abstract Background Observational studies have suggested that herpesvirus infection increased the risk of Alzheimer’s disease (AD), but it is unclear whether the association is causal. The aim of the present study is to evaluate the causal relationship between four herpesvirus infections and AD. Methods We performed a two-sample Mendelian randomization analysis to investigate association of four active herpesvirus infections with AD using summary statistics from genome-wide association studies. The four herpesvirus infections (i.e., chickenpox, shingles, cold sores, mononucleosis) are caused by varicella-zoster virus, herpes simplex virus type 1, and Epstein-Barr virus (EBV), respectively. A large summary statistics data from International Genomics of Alzheimer’s Project was used in primary analysis, including 21,982 AD cases and 41,944 controls. Validation was further performed using family history of AD data from UK Biobank (27,696 cases of maternal AD, 14,338 cases of paternal AD and 272,244 controls). Results We found evidence of a significant association between mononucleosis (caused by EBV) and risk of AD after false discovery rates (FDR) correction (odds ratio [OR] = 1.634, 95% confidence interval [CI] = 1.092–2.446, P = 0.017, FDR-corrected P = 0.034). It has been verified in validation analysis that mononucleosis is also associated with family history of AD (OR [95% CI] = 1.392 [1.061, 1.826], P = 0.017). Genetically predicted shingles were associated with AD risk (OR [95% CI] = 0.867 [0.784, 0.958], P = 0.005, FDR-corrected P = 0.020), while genetically predicted chickenpox was suggestively associated with increased family history of AD (OR [95% CI] = 1.147 [1.007, 1.307], P = 0.039). Conclusions Our findings provided evidence supporting a positive relationship between mononucleosis and AD, indicating a causal link between EBV infection and AD. Further elucidations of this association and underlying mechanisms are likely to identify feasible interventions to promote AD prevention.


2020 ◽  
Vol 12 (1) ◽  
Author(s):  
Oliver S. Burren ◽  
Guillermo Reales ◽  
Limy Wong ◽  
John Bowes ◽  
James C. Lee ◽  
...  

Abstract Background Genome-wide association studies (GWAS) have identified pervasive sharing of genetic architectures across multiple immune-mediated diseases (IMD). By learning the genetic basis of IMD risk from common diseases, this sharing can be exploited to enable analysis of less frequent IMD where, due to limited sample size, traditional GWAS techniques are challenging. Methods Exploiting ideas from Bayesian genetic fine-mapping, we developed a disease-focused shrinkage approach to allow us to distill genetic risk components from GWAS summary statistics for a set of related diseases. We applied this technique to 13 larger GWAS of common IMD, deriving a reduced dimension “basis” that summarised the multidimensional components of genetic risk. We used independent datasets including the UK Biobank to assess the performance of the basis and characterise individual axes. Finally, we projected summary GWAS data for smaller IMD studies, with less than 1000 cases, to assess whether the approach was able to provide additional insights into genetic architecture of less common IMD or IMD subtypes, where cohort collection is challenging. Results We identified 13 IMD genetic risk components. The projection of independent UK Biobank data demonstrated the IMD specificity and accuracy of the basis even for traits with very limited case-size (e.g. vitiligo, 150 cases). Projection of additional IMD-relevant studies allowed us to add biological interpretation to specific components, e.g. related to raised eosinophil counts in blood and serum concentration of the chemokine CXCL10 (IP-10). On application to 22 rare IMD and IMD subtypes, we were able to not only highlight subtype-discriminating axes (e.g. for juvenile idiopathic arthritis) but also suggest eight novel genetic associations. Conclusions Requiring only summary-level data, our unsupervised approach allows the genetic architectures across any range of clinically related traits to be characterised in fewer dimensions. This facilitates the analysis of studies with modest sample size by matching shared axes of both genetic and biological risk across a wider disease domain, and provides an evidence base for possible therapeutic repurposing opportunities.


2019 ◽  
Author(s):  
Alexander S. Hatoum ◽  
Claire L. Morrison ◽  
Evann C. Mitchell ◽  
Max Lam ◽  
Chelsie E. Benca-Bachman ◽  
...  

AbstractDeficits in executive functions (EFs), cognitive processes that control goal-directed behaviors, are associated with psychopathology and neurological disorders. Little is known about the molecular bases of EF individual differences; existing EF genome-wide association studies (GWAS) used small sample sizes and/or focused on individual tasks that are imprecise measures of EF. We conducted a GWAS of a Common EF (cEF) factor based on multiple tasks in the UK Biobank (N=427,037 European-descent individuals), finding 129 independent genome-wide significant lead variants in 112 distinct loci. cEF was associated with fast synaptic transmission processes (synaptic, potassium channel, and GABA pathways) in gene-based analyses. cEF was genetically correlated with measures of intelligence (IQ) and cognitive processing speed, but cEF and IQ showed differential genetic associations with psychiatric disorders and educational attainment. Results suggest that cEF is a genetically distinct cognitive construct that is particularly relevant to understanding the genetic variance in psychiatric disorders.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Camilo Broc ◽  
Therese Truong ◽  
Benoit Liquet

Abstract Background The increasing number of genome-wide association studies (GWAS) has revealed several loci that are associated to multiple distinct phenotypes, suggesting the existence of pleiotropic effects. Highlighting these cross-phenotype genetic associations could help to identify and understand common biological mechanisms underlying some diseases. Common approaches test the association between genetic variants and multiple traits at the SNP level. In this paper, we propose a novel gene- and a pathway-level approach in the case where several independent GWAS on independent traits are available. The method is based on a generalization of the sparse group Partial Least Squares (sgPLS) to take into account groups of variables, and a Lasso penalization that links all independent data sets. This method, called joint-sgPLS, is able to convincingly detect signal at the variable level and at the group level. Results Our method has the advantage to propose a global readable model while coping with the architecture of data. It can outperform traditional methods and provides a wider insight in terms of a priori information. We compared the performance of the proposed method to other benchmark methods on simulated data and gave an example of application on real data with the aim to highlight common susceptibility variants to breast and thyroid cancers. Conclusion The joint-sgPLS shows interesting properties for detecting a signal. As an extension of the PLS, the method is suited for data with a large number of variables. The choice of Lasso penalization copes with architectures of groups of variables and observations sets. Furthermore, although the method has been applied to a genetic study, its formulation is adapted to any data with high number of variables and an exposed a priori architecture in other application fields.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Nadav Brandes ◽  
Nathan Linial ◽  
Michal Linial

AbstractThe characterization of germline genetic variation affecting cancer risk, known as cancer predisposition, is fundamental to preventive and personalized medicine. Studies of genetic cancer predisposition typically identify significant genomic regions based on family-based cohorts or genome-wide association studies (GWAS). However, the results of such studies rarely provide biological insight or functional interpretation. In this study, we conducted a comprehensive analysis of cancer predisposition in the UK Biobank cohort using a new gene-based method for detecting protein-coding genes that are functionally interpretable. Specifically, we conducted proteome-wide association studies (PWAS) to identify genetic associations mediated by alterations to protein function. With PWAS, we identified 110 significant gene-cancer associations in 70 unique genomic regions across nine cancer types and pan-cancer. In 48 of the 110 PWAS associations (44%), estimated gene damage is associated with reduced rather than elevated cancer risk, suggesting a protective effect. Together with standard GWAS, we implicated 145 unique genomic loci with cancer risk. While most of these genomic regions are supported by external evidence, our results also highlight many novel loci. Based on the capacity of PWAS to detect non-additive genetic effects, we found that 46% of the PWAS-significant cancer regions exhibited exclusive recessive inheritance. These results highlight the importance of recessive genetic effects, without relying on familial studies. Finally, we show that many of the detected genes exert substantial cancer risk in the studied cohort determined by a quantitative functional description, suggesting their relevance for diagnosis and genetic consulting.


2021 ◽  
Author(s):  
Abhishek Nag ◽  
Lawrence Middleton ◽  
Ryan S Dhindsa ◽  
Dimitrios Vitsios ◽  
Eleanor M Wigmore ◽  
...  

Genome-wide association studies have established the contribution of common and low frequency variants to metabolic biomarkers in the UK Biobank (UKB); however, the role of rare variants remains to be assessed systematically. We evaluated rare coding variants for 198 metabolic biomarkers, including metabolites assayed by Nightingale Health, using exome sequencing in participants from four genetically diverse ancestries in the UKB (N=412,394). Gene-level collapsing analysis, that evaluated a range of genetic architectures, identified a total of 1,303 significant relationships between genes and metabolic biomarkers (p<1x10-8), encompassing 207 distinct genes. These include associations between rare non-synonymous variants in GIGYF1 and glucose and lipid biomarkers, SYT7 and creatinine, and others, which may provide insights into novel disease biology. Comparing to a previous microarray-based genotyping study in the same cohort, we observed that 40% of gene-biomarker relationships identified in the collapsing analysis were novel. Finally, we applied Gene-SCOUT, a novel tool that utilises the gene-biomarker association statistics from the collapsing analysis to identify genes having similar biomarker fingerprints and thus expand our understanding of gene networks.


Sign in / Sign up

Export Citation Format

Share Document