scholarly journals Multitask group Lasso for Genome Wide association Studies in admixed populations

2021 ◽  
Author(s):  
Asma Nouira ◽  
Chloe-Agathe Azencott

Genome-Wide Association Studies, or GWAS, aim at finding Single Nucleotide Polymorphisms (SNPs) that are associated with a phenotype of interest. GWAS are known to suffer from the large dimensionality of the data with respect to the number of available samples. Other limiting factors include the dependency between SNPs, due to linkage disequilibrium (LD), and the need to account for population structure, that is to say, confounding due to genetic ancestry. We propose an efficient approach for the multivariate analysis of admixed GWAS data based on a multitask group Lasso formulation. Each task corresponds to a subpopulation of the data, and each group to an LD-block. This formulation alleviates the curse of dimensionality, and makes it possible to identify disease LD-blocks shared across populations/tasks, as well as some that are specific to one population/task. In addition, we use stability selection to increase the robustness of our approach. Finally, gap safe screening rules speed up computations enough that our method can run at a genome-wide scale. To our knowledge, this is the first framework for GWAS on admixed populations combining feature selection at the LD-groups level, a multitask approach to address population structure, stability selection, and safe screening rules. We show that our approach outperforms state-of-the-art methods on both a simulated and a real-world cancer datasets.

2013 ◽  
Vol 7 (1) ◽  
pp. 27-33 ◽  
Author(s):  
Gengxin Li ◽  
Hongjiang Zhu

With the availability of high-density genomic data containing millions of single nucleotide polymorphisms and tens or hundreds of thousands of individuals, genetic association study is likely to identify the variants contributing to complex traits in a genome-wide scale. However, genome-wide association studies are confounded by some spurious associations due to not properly interpreting sample structure (containing population structure, family structure and cryptic relatedness). The absence of complete genealogy of population in the genome-wide association studies model greatly motivates the development of new methods to correct the inflation of false positive. In this process, linear mixed model based approaches with the advantage of capturing multilevel relatedness have gained large ground. We summarize current literatures dealing with sample structure, and our review focuses on the following four areas: (i) The approaches handling population structure in genome-wide association studies; (ii) The linear mixed model based approaches in genome-wide association studies; (iii) The performance of linear mixed model based approaches in genome-wide association studies and (iv) The unsolved issues and future work of linear mixed model based approaches.


2019 ◽  
Vol 22 (8) ◽  
pp. 1063-1069 ◽  
Author(s):  
N. S. Yudin ◽  
N. L. Podkolodnyy ◽  
T. A. Agarkova ◽  
E. V. Ignatieva

Selection by means of genetic markers is a promising approach to the eradication of infectious diseases in farm animals, especially in the absence of effective methods of treatment and prevention. Bovine leukemia virus (BLV) is spread throughout the world and represents one of the biggest problems for the livestock production and food security in Russia. However, recent genome-wide association studies have shown that sensitivity/resistance to BLV is polygenic. The aim of this study was to create a catalog of cattle genes and genes of other mammalian species involved in the pathogenesis of BLV-induced infection and to perform gene prioritization using bioinformatics methods. Based on manually collected information from a range of open sources, a total of 446 genes were included in the catalog of cattle genes and genes of other mammals involved in the pathogenesis of BLV-induced infection. The following criteria were used to prioritize 446 genes from the catalog: (1) the gene is associated with leukemia according to a genome-wide association study; (2) the gene is associated with leukemia according to a case-control study; (3) the role of the gene in leukemia development has been studied using knockout mice; (4) protein-protein interactions exist between the gene-encoded protein and either viral particles or individual viral proteins; (5) the gene is annotated with Gene Ontology terms that are overrepresented for a given list of genes; (6) the gene participates in biological pathways from the KEGG or REACTOME databases, which are over-represented for a given list of genes; (7) the protein encoded by the gene has a high number of protein-protein interactions with proteins encoded by other genes from the catalog. Based on each criterion, a rank was assigned to each gene. Then the ranks were summarized and an overall rank was determined. Prioritization of 446 candidate genes allowed us to identify 5 genes of interest (TNF,LTB,BOLA-DQA1,BOLA-DRB3,ATF2), which can affect the sensitivity/resistance of cattle to leukemia.


2018 ◽  
Vol 28 (1) ◽  
pp. 166-174 ◽  
Author(s):  
Sara L Pulit ◽  
Charli Stoneman ◽  
Andrew P Morris ◽  
Andrew R Wood ◽  
Craig A Glastonbury ◽  
...  

Abstract More than one in three adults worldwide is either overweight or obese. Epidemiological studies indicate that the location and distribution of excess fat, rather than general adiposity, are more informative for predicting risk of obesity sequelae, including cardiometabolic disease and cancer. We performed a genome-wide association study meta-analysis of body fat distribution, measured by waist-to-hip ratio (WHR) adjusted for body mass index (WHRadjBMI), and identified 463 signals in 346 loci. Heritability and variant effects were generally stronger in women than men, and we found approximately one-third of all signals to be sexually dimorphic. The 5% of individuals carrying the most WHRadjBMI-increasing alleles were 1.62 times more likely than the bottom 5% to have a WHR above the thresholds used for metabolic syndrome. These data, made publicly available, will inform the biology of body fat distribution and its relationship with disease.


Genetics ◽  
2019 ◽  
Vol 213 (4) ◽  
pp. 1225-1236 ◽  
Author(s):  
Weimiao Wu ◽  
Zhong Wang ◽  
Ke Xu ◽  
Xinyu Zhang ◽  
Amei Amei ◽  
...  

Longitudinal phenotypes have been increasingly available in genome-wide association studies (GWAS) and electronic health record-based studies for identification of genetic variants that influence complex traits over time. For longitudinal binary data, there remain significant challenges in gene mapping, including misspecification of the model for phenotype distribution due to ascertainment. Here, we propose L-BRAT (Longitudinal Binary-trait Retrospective Association Test), a retrospective, generalized estimating equation-based method for genetic association analysis of longitudinal binary outcomes. We also develop RGMMAT, a retrospective, generalized linear mixed model-based association test. Both tests are retrospective score approaches in which genotypes are treated as random conditional on phenotype and covariates. They allow both static and time-varying covariates to be included in the analysis. Through simulations, we illustrated that retrospective association tests are robust to ascertainment and other types of phenotype model misspecification, and gain power over previous association methods. We applied L-BRAT and RGMMAT to a genome-wide association analysis of repeated measures of cocaine use in a longitudinal cohort. Pathway analysis implicated association with opioid signaling and axonal guidance signaling pathways. Lastly, we replicated important pathways in an independent cocaine dependence case-control GWAS. Our results illustrate that L-BRAT is able to detect important loci and pathways in a genome scan and to provide insights into genetic architecture of cocaine use.


2019 ◽  
Vol 8 (2) ◽  
pp. 275 ◽  
Author(s):  
Eun Hong ◽  
Bong Kim ◽  
Steve Cho ◽  
Jin Yang ◽  
Hyuk Choi ◽  
...  

Genome-wide association studies found genetic variations with modulatory effects for intracranial aneurysm (IA) formations in European and Japanese populations. We aimed to identify the susceptibility of single nucleotide polymorphisms (SNPs) to IA in a Korean population consisting of 250 patients, and 294 controls using the Asian-specific Axiom Precision Medicine Research Array. Twenty-nine SNPs reached a genome-wide significance threshold (5 × 10−8). The rs371331393 SNP, with a stop-gain function of ARHGAP32 (11q24.3), showed the most significant association with the risk of IA (OR = 43.57, 95% CI: 21.84–86.95; p = 9.3 × 10−27). Eight out of 29 SNPs—GBA (rs75822236), TCF24 (rs112859779), OLFML2A (rs79134766), ARHGAP32 (rs371331393), CD163L1 (rs138525217), CUL4A (rs74115822), LOC102724084 (rs75861150), and LRRC3 (rs116969723)—demonstrated sufficient statistical power greater than or equal to 0.8. Two previously reported SNPs, rs700651 (BOLL, 2q33.1) and rs6841581 (EDNRA, 4q31.22), were validated in our GWAS (Genome-wide association study). In a subsequent analysis, three SNPs showed a significant difference in expressions: the rs6741819 (RNF144A, 2p25.1) was down-regulated in the adrenal gland tissue (p = 1.5 × 10−6), the rs1052270 (TMOD1. 9q22.33) was up-regulated in the testis tissue (p = 8.6 × 10−10), and rs6841581 (EDNRA, 4q31.22) was up-regulated in both the esophagus (p = 5.2 × 10−12) and skin tissues (1.2 × 10−6). Our GWAS showed novel candidate genes with Korean-specific variations in IA formations. Large population based studies are thus warranted.


2018 ◽  
Vol 13 (5) ◽  
pp. 648-658 ◽  
Author(s):  
Yoichi Kakuta ◽  
Yosuke Kawai ◽  
Takeo Naito ◽  
Atsushi Hirano ◽  
Junji Umeno ◽  
...  

Abstract Background and Aims Genome-wide association studies [GWASs] of European populations have identified numerous susceptibility loci for Crohn’s disease [CD]. Susceptibility genes differ by ethnicity, however, so GWASs specific for Asian populations are required. This study aimed to clarify the Japanese-specific genetic background for CD by a GWAS using the Japonica array [JPA] and subsequent imputation with the 1KJPN reference panel. Methods Two independent Japanese case/control sets (Tohoku region [379 CD patients, 1621 controls] and Kyushu region [334 CD patients, 462 controls]) were included. GWASs were performed separately for each population, followed by a meta-analysis. Two additional replication sets [254 + 516 CD patients and 287 + 565 controls] were analysed for top hit single nucleotide polymorphisms [SNPs] from novel genomic regions. Results Genotype data of 4 335 144 SNPs from 713 Japanese CD patients and 2083 controls were analysed. SNPs located in TNFSF15 (rs78898421, Pmeta = 2.59 × 10−26, odds ratio [OR] = 2.10), HLA-DQB1 [rs184950714, pmeta = 3.56 × 10−19, OR = 2.05], ZNF365, and 4p14 loci were significantly associated with CD in Japanese individuals. Replication analyses were performed for four novel candidate loci [p <1 × 10−6], and rs488200 located upstream of RAP1A was significantly associated with CD [pcombined = 4.36 × 10−8, OR = 1.31]. Transcriptome analysis of CD4+ effector memory T cells from lamina propria mononuclear cells of CD patients revealed a significant association of rs488200 with RAP1A expression. Conclusions RAP1A is a novel susceptibility locus for CD in the Japanese population.


Sign in / Sign up

Export Citation Format

Share Document