Quick approximation of threshold values for genome-wide association studies

2018 ◽  
Vol 20 (6) ◽  
pp. 2217-2223
Author(s):  
Zhiyu Hao ◽  
Li Jiang ◽  
Jin Gao ◽  
Jinhua Ye ◽  
Jingli Zhao ◽  
...  

Abstract Standard normal statistics, chi-squared statistics, Student’s t statistics and F statistics are used to map quantitative trait nucleotides for both small and large sample sizes. In genome-wide association studies (GWASs) of single-nucleotide polymorphisms (SNPs), the statistical distributions depend on both genetic effects and SNPs but are independent of SNPs under the null hypothesis of no genetic effects. Therefore, hypothesis testing when a nuisance parameter is present only under the alternative was introduced to quickly approximate the critical thresholds of these test statistics for GWASs. When only the statistical probabilities are available for high-throughput SNPs, the approximate critical thresholds can be estimated with chi-squared statistics, formulated by statistical probabilities with a degree of freedom of two. High similarities in the critical thresholds between the accurate and approximate estimations were demonstrated by extensive simulations and real data analysis.

Author(s):  
Yingjie Guo ◽  
Chenxi Wu ◽  
Zhian Yuan ◽  
Yansu Wang ◽  
Zhen Liang ◽  
...  

Among the myriad of statistical methods that identify gene–gene interactions in the realm of qualitative genome-wide association studies, gene-based interactions are not only powerful statistically, but also they are interpretable biologically. However, they have limited statistical detection by making assumptions on the association between traits and single nucleotide polymorphisms. Thus, a gene-based method (GGInt-XGBoost) originated from XGBoost is proposed in this article. Assuming that log odds ratio of disease traits satisfies the additive relationship if the pair of genes had no interactions, the difference in error between the XGBoost model with and without additive constraint could indicate gene–gene interaction; we then used a permutation-based statistical test to assess this difference and to provide a statistical p-value to represent the significance of the interaction. Experimental results on both simulation and real data showed that our approach had superior performance than previous experiments to detect gene–gene interactions.


2014 ◽  
Vol 26 (2) ◽  
pp. 567-582 ◽  
Author(s):  
Zhongxue Chen ◽  
Hon Keung Tony Ng ◽  
Jing Li ◽  
Qingzhong Liu ◽  
Hanwen Huang

In the past decade, hundreds of genome-wide association studies have been conducted to detect the significant single-nucleotide polymorphisms that are associated with certain diseases. However, most of the data from the X chromosome were not analyzed and only a few significant associated single-nucleotide polymorphisms from the X chromosome have been identified from genome-wide association studies. This is mainly due to the lack of powerful statistical tests. In this paper, we propose a novel statistical approach that combines the information of single-nucleotide polymorphisms on the X chromosome from both males and females in an efficient way. The proposed approach avoids the need of making strong assumptions about the underlying genetic models. Our proposed statistical test is a robust method that only makes the assumption that the risk allele is the same for both females and males if the single-nucleotide polymorphism is associated with the disease for both genders. Through simulation study and a real data application, we show that the proposed procedure is robust and have excellent performance compared to existing methods. We expect that many more associated single-nucleotide polymorphisms on the X chromosome will be identified if the proposed approach is applied to current available genome-wide association studies data.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Declan Bennett ◽  
Donal O’Shea ◽  
John Ferguson ◽  
Derek Morris ◽  
Cathal Seoighe

AbstractOngoing increases in the size of human genotype and phenotype collections offer the promise of improved understanding of the genetics of complex diseases. In addition to the biological insights that can be gained from the nature of the variants that contribute to the genetic component of complex trait variability, these data bring forward the prospect of predicting complex traits and the risk of complex genetic diseases from genotype data. Here we show that advances in phenotype prediction can be applied to improve the power of genome-wide association studies. We demonstrate a simple and efficient method to model genetic background effects using polygenic scores derived from SNPs that are not on the same chromosome as the target SNP. Using simulated and real data we found that this can result in a substantial increase in the number of variants passing genome-wide significance thresholds. This increase in power to detect trait-associated variants also translates into an increase in the accuracy with which the resulting polygenic score predicts the phenotype from genotype data. Our results suggest that advances in methods for phenotype prediction can be exploited to improve the control of background genetic effects, leading to more accurate GWAS results and further improvements in phenotype prediction.


2021 ◽  
Author(s):  
Declan Bennett ◽  
Dónal O'Shea ◽  
John Ferguson ◽  
Derek Morris ◽  
Cathal Seoighe

Abstract Ongoing increases in the size of human genotype and phenotype collections offer the promise of improved understanding of the genetics of complex diseases. In addition to the biological insights that can be gained from the nature of the variants that contribute to the genetic component of complex trait variability, these data bring forward the prospect of predicting complex traits and the risk of complex genetic diseases from genotype data. Here we show that advances in phenotype prediction can be applied to improve the power of genome-wide association studies. We demonstrate a simple and efficient method to model genetic background effects using polygenic scores derived from SNPs that are not on the same chromosome as the target SNP. Using simulated and real data we found that this can result in a substantial increase in the number of variants passing genome-wide significance thresholds. This increase in power to detect trait-associated variants also translates into an increase in the accuracy with which the resulting polygenic score predicts the phenotype from genotype data. Our results suggest that advances in methods for phenotype prediction can be exploited to improve the control of background genetic effects, leading to more accurate GWAS results and further improvements in phenotype prediction.


Author(s):  
Ting-Hao Chen ◽  
Chen-Cheng Yang ◽  
Kuei-Hau Luo ◽  
Chia-Yen Dai ◽  
Yao-Chung Chuang ◽  
...  

Aluminum (Al) toxicity is related to renal failure and the failure of other systems. Although there were some genome-wide association studies (GWAS) in Australia and England, there were no GWAS about Han Chinese to our knowledge. Thus, this research focused on using whole genomic genotypes from the Taiwan Biobank for exploring the association between Al concentrations in plasma and renal function. Participants, who underwent questionnaire interviews, biomarkers, and genotyping, were from the Taiwan Biobank database. Then, we measured their plasma Al concentrations with ICP-MS in the laboratory at Kaohsiung Medical University. We used this data to link genome-wide association (GWA) tests while looking for candidate genes and associated plasma Al concentration to renal function. Furthermore, we examined the path relationship between Single Nucleotide Polymorphisms (SNPs), Al concentrations, and estimated glomerular filtration rates (eGFR) through the mediation analysis with 3000 replication bootstraps. Following the principles of GWAS, we focused on three SNPs within the dipeptidyl peptidase-like protein 6 (DPP6) gene in chromosome 7, rs10224371, rs2316242, and rs10268004, respectively. The results of the mediation analysis showed that all of the selected SNPs have indirectly affected eGFR through a mediation of Al concentrations. Our analysis revealed the association between DPP6 SNPs, plasma Al concentrations, and eGFR. However, further longitudinal studies and research on mechanism are in need. Our analysis was still be the first study that explored the association between the DPP6, SNPs, and Al in plasma affecting eGFR.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Guomin Zhang ◽  
Rongsheng Wang ◽  
Juntao Ma ◽  
Hongru Gao ◽  
Lingwei Deng ◽  
...  

Abstract Background Heilongjiang Province is a high-quality japonica rice cultivation area in China. One in ten bowls of Chinese rice is produced here. Increasing yield is one of the main aims of rice production in this area. However, yield is a complex quantitative trait composed of many factors. The purpose of this study was to determine how many genetic loci are associated with yield-related traits. Genome-wide association studies (GWAS) were performed on 450 accessions collected from northeast Asia, including Russia, Korea, Japan and Heilongjiang Province of China. These accessions consist of elite varieties and landraces introduced into Heilongjiang Province decade ago. Results After resequencing of the 450 accessions, 189,019 single nucleotide polymorphisms (SNPs) were used for association studies by two different models, a general linear model (GLM) and a mixed linear model (MLM), examining four traits: days to heading (DH), plant height (PH), panicle weight (PW) and tiller number (TI). Over 25 SNPs were found to be associated with each trait. Among them, 22 SNPs were selected to identify candidate genes, and 2, 8, 1 and 11 SNPs were found to be located in 3′ UTR region, intron region, coding region and intergenic region, respectively. Conclusions All SNPs detected in this research may become candidates for further fine mapping and may be used in the molecular breeding of high-latitude rice.


2020 ◽  
Vol 117 (21) ◽  
pp. 11608-11613 ◽  
Author(s):  
Marcelo Blatt ◽  
Alexander Gusev ◽  
Yuriy Polyakov ◽  
Shafi Goldwasser

Genome-wide association studies (GWASs) seek to identify genetic variants associated with a trait, and have been a powerful approach for understanding complex diseases. A critical challenge for GWASs has been the dependence on individual-level data that typically have strict privacy requirements, creating an urgent need for methods that preserve the individual-level privacy of participants. Here, we present a privacy-preserving framework based on several advances in homomorphic encryption and demonstrate that it can perform an accurate GWAS analysis for a real dataset of more than 25,000 individuals, keeping all individual data encrypted and requiring no user interactions. Our extrapolations show that it can evaluate GWASs of 100,000 individuals and 500,000 single-nucleotide polymorphisms (SNPs) in 5.6 h on a single server node (or in 11 min on 31 server nodes running in parallel). Our performance results are more than one order of magnitude faster than prior state-of-the-art results using secure multiparty computation, which requires continuous user interactions, with the accuracy of both solutions being similar. Our homomorphic encryption advances can also be applied to other domains where large-scale statistical analyses over encrypted data are needed.


2020 ◽  
Vol 116 (9) ◽  
pp. 1620-1634
Author(s):  
Charlotte Glinge ◽  
Najim Lahrouchi ◽  
Reza Jabbari ◽  
Jacob Tfelt-Hansen ◽  
Connie R Bezzina

Abstract The genetic basis of cardiac electrical phenotypes has in the last 25 years been the subject of intense investigation. While in the first years, such efforts were dominated by the study of familial arrhythmia syndromes, in recent years, large consortia of investigators have successfully pursued genome-wide association studies (GWAS) for the identification of single-nucleotide polymorphisms that govern inter-individual variability in electrocardiographic parameters in the general population. We here provide a review of GWAS conducted on cardiac electrical phenotypes in the last 14 years and discuss the implications of these discoveries for our understanding of the genetic basis of disease susceptibility and variability in disease severity. Furthermore, we review functional follow-up studies that have been conducted on GWAS loci associated with cardiac electrical phenotypes and highlight the challenges and opportunities offered by such studies.


Animals ◽  
2020 ◽  
Vol 10 (8) ◽  
pp. 1300 ◽  
Author(s):  
Elisabetta Manca ◽  
Alberto Cesarani ◽  
Giustino Gaspa ◽  
Silvia Sorbolini ◽  
Nicolò P.P. Macciotta ◽  
...  

Genome-wide association studies (GWAS) are traditionally carried out by using the single marker regression model that, if a small number of individuals is involved, often lead to very few associations. The Bayesian methods, such as BayesR, have obtained encouraging results when they are applied to the GWAS. However, these approaches, require that an a priori posterior inclusion probability threshold be fixed, thus arbitrarily affecting the obtained associations. To partially overcome these problems, a multivariate statistical algorithm was proposed. The basic idea was that animals with different phenotypic values of a specific trait share different allelic combinations for genes involved in its determinism. Three multivariate techniques were used to highlight the differences between the individuals assembled in high and low phenotype groups: the canonical discriminant analysis, the discriminant analysis and the stepwise discriminant analysis. The multivariate method was tested both on simulated and on real data. The results from the simulation study highlighted that the multivariate GWAS detected a greater number of true associated single nucleotide polymorphisms (SNPs) and Quantitative trait loci (QTLs) than the single marker model and the Bayesian approach. For example, with 3000 animals, the traditional GWAS highlighted only 29 significantly associated markers and 13 QTLs, whereas the multivariate method found 127 associated SNPs and 65 QTLs. The gap between the two approaches slowly decreased as the number of animals increased. The Bayesian method gave worse results than the other two. On average, with the real data, the multivariate GWAS found 108 associated markers for each trait under study and among them, around 63% SNPs were also found in the single marker approach. Among the top 118 associated markers, 76 SNPs harbored putative candidate genes.


Sign in / Sign up

Export Citation Format

Share Document