scholarly journals An efficient multi-locus mixed model framework for the detection of small and linked QTLs in F2

2018 ◽  
Vol 20 (5) ◽  
pp. 1913-1924 ◽  
Author(s):  
Yang-Jun Wen ◽  
Ya-Wen Zhang ◽  
Jin Zhang ◽  
Jian-Ying Feng ◽  
Jim M Dunwell ◽  
...  

Abstract In the genetic system that regulates complex traits, metabolites, gene expression levels, RNA editing levels and DNA methylation, a series of small and linked genes exist. To date, however, little is known about how to design an efficient framework for the detection of these kinds of genes. In this article, we propose a genome-wide composite interval mapping (GCIM) in F2. First, controlling polygenic background via selecting markers in the genome scanning of linkage analysis was replaced by estimating polygenic variance in a genome-wide association study. This can control large, middle and minor polygenic backgrounds in genome scanning. Then, additive and dominant effects for each putative quantitative trait locus (QTL) were separately scanned so that a negative logarithm P-value curve against genome position could be separately obtained for each kind of effect. In each curve, all the peaks were identified as potential QTLs. Thus, almost all the small-effect and linked QTLs are included in a multi-locus model. Finally, adaptive least absolute shrinkage and selection operator (adaptive lasso) was used to estimate all the effects in the multi-locus model, and all the nonzero effects were further identified by likelihood ratio test for true QTL identification. This method was used to reanalyze four rice traits. Among 25 known genes detected in this study, 16 small-effect genes were identified only by GCIM. To further demonstrate GCIM, a series of Monte Carlo simulation experiments was performed. As a result, GCIM is demonstrated to be more powerful than the widely used methods for the detection of closely linked and small-effect QTLs.

2021 ◽  
Author(s):  
Richard F Oppong ◽  
Pau Navarro ◽  
Chris S Haley ◽  
Sara Knott

We describe a genome-wide analytical approach, SNP and Haplotype Regional Heritability Mapping (SNHap-RHM), that provides regional estimates of the heritability across locally defined regions in the genome. This approach utilises relationship matrices that are based on sharing of SNP and haplotype alleles at local haplotype blocks delimited by recombination boundaries in the genome. We implemented the approach on simulated data and show that the haplotype-based regional GRMs capture variation that is complementary to that captured by SNP-based regional GRMs, and thus justifying the fitting of the two GRMs jointly in a single analysis (SNHap-RHM). SNHap-RHM captures regions in the genome contributing to the phenotypic variation that existing genome-wide analysis methods may fail to capture. We further demonstrate that there are real benefits to be gained from this approach by applying it to real data from about 20,000 individuals from the Generation Scotland: Scottish Family Health Study. We analysed height and major depressive disorder (MDD). We identified seven genomic regions that are genome-wide significant for height, and three regions significant at a suggestive threshold (p-value <1x10^(-5) ) for MDD. These significant regions have genes mapped to within 400kb of them. The genes mapped for height have been reported to be associated with height in humans, whiles those mapped for MDD have been reported to be associated with major depressive disorder and other psychiatry phenotypes. The results show that SNHap-RHM presents an exciting new opportunity to analyse complex traits by allowing the joint mapping of novel genomic regions tagged by either SNPs or haplotypes, potentially leading to the recovery of some of the "missing" heritability.


2020 ◽  
Author(s):  
Sarah W. Curtis ◽  
Daniel Chang ◽  
Myoung Keun Lee ◽  
John R. Shaffer ◽  
Karlijne Indencleef ◽  
...  

AbstractNonsyndromic orofacial clefts (OFCs) are the most common craniofacial birth defect in humans and, like many complex traits, OFCs are phenotypically and etiologically heterogenous. The phenotypic heterogeneity of OFCs extends beyond the structures affected by the cleft (e.g., cleft lip (CL) and cleft lip and palate (CLP) to other features, such as the severity of the cleft. Here, we focus on bilateral and unilateral clefts as one dimension of OFC severity. Unilateral clefts are more frequent than bilateral clefts for both CL and CLP, but the genetic architecture of these subtypes is not well understood, and it is not known if genetic variants predispose for the formation of one subtype over another. Therefore, we tested for subtype-specific genetic associations in 44 bilateral CL (BCL) cases, 434 unilateral CL (UCL) cases, 530 bilateral CLP cases (BCLP), 1123 unilateral CLP (UCLP) cases, and unrelated controls (N = 1626), using the mixed-model approach implemented in GENESIS. While no novel loci were found in subtype-specific analyses comparing cases to controls, the genetic architecture of UCL was distinct compared to BCL, with 43.8% of suggestive loci (p < 1.0×10−5) having non-overlapping confidence intervals between the two subtypes. To further understand the genetic risk factors for severity differences, we then performed a genome-wide scan for modifiers using a similar mixed-model approach and found one genome-wide significant modifier locus on 20p11 (p = 7.53×10−9), 300kb downstream of PAX1, associated with higher odds of BCL compared to UCL, which also replicated in an independent cohort (p = 0.0018) and showed no effect in BCLP (p>0.05). We further found that SNPs at this locus were associated with normal human nasal shape. Taken together, these results suggest bilateral and unilateral clefts may have differences in their genetic architecture, especially between CL and CLP. Moreover, our results suggest BCL, the rarest form of OFC, may be genetically distinct from the other OFC subtypes. This expands our understanding of genetic modifiers for subtypes of OFCs and further elucidates the genetic mechanisms behind the phenotypic heterogeneity in OFCs.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Morteza Bitaraf Sani ◽  
Javad Zare Harofte ◽  
Mohammad Hossein Banabazi ◽  
Saeid Esmaeilkhanian ◽  
Ali Shafei Naderi ◽  
...  

AbstractFor thousands of years, camels have produced meat, milk, and fiber in harsh desert conditions. For a sustainable development to provide protein resources from desert areas, it is necessary to pay attention to genetic improvement in camel breeding. By using genotyping-by-sequencing (GBS) method we produced over 14,500 genome wide markers to conduct a genome- wide association study (GWAS) for investigating the birth weight, daily gain, and body weight of 96 dromedaries in the Iranian central desert. A total of 99 SNPs were associated with birth weight, daily gain, and body weight (p-value < 0.002). Genomic breeding values (GEBVs) were estimated with the BGLR package using (i) all 14,522 SNPs and (ii) the 99 SNPs by GWAS. Twenty-eight SNPs were associated with birth weight, daily gain, and body weight (p-value < 0.001). Annotation of the genomic region (s) within ± 100 kb of the associated SNPs facilitated prediction of 36 candidate genes. The accuracy of GEBVs was more than 0.65 based on all 14,522 SNPs, but the regression coefficients for birth weight, daily gain, and body weight were 0.39, 0.20, and 0.23, respectively. Because of low sample size, the GEBVs were predicted using the associated SNPs from GWAS. The accuracy of GEBVs based on the 99 associated SNPs was 0.62, 0.82, and 0.57 for birth weight, daily gain, and body weight. This report is the first GWAS using GBS on dromedary camels and identifies markers associated with growth traits that could help to plan breeding program to genetic improvement. Further researches using larger sample size and collaboration of the camel farmers and more profound understanding will permit verification of the associated SNPs identified in this project. The preliminary results of study show that genomic selection could be the appropriate way to genetic improvement of body weight in dromedary camels, which is challenging due to a long generation interval, seasonal reproduction, and lack of records and pedigrees.


2021 ◽  
Vol 11 (1) ◽  
pp. 59
Author(s):  
Kirsten Voorhies ◽  
Joanne E. Sordillo ◽  
Michael McGeachie ◽  
Elizabeth Ampleford ◽  
Alberta L. Wang ◽  
...  

An unaddressed and important issue is the role age plays in modulating response to short acting β2-agonists in individuals with asthma. The objective of this study was to identify whether age modifies genetic associations of single nucleotide polymorphisms (SNPs) with bronchodilator response (BDR) to β2-agonists. Using three cohorts with a total of 892 subjects, we ran a genome wide interaction study (GWIS) for each cohort to examine SNP by age interactions with BDR. A fixed effect meta-analysis was used to combine the results. In order to determine if previously identified BDR SNPs had an age interaction, we also examined 16 polymorphisms in candidate genes from two published genome wide association studies (GWAS) of BDR. There were no significant SNP by age interactions on BDR using the genome wide significance level of 5 × 10−8. Using a suggestive significance level of 5 × 10−6, three interactions, including one for a SNP within PRAG1 (rs4840337), were significant and replicated at the significance level of 0.05. Considering candidate genes from two previous GWAS of BDR, three SNPs (rs10476900 (near ADRB2) [p-value = 0.009], rs10827492 (CREM) [p-value = 0.02], and rs72646209 (NCOA3) [p-value = 0.02]) had a marginally significant interaction with age on BDR (p < 0.05). Our results suggest age may be an important modifier of genetic associations for BDR in asthma.


Author(s):  
Ying Zhang ◽  
Yuxin Song ◽  
Jin Gao ◽  
Hengyu Zhang ◽  
Ning Yang ◽  
...  

AbstractA hierarchical random regression model (Hi-RRM) was extended into a genome-wide association analysis for longitudinal data, which significantly reduced the dimensionality of repeated measurements. The Hi-RRM first modeled the phenotypic trajectory of each individual using a RRM and then associated phenotypic regressions with genetic markers using a multivariate mixed model (mvLMM). By spectral decomposition of genomic relationship and regression covariance matrices, the mvLMM was transformed into a multiple linear regression, which improved computing efficiency while implementing mvLMM associations in efficient mixed-model association expedited (EMMAX). Compared with the existing RRM-based association analyses, the statistical utility of Hi-RRM was demonstrated by simulation experiments. The method proposed here was also applied to find the quantitative trait nucleotides controlling the growth pattern of egg weights in poultry data.


2019 ◽  
Author(s):  
Jan A. Freudenthal ◽  
Markus J. Ankenbrand ◽  
Dominik G. Grimm ◽  
Arthur Korte

AbstractMotivationGenome-wide association studies (GWAS) are one of the most commonly used methods to detect associations between complex traits and genomic polymorphisms. As both genotyping and phenotyping of large populations has become easier, typical modern GWAS have to cope with massive amounts of data. Thus, the computational demand for these analyses grew remarkably during the last decades. This is especially true, if one wants to implement permutation-based significance thresholds, instead of using the naïve Bonferroni threshold. Permutation-based methods have the advantage to provide an adjusted multiple hypothesis correction threshold that takes the underlying phenotypic distribution into account and will thus remove the need to find the correct transformation for non Gaussian phenotypes. To enable efficient analyses of large datasets and the possibility to compute permutation-based significance thresholds, we used the machine learning framework TensorFlow to develop a linear mixed model (GWAS-Flow) that can make use of the available CPU or GPU infrastructure to decrease the time of the analyses especially for large datasets.ResultsWe were able to show that our application GWAS-Flow outperforms custom GWAS scripts in terms of speed without loosing accuracy. Apart from p-values, GWAS-Flow also computes summary statistics, such as the effect size and its standard error for each individual marker. The CPU-based version is the default choice for small data, while the GPU-based version of GWAS-Flow is especially suited for the analyses of big data.AvailabilityGWAS-Flow is freely available on GitHub (https://github.com/Joyvalley/GWAS_Flow) and is released under the terms of the MIT-License.


2019 ◽  
Vol 36 (12) ◽  
pp. 2890-2905 ◽  
Author(s):  
Christos Vlachos ◽  
Robert Kofler

Abstract Evolve and resequence (E&R) studies are frequently used to dissect the genetic basis of quantitative traits. By subjecting a population to truncating selection for several generations and estimating the allele frequency differences between selected and nonselected populations using next-generation sequencing (NGS), the loci contributing to the selected trait may be identified. The role of different parameters, such as, the population size or the number of replicate populations has been examined in previous works. However, the influence of the selection regime, that is the strength of truncating selection during the experiment, remains little explored. Using whole genome, individual based forward simulations of E&R studies, we found that the power to identify the causative alleles may be maximized by gradually increasing the strength of truncating selection during the experiment. Notably, such an optimal selection regime comes at no or little additional cost in terms of sequencing effort and experimental time. Interestingly, we also found that a selection regime which optimizes the power to identify the causative loci is not necessarily identical to a regime that maximizes the phenotypic response. Finally, our simulations suggest that an E&R study with an optimized selection regime may have a higher power to identify the genetic basis of quantitative traits than a genome-wide association study, highlighting that E&R is a powerful approach for finding the loci underlying complex traits.


Blood ◽  
2008 ◽  
Vol 112 (11) ◽  
pp. 3788-3788
Author(s):  
Liliana H Mochmann ◽  
Konrad Neumann ◽  
Juliane Bock ◽  
Jutta Ortiz Tanchez ◽  
Arend Bohne ◽  
...  

Abstract The Ets related gene, ERG, encodes a transcription factor with a vital role in hematopoiesis. Recent findings have shown that ERG knockout mice require a minimum of one functional allele to ensure embryonic blood development and adult stem cell maintenance. Moreover, it was earlier reported that enforced expression of ERG induced oncogenic transformation in 3T3 cells. Overexpression of ERG, observed in a subset of acute T-lymphoblastic and acute myeloid leukemia patients, was associated with an inferior outcome. However, the impact of ERG contributing to this unfavourable phenotype has yet to be determined, as downstream targets of ERG in leukemia remain unknown. Herein, we conducted a genome-wide analysis of ERG target genes in T-lymphoblastic leukemia. Chromatin immunoprecipitation-on-chip array (ChIP-on-chip) analyses were performed using two ERG specific antibodies for the enrichment of ERG-bound DNA templates in T-lymphoblastic leukemia cells (Jurkat) with input DNA or IgG precipitated DNA as controls. Enriched DNA templates and control DNA were differentially labelled and co-hybridized to high resolution promoter chip arrays with 50–75mer probes (770,000) representing 29,000 annotated human transcripts (NimbleGen). Based on two independent ChIP-on-chip assays, bioinformatic analysis (ACME) yielded statistically significant enriched peaks (using a sliding window of 1000 bp, and a P-value < 0.0001) identifying promoter regions of 365 potential ERG target genes. From these genes, clustering by functional annotation was performed using the DAVID database and subsequently genes related to leukemia were further selected for quantitative PCR validation. The design of promoter primers included the highly conserved ETS GGAA DNA binding site. Genes with greater than two-fold enrichment (ERG ChIP versus control) included WNT2 (17-fold), OLIG2 (14-fold), WNT11 (7-fold), CCND1 (5-fold), WNT9A (4-fold), CD7 (3-fold), EPO (3-fold), ERBB4 (3-fold), RPBJL (3-fold), TRADD (3-fold), PIWIL1 (2-fold), TNFRSF25 (2-fold), TWIST1 (2-fold), and HDAC4 (2-fold). Interestingly, enriched target genes involved in developmental processes (WNT2, WNT9A, WNT11, TWIST1, PIWIL1, ERBB4, and OLIG2) have shown oncogenic potential when mutated or overexpressed. Thus, we hypothesize that overexpression of ERG may contribute to T-cell leukemogenesis by the deregulation of these oncogenic targets. Further disclosure of ERG directed downstream pathways may contribute to the design of specific treatment strategies (such as WNT inhibitors) with particular effectiveness in ERG deregulated leukemia.


2014 ◽  
Vol 32 (3_suppl) ◽  
pp. 42-42
Author(s):  
Eric Morgen ◽  
Xiaowei Shen ◽  
Thomas L. Vaughan ◽  
David Whiteman ◽  
Anna H. Wu ◽  
...  

42 Background: Methods of stratifying esophageal adenocarcinoma patients into prognostic groups are needed, as are new insights into genetic determinants of disease behaviour. Prognosis is likely to have non-negligible genetic influences, as mediated by host responses to tumor, resistance to therapeutic side-effects, and/or an influence on tumor development. Prior studies have used candidate-gene approaches. We took an alternative approach, using an unbiased, genome-wide approach, and novel analytic methods that may be better able to detect multi-gene interactions, which may contribute the majority of genetic effects for many clinical phenotypes. Methods: Germline DNA from a Toronto-based cohort of EAC patients (n=270) was analyzed by Omni1 Quad microarray as part of the BEAGESS initiative. Quality control and analysis was performed using PLINK, R, and GenABEL software packages. A Cox proportional hazards (CPH) model for progression-free survival tested each polymorphism for independent effects at a genome-wide significance level of P < 1E-07, adjusting for population stratification. While classical analysis has limited ability to detect gene-gene interactions, a Random Survival Forest algorithm was used to detect effects based on the complex interactions among top 1,000 polymorphisms by p-value ranking. Results: After data cleaning and standard GWAS quality control procedures, there were 735,309 SNPs and 245 patients remaining for analysis. The CPH model, adjusted for population stratification, produced a satisfactory Q-Q plot, and showed one SNP (rs7844673, Chr 8) that was significant at p=7.8E-8. In addition, Random Forest based variable selection produced a set of 20 polymorphisms that (1) reproduced 86% of the predictive ability of the full 1000 variables, and (2) also included the #3 ranked polymorphism by CPH modeling (rs9290822, Chr 3) upstream of the IGF2BP2 gene. Conclusions: A genome-wide approach has discovered two previously undescribed SNPs with a potential influence on EAC prognosis via a combination of independent and interactive effects. Validation in an independent cohort is currently being pursued.


Sign in / Sign up

Export Citation Format

Share Document