scholarly journals Optimizing the identification of causal variants across varying genetic architectures in crops

2018 ◽  
Author(s):  
Chenyong Miao ◽  
Jinliang Yang ◽  
James C. Schnable

AbstractBackgroundAssociation studies use statistical links between genetic markers and variation in a phenotype’s value across many individuals to identify genes controlling variation in the target phenotype. However, this approach, particularly conducted on a genome-wide scale (GWAS), has limited power to identify the genes responsible for variation in traits controlled by complex genetic architectures.ResultsHere we employ simulation studies utilizing real-world genotype datasets from association populations in four species with distinct minor allele frequency distributions, population structures, and patterns linkage disequilibrium to evaluate the impact of variation in both heritability and trait complexity on both conventional mixed linear model based GWAS and two new approaches specifically developed for complex traits. Mixed linear model based GWAS rapidly losses power for more complex traits. FarmCPU, a method based on multi-locus mixed linear models, provides the greatest statistical power for moderately complex traits. A Bayesian approach adopted from genomic prediction provides the greatest statistical power to identify causal genetic loci for extremely complex traits.ConclusionsUsing estimates of the complexity of the genetic architecture of target traits can guide the selection of appropriate statistical methods and improve the overall accuracy and power of GWAS.

2020 ◽  
Author(s):  
Jiabo Wang ◽  
Zhiwu Zhang

AbstractGenome-Wide Association Study (GWAS) and Genomic Prediction/Selection (GP/GS) are the two essential enterprises in genomic research. Due to the great magnitude and complexity of genomic data, analytical methods and their associated software packages are frequently advanced. GAPIT is a widely used Genomic Association and Prediction Integrated Tool. The first version was released to the public in 2012 with the implementation of the general linear model (GLM), mixed linear model (MLM), compressed MLM, and genomic Best Linear Unbiased Prediction (gBLUP). The second version was released in 2016 with several new implementations, including Enriched Compressed MLM and Settlement of mixed linear models Under Progressively Exclusive Relationship (SUPER). All the GWAS methods are based on the single locus test. For the first time, in the current release of GAPIT, version 3 implemented three multiple loci test methods, including Multiple Loci Mixed Model (MLMM), Fixed and random model Circulating Probability Unification (FarmCPU), and Bayesian-information and Linkage-disequilibrium Iteratively Nested Keyway (BLINK). Additionally, two GP/GS methods were implemented based on Compressed MLM, named compressed BLUP, and SUPER, named SUPER BLUP. These new implementations not only boost statistical power for GWAS and prediction accuracy for GP/GS, but also improve computing speed and increase the capacity to analyze big genomic data. Here, we document the current upgrade of GAPIT by describing the selection of the recently developed methods, their implementation, and potential impact. All documents, including source code, user manual, demo data, and tutorials, are freely available at the GAPIT website (http://zzlab.net/GAPIT).


2010 ◽  
Vol 42 (4) ◽  
pp. 355-360 ◽  
Author(s):  
Zhiwu Zhang ◽  
Elhan Ersoz ◽  
Chao-Qiang Lai ◽  
Rory J Todhunter ◽  
Hemant K Tiwari ◽  
...  

2004 ◽  
Vol 61 (1) ◽  
pp. 122-133 ◽  
Author(s):  
Yan Jiao ◽  
Yong Chen ◽  
David Schneider ◽  
Joe Wroblewski

Stock–recruitment (S–R) models are commonly fitted to S–R data with a least-squares method. Errors in modeling are usually assumed to be normal or lognormal, regardless of whether such an assumption is realistic. A Monte Carlo simulation approach was used to evaluate the impact of the assumption of error structure on S–R modeling. The generalized linear model, which can readily deal with different error structures, was used in estimating parameters. This study suggests that the quality of S–R parameter estimation, measured by estimation errors, can be influenced by the realism of error structure assumed in an estimation, the number of S–R data points, and the number of outliers in modeling. A small number of S–R data points and the presence of outliers in S–R data could increase the difficulty in identifying an appropriate error structure in modeling, which might lead to large biases in the S–R param eter estimation. This study shows that generalized linear model methods can help identify an appropriate error distribution in S–R modeling, leading to an improved estimation of parameters even when there are outliers and the number of S–R data points is small. We recommend the generalized linear model be used for quantifying stock–recruitment relationships.


2009 ◽  
Vol 296 (5) ◽  
pp. L713-L725 ◽  
Author(s):  
Li Gao ◽  
Kathleen C. Barnes

It has been well established that acute lung injury (ALI), and the more severe presentation of acute respiratory distress syndrome (ARDS), constitute complex traits characterized by a multigenic and multifactorial etiology. Identification and validation of genetic variants contributing to disease susceptibility and severity has been hampered by the profound heterogeneity of the clinical phenotype and the role of environmental factors, which includes treatment, on outcome. The critical nature of ALI and ARDS, compounded by the impact of phenotypic heterogeneity, has rendered the amassing of sufficiently powered studies especially challenging. Nevertheless, progress has been made in the identification of genetic variants in select candidate genes, which has enhanced our understanding of the specific pathways involved in disease manifestation. Identification of novel candidate genes for which genetic association studies have confirmed a role in disease has been greatly aided by the powerful tool of high-throughput expression profiling. This article will review these studies to date, summarizing candidate genes associated with ALI and ARDS, acknowledging those that have been replicated in independent populations, with a special focus on the specific pathways for which candidate genes identified so far can be clustered.


Genome ◽  
2010 ◽  
Vol 53 (11) ◽  
pp. 876-883 ◽  
Author(s):  
Ben Hayes ◽  
Mike Goddard

Results from genome-wide association studies in livestock, and humans, has lead to the conclusion that the effect of individual quantitative trait loci (QTL) on complex traits, such as yield, are likely to be small; therefore, a large number of QTL are necessary to explain genetic variation in these traits. Given this genetic architecture, gains from marker-assisted selection (MAS) programs using only a small number of DNA markers to trace a limited number of QTL is likely to be small. This has lead to the development of alternative technology for using the available dense single nucleotide polymorphism (SNP) information, called genomic selection. Genomic selection uses a genome-wide panel of dense markers so that all QTL are likely to be in linkage disequilibrium with at least one SNP. The genomic breeding values are predicted to be the sum of the effect of these SNPs across the entire genome. In dairy cattle breeding, the accuracy of genomic estimated breeding values (GEBV) that can be achieved and the fact that these are available early in life have lead to rapid adoption of the technology. Here, we discuss the design of experiments necessary to achieve accurate prediction of GEBV in future generations in terms of the number of markers necessary and the size of the reference population where marker effects are estimated. We also present a simple method for implementing genomic selection using a genomic relationship matrix. Future challenges discussed include using whole genome sequence data to improve the accuracy of genomic selection and management of inbreeding through genomic relationships.


2020 ◽  
Author(s):  
Jiawen Chen ◽  
Jing You ◽  
Zijie Zhao ◽  
Zheng Ni ◽  
Kunling Huang ◽  
...  

AbstractPolygenic risk scores (PRS) derived from summary statistics of genome-wide association studies (GWAS) have enjoyed great popularity in human genetics research. Applied to population cohorts, PRS can effectively stratify individuals by risk group and has promising applications in early diagnosis and clinical intervention. However, our understanding of within-family polygenic risk is incomplete, in part because the small samples per family significantly limits power. Here, to address this challenge, we introduce ORIGAMI, a computational framework that uses parental genotype data to simulate offspring genomes. ORIGAMI uses state-of-the-art genetic maps to simulate realistic recombination events on phased parental genomes and allows quantifying the prospective PRS variability within each family. We quantify and showcase the substantially reduced yet highly heterogeneous PRS variation within families for numerous complex traits. Further, we incorporate within-family PRS variability to improve polygenic transmission disequilibrium test (pTDT). Through simulations, we demonstrate that modeling within-family risk substantially improves the statistical power of pTDT. Applied to 7,805 trios of autism spectrum disorder (ASD) probands and healthy parents, we successfully replicated previously reported over-transmission of ASD, educational attainment, and schizophrenia risk, and identified multiple novel traits with significant transmission disequilibrium. These results provided novel etiologic insights into the shared genetic basis of various complex traits and ASD.


2019 ◽  
Vol 56 (1) ◽  
pp. 45-57
Author(s):  
Iwona Mejza ◽  
Katarzyna Ambroży-Deręgowska ◽  
Jan Bocianowski ◽  
Józef Błażewicz ◽  
Marek Liszewski ◽  
...  

SummaryThe main purpose of this study was the model fitting of data deriving from a three-year experiment with barley malt. Two linear models were considered: a fixed linear model with fixed effects of years and other factors, and a mixed linear model with random effects of years and fixed effects of other factors. Two cultivars of brewing barley, Sebastian and Mauritia, six methods of nitrogen fertilization and four germination times were analyzed. Three quantitative traits were observed: practical extractivity of the malt, malting productivity, and a quality coefficient Q. The starting point for the statistical analyses was the available experimental material, which consisted of barley grain samples destined for malting. The analyses were performed over a series of years with respect to fixed or random effects of years. Due to the strong differentiation of the years of the study and some significant interactions of factors with years, annual analyses were also carried out.


2015 ◽  
Author(s):  
Guo-Bo Chen ◽  
Sang Hong Lee ◽  
Matthew R Robinson ◽  
Maciej Trzaskowski ◽  
Zhi-Xiang Zhu ◽  
...  

Genome-wide association studies (GWASs) have been successful in discovering replicable SNP-trait associations for many quantitative traits and common diseases in humans. Typically the effect sizes of SNP alleles are very small and this has led to large genome-wide association meta-analyses (GWAMA) to maximize statistical power. A trend towards ever-larger GWAMA is likely to continue, yet dealing with summary statistics from hundreds of cohorts increases logistical and quality control problems, including unknown sample overlap, and these can lead to both false positive and false negative findings. In this study we propose a new set of metrics and visualization tools for GWAMA, using summary statistics from cohort-level GWASs. We proposed a pair of methods in examining the concordance between demographic information and summary statistics. In method I, we use the population genetics Fststatistic to verify the genetic origin of each cohort and their geographic location, and demonstrate using GWAMA data from the GIANT Consortium that geographic locations of cohorts can be recovered and outlier cohorts can be detected. In method II, we conduct principal component analysis based on reported allele frequencies, and is able to recover the ancestral information for each cohort. In addition, we propose a new statistic that uses the reported allelic effect sizes and their standard errors to identify significant sample overlap or heterogeneity between pairs of cohorts. Finally, to quantify unknown sample overlap across all pairs of cohorts we propose a method that uses randomly generated genetic predictors that does not require the sharing of individual-level genotype data and does not breach individual privacy.


Author(s):  
Jinglu Hu ◽  
◽  
Kotaro Hirasawa ◽  
Kousuke Kumamaru ◽  

This paper proposes a neurofuzzy approach to fault detection in linear systems. The system diagnosed is described by using a neurofuzzy model called LimNet that consists of a linear model and multiple local linear models with interpolation of a "fuzzy basis function". Fault detection is considered in two cases: when faults occur in the linear model part, a KDI-based robust fault detection is applied, where a multi-local-model part is treated as error due to nonlinear undermodeling; when faults occur in the multi-local-model part, a multi-model based fault detection method is developed, in which the identified LimNet is interpreted as several local ARMAX models, and KDI is used as an index to discriminate between each local model and its reference. This paper mainly concentrates discussions on multi-model based fault detection.


2021 ◽  
Author(s):  
Hector Roux de Bezieux ◽  
Leandro Lima ◽  
Fanny Perraudeau ◽  
Arnaud Mary ◽  
Sandrine Dudoit ◽  
...  

Genome wide association studies (GWAS), aiming to find genetic variants associated with a trait, have widely been used on bacteria to identify genetic determinants of drug resistance or hypervirulence. Recent bacterial GWAS methods usually rely on k-mers, whose presence in a genome can denote variants ranging from single nucleotide polymorphisms to mobile genetic elements. Since many bacterial species include genes that are not shared among all strains, this approach avoids the reliance on a common reference genome. However, the same gene can exist in slightly different versions across different strains, leading to diluted effects when trying to detect its association to a phenotype through k-mer based GWAS. Here we propose to overcome this by testing covariates built from closed connected subgraphs of the De Bruijn graph defined over genomic k-mers. These covariates are able to capture polymorphic genes as a single entity, improving k-mer based GWAS in terms of power and interpretability. As the number of subgraphs is exponential in the number of nodes in the DBG, a method naively testing all possible subgraphs would result in very low statistical power due to multiple testing corrections, and the mere exploration of these subgraphs would quickly become computationally intractable. The concept of testable hypothesis has successfully been used to address both problems in similar contexts. We leverage this concept to test all closed connected subgraphs by proposing a novel enumeration scheme for these objects which fully exploits the pruning opportunity offered by testability, resulting in drastic improvements in computational efficiency. We illustrate this on both real and simulated datasets and also demonstrate how considering subgraphs leads to a more powerful and interpretable method. Our method integrates with existing visual tools to facilitate interpretation. We also provide an implementation of our method, as well as code to reproduce all results at https://github.com/HectorRDB/Caldera_Recomb.


Sign in / Sign up

Export Citation Format

Share Document