scholarly journals The Current and Future Use of Ridge Regression for Prediction in Quantitative Genetics

2015 ◽  
Vol 2015 ◽  
pp. 1-18 ◽  
Author(s):  
Ronald de Vlaming ◽  
Patrick J. F. Groenen

In recent years, there has been a considerable amount of research on the use of regularization methods for inference and prediction in quantitative genetics. Such research mostly focuses on selection of markers and shrinkage of their effects. In this review paper, the use ofridge regressionfor prediction in quantitative genetics usingsingle-nucleotide polymorphismdata is discussed. In particular, we consider (i) the theoretical foundations of ridge regression, (ii) its link to commonly used methods in animal breeding, (iii) the computational feasibility, and (iv) the scope for constructing prediction models with nonlinear effects (e.g.,dominanceandepistasis). Based on a simulation study we gauge the current and future potential of ridge regression for prediction of human traits using genome-wide SNP data. We conclude that, for outcomes with a relatively simple genetic architecture, given current sample sizes in most cohorts (i.e.,N<10,000) the predictive accuracy of ridge regression is slightly higher than the classicalgenome-wide association studyapproach ofrepeated simple regression(i.e., one regression per SNP). However, both capture only a small proportion of the heritability. Nevertheless, we find evidence that for large-scale initiatives, such as biobanks, sample sizes can be achieved where ridge regression compared to the classical approach improves predictive accuracy substantially.

2021 ◽  
Vol 12 ◽  
Author(s):  
Jin Zhang ◽  
Min Chen ◽  
Yangjun Wen ◽  
Yin Zhang ◽  
Yunan Lu ◽  
...  

The mixed linear model (MLM) has been widely used in genome-wide association study (GWAS) to dissect quantitative traits in human, animal, and plant genetics. Most methodologies consider all single nucleotide polymorphism (SNP) effects as random effects under the MLM framework, which fail to detect the joint minor effect of multiple genetic markers on a trait. Therefore, polygenes with minor effects remain largely unexplored in today’s big data era. In this study, we developed a new algorithm under the MLM framework, which is called the fast multi-locus ridge regression (FastRR) algorithm. The FastRR algorithm first whitens the covariance matrix of the polygenic matrix K and environmental noise, then selects potentially related SNPs among large scale markers, which have a high correlation with the target trait, and finally analyzes the subset variables using a multi-locus deshrinking ridge regression for true quantitative trait nucleotide (QTN) detection. Results from the analyses of both simulated and real data show that the FastRR algorithm is more powerful for both large and small QTN detection, more accurate in QTN effect estimation, and has more stable results under various polygenic backgrounds. Moreover, compared with existing methods, the FastRR algorithm has the advantage of high computing speed. In conclusion, the FastRR algorithm provides an alternative algorithm for multi-locus GWAS in high dimensional genomic datasets.


2020 ◽  
Vol 26 (33) ◽  
pp. 4195-4205
Author(s):  
Xiaoyu Ding ◽  
Chen Cui ◽  
Dingyan Wang ◽  
Jihui Zhao ◽  
Mingyue Zheng ◽  
...  

Background: Enhancing a compound’s biological activity is the central task for lead optimization in small molecules drug discovery. However, it is laborious to perform many iterative rounds of compound synthesis and bioactivity tests. To address the issue, it is highly demanding to develop high quality in silico bioactivity prediction approaches, to prioritize such more active compound derivatives and reduce the trial-and-error process. Methods: Two kinds of bioactivity prediction models based on a large-scale structure-activity relationship (SAR) database were constructed. The first one is based on the similarity of substituents and realized by matched molecular pair analysis, including SA, SA_BR, SR, and SR_BR. The second one is based on SAR transferability and realized by matched molecular series analysis, including Single MMS pair, Full MMS series, and Multi single MMS pairs. Moreover, we also defined the application domain of models by using the distance-based threshold. Results: Among seven individual models, Multi single MMS pairs bioactivity prediction model showed the best performance (R2 = 0.828, MAE = 0.406, RMSE = 0.591), and the baseline model (SA) produced the most lower prediction accuracy (R2 = 0.798, MAE = 0.446, RMSE = 0.637). The predictive accuracy could further be improved by consensus modeling (R2 = 0.842, MAE = 0.397 and RMSE = 0.563). Conclusion: An accurate prediction model for bioactivity was built with a consensus method, which was superior to all individual models. Our model should be a valuable tool for lead optimization.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Daniel J. Panyard ◽  
Kyeong Mo Kim ◽  
Burcu F. Darst ◽  
Yuetiva K. Deming ◽  
Xiaoyuan Zhong ◽  
...  

AbstractThe study of metabolomics and disease has enabled the discovery of new risk factors, diagnostic markers, and drug targets. For neurological and psychiatric phenotypes, the cerebrospinal fluid (CSF) is of particular importance. However, the CSF metabolome is difficult to study on a large scale due to the relative complexity of the procedure needed to collect the fluid. Here, we present a metabolome-wide association study (MWAS), which uses genetic and metabolomic data to impute metabolites into large samples with genome-wide association summary statistics. We conduct a metabolome-wide, genome-wide association analysis with 338 CSF metabolites, identifying 16 genotype-metabolite associations (metabolite quantitative trait loci, or mQTLs). We then build prediction models for all available CSF metabolites and test for associations with 27 neurological and psychiatric phenotypes, identifying 19 significant CSF metabolite-phenotype associations. Our results demonstrate the feasibility of MWAS to study omic data in scarce sample types.


PLoS ONE ◽  
2017 ◽  
Vol 12 (1) ◽  
pp. e0167742 ◽  
Author(s):  
Paul S. de Vries ◽  
Maria Sabater-Lleal ◽  
Daniel I. Chasman ◽  
Stella Trompet ◽  
Tarunveer S. Ahluwalia ◽  
...  

2021 ◽  
Author(s):  
Alexandra Ficht ◽  
Robert W. Bruce ◽  
Davoud Torkamaneh ◽  
Christopher Grainger ◽  
Milad Eskandari ◽  
...  

Abstract Soybean (Glycine max (L.) Merr) is a crop of global importance for both human and animal consumption, which was domesticated in China more than 6000 years ago. A concern about losing genetic diversity as a result of decades of breeding has been expressed by soybean researchers. In order to develop new cultivars, it is critical for breeders to understand the genetic variability present for traits of interest in their program germplasm. Sucrose concentration is becoming an increasingly important trait for the production of soy-food products. The objective of this study was to use a genome-wide association study (GWAS) to identify putative QTL for sucrose concentration in soybean seed. A GWAS panel consisting of 266 historic and current soybean accessions was genotyped with 76k genotype-by-sequencing (GBS) SNP data and phenotyped in four field locations in Ontario (Canada) from 2015 to 2017. Seven putative QTL were identified on chromosomes 1, 6, 8, 9, 10, 13 and 14. A key gene related to sucrose synthase (Glyma.06g182700) was found to be associated with the QTL found on chromosome 6. This information will facilitate efforts to increase the available genetic variability for sucrose concentration in soybean breeding programs and develop new and improved high-sucrose soybean cultivars suitable for the soy-food industry.


2020 ◽  
Author(s):  
Youwen Qin ◽  
Aki S Havulinna ◽  
Yang Liu ◽  
Pekka Jousilahti ◽  
Scott C Ritchie ◽  
...  

Co-evolution between humans and the microbial communities colonizing them has resulted in an intimate assembly of thousands of microbial species mutualistically living on and in their body and impacting multiple aspects of host physiology and health. Several studies examining whether human genetic variation can affect gut microbiota suggest a complex combination of environmental and host factors. Here, we leverage a single large-scale population-based cohort of 5,959 genotyped individuals with matched gut microbial shotgun metagenomes, dietary information and health records up to 16 years post-sampling, to characterize human genetic variations associated with microbial abundances, and predict possible causal links with various diseases using Mendelian randomization (MR). Genome-wide association study (GWAS) identified 583 independent SNP-taxon associations at genome-wide significance (p<5.0×10-8), which included notable strong associations with LCT (p=5.02×10-35), ABO (p=1.1×10-12), and MED13L (p=1.84×10-12). A combination of genetics and dietary habits was shown to strongly shape the abundances of certain key bacterial members of the gut microbiota, and explain their genetic association. Genetic effects from the LCT locus on Bifidobacterium and three other associated taxa significantly differed according to dairy intake. Variation in mucin-degrading Faecalicatena lactaris abundances were associated with ABO, highlighting a preferential utilization of secreted A/B/AB-antigens as energy source in the gut, irrespectively of fibre intake. Enterococcus faecalis levels showed a robust association with a variant in MED13L, with putative links to colorectal cancer. Finally, we identified putative causal relationships between gut microbes and complex diseases using MR, with a predicted effect of Morganella on major depressive disorder that was consistent with observational incident disease analysis. Overall, we present striking examples of the intricate relationship between humans and their gut microbial communities, and highlight important health implications.


Sign in / Sign up

Export Citation Format

Share Document