The Current and Future Use of Ridge Regression for Prediction in Quantitative Genetics

In recent years, there has been a considerable amount of research on the use of regularization methods for inference and prediction in quantitative genetics. Such research mostly focuses on selection of markers and shrinkage of their effects. In this review paper, the use ofridge regressionfor prediction in quantitative genetics usingsingle-nucleotide polymorphismdata is discussed. In particular, we consider (i) the theoretical foundations of ridge regression, (ii) its link to commonly used methods in animal breeding, (iii) the computational feasibility, and (iv) the scope for constructing prediction models with nonlinear effects (e.g.,dominanceandepistasis). Based on a simulation study we gauge the current and future potential of ridge regression for prediction of human traits using genome-wide SNP data. We conclude that, for outcomes with a relatively simple genetic architecture, given current sample sizes in most cohorts (i.e.,N<10,000) the predictive accuracy of ridge regression is slightly higher than the classicalgenome-wide association studyapproach ofrepeated simple regression(i.e., one regression per SNP). However, both capture only a small proportion of the heritability. Nevertheless, we find evidence that for large-scale initiatives, such as biobanks, sample sizes can be achieved where ridge regression compared to the classical approach improves predictive accuracy substantially.

Download Full-text

A Fast Multi-Locus Ridge Regression Algorithm for High-Dimensional Genome-Wide Association Studies

Frontiers in Genetics ◽

10.3389/fgene.2021.649196 ◽

2021 ◽

Vol 12 ◽

Author(s):

Jin Zhang ◽

Min Chen ◽

Yangjun Wen ◽

Yin Zhang ◽

Yunan Lu ◽

...

Keyword(s):

Ridge Regression ◽

Large Scale ◽

Genome Wide Association Study ◽

Association Studies ◽

Real Data ◽

Genome Wide Association ◽

High Dimensional ◽

Minor Effect ◽

Genome Wide Association Studies ◽

Genome Wide

The mixed linear model (MLM) has been widely used in genome-wide association study (GWAS) to dissect quantitative traits in human, animal, and plant genetics. Most methodologies consider all single nucleotide polymorphism (SNP) effects as random effects under the MLM framework, which fail to detect the joint minor effect of multiple genetic markers on a trait. Therefore, polygenes with minor effects remain largely unexplored in today’s big data era. In this study, we developed a new algorithm under the MLM framework, which is called the fast multi-locus ridge regression (FastRR) algorithm. The FastRR algorithm first whitens the covariance matrix of the polygenic matrix K and environmental noise, then selects potentially related SNPs among large scale markers, which have a high correlation with the target trait, and finally analyzes the subset variables using a multi-locus deshrinking ridge regression for true quantitative trait nucleotide (QTN) detection. Results from the analyses of both simulated and real data show that the FastRR algorithm is more powerful for both large and small QTN detection, more accurate in QTN effect estimation, and has more stable results under various polygenic backgrounds. Moreover, compared with existing methods, the FastRR algorithm has the advantage of high computing speed. In conclusion, the FastRR algorithm provides an alternative algorithm for multi-locus GWAS in high dimensional genomic datasets.

Download Full-text

Bioactivity Prediction Based on Matched Molecular Pair and Matched Molecular Series Methods

Current Pharmaceutical Design ◽

10.2174/1381612826666200427111309 ◽

2020 ◽

Vol 26 (33) ◽

pp. 4195-4205

Author(s):

Xiaoyu Ding ◽

Chen Cui ◽

Dingyan Wang ◽

Jihui Zhao ◽

Mingyue Zheng ◽

...

Keyword(s):

Prediction Model ◽

Large Scale ◽

Prediction Models ◽

Predictive Accuracy ◽

Lead Optimization ◽

Consensus Method ◽

Molecular Pair ◽

Bioactivity Prediction ◽

Compound Synthesis ◽

Consensus Modeling

Background: Enhancing a compound’s biological activity is the central task for lead optimization in small molecules drug discovery. However, it is laborious to perform many iterative rounds of compound synthesis and bioactivity tests. To address the issue, it is highly demanding to develop high quality in silico bioactivity prediction approaches, to prioritize such more active compound derivatives and reduce the trial-and-error process. Methods: Two kinds of bioactivity prediction models based on a large-scale structure-activity relationship (SAR) database were constructed. The first one is based on the similarity of substituents and realized by matched molecular pair analysis, including SA, SA_BR, SR, and SR_BR. The second one is based on SAR transferability and realized by matched molecular series analysis, including Single MMS pair, Full MMS series, and Multi single MMS pairs. Moreover, we also defined the application domain of models by using the distance-based threshold. Results: Among seven individual models, Multi single MMS pairs bioactivity prediction model showed the best performance (R2 = 0.828, MAE = 0.406, RMSE = 0.591), and the baseline model (SA) produced the most lower prediction accuracy (R2 = 0.798, MAE = 0.446, RMSE = 0.637). The predictive accuracy could further be improved by consensus modeling (R2 = 0.842, MAE = 0.397 and RMSE = 0.563). Conclusion: An accurate prediction model for bioactivity was built with a consensus method, which was superior to all individual models. Our model should be a valuable tool for lead optimization.

Download Full-text

Publisher Correction: Genome-wide association study of individual differences of human lymphocyte profiles using large-scale cytometry data

Journal of Human Genetics ◽

10.1038/s10038-020-00890-x ◽

2021 ◽

Author(s):

Daigo Okada ◽

Naotoshi Nakamura ◽

Kazuya Setoh ◽

Takahisa Kawaguchi ◽

Koichiro Higasa ◽

...

Keyword(s):

Individual Differences ◽

Association Study ◽

Human Lymphocyte ◽

Large Scale ◽

Genome Wide Association Study ◽

Genome Wide Association ◽

Genome Wide

Download Full-text

Large-scale integration of meta-QTL and genome-wide association study discovers the genomic regions and candidate genes for yield and yield-related traits in bread wheat

Theoretical and Applied Genetics ◽

10.1007/s00122-021-03881-4 ◽

2021 ◽

Author(s):

Yang Yang ◽

Aduragbemi Amo ◽

Di Wei ◽

Yongmao Chai ◽

Jie Zheng ◽

...

Keyword(s):

Association Study ◽

Candidate Genes ◽

Bread Wheat ◽

Large Scale ◽

Genome Wide Association Study ◽

Genome Wide Association ◽

Genome Wide ◽

Large Scale Integration ◽

Scale Integration ◽

Genomic Regions

Download Full-text

Cerebrospinal fluid metabolomics identifies 19 brain-related phenotype associations

Communications Biology ◽

10.1038/s42003-020-01583-z ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Daniel J. Panyard ◽

Kyeong Mo Kim ◽

Burcu F. Darst ◽

Yuetiva K. Deming ◽

Xiaoyuan Zhong ◽

...

Keyword(s):

Cerebrospinal Fluid ◽

Drug Targets ◽

Large Scale ◽

Prediction Models ◽

Genome Wide Association ◽

Large Samples ◽

Genome Wide ◽

Metabolomic Data ◽

Related Phenotype ◽

Omic Data

AbstractThe study of metabolomics and disease has enabled the discovery of new risk factors, diagnostic markers, and drug targets. For neurological and psychiatric phenotypes, the cerebrospinal fluid (CSF) is of particular importance. However, the CSF metabolome is difficult to study on a large scale due to the relative complexity of the procedure needed to collect the fluid. Here, we present a metabolome-wide association study (MWAS), which uses genetic and metabolomic data to impute metabolites into large samples with genome-wide association summary statistics. We conduct a metabolome-wide, genome-wide association analysis with 338 CSF metabolites, identifying 16 genotype-metabolite associations (metabolite quantitative trait loci, or mQTLs). We then build prediction models for all available CSF metabolites and test for associations with 27 neurological and psychiatric phenotypes, identifying 19 significant CSF metabolite-phenotype associations. Our results demonstrate the feasibility of MWAS to study omic data in scarce sample types.

Download Full-text

Abstract 826: Large-scale genome-wide association study identifies multiple novel germline susceptibility variants associated with bladder cancer risk

10.1158/1538-7445.am2021-826 ◽

2021 ◽

Author(s):

Stella Koutros ◽

Lambertus A. Kiemeney ◽

Roger L. Milne ◽

Yuanqing Ye ◽

Vijai Joseph ◽

...

Keyword(s):

Bladder Cancer ◽

Cancer Risk ◽

Association Study ◽

Large Scale ◽

Genome Wide Association Study ◽

Genome Wide Association ◽

Bladder Cancer Risk ◽

Genome Wide

Download Full-text

Comparison of HapMap and 1000 Genomes Reference Panels in a Large-Scale Genome-Wide Association Study

PLoS ONE ◽

10.1371/journal.pone.0167742 ◽

2017 ◽

Vol 12 (1) ◽

pp. e0167742 ◽

Cited By ~ 14

Author(s):

Paul S. de Vries ◽

Maria Sabater-Lleal ◽

Daniel I. Chasman ◽

Stella Trompet ◽

Tarunveer S. Ahluwalia ◽

...

Keyword(s):

Association Study ◽

Large Scale ◽

Genome Wide Association Study ◽

Genome Wide Association ◽

1000 Genomes ◽

Genome Wide

Download Full-text

Multiple analyses of large-scale genome-wide association study highlight new risk pathways in lumbar spine bone mineral density

Oncotarget ◽

10.18632/oncotarget.8948 ◽

2016 ◽

Vol 7 (21) ◽

pp. 31429-31439 ◽

Cited By ~ 5

Author(s):

Jinsong Wei ◽

Ming Li ◽

Feng Gao ◽

Rong Zeng ◽

Guiyou Liu ◽

...

Keyword(s):

Bone Mineral Density ◽

Lumbar Spine ◽

Association Study ◽

Bone Mineral ◽

Large Scale ◽

Genome Wide Association Study ◽

Spine Bone Mineral Density ◽

Genome Wide Association ◽

Mineral Density ◽

Genome Wide

Download Full-text

Genetic analysis of sucrose concentration in soybean seeds using a historical soybean genomic panel

10.21203/rs.3.rs-158915/v1 ◽

2021 ◽

Author(s):

Alexandra Ficht ◽

Robert W. Bruce ◽

Davoud Torkamaneh ◽

Christopher Grainger ◽

Milad Eskandari ◽

...

Keyword(s):

Genetic Variability ◽

Genome Wide Association Study ◽

Sucrose Concentration ◽

Soybean Seed ◽

Breeding Programs ◽

Snp Data ◽

Genome Wide ◽

A Genome ◽

Important Trait ◽

Genotype By Sequencing

Abstract Soybean (Glycine max (L.) Merr) is a crop of global importance for both human and animal consumption, which was domesticated in China more than 6000 years ago. A concern about losing genetic diversity as a result of decades of breeding has been expressed by soybean researchers. In order to develop new cultivars, it is critical for breeders to understand the genetic variability present for traits of interest in their program germplasm. Sucrose concentration is becoming an increasingly important trait for the production of soy-food products. The objective of this study was to use a genome-wide association study (GWAS) to identify putative QTL for sucrose concentration in soybean seed. A GWAS panel consisting of 266 historic and current soybean accessions was genotyped with 76k genotype-by-sequencing (GBS) SNP data and phenotyped in four field locations in Ontario (Canada) from 2015 to 2017. Seven putative QTL were identified on chromosomes 1, 6, 8, 9, 10, 13 and 14. A key gene related to sucrose synthase (Glyma.06g182700) was found to be associated with the QTL found on chromosome 6. This information will facilitate efforts to increase the available genetic variability for sucrose concentration in soybean breeding programs and develop new and improved high-sucrose soybean cultivars suitable for the soy-food industry.

Download Full-text

Combined effects of host genetics and diet on human gut microbiota and incident disease in a single population cohort

10.1101/2020.09.12.20193045 ◽

2020 ◽

Author(s):

Youwen Qin ◽

Aki S Havulinna ◽

Yang Liu ◽

Pekka Jousilahti ◽

Scott C Ritchie ◽

...

Keyword(s):

Gut Microbiota ◽

Microbial Communities ◽

Large Scale ◽

Genome Wide Association Study ◽

Dietary Habits ◽

Population Based ◽

Genome Wide ◽

Scale Population ◽

Important Health ◽

Disease Analysis

Co-evolution between humans and the microbial communities colonizing them has resulted in an intimate assembly of thousands of microbial species mutualistically living on and in their body and impacting multiple aspects of host physiology and health. Several studies examining whether human genetic variation can affect gut microbiota suggest a complex combination of environmental and host factors. Here, we leverage a single large-scale population-based cohort of 5,959 genotyped individuals with matched gut microbial shotgun metagenomes, dietary information and health records up to 16 years post-sampling, to characterize human genetic variations associated with microbial abundances, and predict possible causal links with various diseases using Mendelian randomization (MR). Genome-wide association study (GWAS) identified 583 independent SNP-taxon associations at genome-wide significance (p<5.0×10-8), which included notable strong associations with LCT (p=5.02×10-35), ABO (p=1.1×10-12), and MED13L (p=1.84×10-12). A combination of genetics and dietary habits was shown to strongly shape the abundances of certain key bacterial members of the gut microbiota, and explain their genetic association. Genetic effects from the LCT locus on Bifidobacterium and three other associated taxa significantly differed according to dairy intake. Variation in mucin-degrading Faecalicatena lactaris abundances were associated with ABO, highlighting a preferential utilization of secreted A/B/AB-antigens as energy source in the gut, irrespectively of fibre intake. Enterococcus faecalis levels showed a robust association with a variant in MED13L, with putative links to colorectal cancer. Finally, we identified putative causal relationships between gut microbes and complex diseases using MR, with a predicted effect of Morganella on major depressive disorder that was consistent with observational incident disease analysis. Overall, we present striking examples of the intricate relationship between humans and their gut microbial communities, and highlight important health implications.

Download Full-text