scholarly journals TSGSIS: a high-dimensional grouped variable selection approach for detection of whole-genome SNP–SNP interactions

2017 ◽  
Vol 33 (22) ◽  
pp. 3595-3602 ◽  
Author(s):  
Yao-Hwei Fang ◽  
Jie-Huei Wang ◽  
Chao A Hsiung
Mathematics ◽  
2021 ◽  
Vol 9 (3) ◽  
pp. 222
Author(s):  
Juan C. Laria ◽  
M. Carmen Aguilera-Morillo ◽  
Enrique Álvarez ◽  
Rosa E. Lillo ◽  
Sara López-Taruella ◽  
...  

Over the last decade, regularized regression methods have offered alternatives for performing multi-marker analysis and feature selection in a whole genome context. The process of defining a list of genes that will characterize an expression profile remains unclear. It currently relies upon advanced statistics and can use an agnostic point of view or include some a priori knowledge, but overfitting remains a problem. This paper introduces a methodology to deal with the variable selection and model estimation problems in the high-dimensional set-up, which can be particularly useful in the whole genome context. Results are validated using simulated data and a real dataset from a triple-negative breast cancer study.


2018 ◽  
Vol 67 (4) ◽  
pp. 813-839 ◽  
Author(s):  
Anna Bonnet ◽  
Céline Lévy‐Leduc ◽  
Elisabeth Gassiat ◽  
Roberto Toro ◽  
Thomas Bourgeron

Author(s):  
Wencan Zhu ◽  
Céline Lévy-Leduc ◽  
Nils Ternès

Abstract Motivation In genomic studies, identifying biomarkers associated with a variable of interest is a major concern in biomedical research. Regularized approaches are classically used to perform variable selection in high-dimensional linear models. However, these methods can fail in highly correlated settings. Results We propose a novel variable selection approach called WLasso, taking these correlations into account. It consists in rewriting the initial high-dimensional linear model to remove the correlation between the biomarkers (predictors) and in applying the generalized Lasso criterion. The performance of WLasso is assessed using synthetic data in several scenarios and compared with recent alternative approaches. The results show that when the biomarkers are highly correlated, WLasso outperforms the other approaches in sparse high-dimensional frameworks. The method is also illustrated on publicly available gene expression data in breast cancer. Availabilityand implementation Our method is implemented in the WLasso R package which is available from the Comprehensive R Archive Network (CRAN). Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Sierra Bainter ◽  
Thomas Granville McCauley ◽  
Tor D Wager ◽  
Elizabeth Reynolds Losin

In this paper we address the problem of selecting important predictors from some larger set of candidate predictors. Standard techniques are limited by lack of power and high false positive rates. A Bayesian variable selection approach used widely in biostatistics, stochastic search variable selection, can be used instead to combat these issues by accounting for uncertainty in the other predictors of the model. In this paper we present Bayesian variable selection to aid researchers facing this common scenario, along with an online application (https://ssvsforpsych.shinyapps.io/ssvsforpsych/) to perform the analysis and visualize the results. Using an application to predict pain ratings, we demonstrate how this approach quickly identifies reliable predictors, even when the set of possible predictors is larger than the sample size. This technique is widely applicable to research questions that may be relatively data-rich, but with limited information or theory to guide variable selection.


PLoS ONE ◽  
2015 ◽  
Vol 10 (10) ◽  
pp. e0138903 ◽  
Author(s):  
David C. Haws ◽  
Irina Rish ◽  
Simon Teyssedre ◽  
Dan He ◽  
Aurelie C. Lozano ◽  
...  

2021 ◽  
Vol 53 (1) ◽  
Author(s):  
Theo Meuwissen ◽  
Irene van den Berg ◽  
Mike Goddard

Abstract Background Whole-genome sequence (WGS) data are increasingly available on large numbers of individuals in animal and plant breeding and in human genetics through second-generation resequencing technologies, 1000 genomes projects, and large-scale genotype imputation from lower marker densities. Here, we present a computationally fast implementation of a variable selection genomic prediction method, that could handle WGS data on more than 35,000 individuals, test its accuracy for across-breed predictions and assess its quantitative trait locus (QTL) mapping precision. Methods The Monte Carlo Markov chain (MCMC) variable selection model (Bayes GC) fits simultaneously a genomic best linear unbiased prediction (GBLUP) term, i.e. a polygenic effect whose correlations are described by a genomic relationship matrix (G), and a Bayes C term, i.e. a set of single nucleotide polymorphisms (SNPs) with large effects selected by the model. Computational speed is improved by a Metropolis–Hastings sampling that directs computations to the SNPs, which are, a priori, most likely to be included into the model. Speed is also improved by running many relatively short MCMC chains. Memory requirements are reduced by storing the genotype matrix in binary form. The model was tested on a WGS dataset containing Holstein, Jersey and Australian Red cattle. The data contained 4,809,520 genotypes on 35,549 individuals together with their milk, fat and protein yields, and fat and protein percentage traits. Results The prediction accuracies of the Jersey individuals improved by 1.5% when using across-breed GBLUP compared to within-breed predictions. Using WGS instead of 600 k SNP-chip data yielded on average a 3% accuracy improvement for Australian Red cows. QTL were fine-mapped by locating the SNP with the highest posterior probability of being included in the model. Various QTL known from the literature were rediscovered, and a new SNP affecting milk production was discovered on chromosome 20 at 34.501126 Mb. Due to the high mapping precision, it was clear that many of the discovered QTL were the same across the five dairy traits. Conclusions Across-breed Bayes GC genomic prediction improved prediction accuracies compared to GBLUP. The combination of across-breed WGS data and Bayesian genomic prediction proved remarkably effective for the fine-mapping of QTL.


2019 ◽  
Vol 158 (5) ◽  
pp. 210
Author(s):  
Bo Ning ◽  
Alexander Wise ◽  
Jessi Cisewski-Kehe ◽  
Sarah Dodson-Robinson ◽  
Debra Fischer

The Analyst ◽  
2014 ◽  
Vol 139 (19) ◽  
pp. 4836 ◽  
Author(s):  
Bai-chuan Deng ◽  
Yong-huan Yun ◽  
Yi-zeng Liang ◽  
Lun-zhao Yi

Sign in / Sign up

Export Citation Format

Share Document