BayesSUR: An R Package for High-Dimensional Multivariate Bayesian Variable and Covariance Selection in Linear Regression

Abstract Background Statistical analyses of biological problems in life sciences often lead to high-dimensional linear models. To solve the corresponding system of equations, penalization approaches are often the methods of choice. They are especially useful in case of multicollinearity, which appears if the number of explanatory variables exceeds the number of observations or for some biological reason. Then, the model goodness of fit is penalized by some suitable function of interest. Prominent examples are the lasso, group lasso and sparse-group lasso. Here, we offer a fast and numerically cheap implementation of these operators via proximal gradient descent. The grid search for the penalty parameter is realized by warm starts. The step size between consecutive iterations is determined with backtracking line search. Finally, seagull -the R package presented here- produces complete regularization paths. Results Publicly available high-dimensional methylation data are used to compare seagull to the established R package SGL. The results of both packages enabled a precise prediction of biological age from DNA methylation status. But even though the results of seagull and SGL were very similar (R2 > 0.99), seagull computed the solution in a fraction of the time needed by SGL. Additionally, seagull enables the incorporation of weights for each penalized feature. Conclusions The following operators for linear regression models are available in seagull: lasso, group lasso, sparse-group lasso and Integrative LASSO with Penalty Factors (IPF-lasso). Thus, seagull is a convenient envelope of lasso variants.

Download Full-text

MLRMPA: An R package of multiple linear regression model population analysis based on a cluster sampling technique for variable selection of high dimensional data

Chemometrics and Intelligent Laboratory Systems ◽

10.1016/j.chemolab.2014.01.010 ◽

2014 ◽

Vol 132 ◽

pp. 124-132 ◽

Cited By ~ 4

Author(s):

Meihong Xie ◽

Fangfang Deng ◽

Xiaoyun Zhang ◽

Yueli Tian ◽

Peizhen Li ◽

...

Keyword(s):

Linear Regression ◽

Regression Model ◽

Linear Regression Model ◽

Population Analysis ◽

Multiple Linear Regression Model ◽

Sampling Technique ◽

R Package ◽

High Dimensional ◽

Cluster Sampling ◽

Selection Of

Download Full-text

Variable Clustering in High-Dimensional Linear Regression: The R Package clere

The R Journal ◽

10.32614/rj-2016-006 ◽

2016 ◽

Vol 8 (1) ◽

pp. 92 ◽

Cited By ~ 1

Author(s):

Loïc Yengo ◽

Julien Jacques ◽

Christophe Biernacki ◽

Mickael Canouil

Keyword(s):

Linear Regression ◽

R Package ◽

High Dimensional ◽

Variable Clustering

Download Full-text

SPReM: Sparse Projection Regression Model For High-Dimensional Linear Regression

Journal of the American Statistical Association ◽

10.1080/01621459.2014.892008 ◽

2015 ◽

Vol 110 (509) ◽

pp. 289-302 ◽

Cited By ~ 6

Author(s):

Qiang Sun ◽

Hongtu Zhu ◽

Yufeng Liu ◽

Joseph G. Ibrahim

Keyword(s):

Linear Regression ◽

Regression Model ◽

High Dimensional ◽

Sparse Projection

Download Full-text

An R Package for Divergence Analysis of Omics Data

10.1101/720391 ◽

2019 ◽

Author(s):

Wikum Dinalankara ◽

Qian Ke ◽

Donald Geman ◽

Luigi Marchionni

Keyword(s):

High Throughput Sequencing ◽

R Package ◽

The Cancer Genome Atlas ◽

High Dimensional ◽

Omics Data ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Ternary Code ◽

Cancer Genome Atlas ◽

Level Analysis

AbstractGiven the ever-increasing amount of high-dimensional and complex omics data becoming available, it is increasingly important to discover simple but effective methods of analysis. Divergence analysis transforms each entry of a high-dimensional omics profile into a digitized (binary or ternary) code based on the deviation of the entry from a given baseline population. This is a novel framework that is significantly different from existing omics data analysis methods: it allows digitization of continuous omics data at the univariate or multivariate level, facilitates sample level analysis, and is applicable on many different omics platforms. The divergence package, available on the R platform through the Bioconductor repository collection, provides easy-to-use functions for carrying out this transformation. Here we demonstrate how to use the package with sample high throughput sequencing data from the Cancer Genome Atlas.

Download Full-text

R/qtlcharts: interactive graphics for quantitative trait locus mapping

10.1101/011437 ◽

2014 ◽

Cited By ~ 1

Author(s):

Karl W Broman

Keyword(s):

Quantitative Trait Locus ◽

Quantitative Trait Locus Mapping ◽

Quantitative Trait ◽

Quantitative Traits ◽

R Package ◽

High Dimensional ◽

Interactive Graphics ◽

Phenotype Data ◽

Trait Locus ◽

Locus Mapping

Every data visualization can be improved with some level of interactivity. Interactive graphics hold particular promise for the exploration of high-dimensional data. R/qtlcharts is an R package to create interactive graphics for experiments to map quantitative trait loci (QTL; genetic loci that influence quantitative traits). R/qtlcharts serves as a companion to the R/qtl package, providing interactive versions of R/qtl's static graphs, as well as additional interactive graphs for the exploration of high-dimensional genotype and phenotype data.

Download Full-text

An R package for divergence analysis of omics data

PLoS ONE ◽

10.1371/journal.pone.0249002 ◽

2021 ◽

Vol 16 (4) ◽

pp. e0249002

Author(s):

Wikum Dinalankara ◽

Qian Ke ◽

Donald Geman ◽

Luigi Marchionni

Keyword(s):

R Package ◽

The Cancer Genome Atlas ◽

High Dimensional ◽

Omics Data ◽

Ternary Code ◽

Cancer Genome Atlas ◽

Level Analysis ◽

Data Analysis Methods ◽

Genome Atlas ◽

Omics Data Analysis

Given the ever-increasing amount of high-dimensional and complex omics data becoming available, it is increasingly important to discover simple but effective methods of analysis. Divergence analysis transforms each entry of a high-dimensional omics profile into a digitized (binary or ternary) code based on the deviation of the entry from a given baseline population. This is a novel framework that is significantly different from existing omics data analysis methods: it allows digitization of continuous omics data at the univariate or multivariate level, facilitates sample level analysis, and is applicable on many different omics platforms. The divergence package, available on the R platform through the Bioconductor repository collection, provides easy-to-use functions for carrying out this transformation. Here we demonstrate how to use the package with data from the Cancer Genome Atlas.

Download Full-text