GALGO: an R package for multivariate variable selection using genetic algorithms

Victor Trevino; Francesco Falciani

doi:10.1093/bioinformatics/btl074

Correction to: MicroBVS: Dirichlet-tree multinomial regression models with Bayesian variable selection - an R package

BMC Bioinformatics ◽

10.1186/s12859-020-03912-9 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Matthew D. Koslovsky ◽

Marina Vannucci

Keyword(s):

Variable Selection ◽

Regression Models ◽

R Package ◽

Bayesian Variable Selection ◽

Multinomial Regression

An amendment to this paper has been published and can be accessed via the original article.

Download Full-text

Variable selection in STAR models with neighbourhood effects using genetic algorithms

Journal of Forecasting ◽

10.1002/for.1164 ◽

2010 ◽

Vol 29 (8) ◽

pp. 728-750 ◽

Cited By ~ 4

Author(s):

Isolina Alberto ◽

Asunción Beamonte ◽

Pilar Gargallo ◽

Pedro M. Mateo ◽

Manuel Salvador

Keyword(s):

Genetic Algorithms ◽

Variable Selection ◽

Neighbourhood Effects ◽

Star Models

Download Full-text

Genetic Operators Impact on Genetic Algorithms Based Variable Selection

Intelligent Decision Technologies - Smart Innovation, Systems and Technologies ◽

10.1007/978-981-15-5925-9_18 ◽

2020 ◽

pp. 211-221

Author(s):

Marco Vannucci ◽

Valentina Colla ◽

Silvia Cateni

Keyword(s):

Genetic Algorithms ◽

Variable Selection ◽

Genetic Operators

Download Full-text

Several Approaches to Variable Selection by Means of Genetic Algorithms

Intelligent Information Technologies ◽

10.4018/978-1-59904-941-0.ch013 ◽

2011 ◽

pp. 274-292

Author(s):

Marcos Gestal Pose ◽

Alberto Cancela Carollo ◽

José Manuel Andrade Garda ◽

Mari Paz Gomez-Carracedo

Keyword(s):

Analytical Chemistry ◽

Genetic Algorithms ◽

Variable Selection ◽

Evolutionary Computation ◽

Fitness Function ◽

Classification Model ◽

Classification Task ◽

Practical Case ◽

General Method

This chapter shows several approaches to determine how the most relevant subset of variables can perform a classification task. It will permit the improvement and efficiency of the classification model. A particular technique of evolutionary computation, the genetic algorithms, is applied which aim to obtain a general method of variable selection where only the fitness function will be dependent on the particular problem. The solution proposed is applied and tested on a practical case in the field of analytical chemistry to classify apple beverages.

Download Full-text

Fast approximate inference for variable selection in Dirichlet process mixtures, with an application to pan-cancer proteomics

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2018-0065 ◽

2019 ◽

Vol 18 (6) ◽

Author(s):

Oliver M. Crook ◽

Laurent Gatto ◽

Paul D. W. Kirk

Keyword(s):

Variable Selection ◽

Dirichlet Process ◽

Bayesian Model ◽

Bayesian Model Averaging ◽

Model Averaging ◽

R Package ◽

The Cancer Genome Atlas ◽

Fast Method ◽

Model Based Clustering ◽

Pan Cancer

Abstract The Dirichlet Process (DP) mixture model has become a popular choice for model-based clustering, largely because it allows the number of clusters to be inferred. The sequential updating and greedy search (SUGS) algorithm (Wang & Dunson, 2011) was proposed as a fast method for performing approximate Bayesian inference in DP mixture models, by posing clustering as a Bayesian model selection (BMS) problem and avoiding the use of computationally costly Markov chain Monte Carlo methods. Here we consider how this approach may be extended to permit variable selection for clustering, and also demonstrate the benefits of Bayesian model averaging (BMA) in place of BMS. Through an array of simulation examples and well-studied examples from cancer transcriptomics, we show that our method performs competitively with the current state-of-the-art, while also offering computational benefits. We apply our approach to reverse-phase protein array (RPPA) data from The Cancer Genome Atlas (TCGA) in order to perform a pan-cancer proteomic characterisation of 5157 tumour samples. We have implemented our approach, together with the original SUGS algorithm, in an open-source R package named sugsvarsel, which accelerates analysis by performing intensive computations in C++ and provides automated parallel processing. The R package is freely available from: https://github.com/ococrook/sugsvarsel

Download Full-text

Chemically driven variable selection by focused multimodal genetic algorithms in mid-IR spectra

Analytical and Bioanalytical Chemistry ◽

10.1007/s00216-007-1608-1 ◽

2007 ◽

Vol 389 (7-8) ◽

pp. 2331-2342 ◽

Cited By ~ 4

Author(s):

M. P. Gómez-Carracedo ◽

M. Gestal ◽

J. Dorado ◽

J. M. Andrade

Keyword(s):

Genetic Algorithms ◽

Variable Selection ◽

Ir Spectra

Download Full-text

NonpModelCheck: An R Package for Nonparametric Lack-of-Fit Testing and Variable Selection

Journal of Statistical Software ◽

10.18637/jss.v077.i10 ◽

2017 ◽

Vol 77 (10) ◽

Cited By ~ 1

Author(s):

Adriano Zanin Zambom ◽

Michael G. Akritas

Keyword(s):

Variable Selection ◽

R Package ◽

Lack Of Fit ◽

Fit Testing

Download Full-text

BayICE: A hierarchical Bayesian deconvolution model with stochastic search variable selection

10.1101/732743 ◽

2019 ◽

Author(s):

An-Shun Tai ◽

George C. Tseng ◽

Wen-Ping Hsieh

Keyword(s):

Gene Expression ◽

Variable Selection ◽

Immune Cell ◽

Expression Profiles ◽

Gene Expression Profiles ◽

R Package ◽

Stochastic Search ◽

Hierarchical Bayesian ◽

Stochastic Search Variable Selection ◽

Search Variable

AbstractGene expression deconvolution is a powerful tool for exploring the microenvironment of complex tissues comprised of multiple cell groups using transcriptomic data. Characterizing cell activities for a particular condition has been regarded as a primary mission against diseases. For example, cancer immunology aims to clarify the role of the immune system in the progression and development of cancer through analyzing the immune cell components of tumors. To that end, many deconvolution methods have been proposed for inferring cell subpopulations within tissues. Nevertheless, two problems limit the practicality of current approaches. First, all approaches use external purified data to preselect cell type-specific genes that contribute to deconvolution. However, some types of cells cannot be found in purified profiles and the genes specifically over- or under-expressed in them cannot be identified. This is particularly a problem in cancer studies. Hence, a preselection strategy that is independent from deconvolution is inappropriate. The second problem is that existing approaches do not recover the expression profiles of unknown cells present in bulk tissues, which results in biased estimation of unknown cell proportions. Furthermore, it causes the shift-invariant property of deconvolution to fail, which then affects the estimation performance. To address these two problems, we propose a novel deconvolution approach, BayICE, which employs hierarchical Bayesian modeling with stochastic search variable selection. We develop a comprehensive Markov chain Monte Carlo procedure through Gibbs sampling to estimate cell proportions, gene expression profiles, and signature genes. Simulation and validation studies illustrate that BayICE outperforms existing deconvolution approaches in estimating cell proportions. Subsequently, we demonstrate an application of BayICE in the RNA sequencing of patients with non-small cell lung cancer. The model is implemented in the R package “BayICE” and the algorithm is available for download.

Download Full-text

Genetic Algorithms for Variable Selection and Pattern Recognition

Comprehensive Chemometrics ◽

10.1016/b978-0-12-409547-2.14888-7 ◽

2020 ◽

pp. 673-700

Author(s):

B.K. Lavine ◽

Collin G. White ◽

Charles E. Davidson

Keyword(s):

Pattern Recognition ◽

Genetic Algorithms ◽

Variable Selection

Download Full-text

MicroBVS: Dirichlet-tree multinomial regression models with Bayesian variable selection - an R package

BMC Bioinformatics ◽

10.1186/s12859-020-03640-0 ◽

2020 ◽

Vol 21 (1) ◽

Cited By ~ 1

Author(s):

Matthew D. Koslovsky ◽

Marina Vannucci

Keyword(s):

Variable Selection ◽

Compositional Data ◽

Human Microbiome ◽

R Package ◽

Bayesian Variable Selection ◽

Multinomial Regression ◽

Phylogenetic Structure ◽

Prior Probabilities ◽

Abundance Data ◽

Model Selection Uncertainty

Abstract Background Understanding the relation between the human microbiome and modulating factors, such as diet, may help researchers design intervention strategies that promote and maintain healthy microbial communities. Numerous analytical tools are available to help identify these relations, oftentimes via automated variable selection methods. However, available tools frequently ignore evolutionary relations among microbial taxa, potential relations between modulating factors, as well as model selection uncertainty. Results We present MicroBVS, an R package for Dirichlet-tree multinomial models with Bayesian variable selection, for the identification of covariates associated with microbial taxa abundance data. The underlying Bayesian model accommodates phylogenetic structure in the abundance data and various parameterizations of covariates’ prior probabilities of inclusion. Conclusion While developed to study the human microbiome, our software can be employed in various research applications, where the aim is to generate insights into the relations between a set of covariates and compositional data with or without a known tree-like structure.

Download Full-text