scholarly journals GALGO: an R package for multivariate variable selection using genetic algorithms

2006 ◽  
Vol 22 (9) ◽  
pp. 1154-1156 ◽  
Author(s):  
Victor Trevino ◽  
Francesco Falciani
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Matthew D. Koslovsky ◽  
Marina Vannucci

An amendment to this paper has been published and can be accessed via the original article.


2010 ◽  
Vol 29 (8) ◽  
pp. 728-750 ◽  
Author(s):  
Isolina Alberto ◽  
Asunción Beamonte ◽  
Pilar Gargallo ◽  
Pedro M. Mateo ◽  
Manuel Salvador

Author(s):  
Marcos Gestal Pose ◽  
Alberto Cancela Carollo ◽  
José Manuel Andrade Garda ◽  
Mari Paz Gomez-Carracedo

This chapter shows several approaches to determine how the most relevant subset of variables can perform a classification task. It will permit the improvement and efficiency of the classification model. A particular technique of evolutionary computation, the genetic algorithms, is applied which aim to obtain a general method of variable selection where only the fitness function will be dependent on the particular problem. The solution proposed is applied and tested on a practical case in the field of analytical chemistry to classify apple beverages.


Author(s):  
Oliver M. Crook ◽  
Laurent Gatto ◽  
Paul D. W. Kirk

Abstract The Dirichlet Process (DP) mixture model has become a popular choice for model-based clustering, largely because it allows the number of clusters to be inferred. The sequential updating and greedy search (SUGS) algorithm (Wang & Dunson, 2011) was proposed as a fast method for performing approximate Bayesian inference in DP mixture models, by posing clustering as a Bayesian model selection (BMS) problem and avoiding the use of computationally costly Markov chain Monte Carlo methods. Here we consider how this approach may be extended to permit variable selection for clustering, and also demonstrate the benefits of Bayesian model averaging (BMA) in place of BMS. Through an array of simulation examples and well-studied examples from cancer transcriptomics, we show that our method performs competitively with the current state-of-the-art, while also offering computational benefits. We apply our approach to reverse-phase protein array (RPPA) data from The Cancer Genome Atlas (TCGA) in order to perform a pan-cancer proteomic characterisation of 5157 tumour samples. We have implemented our approach, together with the original SUGS algorithm, in an open-source R package named sugsvarsel, which accelerates analysis by performing intensive computations in C++ and provides automated parallel processing. The R package is freely available from: https://github.com/ococrook/sugsvarsel


2007 ◽  
Vol 389 (7-8) ◽  
pp. 2331-2342 ◽  
Author(s):  
M. P. Gómez-Carracedo ◽  
M. Gestal ◽  
J. Dorado ◽  
J. M. Andrade

2019 ◽  
Author(s):  
An-Shun Tai ◽  
George C. Tseng ◽  
Wen-Ping Hsieh

AbstractGene expression deconvolution is a powerful tool for exploring the microenvironment of complex tissues comprised of multiple cell groups using transcriptomic data. Characterizing cell activities for a particular condition has been regarded as a primary mission against diseases. For example, cancer immunology aims to clarify the role of the immune system in the progression and development of cancer through analyzing the immune cell components of tumors. To that end, many deconvolution methods have been proposed for inferring cell subpopulations within tissues. Nevertheless, two problems limit the practicality of current approaches. First, all approaches use external purified data to preselect cell type-specific genes that contribute to deconvolution. However, some types of cells cannot be found in purified profiles and the genes specifically over- or under-expressed in them cannot be identified. This is particularly a problem in cancer studies. Hence, a preselection strategy that is independent from deconvolution is inappropriate. The second problem is that existing approaches do not recover the expression profiles of unknown cells present in bulk tissues, which results in biased estimation of unknown cell proportions. Furthermore, it causes the shift-invariant property of deconvolution to fail, which then affects the estimation performance. To address these two problems, we propose a novel deconvolution approach, BayICE, which employs hierarchical Bayesian modeling with stochastic search variable selection. We develop a comprehensive Markov chain Monte Carlo procedure through Gibbs sampling to estimate cell proportions, gene expression profiles, and signature genes. Simulation and validation studies illustrate that BayICE outperforms existing deconvolution approaches in estimating cell proportions. Subsequently, we demonstrate an application of BayICE in the RNA sequencing of patients with non-small cell lung cancer. The model is implemented in the R package “BayICE” and the algorithm is available for download.


2020 ◽  
pp. 673-700
Author(s):  
B.K. Lavine ◽  
Collin G. White ◽  
Charles E. Davidson

2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Matthew D. Koslovsky ◽  
Marina Vannucci

Abstract Background Understanding the relation between the human microbiome and modulating factors, such as diet, may help researchers design intervention strategies that promote and maintain healthy microbial communities. Numerous analytical tools are available to help identify these relations, oftentimes via automated variable selection methods. However, available tools frequently ignore evolutionary relations among microbial taxa, potential relations between modulating factors, as well as model selection uncertainty. Results We present MicroBVS, an R package for Dirichlet-tree multinomial models with Bayesian variable selection, for the identification of covariates associated with microbial taxa abundance data. The underlying Bayesian model accommodates phylogenetic structure in the abundance data and various parameterizations of covariates’ prior probabilities of inclusion. Conclusion While developed to study the human microbiome, our software can be employed in various research applications, where the aim is to generate insights into the relations between a set of covariates and compositional data with or without a known tree-like structure.


Sign in / Sign up

Export Citation Format

Share Document