Bayesian network feature finder (BANFF): an R package for gene network feature selection: Table 1.

In this article, we aim to develop a unified view of causal and non-causal feature selection methods. The unified view will fill in the gap in the research of the relation between the two types of methods. Based on the Bayesian network framework and information theory, we first show that causal and non-causal feature selection methods share the same objective. That is to find the Markov blanket of a class attribute, the theoretically optimal feature set for classification. We then examine the assumptions made by causal and non-causal feature selection methods when searching for the optimal feature set, and unify the assumptions by mapping them to the restrictions on the structure of the Bayesian network model of the studied problem. We further analyze in detail how the structural assumptions lead to the different levels of approximations employed by the methods in their search, which then result in the approximations in the feature sets found by the methods with respect to the optimal feature set. With the unified view, we can interpret the output of non-causal methods from a causal perspective and derive the error bounds of both types of methods. Finally, we present practical understanding of the relation between causal and non-causal methods using extensive experiments with synthetic data and various types of real-world data.

Download Full-text

Gene Network Learning Using Regulated Dynamic Bayesian Network Methods

2008 Seventh International Conference on Machine Learning and Applications ◽

10.1109/icmla.2008.119 ◽

2008 ◽

Author(s):

Xiaotong Lin ◽

Xue-wen Chen

Keyword(s):

Bayesian Network ◽

Gene Network ◽

Dynamic Bayesian Network ◽

Network Learning ◽

Network Methods

Download Full-text

modelBuildR: an R package for model building and feature selection with erroneous classifications

PeerJ ◽

10.7717/peerj.10849 ◽

2021 ◽

Vol 9 ◽

pp. e10849

Author(s):

Maximilian Knoll ◽

Jennifer Furkel ◽

Juergen Debus ◽

Amir Abdollahi

Keyword(s):

Feature Selection ◽

Cross Validation ◽

Model Building ◽

Linear Models ◽

Binary Classification ◽

Ground Truth ◽

R Package ◽

Methylation Array ◽

Survival Difference ◽

Error Probabilities

Background Model building is a crucial part of omics based biomedical research to transfer classifications and obtain insights into underlying mechanisms. Feature selection is often based on minimizing error between model predictions and given classification (maximizing accuracy). Human ratings/classifications, however, might be error prone, with discordance rates between experts of 5–15%. We therefore evaluate if a feature pre-filtering step might improve identification of features associated with true underlying groups. Methods Data was simulated for up to 100 samples and up to 10,000 features, 10% of which were associated with the ground truth comprising 2–10 normally distributed populations. Binary and semi-quantitative ratings with varying error probabilities were used as classification. For feature preselection standard cross-validation (V2) was compared to a novel heuristic (V1) applying univariate testing, multiplicity adjustment and cross-validation on switched dependent (classification) and independent (features) variables. Preselected features were used to train logistic regression/linear models (backward selection, AIC). Predictions were compared against the ground truth (ROC, multiclass-ROC). As use case, multiple feature selection/classification methods were benchmarked against the novel heuristic to identify prognostically different G-CIMP negative glioblastoma tumors from the TCGA-GBM 450 k methylation array data cohort, starting from a fuzzy umap based rough and erroneous separation. Results V1 yielded higher median AUC ranks for two true groups (ground truth), with smaller differences for true graduated differences (3–10 groups). Lower fractions of models were successfully fit with V1. Median AUCs for binary classification and two true groups were 0.91 (range: 0.54–1.00) for V1 (Benjamini-Hochberg) and 0.70 (0.28–1.00) for V2, 13% (n = 616) of V2 models showed AUCs < = 50% for 25 samples and 100 features. For larger numbers of features and samples, median AUCs were 0.75 (range 0.59–1.00) for V1 and 0.54 (range 0.32–0.75) for V2. In the TCGA-GBM data, modelBuildR allowed best prognostic separation of patients with highest median overall survival difference (7.51 months) followed a difference of 6.04 months for a random forest based method. Conclusions The proposed heuristic is beneficial for the retrieval of features associated with two true groups classified with errors. We provide the R package modelBuildR to simplify (comparative) evaluation/application of the proposed heuristic (http://github.com/mknoll/modelBuildR).

Download Full-text

Combining Clustering and Bayesian Network for Gene Network Inference

2008 Eighth International Conference on Intelligent Systems Design and Applications ◽

10.1109/isda.2008.183 ◽

2008 ◽

Cited By ~ 3

Author(s):

Suhaila Zainudin ◽

Safaai Deris

Keyword(s):

Bayesian Network ◽

Gene Network ◽

Network Inference ◽

Gene Network Inference

Download Full-text

DiffGraph: an R package for identifying gene network rewiring using differential graphical models

Bioinformatics ◽

10.1093/bioinformatics/btx836 ◽

2017 ◽

Vol 34 (9) ◽

pp. 1571-1573 ◽

Cited By ~ 4

Author(s):

Xiao-Fei Zhang ◽

Le Ou-Yang ◽

Shuo Yang ◽

Xiaohua Hu ◽

Hong Yan

Keyword(s):

Graphical Models ◽

Gene Network ◽

R Package ◽

Network Rewiring

Download Full-text

Thinning a Triangulation of a Bayesian Network or Undirected Graph to Create a Minimal Triangulation

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488517500143 ◽

2017 ◽

Vol 25 (03) ◽

pp. 1750014

Author(s):

Edmund Jones ◽

Vanessa Didelez

Keyword(s):

Bayesian Network ◽

Graphical Model ◽

Undirected Graph ◽

Computer Experiment ◽

R Package ◽

The Other ◽

Prime Decomposition ◽

Original Graph ◽

Minimal Triangulation

In one procedure for finding the maximal prime decomposition of a Bayesian network or undirected graphical model, the first step is to create a minimal triangulation of the network, and a common and straightforward way to do this is to create a triangulation that is not necessarily minimal and then thin this triangulation by removing excess edges. We show that the algorithm for thinning proposed in several previous publications is incorrect. A different version of this algorithm is available in the R package gRbase, but its correctness has not previously been proved. We prove that this version is correct and provide a simpler version, also with a proof. We compare the speed of the two corrected algorithms in three ways and find that asymptotically their speeds are the same, neither algorithm is consistently faster than the other, and in a computer experiment the algorithm used by gRbase is faster when the original graph is large, dense, and undirected, but usually slightly slower when it is directed.

Download Full-text

Use of the circle segments visualization technique for neural network feature selection and analysis

Neurocomputing ◽

10.1016/j.neucom.2009.06.018 ◽

2010 ◽

Vol 73 (4-6) ◽

pp. 613-621 ◽

Cited By ~ 2

Author(s):

C.P. Lim ◽

S.L. Wang ◽

K.S. Tan ◽

J. Navarro ◽

L.C. Jain

Keyword(s):

Neural Network ◽

Feature Selection ◽

Visualization Technique ◽

Network Feature

Download Full-text

corto: a lightweight R package for Gene Network Inference and Master Regulator Analysis

10.1101/2020.02.10.942623 ◽

2020 ◽

Cited By ~ 1

Author(s):

Daniele Mercatelli ◽

Gonzalo Lopez-Garcia ◽

Federico M. Giorgi

Keyword(s):

Gene Expression ◽

Gene Networks ◽

Gene Network ◽

Network Inference ◽

Human Tumor ◽

R Package ◽

Specific Gene ◽

Master Regulator ◽

Gene Network Inference ◽

Link Type

AbstractMotivationGene Network Inference and Master Regulator Analysis (MRA) have been widely adopted to define specific transcriptional perturbations from gene expression signatures. Several tools exist to perform such analyses, but most require a computer cluster or large amounts of RAM to be executed.ResultsWe developed corto, a fast and lightweight R package to infer gene networks and perform MRA from gene expression data, with optional corrections for Copy Number Variations (CNVs) and able to run on signatures generated from RNA-Seq or ATAC-Seq data. We extensively benchmarked it to infer context-specific gene networks in 39 human tumor and 27 normal tissue datasets.AvailabilityCross-platform and multi-threaded R package on CRAN (stable version) https://cran.rproject.org/package=corto and Github (development release) https://github.com/federicogiorgi/[email protected]

Download Full-text

M3Drop: dropout-based feature selection for scRNASeq

Bioinformatics ◽

10.1093/bioinformatics/bty1044 ◽

2018 ◽

Vol 35 (16) ◽

pp. 2865-2867 ◽

Cited By ~ 61

Author(s):

Tallulah S Andrews ◽

Martin Hemberg

Keyword(s):

Feature Selection ◽

Single Cell ◽

R Package ◽

Supplementary Information ◽

Supplementary Data ◽

Selection Methods ◽

Functional Responses ◽

Technical Noise ◽

New Methods ◽

Selection For

Abstract Motivation Most genomes contain thousands of genes, but for most functional responses, only a subset of those genes are relevant. To facilitate many single-cell RNASeq (scRNASeq) analyses the set of genes is often reduced through feature selection, i.e. by removing genes only subject to technical noise. Results We present M3Drop, an R package that implements popular existing feature selection methods and two novel methods which take advantage of the prevalence of zeros (dropouts) in scRNASeq data to identify features. We show these new methods outperform existing methods on simulated and real datasets. Availability and implementation M3Drop is freely available on github as an R package and is compatible with other popular scRNASeq tools: https://github.com/tallulandrews/M3Drop. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text