Random Forests with Latent Variables to Foster Feature Selection in the Context of Highly Correlated Variables. Illustration with a Bioinformatics Application.

The principles of neural encoding and computations are inherently collective and usually involve large populations of interacting neurons with highly correlated activities. While theories of neural function have long recognized the importance of collective effects in populations of neurons, only in the past two decades has it become possible to record from many cells simultaneously using advanced experimental techniques with single-spike resolution and to relate these correlations to function and behavior. This review focuses on the modeling and inference approaches that have been recently developed to describe the correlated spiking activity of populations of neurons. We cover a variety of models describing correlations between pairs of neurons, as well as between larger groups, synchronous or delayed in time, with or without the explicit influence of the stimulus, and including or not latent variables. We discuss the advantages and drawbacks or each method, as well as the computational challenges related to their application to recordings of ever larger populations.

Download Full-text

On the Interpretability of Machine Learning Models and Experimental Feature Selection in Case of Multicollinear Data

Electronics ◽

10.3390/electronics9050761 ◽

2020 ◽

Vol 9 (5) ◽

pp. 761

Author(s):

Franc Drobnič ◽

Andrej Kos ◽

Matevž Pustišek

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forests ◽

Experimental Approach ◽

Feature Selection Method ◽

Model Quality ◽

Human In The Loop ◽

Model Interpretation ◽

Small Feature ◽

Original Feature

In the field of machine learning, a considerable amount of research is involved in the interpretability of models and their decisions. The interpretability contradicts the model quality. Random Forests are among the best quality technologies of machine learning, but their operation is of “black box” character. Among the quantifiable approaches to the model interpretation, there are measures of association of predictors and response. In case of the Random Forests, this approach usually consists of calculating the model’s feature importances. Known methods, including the built-in one, are less suitable in settings with strong multicollinearity of features. Therefore, we propose an experimental approach to the feature selection task, a greedy forward feature selection method with least-trees-used criterion. It yields a set of most informative features that can be used in a machine learning (ML) training process with similar prediction quality as the original feature set. We verify the results of the proposed method on two known datasets, one with small feature multicollinearity and another with large feature multicollinearity. The proposed method also allows for a domain expert help with selecting among equally important features, which is known as the human-in-the-loop approach.

Download Full-text

Hybrid biogeography based simultaneous feature selection and MHC class I peptide binding prediction using support vector machines and random forests

Journal of Immunological Methods ◽

10.1016/j.jim.2012.09.013 ◽

2013 ◽

Vol 387 (1-2) ◽

pp. 284-292 ◽

Cited By ~ 8

Author(s):

Atulji Srivastava ◽

Shameek Ghosh ◽

N. Anantharaman ◽

V.K. Jayaraman

Keyword(s):

Feature Selection ◽

Support Vector Machines ◽

Mhc Class I ◽

Random Forests ◽

Peptide Binding ◽

Class I ◽

Support Vector ◽

Binding Prediction ◽

Vector Machines ◽

Peptide Binding Prediction

Download Full-text