Generalization-Based Acquisition of Training Data for Motor Primitive Learning by Neural Networks

Autonomous robot learning in unstructured environments often faces the problem that the dimensionality of the search space is too large for practical applications. Dimensionality reduction techniques have been developed to address this problem and describe motor skills in low-dimensional latent spaces. Most of these techniques require the availability of a sufficiently large database of example task executions to compute the latent space. However, the generation of many example task executions on a real robot is tedious, and prone to errors and equipment failures. The main result of this paper is a new approach for efficient database gathering by performing a small number of task executions with a real robot and applying statistical generalization, e.g., Gaussian process regression, to generate more data. We have shown in our experiments that the data generated this way can be used for dimensionality reduction with autoencoder neural networks. The resulting latent spaces can be exploited to implement robot learning more efficiently. The proposed approach has been evaluated on the problem of robotic throwing at a target. Simulation and real-world results with a humanoid robot TALOS are provided. They confirm the effectiveness of generalization-based database acquisition and the efficiency of learning in a low-dimensional latent space.

Download Full-text

Extracting Low-Dimensional Latent Structure from Time Series in the Presence of Delays

Neural Computation ◽

10.1162/neco_a_00759 ◽

2015 ◽

Vol 27 (9) ◽

pp. 1825-1856 ◽

Cited By ~ 19

Author(s):

Karthik C. Lakshmanan ◽

Patrick T. Sadtler ◽

Elizabeth C. Tyler-Kabara ◽

Aaron P. Batista ◽

Byron M. Yu

Keyword(s):

Time Series ◽

Time Delay ◽

Motor Cortex ◽

Dimensionality Reduction ◽

Gaussian Process ◽

Latent Variables ◽

Time Delays ◽

High Dimensional ◽

Latent Space ◽

Low Dimensional

Noisy, high-dimensional time series observations can often be described by a set of low-dimensional latent variables. Commonly used methods to extract these latent variables typically assume instantaneous relationships between the latent and observed variables. In many physical systems, changes in the latent variables manifest as changes in the observed variables after time delays. Techniques that do not account for these delays can recover a larger number of latent variables than are present in the system, thereby making the latent representation more difficult to interpret. In this work, we introduce a novel probabilistic technique, time-delay gaussian-process factor analysis (TD-GPFA), that performs dimensionality reduction in the presence of a different time delay between each pair of latent and observed variables. We demonstrate how using a gaussian process to model the evolution of each latent variable allows us to tractably learn these delays over a continuous domain. Additionally, we show how TD-GPFA combines temporal smoothing and dimensionality reduction into a common probabilistic framework. We present an expectation/conditional maximization either (ECME) algorithm to learn the model parameters. Our simulations demonstrate that when time delays are present, TD-GPFA is able to correctly identify these delays and recover the latent space. We then applied TD-GPFA to the activity of tens of neurons recorded simultaneously in the macaque motor cortex during a reaching task. TD-GPFA is able to better describe the neural activity using a more parsimonious latent space than GPFA, a method that has been used to interpret motor cortex data but does not account for time delays. More broadly, TD-GPFA can help to unravel the mechanisms underlying high-dimensional time series data by taking into account physical delays in the system.

Download Full-text

Deciphering protein evolution and fitness landscapes with latent space models

Nature Communications ◽

10.1038/s41467-019-13633-0 ◽

2019 ◽

Vol 10 (1) ◽

Cited By ~ 4

Author(s):

Xinqiang Ding ◽

Zhengting Zou ◽

Charles L. Brooks III

Keyword(s):

Protein Evolution ◽

Fitness Landscape ◽

Dimensional Space ◽

Gaussian Process Regression ◽

Protein Sequences ◽

Fitness Landscapes ◽

Space Representation ◽

Latent Space ◽

Low Dimensional ◽

Latent Space Models

AbstractProtein sequences contain rich information about protein evolution, fitness landscapes, and stability. Here we investigate how latent space models trained using variational auto-encoders can infer these properties from sequences. Using both simulated and real sequences, we show that the low dimensional latent space representation of sequences, calculated using the encoder model, captures both evolutionary and ancestral relationships between sequences. Together with experimental fitness data and Gaussian process regression, the latent space representation also enables learning the protein fitness landscape in a continuous low dimensional space. Moreover, the model is also useful in predicting protein mutational stability landscapes and quantifying the importance of stability in shaping protein evolution. Overall, we illustrate that the latent space models learned using variational auto-encoders provide a mechanism for exploration of the rich data contained in protein sequences regarding evolution, fitness and stability and hence are well-suited to help guide protein engineering efforts.

Download Full-text

Learning Brain Dynamics with Coupled Low-Dimensional Nonlinear Oscillators and Deep Recurrent Networks

Neural Computation ◽

10.1162/neco_a_01401 ◽

2021 ◽

pp. 1-40

Author(s):

Germán Abrevaya ◽

Guillaume Dumas ◽

Aleksandr Y. Aravkin ◽

Peng Zheng ◽

Jean-Christophe Gagnon-Audet ◽

...

Keyword(s):

Neural Networks ◽

Dynamical Systems ◽

Brain Imaging ◽

Recurrent Neural Networks ◽

Practical Importance ◽

Parameters Estimation ◽

Training Data ◽

Autoregressive Models ◽

Unseen Data ◽

Low Dimensional

Abstract Many natural systems, especially biological ones, exhibit complex multivariate nonlinear dynamical behaviors that can be hard to capture by linear autoregressive models. On the other hand, generic nonlinear models such as deep recurrent neural networks often require large amounts of training data, not always available in domains such as brain imaging; also, they often lack interpretability. Domain knowledge about the types of dynamics typically observed in such systems, such as a certain type of dynamical systems models, could complement purely data-driven techniques by providing a good prior. In this work, we consider a class of ordinary differential equation (ODE) models known as van der Pol (VDP) oscil lators and evaluate their ability to capture a low-dimensional representation of neural activity measured by different brain imaging modalities, such as calcium imaging (CaI) and fMRI, in different living organisms: larval zebrafish, rat, and human. We develop a novel and efficient approach to the nontrivial problem of parameters estimation for a network of coupled dynamical systems from multivariate data and demonstrate that the resulting VDP models are both accurate and interpretable, as VDP's coupling matrix reveals anatomically meaningful excitatory and inhibitory interactions across different brain subsystems. VDP outperforms linear autoregressive models (VAR) in terms of both the data fit accuracy and the quality of insight provided by the coupling matrices and often tends to generalize better to unseen data when predicting future brain activity, being comparable to and sometimes better than the recurrent neural networks (LSTMs). Finally, we demonstrate that our (generative) VDP model can also serve as a data-augmentation tool leading to marked improvements in predictive accuracy of recurrent neural networks. Thus, our work contributes to both basic and applied dimensions of neuroimaging: gaining scientific insights and improving brain-based predictive models, an area of potentially high practical importance in clinical diagnosis and neurotechnology.

Download Full-text

Tool-Use Model to Reproduce the Goal Situations Considering Relationship Among Tools, Objects, Actions and Effects Using Multimodal Deep Neural Networks

Frontiers in Robotics and AI ◽

10.3389/frobt.2021.748716 ◽

2021 ◽

Vol 8 ◽

Author(s):

Namiko Saito ◽

Tetsuya Ogata ◽

Hiroki Mori ◽

Shingo Murata ◽

Shigeki Sugano

Keyword(s):

Neural Networks ◽

Real Time ◽

Tool Use ◽

Deep Neural Networks ◽

Joint Angle ◽

Image Force ◽

Training Data ◽

Task Goal ◽

Latent Space ◽

Target Effects

We propose a tool-use model that enables a robot to act toward a provided goal. It is important to consider features of the four factors; tools, objects actions, and effects at the same time because they are related to each other and one factor can influence the others. The tool-use model is constructed with deep neural networks (DNNs) using multimodal sensorimotor data; image, force, and joint angle information. To allow the robot to learn tool-use, we collect training data by controlling the robot to perform various object operations using several tools with multiple actions that leads different effects. Then the tool-use model is thereby trained and learns sensorimotor coordination and acquires relationships among tools, objects, actions and effects in its latent space. We can give the robot a task goal by providing an image showing the target placement and orientation of the object. Using the goal image with the tool-use model, the robot detects the features of tools and objects, and determines how to act to reproduce the target effects automatically. Then the robot generates actions adjusting to the real time situations even though the tools and objects are unknown and more complicated than trained ones.

Download Full-text

Memory Model for Morphological Semantics of Visual Stimuli Using Sparse Distributed Representation

Applied Sciences ◽

10.3390/app112210786 ◽

2021 ◽

Vol 11 (22) ◽

pp. 10786

Author(s):

Kyuchang Kang ◽

Changseok Bae

Keyword(s):

Neural Networks ◽

Visual Stimuli ◽

Training Data ◽

Memory Model ◽

Distributed Representation ◽

Data Sets ◽

Memory Process ◽

Practical Applications ◽

Proposed Model ◽

Continual Learning

Recent achievements on CNN (convolutional neural networks) and DNN (deep neural networks) researches provide a lot of practical applications on computer vision area. However, these approaches require construction of huge size of training data for learning process. This paper tries to find a way for continual learning which does not require prior high-cost training data construction by imitating a biological memory model. We employ SDR (sparse distributed representation) for information processing and semantic memory model, which is known as a representation model of firing patterns on neurons in neocortex area. This paper proposes a novel memory model to reflect remembrance of morphological semantics of visual input stimuli. The proposed memory model considers both memory process and recall process separately. First, memory process converts input visual stimuli to sparse distributed representation, and in this process, morphological semantic of input visual stimuli can be preserved. Next, recall process can be considered by comparing sparse distributed representation of new input visual stimulus and remembered sparse distributed representations. Superposition of sparse distributed representation is used to measure similarities. Experimental results using 10,000 images in MNIST (Modified National Institute of Standards and Technology) and Fashion-MNIST data sets show that the sparse distributed representation of the proposed model efficiently keeps morphological semantic of the input visual stimuli.

Download Full-text

CiwGAN and fiwGAN: Encoding information in acoustic data to model lexical learning with Generative Adversarial Networks

10.31234/osf.io/mwb5u ◽

2020 ◽

Author(s):

Gasper Begus

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Neural Networks ◽

Training Data ◽

Human Speech ◽

Acoustic Data ◽

Unique Information ◽

Latent Space ◽

Lexical Learning ◽

Lexical Items

How can deep neural networks encode information that corresponds to words in human speech into raw acoustic data? This paper proposes two neural network architectures for modeling unsupervised lexical learning from raw acoustic inputs, ciwGAN (Categorical InfoWaveGAN) and fiwGAN (Featural InfoWaveGAN), that combine a Deep Convolutional GAN architecture for audio data (WaveGAN; Donahue et al. 2019) with an information theoretic extension of GAN – InfoGAN (Chen et al., 2016), and propose a new latent space structure that can model featural learning simultaneously with a higher level classification. In addition to the Generator and the Discriminator networks, the architectures introduce a network that learns to retrieve latent codes from generated audio outputs. Lexical learning is thus modeled as emergent from an architecture that forces a deep neural network to output data such that unique information is retrievable from its acoustic outputs. The networks trained on lexical items from TIMIT learn to encode unique information corresponding to lexical items in the form of categorical variables in their latent space. By manipulating these variables, the network outputs specific lexical items. The network occasionally outputs innovative lexical items that violate training data, but are linguistically interpretable and highly informative for cognitive modeling and neural network interpretability. Innovative outputs suggest that phonetic and phonological representations learned by the network can be productively recombined and directly paralleled to productivity in human speech: a fiwGAN network trained on suit and dark outputs innovative start, even though it never saw start or even a [st] sequence in the training data. We also argue that setting latent featural codes to values well beyond training range results in almost categorical generation of prototypical lexical items and reveals underlying values of each latent code. Probing deep neural networks trained on well understood dependencies in speech bear implications for latent space interpretability, understanding how deep neural networks learn meaningful representations, as well as a potential for unsupervised text-to-speech generation in the GAN framework.

Download Full-text

Handling Black Swan Events in Deep Learning with Diversely Extrapolated Neural Networks

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/296 ◽

2020 ◽

Author(s):

Maxime Wabartha ◽

Audrey Durand ◽

Vincent François-Lavet ◽

Joelle Pineau

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Expressive Power ◽

Training Data ◽

Imitation Learning ◽

Learning Problems ◽

Black Swan ◽

Regression Problem ◽

Data Points ◽

Low Dimensional

By virtue of their expressive power, neural networks (NNs) are well suited to fitting large, complex datasets, yet they are also known to produce similar predictions for points outside the training distribution. As such, they are, like humans, under the influence of the Black Swan theory: models tend to be extremely "surprised" by rare events, leading to potentially disastrous consequences, while justifying these same events in hindsight. To avoid this pitfall, we introduce DENN, an ensemble approach building a set of Diversely Extrapolated Neural Networks that fits the training data and is able to generalize more diversely when extrapolating to novel data points. This leads DENN to output highly uncertain predictions for unexpected inputs. We achieve this by adding a diversity term in the loss function used to train the model, computed at specific inputs. We first illustrate the usefulness of the method on a low-dimensional regression problem. Then, we show how the loss can be adapted to tackle anomaly detection during classification, as well as safe imitation learning problems.

Download Full-text

Learning Low-Dimensional Embeddings of Audio Shingles for Cross-Version Retrieval of Classical Music

Applied Sciences ◽

10.3390/app10010019 ◽

2019 ◽

Vol 10 (1) ◽

pp. 19 ◽

Cited By ~ 1

Author(s):

Frank Zalkow ◽

Meinard Müller

Keyword(s):

Neural Networks ◽

Dimensionality Reduction ◽

Classical Music ◽

Nearest Neighbor ◽

Nearest Neighbor Search ◽

Neighbor Search ◽

Reduction Methods ◽

Retrieval Problem ◽

Low Dimensional ◽

Western Classical Music

Cross-version music retrieval aims at identifying all versions of a given piece of music using a short query audio fragment. One previous approach, which is particularly suited for Western classical music, is based on a nearest neighbor search using short sequences of chroma features, also referred to as audio shingles. From the viewpoint of efficiency, indexing and dimensionality reduction are important aspects. In this paper, we extend previous work by adapting two embedding techniques; one is based on classical principle component analysis, and the other is based on neural networks with triplet loss. Furthermore, we report on systematically conducted experiments with Western classical music recordings and discuss the trade-off between retrieval quality and embedding dimensionality. As one main result, we show that, using neural networks, one can reduce the audio shingles from 240 to fewer than 8 dimensions with only a moderate loss in retrieval accuracy. In addition, we present extended experiments with databases of different sizes and different query lengths to test the scalability and generalizability of the dimensionality reduction methods. We also provide a more detailed view into the retrieval problem by analyzing the distances that appear in the nearest neighbor search.

Download Full-text

Fully Learnable Model for Task-Driven Image Compressed Sensing

Sensors ◽

10.3390/s21144662 ◽

2021 ◽

Vol 21 (14) ◽

pp. 4662

Author(s):

Bowen Zheng ◽

Jianping Zhang ◽

Guiling Sun ◽

Xiangnan Ren

Keyword(s):

Neural Networks ◽

Compressed Sensing ◽

Generative Adversarial Networks ◽

Measurement Matrix ◽

Dimensional Representation ◽

Image Sensing ◽

Adversarial Networks ◽

Latent Space ◽

Sensing Model ◽

Low Dimensional

This study primarily investigates image sensing at low sampling rates with convolutional neural networks (CNN) for specific applications. To improve the image acquisition efficiency in energy-limited systems, this study, inspired by compressed sensing, proposes a fully learnable model for task-driven image-compressed sensing (FLCS). The FLCS, based on Deep Convolution Generative Adversarial Networks (DCGAN) and Variational Auto-encoder (VAE), divides the image-compressed sensing model into three learnable parts, i.e., the Sampler, the Solver and the Rebuilder. To be specific, a measurement matrix suitable for a type of image is obtained by training the Sampler. The Solver calculates the image’s low-dimensional representation with the measurements. The Rebuilder learns a mapping from the low-dimensional latent space to the image space. All the mentioned could be trained jointly or individually for a range of application scenarios. The pre-trained FLCS reconstructs images with few iterations for task-driven compressed sensing. As indicated from the experimental results, compared with existing approaches, the proposed method could significantly improve the reconstructed images’ quality while decreasing the running time. This study is of great significance for the application of image-compressed sensing at low sampling rates.

Download Full-text

Whole Heart Segmentation Using 3D FM-Pre-ResNet Encoder–Decoder Based Architecture with Variational Autoencoder Regularization

Applied Sciences ◽

10.3390/app11093912 ◽

2021 ◽

Vol 11 (9) ◽

pp. 3912

Author(s):

Marija Habijan ◽

Irena Galić ◽

Hrvoje Leventić ◽

Krešimir Romić

Keyword(s):

Three Dimensional ◽

Disease Diagnosis ◽

Input Image ◽

Medical Image Segmentation ◽

Training Data ◽

Treatment Manual ◽

Latent Space ◽

Variational Autoencoder ◽

Low Dimensional ◽

Whole Heart

An accurate whole heart segmentation (WHS) on medical images, including computed tomography (CT) and magnetic resonance (MR) images, plays a crucial role in many clinical applications, such as cardiovascular disease diagnosis, pre-surgical planning, and intraoperative treatment. Manual whole-heart segmentation is a time-consuming process, prone to subjectivity and error. Therefore, there is a need to develop a quick, automatic, and accurate whole heart segmentation systems. Nowadays, convolutional neural networks (CNNs) emerged as a robust approach for medical image segmentation. In this paper, we first introduce a novel connectivity structure of residual unit that we refer to as a feature merge residual unit (FM-Pre-ResNet). The proposed connectivity allows the creation of distinctly deep models without an increase in the number of parameters compared to the pre-activation residual units. Second, we propose a three-dimensional (3D) encoder–decoder based architecture that successfully incorporates FM-Pre-ResNet units and variational autoencoder (VAE). In an encoding stage, FM-Pre-ResNet units are used for learning a low-dimensional representation of the input. After that, the variational autoencoder (VAE) reconstructs the input image from the low-dimensional latent space to provide a strong regularization of all model weights, simultaneously preventing overfitting on the training data. Finally, the decoding stage creates the final whole heart segmentation. We evaluate our method on the 40 test subjects of the MICCAI Multi-Modality Whole Heart Segmentation (MM-WHS) Challenge. The average dice values of whole heart segmentation are 90.39% (CT images) and 89.50% (MRI images), which are both highly comparable to the state-of-the-art.

Download Full-text