Multimodal Representation Learning for Place Recognition Using Deep Hebbian Predictive Coding

Frontiers in Robotics and AI ◽

10.3389/frobt.2021.732023 ◽

2021 ◽

Vol 8 ◽

Author(s):

Martin J. Pearson ◽

Shirin Dora ◽

Oliver Struckmeier ◽

Thomas C. Knowles ◽

Ben Mitchinson ◽

...

Keyword(s):

Recognition Performance ◽

Predictive Coding ◽

Learning Rule ◽

Representation Learning ◽

Drop Out ◽

Place Recognition ◽

Data Registration ◽

Learning Techniques ◽

Biomimetic Robot ◽

Latent Representations

Recognising familiar places is a competence required in many engineering applications that interact with the real world such as robot navigation. Combining information from different sensory sources promotes robustness and accuracy of place recognition. However, mismatch in data registration, dimensionality, and timing between modalities remain challenging problems in multisensory place recognition. Spurious data generated by sensor drop-out in multisensory environments is particularly problematic and often resolved through adhoc and brittle solutions. An effective approach to these problems is demonstrated by animals as they gracefully move through the world. Therefore, we take a neuro-ethological approach by adopting self-supervised representation learning based on a neuroscientific model of visual cortex known as predictive coding. We demonstrate how this parsimonious network algorithm which is trained using a local learning rule can be extended to combine visual and tactile sensory cues from a biomimetic robot as it naturally explores a visually aliased environment. The place recognition performance obtained using joint latent representations generated by the network is significantly better than contemporary representation learning techniques. Further, we see evidence of improved robustness at place recognition in face of unimodal sensor drop-out. The proposed multimodal deep predictive coding algorithm presented is also linearly extensible to accommodate more than two sensory modalities, thereby providing an intriguing example of the value of neuro-biologically plausible representation learning for multimodal navigation.

Download Full-text

Deep Gated Hebbian Predictive Coding Accounts for Emergence of Complex Neural Response Properties Along the Visual Cortical Hierarchy

Frontiers in Computational Neuroscience ◽

10.3389/fncom.2021.666131 ◽

2021 ◽

Vol 15 ◽

Author(s):

Shirin Dora ◽

Sander M. Bohte ◽

Cyriel M. A. Pennartz

Keyword(s):

Network Architecture ◽

Hebbian Learning ◽

Predictive Coding ◽

Learning Rule ◽

Neuronal Populations ◽

Hebbian Learning Rule ◽

Cortical Hierarchy ◽

Latent Representations ◽

High Level ◽

Visual Cortical

Predictive coding provides a computational paradigm for modeling perceptual processing as the construction of representations accounting for causes of sensory inputs. Here, we developed a scalable, deep network architecture for predictive coding that is trained using a gated Hebbian learning rule and mimics the feedforward and feedback connectivity of the cortex. After training on image datasets, the models formed latent representations in higher areas that allowed reconstruction of the original images. We analyzed low- and high-level properties such as orientation selectivity, object selectivity and sparseness of neuronal populations in the model. As reported experimentally, image selectivity increased systematically across ascending areas in the model hierarchy. Depending on the strength of regularization factors, sparseness also increased from lower to higher areas. The results suggest a rationale as to why experimental results on sparseness across the cortical hierarchy have been inconsistent. Finally, representations for different object classes became more distinguishable from lower to higher areas. Thus, deep neural networks trained using a gated Hebbian formulation of predictive coding can reproduce several properties associated with neuronal responses along the visual cortical hierarchy.

Download Full-text

Improving Visual Place Recognition Performance by Maximising Complementarity

IEEE Robotics and Automation Letters ◽

10.1109/lra.2021.3088779 ◽

2021 ◽

Vol 6 (3) ◽

pp. 5976-5983

Author(s):

Maria Waheed ◽

Michael Milford ◽

Klaus McDonald-Maier ◽

Shoaib Ehsan

Keyword(s):

Recognition Performance ◽

Place Recognition ◽

Visual Place Recognition

Download Full-text

Representation Learning for Fine-Grained Change Detection

Sensors ◽

10.3390/s21134486 ◽

2021 ◽

Vol 21 (13) ◽

pp. 4486

Author(s):

Niall O’Mahony ◽

Sean Campbell ◽

Lenka Krpalkova ◽

Anderson Carvalho ◽

Joseph Walsh ◽

...

Keyword(s):

Deep Learning ◽

Change Detection ◽

Model Calibration ◽

State Of The Art ◽

Representation Learning ◽

Machine Intelligence ◽

The State ◽

Sensor Data ◽

Fine Grained ◽

Learning Techniques

Fine-grained change detection in sensor data is very challenging for artificial intelligence though it is critically important in practice. It is the process of identifying differences in the state of an object or phenomenon where the differences are class-specific and are difficult to generalise. As a result, many recent technologies that leverage big data and deep learning struggle with this task. This review focuses on the state-of-the-art methods, applications, and challenges of representation learning for fine-grained change detection. Our research focuses on methods of harnessing the latent metric space of representation learning techniques as an interim output for hybrid human-machine intelligence. We review methods for transforming and projecting embedding space such that significant changes can be communicated more effectively and a more comprehensive interpretation of underlying relationships in sensor data is facilitated. We conduct this research in our work towards developing a method for aligning the axes of latent embedding space with meaningful real-world metrics so that the reasoning behind the detection of change in relation to past observations may be revealed and adjusted. This is an important topic in many fields concerned with producing more meaningful and explainable outputs from deep learning and also for providing means for knowledge injection and model calibration in order to maintain user confidence.

Download Full-text

Clustering-Based Relational Unsupervised Representation Learning with an Explicit Distributed Representation

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/226 ◽

2017 ◽

Cited By ~ 2

Author(s):

Sebastijan Dumancic ◽

Hendrik Blockeel

Keyword(s):

Relational Learning ◽

Representation Learning ◽

Relational Data ◽

Distributed Representation ◽

Learning Tasks ◽

Wide Range ◽

Lower Complexity ◽

Classification Tasks ◽

Latent Representations

The goal of unsupervised representation learning is to extract a new representation of data, such that solving many different tasks becomes easier. Existing methods typically focus on vectorized data and offer little support for relational data, which additionally describes relationships among instances. In this work we introduce an approach for relational unsupervised representation learning. Viewing a relational dataset as a hypergraph, new features are obtained by clustering vertices and hyperedges. To find a representation suited for many relational learning tasks, a wide range of similarities between relational objects is considered, e.g. feature and structural similarities. We experimentally evaluate the proposed approach and show that models learned on such latent representations perform better, have lower complexity, and outperform the existing approaches on classification tasks.

Download Full-text

CityLearn: Diverse Real-World Environments for Sample-Efficient Navigation Policy Learning

10.36227/techrxiv.12063582.v1 ◽

2020 ◽

Author(s):

Marvin Chancán

Keyword(s):

Real World ◽

Autonomous Driving ◽

Visual Navigation ◽

Place Recognition ◽

Visual Appearance ◽

Learning Techniques ◽

Benchmark Datasets ◽

Image Representations ◽

Visual Place Recognition ◽

Self Motion

<div>Visual navigation tasks in real-world environments often require both self-motion and place recognition feedback. While deep reinforcement learning has shown success in solving these perception and decision-making problems in an end-to-end manner, these algorithms require large amounts of experience to learn navigation policies from high-dimensional data, which is generally impractical for real robots due to sample complexity. In this paper, we address these problems with two main contributions. We first leverage place recognition and deep learning techniques combined with goal destination feedback to generate compact, bimodal image representations that can then be used to effectively learn control policies from a small amount of experience. Second, we present an interactive framework, CityLearn, that enables for the first time training and deployment of navigation algorithms across city-sized, realistic environments with extreme visual appearance changes. CityLearn features more than 10 benchmark datasets, often used in visual place recognition and autonomous driving research, including over 100 recorded traversals across 60 cities around the world. We evaluate our approach on two CityLearn environments, training our navigation policy on a single traversal. Results show our method can be over 2 orders of magnitude faster than when using raw images, and can also generalize across extreme visual changes including day to night and summer to winter transitions.</div>

Download Full-text

Exponential Family Graph Embeddings

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5737 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3357-3364

Author(s):

Abdulkadir Celikkanat ◽

Fragkiskos D. Malliaros

Keyword(s):

Random Walk ◽

Exponential Family ◽

Representation Learning ◽

Learning Problems ◽

Interaction Patterns ◽

Network Representation ◽

Learning Tasks ◽

Learning Techniques ◽

Real World Datasets ◽

Low Dimensional

Representing networks in a low dimensional latent space is a crucial task with many interesting applications in graph learning problems, such as link prediction and node classification. A widely applied network representation learning paradigm is based on the combination of random walks for sampling context nodes and the traditional Skip-Gram model to capture center-context node relationships. In this paper, we emphasize on exponential family distributions to capture rich interaction patterns between nodes in random walk sequences. We introduce the generic exponential family graph embedding model, that generalizes random walk-based network representation learning techniques to exponential family conditional distributions. We study three particular instances of this model, analyzing their properties and showing their relationship to existing unsupervised learning models. Our experimental evaluation on real-world datasets demonstrates that the proposed techniques outperform well-known baseline methods in two downstream machine learning tasks.

Download Full-text

Multi-Label Classification with Label Graph Superimposing

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6909 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12265-12272

Author(s):

Ya Wang ◽

Dongliang He ◽

Fu Li ◽

Xiang Long ◽

Zhichao Zhou ◽

...

Keyword(s):

Recognition Performance ◽

Rapid Development ◽

Feature Learning ◽

Representation Learning ◽

Learning Technologies ◽

Label Graph ◽

Multiple Objects ◽

Label System ◽

Deep Layers ◽

Label Correlations

Images or videos always contain multiple objects or actions. Multi-label recognition has been witnessed to achieve pretty performance attribute to the rapid development of deep learning technologies. Recently, graph convolution network (GCN) is leveraged to boost the performance of multi-label recognition. However, what is the best way for label correlation modeling and how feature learning can be improved with label system awareness are still unclear. In this paper, we propose a label graph superimposing framework to improve the conventional GCN+CNN framework developed for multi-label recognition in the following two aspects. Firstly, we model the label correlations by superimposing label graph built from statistical co-occurrence information into the graph constructed from knowledge priors of labels, and then multi-layer graph convolutions are applied on the final superimposed graph for label embedding abstraction. Secondly, we propose to leverage embedding of the whole label system for better representation learning. In detail, lateral connections between GCN and CNN are added at shallow, middle and deep layers to inject information of label system into backbone CNN for label-awareness in the feature learning process. Extensive experiments are carried out on MS-COCO and Charades datasets, showing that our proposed solution can greatly improve the recognition performance and achieves new state-of-the-art recognition performance.

Download Full-text

A Cross-Correlated Delay Shift Supervised Learning Method for Spiking Neurons with Application to Interictal Spike Detection in Epilepsy

International Journal of Neural Systems ◽

10.1142/s0129065717500022 ◽

2017 ◽

Vol 27 (03) ◽

pp. 1750002 ◽

Cited By ~ 24

Author(s):

Lilin Guo ◽

Zhenzhong Wang ◽

Mercedes Cabrerizo ◽

Malek Adjouadi

Keyword(s):

Adaptive Learning ◽

Learning Algorithm ◽

Recognition Performance ◽

Learning Rule ◽

Classification Performance ◽

Spiking Neurons ◽

Learning Performance ◽

Learning Method ◽

Neuron Models ◽

Interictal Spike

This study introduces a novel learning algorithm for spiking neurons, called CCDS, which is able to learn and reproduce arbitrary spike patterns in a supervised fashion allowing the processing of spatiotemporal information encoded in the precise timing of spikes. Unlike the Remote Supervised Method (ReSuMe), synapse delays and axonal delays in CCDS are variants which are modulated together with weights during learning. The CCDS rule is both biologically plausible and computationally efficient. The properties of this learning rule are investigated extensively through experimental evaluations in terms of reliability, adaptive learning performance, generality to different neuron models, learning in the presence of noise, effects of its learning parameters and classification performance. Results presented show that the CCDS learning method achieves learning accuracy and learning speed comparable with ReSuMe, but improves classification accuracy when compared to both the Spike Pattern Association Neuron (SPAN) learning rule and the Tempotron learning rule. The merit of CCDS rule is further validated on a practical example involving the automated detection of interictal spikes in EEG records of patients with epilepsy. Results again show that with proper encoding, the CCDS rule achieves good recognition performance.

Download Full-text

Deep Regressor: Cross Subject Academic Performance Prediction System for University Level Students

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k1254.09811s19 ◽

2019 ◽

Vol 8 (11S) ◽

pp. 1265-1267

Keyword(s):

Machine Learning ◽

Academic Performance ◽

Evaluation Process ◽

Education Institution ◽

Drop Out ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Current Program ◽

The Right ◽

The University

Predicting the academic performance of students has been an important research topic in the Educational field. The main aim of a higher education institution is to provide quality education for students. One way to accomplish a higher level of quality of education is by predicting student’s academic performance and there by taking earlyre- medial actions to improve the same. This paper presents a system which utilizes machine learning techniques to classify and predict the academic performance of the students at the right time before the drop out occurs. The system first accepts the performance parameters of the basic level courses which the student had already passed as these parameters also influence the further study. To pre- dict the performance of the current program, the system continuously accepts the academic performance parame- ters after each academic evaluation process. The system employs machine learning techniques to study the aca- demic performance of the students after each evaluation process. The system also learns the basic rules followed by the University for assessing the students. Based on the present performance of the students, the system classifies the students into different levels and identify the students at high risk. Earlier prediction can help the students to adopt suitable measures in advance to improve the per for- man ce. The systems can also identify the factor saffecting the performance of the same students which helps them to take remedial measures in advance.

Download Full-text

Spatial position constraint for unsupervised learning of speech representations

PeerJ Computer Science ◽

10.7717/peerj-cs.650 ◽

2021 ◽

Vol 7 ◽

pp. e650

Author(s):

Mohammad Ali Humayun ◽

Hayati Yassin ◽

Pg Emeroylariffion Abas

Keyword(s):

Speech Processing ◽

Supervised Classification ◽

Representation Learning ◽

Keyword Spotting ◽

Learning Techniques ◽

Language Analysis ◽

Proposed Model ◽

Unlabelled Data ◽

Cepstral Features ◽

Classification Tasks

The success of supervised learning techniques for automatic speech processing does not always extend to problems with limited annotated speech. Unsupervised representation learning aims at utilizing unlabelled data to learn a transformation that makes speech easily distinguishable for classification tasks, whereby deep auto-encoder variants have been most successful in finding such representations. This paper proposes a novel mechanism to incorporate geometric position of speech samples within the global structure of an unlabelled feature set. Regression to the geometric position is also added as an additional constraint for the representation learning auto-encoder. The representation learnt by the proposed model has been evaluated over a supervised classification task for limited vocabulary keyword spotting, with the proposed representation outperforming the commonly used cepstral features by about 9% in terms of classification accuracy, despite using a limited amount of labels during supervision. Furthermore, a small keyword dataset has been collected for Kadazan, an indigenous, low-resourced Southeast Asian language. Analysis for the Kadazan dataset also confirms the superiority of the proposed representation for limited annotation. The results are significant as they confirm that the proposed method can learn unsupervised speech representations effectively for classification tasks with scarce labelled data.

Download Full-text