Pre-training with non-expert human demonstration for deep reinforcement learning

Abstract Deep reinforcement learning (deep RL) has achieved superior performance in complex sequential tasks by using deep neural networks as function approximators to learn directly from raw input images. However, learning directly from raw images is data inefficient. The agent must learn feature representation of complex states in addition to learning a policy. As a result, deep RL typically suffers from slow learning speeds and often requires a prohibitively large amount of training time and data to reach reasonable performance, making it inapplicable to real-world settings where data are expensive. In this work, we improve data efficiency in deep RL by addressing one of the two learning goals, feature learning. We leverage supervised learning to pre-train on a small set of non-expert human demonstrations and empirically evaluate our approach using the asynchronous advantage actor-critic algorithms in the Atari domain. Our results show significant improvements in learning speed, even when the provided demonstration is noisy and of low quality.

Download Full-text

FnnmOS-ELM: A Flexible Neural Network Mixed Online Sequential Elm

Applied Sciences ◽

10.3390/app9183772 ◽

2019 ◽

Vol 9 (18) ◽

pp. 3772

Author(s):

Xiali Li ◽

Shuai He ◽

Junzhi Yu ◽

Licheng Wu ◽

Zhao Yue

Keyword(s):

Neural Network ◽

Network Performance ◽

Classification Performance ◽

Feature Representation ◽

Mixed Structure ◽

Training Time ◽

Learning Speed ◽

Learning Machine ◽

Fully Connected ◽

The Relationship

The learning speed of online sequential extreme learning machine (OS-ELM) algorithms is much higher than that of convolutional neural networks (CNNs) or recurrent neural network (RNNs) on regression and simple classification datasets. However, the general feature extraction of OS-ELM makes it difficult to conveniently and effectively perform classification on some large and complex datasets, e.g., CIFAR. In this paper, we propose a flexible OS-ELM-mixed neural network, termed as fnnmOS-ELM. In this mixed structure, the OS-ELM can replace a part of fully connected layers in CNNs or RNNs. Our framework not only exploits the strong feature representation of CNNs or RNNs, but also performs at a fast speed in terms of classification. Additionally, it avoids the problem of long training time and large parameter size of CNNs or RNNs to some extent. Further, we propose a method for optimizing network performance by splicing OS-ELM after CNN or RNN structures. Iris, IMDb, CIFAR-10, and CIFAR-100 datasets are employed to verify the performance of the fnnmOS-ELM. The relationship between hyper-parameters and the performance of the fnnmOS-ELM is explored, which sheds light on the optimization of network performance. Finally, the experimental results demonstrate that the fnnmOS-ELM has a stronger feature representation and higher classification performance than contemporary methods.

Download Full-text

S$$^{2}$$ES: a stationary and scalable knowledge transfer approach for multiagent reinforcement learning

Complex & Intelligent Systems ◽

10.1007/s40747-021-00423-9 ◽

2021 ◽

Author(s):

Tonghao Wang ◽

Xingguang Peng ◽

Demin Xu

Keyword(s):

Reinforcement Learning ◽

Knowledge Transfer ◽

Shannon Entropy ◽

Empirical Studies ◽

The Other ◽

Learning Pattern ◽

Learning Speed ◽

Multiagent Reinforcement Learning ◽

Experience Replay ◽

Data Efficiency

AbstractKnowledge transfer is widely adopted in accelerating multiagent reinforcement learning (MARL). To accelerate the learning speed of MARL for learning-from scratch agents, in this paper, we propose a Stationary and Scalable knowledge transfer approach based on Experience Sharing (S$$^{2}$$ 2 ES). The mainframe of our approach is structured into three components: what kind of experience, how to learn, and when to transfer. Specifically, we first design an augmented form of experience. By sharing (i.e., transmitting) the experience from one agent to its peers, the learning speed can be effectively enhanced with guaranteed scalability. A synchronized learning pattern is then adopted, which reduces the nonstationarity brought by experience replay, and at the same time retains data efficiency. Moreover, to avoid redundant transfer when the agents’ policies have converged, we further design two trigger conditions, one is modified Q value-based and another is normalized Shannon entropy-based, to determine when to conduct experience sharing. Empirical studies indicate that the proposed approach outperforms the other knowledge transfer methods in efficacy, efficiency, and scalability. We also provide ablation experiments to demonstrate the necessity of the key ingredients.

Download Full-text

A Gaussian Process Latent Variable Model for Subspace Clustering

Complexity ◽

10.1155/2021/8864981 ◽

2021 ◽

Vol 2021 ◽

pp. 1-7

Author(s):

Shangfang Li

Keyword(s):

Gaussian Process ◽

Latent Variable ◽

Clustering Algorithms ◽

Feature Learning ◽

Subspace Clustering ◽

Feature Representation ◽

Learning Ability ◽

Latent Variable Model ◽

Superior Performance ◽

Variable Model

Effective feature representation is the key to success of machine learning applications. Recently, many feature learning models have been proposed. Among these models, the Gaussian process latent variable model (GPLVM) for nonlinear feature learning has received much attention because of its superior performance. However, most of the existing GPLVMs are mainly designed for classification and regression tasks, thus cannot be used in data clustering task. To address this issue and extend the application scope, this paper proposes a novel GPLVM for clustering (C-GPLVM). Specifically, by combining GPLVM with the subspace clustering method, our C-GPLVM can obtain more representative latent variable for clustering. Moreover, it can directly predict the new samples by introducing a back constraint in the model, thus being more suitable for big data learning tasks such as analysis of chaotic time series and so on. In the experiment, we compare it with the related GPLVMs and clustering algorithms. The experimental results show that the proposed model not only inherits the feature learning ability of GPLVM but also has superior clustering accuracy.

Download Full-text

Integrating multi-network topology for gene function prediction using deep neural networks

Briefings in Bioinformatics ◽

10.1093/bib/bbaa036 ◽

2020 ◽

Cited By ~ 3

Author(s):

Jiajie Peng ◽

Hansheng Xue ◽

Zhongyu Wei ◽

Idil Tuncali ◽

Jianye Hao ◽

...

Keyword(s):

Gene Function ◽

Biological Networks ◽

Feature Learning ◽

Learning Task ◽

Function Prediction ◽

Feature Representation ◽

Superior Performance ◽

Gene Function Prediction ◽

Multiple Networks ◽

Low Dimensional

Abstract Motivation The emergence of abundant biological networks, which benefit from the development of advanced high-throughput techniques, contributes to describing and modeling complex internal interactions among biological entities such as genes and proteins. Multiple networks provide rich information for inferring the function of genes or proteins. To extract functional patterns of genes based on multiple heterogeneous networks, network embedding-based methods, aiming to capture non-linear and low-dimensional feature representation based on network biology, have recently achieved remarkable performance in gene function prediction. However, existing methods do not consider the shared information among different networks during the feature learning process. Results Taking the correlation among the networks into account, we design a novel semi-supervised autoencoder method to integrate multiple networks and generate a low-dimensional feature representation. Then we utilize a convolutional neural network based on the integrated feature embedding to annotate unlabeled gene functions. We test our method on both yeast and human datasets and compare with three state-of-the-art methods. The results demonstrate the superior performance of our method. We not only provide a comprehensive analysis of the performance of the newly proposed algorithm but also provide a tool for extracting features of genes based on multiple networks, which can be used in the downstream machine learning task. Availability DeepMNE-CNN is freely available at https://github.com/xuehansheng/DeepMNE-CNN Contact [email protected]; [email protected]; [email protected]

Download Full-text

Integrating multi-network topology for gene function prediction using deep neural networks

10.1101/532408 ◽

2019 ◽

Author(s):

Hansheng Xue ◽

Jiajie Peng ◽

Xuequn Shang

Keyword(s):

Neural Network ◽

Gene Function ◽

State Of The Art ◽

Feature Learning ◽

Function Prediction ◽

Feature Representation ◽

Superior Performance ◽

Gene Function Prediction ◽

Multiple Networks ◽

Low Dimensional

AbstractMotivationThe emerging of abundant biological networks, which benefit from the development of advanced high-throughput techniques, contribute to describing and modeling complex internal interactions among biological entities such as genes and proteins. Multiple networks provide rich information for inferring the function of genes or proteins. To extract functional patterns of genes based on multiple heterogeneous networks, network embedding-based methods, aiming to capture non-linear and low-dimensional feature representation based on network biology, have recently achieved remarkable performance in gene function prediction. However, existing methods mainly do not consider the shared information among different networks during the feature learning process. Thus, we propose a novel multi-networks embedding-based function prediction method based on semi-supervised autoencoder and feature convolution neural network, named DeepMNE-CNN, which captures complex topological structures of multi-networks and takes the correlation among multi-networks into account.ResultsWe design a novel semi-supervised autoencoder method to integrate multiple networks and generate a low-dimensional feature representation. Then we utilize a convolutional neural network based on the integrated feature embedding to annotate unlabeled gene functions. We test our method on both yeast and human dataset and compare with four state-of-the-art methods. The results demonstrate the superior performance of our method over four state-of-the-art algorithms. From the future explorations, we find that semi-supervised autoencoder based multi-networks integration method and CNN-based feature learning methods both contribute to the task of function prediction.AvailabilityDeepMNE-CNN is freely available at https://github.com/xuehansheng/DeepMNE-CNN

Download Full-text

GateRL: Automated Circuit Design Framework of CMOS Logic Gates Using Reinforcement Learning

Electronics ◽

10.3390/electronics10091032 ◽

2021 ◽

Vol 10 (9) ◽

pp. 1032

Author(s):

Hyoungsik Nam ◽

Young In Kim ◽

Jina Bae ◽

Junhee Lee

Keyword(s):

Reinforcement Learning ◽

Circuit Design ◽

Logic Gates ◽

Connection Matrix ◽

Design Framework ◽

Network Layers ◽

Learning Speed ◽

Fully Connected ◽

Automated Circuit Design ◽

Circuit Configuration

This paper proposes a GateRL that is an automated circuit design framework of CMOS logic gates based on reinforcement learning. Because there are constraints in the connection of circuit elements, the action masking scheme is employed. It also reduces the size of the action space leading to the improvement on the learning speed. The GateRL consists of an agent for the action and an environment for state, mask, and reward. State and reward are generated from a connection matrix that describes the current circuit configuration, and the mask is obtained from a masking matrix based on constraints and current connection matrix. The action is given rise to by the deep Q-network of 4 fully connected network layers in the agent. In particular, separate replay buffers are devised for success transitions and failure transitions to expedite the training process. The proposed network is trained with 2 inputs, 1 output, 2 NMOS transistors, and 2 PMOS transistors to design all the target logic gates, such as buffer, inverter, AND, OR, NAND, and NOR. Consequently, the GateRL outputs one-transistor buffer, two-transistor inverter, two-transistor AND, two-transistor OR, three-transistor NAND, and three-transistor NOR. The operations of these resultant logics are verified by the SPICE simulation.

Download Full-text

Optimising Performance for NB-IoT UE Devices through Data Driven Models

Journal of Sensor and Actuator Networks ◽

10.3390/jsan10010021 ◽

2021 ◽

Vol 10 (1) ◽

pp. 21

Author(s):

Omar Nassef ◽

Toktam Mahmoodi ◽

Foivos Michelinakis ◽

Kashif Mahmood ◽

Ahmed Elmokashfi

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Gradient Descent ◽

Deep Neural Network ◽

Narrow Band ◽

Learning Algorithm ◽

Base Station ◽

User Equipment ◽

Data Driven ◽

Superior Performance

This paper presents a data driven framework for performance optimisation of Narrow-Band IoT user equipment. The proposed framework is an edge micro-service that suggests one-time configurations to user equipment communicating with a base station. Suggested configurations are delivered from a Configuration Advocate, to improve energy consumption, delay, throughput or a combination of those metrics, depending on the user-end device and the application. Reinforcement learning utilising gradient descent and genetic algorithm is adopted synchronously with machine and deep learning algorithms to predict the environmental states and suggest an optimal configuration. The results highlight the adaptability of the Deep Neural Network in the prediction of intermediary environmental states, additionally the results present superior performance of the genetic reinforcement learning algorithm regarding its performance optimisation.

Download Full-text

Intelligent Ramp Control for Incident Response Using Dyna-QArchitecture

Mathematical Problems in Engineering ◽

10.1155/2015/896943 ◽

2015 ◽

Vol 2015 ◽

pp. 1-16

Author(s):

Chao Lu ◽

Yanan Zhao ◽

Jianwei Gong

Keyword(s):

Reinforcement Learning ◽

Travel Time ◽

Single Agent ◽

Superior Performance ◽

Model Free ◽

Road Users ◽

Total Travel Time ◽

The Uk ◽

Traffic Operation ◽

Ramp Control

Reinforcement learning (RL) has shown great potential for motorway ramp control, especially under the congestion caused by incidents. However, existing applications limited to single-agent tasks and based onQ-learning have inherent drawbacks for dealing with coordinated ramp control problems. For solving these problems, a Dyna-Qbased multiagent reinforcement learning (MARL) system named Dyna-MARL has been developed in this paper. Dyna-Qis an extension ofQ-learning, which combines model-free and model-based methods to obtain benefits from both sides. The performance of Dyna-MARL is tested in a simulated motorway segment in the UK with the real traffic data collected from AM peak hours. The test results compared with Isolated RL and noncontrolled situations show that Dyna-MARL can achieve a superior performance on improving the traffic operation with respect to increasing total throughput, reducing total travel time and CO2emission. Moreover, with a suitable coordination strategy, Dyna-MARL can maintain a highly equitable motorway system by balancing the travel time of road users from different on-ramps.

Download Full-text

mAHTPred: a sequence-based meta-predictor for improving the prediction of anti-hypertensive peptides using effective feature representation

Bioinformatics ◽

10.1093/bioinformatics/bty1047 ◽

2018 ◽

Vol 35 (16) ◽

pp. 2757-2765 ◽

Cited By ~ 63

Author(s):

Balachandran Manavalan ◽

Shaherin Basith ◽

Tae Hwan Shin ◽

Leyi Wei ◽

Gwang Lee

Keyword(s):

Nearest Neighbor ◽

Feature Representation ◽

Superior Performance ◽

Supplementary Information ◽

Gradient Boosting ◽

Support Vector ◽

Pharmaceutical Drugs ◽

K Nearest Neighbor ◽

Feature Descriptors ◽

Predicted Probability

AbstractMotivationCardiovascular disease is the primary cause of death globally accounting for approximately 17.7 million deaths per year. One of the stakes linked with cardiovascular diseases and other complications is hypertension. Naturally derived bioactive peptides with antihypertensive activities serve as promising alternatives to pharmaceutical drugs. So far, there is no comprehensive analysis, assessment of diverse features and implementation of various machine-learning (ML) algorithms applied for antihypertensive peptide (AHTP) model construction.ResultsIn this study, we utilized six different ML algorithms, namely, Adaboost, extremely randomized tree (ERT), gradient boosting (GB), k-nearest neighbor, random forest (RF) and support vector machine (SVM) using 51 feature descriptors derived from eight different feature encodings for the prediction of AHTPs. While ERT-based trained models performed consistently better than other algorithms regardless of various feature descriptors, we treated them as baseline predictors, whose predicted probability of AHTPs was further used as input features separately for four different ML-algorithms (ERT, GB, RF and SVM) and developed their corresponding meta-predictors using a two-step feature selection protocol. Subsequently, the integration of four meta-predictors through an ensemble learning approach improved the balanced prediction performance and model robustness on the independent dataset. Upon comparison with existing methods, mAHTPred showed superior performance with an overall improvement of approximately 6–7% in both benchmarking and independent datasets.Availability and implementationThe user-friendly online prediction tool, mAHTPred is freely accessible at http://thegleelab.org/mAHTPred.Supplementary informationSupplementary data are available at Bioinformatics online.

Download Full-text

Disentangled Feature Learning Network for Vehicle Re-Identification

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/66 ◽

2020 ◽

Author(s):

Yan Bai ◽

Yihang Lou ◽

Yongxing Dai ◽

Jun Liu ◽

Ziqian Chen ◽

...

Keyword(s):

State Of The Art ◽

Feature Learning ◽

Feature Representation ◽

Public Security ◽

The Public ◽

Common Features ◽

Learning Network ◽

Single Feature ◽

Art Performance

Vehicle Re-Identification (ReID) has attracted lots of research efforts due to its great significance to the public security. In vehicle ReID, we aim to learn features that are powerful in discriminating subtle differences between vehicles which are visually similar, and also robust against different orientations of the same vehicle. However, these two characteristics are hard to be encapsulated into a single feature representation simultaneously with unified supervision. Here we propose a Disentangled Feature Learning Network (DFLNet) to learn orientation specific and common features concurrently, which are discriminative at details and invariant to orientations, respectively. Moreover, to effectively use these two types of features for ReID, we further design a feature metric alignment scheme to ensure the consistency of the metric scales. The experiments show the effectiveness of our method that achieves state-of-the-art performance on three challenging datasets.

Download Full-text