An Intelligent Optimization Strategy Based on Deep Reinforcement Learning for Step Counting

With the popularity of Internet of things technology and intelligent devices, the application prospect of accurate step counting has gained more and more attention. To solve the problems that the existing algorithms use threshold to filter noise, and the parameters cannot be updated in time, an intelligent optimization strategy based on deep reinforcement learning is proposed. In this study, the counting problem is transformed into a serialization decision optimization. This study integrates the noise recognition and the user feedback to update parameters. The end-to-end processing is direct, which alleviates the inaccuracy of step counting in the follow-up step counting module caused by the inaccuracy of noise filtering in the two-stage processing and makes the model parameters continuously updated. Finally, the experimental results show that the proposed model achieves superior performance to existing approaches.

Download Full-text

A Reinforcement Learning Framework for Spiking Networks with Dynamic Synapses

Computational Intelligence and Neuroscience ◽

10.1155/2011/869348 ◽

2011 ◽

Vol 2011 ◽

pp. 1-12 ◽

Cited By ~ 3

Author(s):

Karim El-Laithy ◽

Martin Bogdan

Keyword(s):

Reinforcement Learning ◽

Spike Timing ◽

Neural Representation ◽

Model Parameters ◽

Learning Framework ◽

Reference Target ◽

Wide Range ◽

Spiking Network ◽

Dynamic Synapses ◽

Exclusive Or

An integration of both the Hebbian-based and reinforcement learning (RL) rules is presented for dynamic synapses. The proposed framework permits the Hebbian rule to update the hidden synaptic model parameters regulating the synaptic response rather than the synaptic weights. This is performed using both the value and the sign of the temporal difference in the reward signal after each trial. Applying this framework, a spiking network with spike-timing-dependent synapses is tested to learn the exclusive-OR computation on a temporally coded basis. Reward values are calculated with the distance between the output spike train of the network and a reference target one. Results show that the network is able to capture the required dynamics and that the proposed framework can reveal indeed an integrated version of Hebbian and RL. The proposed framework is tractable and less computationally expensive. The framework is applicable to a wide class of synaptic models and is not restricted to the used neural representation. This generality, along with the reported results, supports adopting the introduced approach to benefit from the biologically plausible synaptic models in a wide range of intuitive signal processing.

Download Full-text

Optimising Performance for NB-IoT UE Devices through Data Driven Models

Journal of Sensor and Actuator Networks ◽

10.3390/jsan10010021 ◽

2021 ◽

Vol 10 (1) ◽

pp. 21

Author(s):

Omar Nassef ◽

Toktam Mahmoodi ◽

Foivos Michelinakis ◽

Kashif Mahmood ◽

Ahmed Elmokashfi

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Gradient Descent ◽

Deep Neural Network ◽

Narrow Band ◽

Learning Algorithm ◽

Base Station ◽

User Equipment ◽

Data Driven ◽

Superior Performance

This paper presents a data driven framework for performance optimisation of Narrow-Band IoT user equipment. The proposed framework is an edge micro-service that suggests one-time configurations to user equipment communicating with a base station. Suggested configurations are delivered from a Configuration Advocate, to improve energy consumption, delay, throughput or a combination of those metrics, depending on the user-end device and the application. Reinforcement learning utilising gradient descent and genetic algorithm is adopted synchronously with machine and deep learning algorithms to predict the environmental states and suggest an optimal configuration. The results highlight the adaptability of the Deep Neural Network in the prediction of intermediary environmental states, additionally the results present superior performance of the genetic reinforcement learning algorithm regarding its performance optimisation.

Download Full-text

Intelligent Ramp Control for Incident Response Using Dyna-QArchitecture

Mathematical Problems in Engineering ◽

10.1155/2015/896943 ◽

2015 ◽

Vol 2015 ◽

pp. 1-16

Author(s):

Chao Lu ◽

Yanan Zhao ◽

Jianwei Gong

Keyword(s):

Reinforcement Learning ◽

Travel Time ◽

Single Agent ◽

Superior Performance ◽

Model Free ◽

Road Users ◽

Total Travel Time ◽

The Uk ◽

Traffic Operation ◽

Ramp Control

Reinforcement learning (RL) has shown great potential for motorway ramp control, especially under the congestion caused by incidents. However, existing applications limited to single-agent tasks and based onQ-learning have inherent drawbacks for dealing with coordinated ramp control problems. For solving these problems, a Dyna-Qbased multiagent reinforcement learning (MARL) system named Dyna-MARL has been developed in this paper. Dyna-Qis an extension ofQ-learning, which combines model-free and model-based methods to obtain benefits from both sides. The performance of Dyna-MARL is tested in a simulated motorway segment in the UK with the real traffic data collected from AM peak hours. The test results compared with Isolated RL and noncontrolled situations show that Dyna-MARL can achieve a superior performance on improving the traffic operation with respect to increasing total throughput, reducing total travel time and CO2emission. Moreover, with a suitable coordination strategy, Dyna-MARL can maintain a highly equitable motorway system by balancing the travel time of road users from different on-ramps.

Download Full-text

Active Learning in the Geometric Block Model

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5772 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3641-3648 ◽

Cited By ~ 1

Author(s):

Eli Chien ◽

Antonia Tulino ◽

Jaime Llorca

Keyword(s):

Active Learning ◽

Community Detection ◽

Random Graphs ◽

State Of The Art ◽

Superior Performance ◽

Model Parameters ◽

Block Model ◽

Stochastic Block Model ◽

Synthetic Datasets ◽

Community Detection Problems

The geometric block model is a recently proposed generative model for random graphs that is able to capture the inherent geometric properties of many community detection problems, providing more accurate characterizations of practical community structures compared with the popular stochastic block model. Galhotra et al. recently proposed a motif-counting algorithm for unsupervised community detection in the geometric block model that is proved to be near-optimal. They also characterized the regimes of the model parameters for which the proposed algorithm can achieve exact recovery. In this work, we initiate the study of active learning in the geometric block model. That is, we are interested in the problem of exactly recovering the community structure of random graphs following the geometric block model under arbitrary model parameters, by possibly querying the labels of a limited number of chosen nodes. We propose two active learning algorithms that combine the use of motif-counting with two different label query policies. Our main contribution is to show that sampling the labels of a vanishingly small fraction of nodes (sub-linear in the total number of nodes) is sufficient to achieve exact recovery in the regimes under which the state-of-the-art unsupervised method fails. We validate the superior performance of our algorithms via numerical simulations on both real and synthetic datasets.

Download Full-text

Quality of Experience in Cyber-Physical Social Systems Based on Reinforcement Learning and Game Theory

Future Internet ◽

10.3390/fi10110108 ◽

2018 ◽

Vol 10 (11) ◽

pp. 108 ◽

Cited By ~ 5

Author(s):

Eirini Tsiropoulou ◽

George Kousis ◽

Athina Thanou ◽

Ioanna Lykourentzou ◽

Symeon Papavassiliou

Keyword(s):

Game Theory ◽

Reinforcement Learning ◽

Time Management ◽

Quality Of Experience ◽

Social Systems ◽

Cost Effective ◽

Superior Performance ◽

Operational Effectiveness ◽

Effective Manner

This paper addresses the problem of museum visitors’ Quality of Experience (QoE) optimization by viewing and treating the museum environment as a cyber-physical social system. To achieve this goal, we harness visitors’ internal ability to intelligently sense their environment and make choices that improve their QoE in terms of which the museum touring option is the best for them and how much time to spend on their visit. We model the museum setting as a distributed non-cooperative game where visitors selfishly maximize their own QoE. In this setting, we formulate the problem of Recommendation Selection and Visiting Time Management (RSVTM) and propose a two-stage distributed algorithm based on game theory and reinforcement learning, which learns from visitor behavior to make on-the-fly recommendation selections that maximize visitor QoE. The proposed framework enables autonomic visitor-centric management in a personalized manner and enables visitors themselves to decide on the best visiting strategies. Experimental results evaluating the performance of the proposed RSVTM algorithm under realistic simulation conditions indicate the high operational effectiveness and superior performance when compared to other recommendation approaches. Our results constitute a practical alternative for museums and exhibition spaces meant to enhance visitor QoE in a flexible, efficient, and cost-effective manner.

Download Full-text

Adaptive tensegrity locomotion: Controlling a compliant icosahedron with symmetry-reduced reinforcement learning

The International Journal of Robotics Research ◽

10.1177/0278364919859443 ◽

2019 ◽

pp. 027836491985944 ◽

Cited By ~ 1

Author(s):

David Surovik ◽

Kun Wang ◽

Massimo Vespignani ◽

Jonathan Bruce ◽

Kostas E Bekris

Keyword(s):

Reinforcement Learning ◽

Policy Design ◽

Model Parameters ◽

Locomotion Control ◽

Small Variety ◽

Sample Data ◽

Hybrid Framework ◽

Varying Environments ◽

High Degree ◽

Flat Ground

Tensegrity robots, which are prototypical examples of hybrid soft–rigid robots, exhibit dynamical properties that provide ruggedness and adaptability. They also bring about, however, major challenges for locomotion control. Owing to high dimensionality and the complex evolution of contact states, data-driven approaches are appropriate for producing viable feedback policies for tensegrities. Guided policy search (GPS), a sample-efficient hybrid framework for optimization and reinforcement learning, has previously been applied to generate periodic, axis-constrained locomotion by an icosahedral tensegrity on flat ground. Varying environments and tasks, however, create a need for more adaptive and general locomotion control that actively utilizes an expanded space of robot states. This implies significantly higher needs in terms of sample data and setup effort. This work mitigates such requirements by proposing a new GPS -based reinforcement learning pipeline, which exploits the vehicle’s high degree of symmetry and appropriately learns contextual behaviors that are sustainable without periodicity. Newly achieved capabilities include axially unconstrained rolling, rough terrain traversal, and rough incline ascent. These tasks are evaluated for a small variety of key model parameters in simulation and tested on the NASA hardware prototype, SUPERball. Results confirm the utility of symmetry exploitation and the adaptability of the vehicle. They also shed light on numerous strengths and limitations of the GPS framework for policy design and transfer to real hybrid soft–rigid robots.

Download Full-text

Using Reinforcement Learning to Estimate Human Joint Moments via EMG Signals or Joint Kinematics: An Alternative Solution to Musculoskeletal-Based Biomechanics

Journal of Biomechanical Engineering ◽

10.1115/1.4049333 ◽

2020 ◽

Author(s):

Wen Wu ◽

Kate Saul ◽

He (Helen) Huang

Keyword(s):

Reinforcement Learning ◽

Correlation Coefficients ◽

Metacarpophalangeal Joint ◽

Joint Moment ◽

Biomechanical Analysis ◽

Model Parameters ◽

Optimization Approach ◽

Joint Moments ◽

Moment Estimation ◽

Policy Optimization

Abstract Reinforcement learning (RL) has potential to provide innovative solutions to existing challenges in estimating joint moments in motion analysis, such as kinematic or electromyography (EMG) noise and unknown model parameters. Here we explore feasibility of RL to assist joint moment estimation for biomechanical applications. Forearm and hand kinematics and forearm EMGs from 4 muscles during free finger and wrist movement were collected from six healthy subjects. Using the Proximal Policy Optimization approach, we trained and tested two types of RL agents that estimated joint moment based on measured kinematics or measured EMGs, respectively. To quantify the performance of RL agents, the estimated joint moment was used to drive a forward dynamic model for estimating kinematics, which were then compared with measured kinematics. The results demonstrated that both RL agents can accurately reproduce wrist and metacarpophalangeal joint motion. The correlation coefficients between estimated and measured kinematics, derived from the kinematics-driven agent and subject-specific EMG-driven agents, were 0.98±0.01 and 0.94±0.03 for the wrist, respectively, and were 0.95±0.02 and 0.84±0.06 for the metacarpophalangeal joint, respectively. In addition, a biomechanically reasonable joint moment-angle-EMG relationship (i.e. dependence of joint moment on joint angle and EMG) was predicted using only 15 seconds of collected data. In conclusion, this study serves as a proof of concept that an RL approach can assist in biomechanical analysis and human-machine interface applications by deriving joint moments from kinematic or EMG data.

Download Full-text

ERLP: Ensembles of Reinforcement Learning Policies (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7225 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13905-13906

Author(s):

Rohan Saphal ◽

Balaraman Ravindran ◽

Dheevatsa Mudigere ◽

Sasikanth Avancha ◽

Bharat Kaul

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Multiple Models ◽

Model Parameters ◽

Continuous Control ◽

Sample Complexity ◽

Local Minima ◽

Single Model ◽

Learning Policies ◽

Reinforcement Learning Models

Reinforcement learning algorithms are sensitive to hyper-parameters and require tuning and tweaking for specific environments for improving performance. Ensembles of reinforcement learning models on the other hand are known to be much more robust and stable. However, training multiple models independently on an environment suffers from high sample complexity. We present here a methodology to create multiple models from a single training instance that can be used in an ensemble through directed perturbation of the model parameters at regular intervals. This allows training a single model that converges to several local minima during the optimization process as a result of the perturbation. By saving the model parameters at each such instance, we obtain multiple policies during training that are ensembled during evaluation. We evaluate our approach on challenging discrete and continuous control tasks and also discuss various ensembling strategies. Our framework is substantially sample efficient, computationally inexpensive and is seen to outperform state of the art (SOTA) approaches

Download Full-text