Adaptive tensegrity locomotion: Controlling a compliant icosahedron with symmetry-reduced reinforcement learning

Tensegrity robots, which are prototypical examples of hybrid soft–rigid robots, exhibit dynamical properties that provide ruggedness and adaptability. They also bring about, however, major challenges for locomotion control. Owing to high dimensionality and the complex evolution of contact states, data-driven approaches are appropriate for producing viable feedback policies for tensegrities. Guided policy search (GPS), a sample-efficient hybrid framework for optimization and reinforcement learning, has previously been applied to generate periodic, axis-constrained locomotion by an icosahedral tensegrity on flat ground. Varying environments and tasks, however, create a need for more adaptive and general locomotion control that actively utilizes an expanded space of robot states. This implies significantly higher needs in terms of sample data and setup effort. This work mitigates such requirements by proposing a new GPS -based reinforcement learning pipeline, which exploits the vehicle’s high degree of symmetry and appropriately learns contextual behaviors that are sustainable without periodicity. Newly achieved capabilities include axially unconstrained rolling, rough terrain traversal, and rough incline ascent. These tasks are evaluated for a small variety of key model parameters in simulation and tested on the NASA hardware prototype, SUPERball. Results confirm the utility of symmetry exploitation and the adaptability of the vehicle. They also shed light on numerous strengths and limitations of the GPS framework for policy design and transfer to real hybrid soft–rigid robots.

Download Full-text

A Reinforcement Learning Framework for Spiking Networks with Dynamic Synapses

Computational Intelligence and Neuroscience ◽

10.1155/2011/869348 ◽

2011 ◽

Vol 2011 ◽

pp. 1-12 ◽

Cited By ~ 3

Author(s):

Karim El-Laithy ◽

Martin Bogdan

Keyword(s):

Reinforcement Learning ◽

Spike Timing ◽

Neural Representation ◽

Model Parameters ◽

Learning Framework ◽

Reference Target ◽

Wide Range ◽

Spiking Network ◽

Dynamic Synapses ◽

Exclusive Or

An integration of both the Hebbian-based and reinforcement learning (RL) rules is presented for dynamic synapses. The proposed framework permits the Hebbian rule to update the hidden synaptic model parameters regulating the synaptic response rather than the synaptic weights. This is performed using both the value and the sign of the temporal difference in the reward signal after each trial. Applying this framework, a spiking network with spike-timing-dependent synapses is tested to learn the exclusive-OR computation on a temporally coded basis. Reward values are calculated with the distance between the output spike train of the network and a reference target one. Results show that the network is able to capture the required dynamics and that the proposed framework can reveal indeed an integrated version of Hebbian and RL. The proposed framework is tractable and less computationally expensive. The framework is applicable to a wide class of synaptic models and is not restricted to the used neural representation. This generality, along with the reported results, supports adopting the introduced approach to benefit from the biologically plausible synaptic models in a wide range of intuitive signal processing.

Download Full-text

A Hybrid Framework for Functional Verification using Reinforcement Learning and Deep Learning

Proceedings of the 2019 on Great Lakes Symposium on VLSI - GLSVLSI '19 ◽

10.1145/3299874.3318039 ◽

2019 ◽

Cited By ~ 2

Author(s):

Karunveer Singh ◽

Rishabh Gupta ◽

Vikram Gupta ◽

Arash Fayyazi ◽

Massoud Pedram ◽

...

Keyword(s):

Deep Learning ◽

Reinforcement Learning ◽

Functional Verification ◽

Hybrid Framework

Download Full-text

Analysis of Vehicle Dynamic Performance Based on MATLAB

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.860-863.1725 ◽

2013 ◽

Vol 860-863 ◽

pp. 1725-1728

Author(s):

Fan Biao Bao

Keyword(s):

Research Method ◽

Dynamic Performance ◽

Specific Model ◽

Paper Analysis ◽

Power Balance ◽

Model Parameters ◽

Vehicle Dynamic ◽

Clear Physical Meaning ◽

New Ideas ◽

High Degree

This document focus on the car's dynamic performance characteristics.Because MATLAB has many advantages such as intuitive, clear physical meaning, a small amount of programming, data visualization and high degree of merit. This paper Computes and analysis with the introduction of an instance practice vehicle models.In light of the specific model parameters, this paper has analyzed car driver and driving resistance balance, power balance and power factor based on the application of Mat Lab's data analysis and graphics, and drawn the relevant graph, according to the mapping feature maps.The paper analysis of the car comprehensive power the car's dynamic graphing features calculation and research method are provided. The paper has provided new ideas of vehicle parameter selection and design.It has some practical value.

Download Full-text

Modeling of Coal Mill System Used for Fault Simulation

Energies ◽

10.3390/en13071784 ◽

2020 ◽

Vol 13 (7) ◽

pp. 1784

Author(s):

Yong Hu ◽

Boyu Ping ◽

Deliang Zeng ◽

Yuguang Niu ◽

Yaokui Gao

Keyword(s):

Relative Error ◽

Power Plants ◽

Fault Simulation ◽

Recognition Rate ◽

Model Parameters ◽

Actual Object ◽

Fault Recognition ◽

Sample Data ◽

Grinding Mechanism ◽

Types Of Faults

Monitoring and diagnosis of coal mill systems are critical to the security operation of power plants. The traditional data-driven fault diagnosis methods often result in low fault recognition rate or even misjudgment due to the imbalance between fault data samples and normal data samples. In order to obtain massive fault sample data effectively, based on the analysis of primary air system, grinding mechanism and energy conversion process, a dynamic model of the coal mill system which can be used for fault simulation is established. Then, according to the mechanism of various faults, three types of faults (i.e., coal interruption, coal blockage and coal self-ignition) are simulated through the modification of model parameters. The simulation shows that the dynamic characteristic of the model is consistent with the actual object, the relative error of each output variable is less than 2.53%, and the total average relative error of all outputs is about 1.2%. The model has enough accuracy and adaptability for fault simulation, and the problem of massive fault samples acquisition can be effectively solved by the proposed method.

Download Full-text

Using Reinforcement Learning to Estimate Human Joint Moments via EMG Signals or Joint Kinematics: An Alternative Solution to Musculoskeletal-Based Biomechanics

Journal of Biomechanical Engineering ◽

10.1115/1.4049333 ◽

2020 ◽

Author(s):

Wen Wu ◽

Kate Saul ◽

He (Helen) Huang

Keyword(s):

Reinforcement Learning ◽

Correlation Coefficients ◽

Metacarpophalangeal Joint ◽

Joint Moment ◽

Biomechanical Analysis ◽

Model Parameters ◽

Optimization Approach ◽

Joint Moments ◽

Moment Estimation ◽

Policy Optimization

Abstract Reinforcement learning (RL) has potential to provide innovative solutions to existing challenges in estimating joint moments in motion analysis, such as kinematic or electromyography (EMG) noise and unknown model parameters. Here we explore feasibility of RL to assist joint moment estimation for biomechanical applications. Forearm and hand kinematics and forearm EMGs from 4 muscles during free finger and wrist movement were collected from six healthy subjects. Using the Proximal Policy Optimization approach, we trained and tested two types of RL agents that estimated joint moment based on measured kinematics or measured EMGs, respectively. To quantify the performance of RL agents, the estimated joint moment was used to drive a forward dynamic model for estimating kinematics, which were then compared with measured kinematics. The results demonstrated that both RL agents can accurately reproduce wrist and metacarpophalangeal joint motion. The correlation coefficients between estimated and measured kinematics, derived from the kinematics-driven agent and subject-specific EMG-driven agents, were 0.98±0.01 and 0.94±0.03 for the wrist, respectively, and were 0.95±0.02 and 0.84±0.06 for the metacarpophalangeal joint, respectively. In addition, a biomechanically reasonable joint moment-angle-EMG relationship (i.e. dependence of joint moment on joint angle and EMG) was predicted using only 15 seconds of collected data. In conclusion, this study serves as a proof of concept that an RL approach can assist in biomechanical analysis and human-machine interface applications by deriving joint moments from kinematic or EMG data.

Download Full-text

ERLP: Ensembles of Reinforcement Learning Policies (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7225 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13905-13906

Author(s):

Rohan Saphal ◽

Balaraman Ravindran ◽

Dheevatsa Mudigere ◽

Sasikanth Avancha ◽

Bharat Kaul

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Multiple Models ◽

Model Parameters ◽

Continuous Control ◽

Sample Complexity ◽

Local Minima ◽

Single Model ◽

Learning Policies ◽

Reinforcement Learning Models

Reinforcement learning algorithms are sensitive to hyper-parameters and require tuning and tweaking for specific environments for improving performance. Ensembles of reinforcement learning models on the other hand are known to be much more robust and stable. However, training multiple models independently on an environment suffers from high sample complexity. We present here a methodology to create multiple models from a single training instance that can be used in an ensemble through directed perturbation of the model parameters at regular intervals. This allows training a single model that converges to several local minima during the optimization process as a result of the perturbation. By saving the model parameters at each such instance, we obtain multiple policies during training that are ensembled during evaluation. We evaluate our approach on challenging discrete and continuous control tasks and also discuss various ensembling strategies. Our framework is substantially sample efficient, computationally inexpensive and is seen to outperform state of the art (SOTA) approaches

Download Full-text

Application of Reinforcement Learning to a Robotic Drinking Assistant

Robotics ◽

10.3390/robotics9010001 ◽

2019 ◽

Vol 9 (1) ◽

pp. 1 ◽

Cited By ~ 1

Author(s):

Tejas Kumar Shastha ◽

Maria Kyrarini ◽

Axel Gräser

Keyword(s):

Reinforcement Learning ◽

Tactile Feedback ◽

Human Robot Interaction ◽

Assistive Robotics ◽

Training Procedure ◽

Assistive Robots ◽

Assistive Robot ◽

Markov Decision ◽

Degree Of Acceptance ◽

High Degree

Meal assistant robots form a very important part of the assistive robotics sector since self-feeding is a priority activity of daily living (ADL) for people suffering from physical disabilities like tetraplegia. A quick survey of the current trends in this domain reveals that, while tremendous progress has been made in the development of assistive robots for the feeding of solid foods, the task of feeding liquids from a cup remains largely underdeveloped. Therefore, this paper describes an assistive robot that focuses specifically on the feeding of liquids from a cup using tactile feedback through force sensors with direct human–robot interaction (HRI). The main focus of this paper is the application of reinforcement learning (RL) to learn what the best robotic actions are, based on the force applied by the user. A model of the application environment is developed based on the Markov decision process and a software training procedure is designed for quick development and testing. Five of the commonly used RL algorithms are investigated, with the intention of finding the best fit for training, and the system is tested in an experimental study. The preliminary results show a high degree of acceptance by the participants. Feedback from the users indicates that the assistive robot functions intuitively and effectively.

Download Full-text

ASM1 dynamic calibration and long-term validation for an intermittently aerated WWTP

Water Science & Technology ◽

10.2166/wst.2006.427 ◽

2006 ◽

Vol 53 (12) ◽

pp. 247-256 ◽

Cited By ~ 7

Author(s):

A. Marquot ◽

A.-E. Stricker ◽

Y. Racault

Keyword(s):

Treatment Plant ◽

Simulation Software ◽

Biological Nutrient Removal ◽

Model Parameters ◽

Dynamic Calibration ◽

Biological Degradation ◽

Data Set ◽

Degree Of Confidence ◽

High Degree

Activated sludge models, and ASM1 in particular, are well recognised and useful mathematical representations of the macroscopic processes involved in the biological degradation of the pollution carried by wastewater. Nevertheless, the use of these models through simulation software requires a careful methodology for their calibration (determination of the model parameters' values) and the validation step (verification with an independent data set). This paper presents the methodology and the results of dynamic calibration and validation tasks as a prior work to a modelling project for defining a reference guideline destined to French designers and operators. To reach these goals, a biological nutrient removal (BNR) wastewater treatment plant (WWTP) with intermittent aeration was selected and monitored for 2 years. Two sets of calibrated parameters are given and discussed. The results of the long-term validation task are presented through a 2-month simulation with lots of operation changes. Finally, it is concluded that, even if calibrating ASM1 with a high degree of confidence with a single set of parameters was not possible, the results of the calibration are sufficient to obtain satisfactory results over long-term dynamic simulation. However, simulating long periods reveals specific calibration issues such as the variation of the nitrification capacity due to external events.

Download Full-text

Contact length in grinding: Part 2: Evaluation of contact length models

Proceedings of the Institution of Mechanical Engineers Part J Journal of Engineering Tribology ◽

10.1243/1350650971542336 ◽

1997 ◽

Vol 211 (1) ◽

pp. 77-85 ◽

Cited By ~ 2

Author(s):

H. S. Qi ◽

W. B. Rowe ◽

B Mills

Keyword(s):

Roughness Parameter ◽

Contact Length ◽

Speed Ratio ◽

Depth Of Cut ◽

Model Parameters ◽

Roughness Factor ◽

Workpiece Material ◽

Curve Fit ◽

Material Hardness ◽

High Degree

A non-linear curve fit method was used to identify model parameters for the prediction of contact length in grinding. The evaluation shows that the model based on a surface roughness factor demonstrates a high degree of correlation with experimental trends and behaviour. It was found that the roughness parameter R r is insensitive to depth of cut, speed ratio and material hardness, but is sensitive to the wheel-workpiece material combinations and whether grinding is conducted with or without coolant.

Download Full-text

Monte-Carlo Simulation of the Theoretical Site Response Variability at Turkey Flat, California, Given the Uncertainty in the Geotechnically Derived Input Parameters

Earthquake Spectra ◽

10.1193/1.1585736 ◽

1993 ◽

Vol 9 (4) ◽

pp. 669-701 ◽

Cited By ~ 12

Author(s):

Edward H. Field ◽

Klaus H. Jacob

Keyword(s):

Monte Carlo ◽

Physical Model ◽

Site Response ◽

Input Parameter ◽

Model Parameters ◽

Spectral Ratios ◽

Parameter Uncertainties ◽

Theoretical Predictions ◽

High Degree ◽

Geotechnical Studies

In the weak-motion phase of the Turkey Flat blind-prediction effort, it was found that given a particular physical model of each sediment site, various theoretical techniques give similar estimates of the site response. However, it remained to be determined how uncertainties in the physical model parameters influence the theoretical predictions. We have studied this question by propagating the physical parameter uncertainties into the theoretical site-response predictions using monte-carlo simulations. The input-parameter uncertainties were estimated directly from the results of several independent geotechnical studies performed at Turkey Flat. While the computed results generally agree with empirical site-response estimates (average spectral ratios of earthquake recordings), we found that the uncertainties lead to a high degree of variability in the theoretical predictions. Most of this variability comes from poor constraints on the shear-wave velocity and thickness of a thin (∼2m) surface layer, and on the attenuation of the sediments. Our results suggest that in site-response studies which rely exclusively on geotechnically based theoretical predictions, it will be important that the variability resulting from input-parameter uncertainties is recognized and accounted for.

Download Full-text