Simpler Learning of Robotic Manipulation of Clothing by Utilizing DIY Smart Textile Technology

Deformable objects such as ropes, wires, and clothing are omnipresent in society and industry but are little researched in robotics research. This is due to the infinite amount of possible state configurations caused by the deformations of the deformable object. Engineered approaches try to cope with this by implementing highly complex operations in order to estimate the state of the deformable object. This complexity can be circumvented by utilizing learning-based approaches, such as reinforcement learning, which can deal with the intrinsic high-dimensional state space of deformable objects. However, the reward function in reinforcement learning needs to measure the state configuration of the highly deformable object. Vision-based reward functions are difficult to implement, given the high dimensionality of the state and complex dynamic behavior. In this work, we propose the consideration of concepts beyond vision and incorporate other modalities which can be extracted from deformable objects. By integrating tactile sensor cells into a textile piece, proprioceptive capabilities are gained that are valuable as they provide a reward function to a reinforcement learning agent. We demonstrate on a low-cost dual robotic arm setup that a physical agent can learn on a single CPU core to fold a rectangular patch of textile in the real world based on a learned reward function from tactile information.

Download Full-text

Quadrotor Path Following and Reactive Obstacle Avoidance with Deep Reinforcement Learning

Journal of Intelligent & Robotic Systems ◽

10.1007/s10846-021-01491-2 ◽

2021 ◽

Vol 103 (4) ◽

Author(s):

Bartomeu Rubí ◽

Bernardo Morcego ◽

Ramon Pérez

Keyword(s):

Reinforcement Learning ◽

Obstacle Avoidance ◽

Low Cost ◽

Path Following ◽

The State ◽

Gradient Algorithm ◽

Avoidance Task ◽

Learning Approaches ◽

Reward Function ◽

Novel Structure

AbstractA deep reinforcement learning approach for solving the quadrotor path following and obstacle avoidance problem is proposed in this paper. The problem is solved with two agents: one for the path following task and another one for the obstacle avoidance task. A novel structure is proposed, where the action computed by the obstacle avoidance agent becomes the state of the path following agent. Compared to traditional deep reinforcement learning approaches, the proposed method allows to interpret the training process outcomes, is faster and can be safely trained on the real quadrotor. Both agents implement the Deep Deterministic Policy Gradient algorithm. The path following agent was developed in a previous work. The obstacle avoidance agent uses the information provided by a low-cost LIDAR to detect obstacles around the vehicle. Since LIDAR has a narrow field-of-view, an approach for providing the agent with a memory of the previously seen obstacles is developed. A detailed description of the process of defining the state vector, the reward function and the action of this agent is given. The agents are programmed in python/tensorflow and are trained and tested in the RotorS/gazebo platform. Simulations results prove the validity of the proposed approach.

Download Full-text

Location- and Person-Independent Activity Recognition with WiFi, Deep Neural Networks, and Reinforcement Learning

ACM Transactions on Internet of Things ◽

10.1145/3424739 ◽

2021 ◽

Vol 2 (1) ◽

pp. 1-25

Author(s):

Yongsen Ma ◽

Sheheryar Arshad ◽

Swetha Muniraju ◽

Eric Torkildson ◽

Enrico Rantala ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Reinforcement Learning ◽

Activity Recognition ◽

Deep Neural Networks ◽

State Machine ◽

Recognition Algorithm ◽

The State ◽

Neural Architecture ◽

Learning Agent

In recent years, Channel State Information (CSI) measured by WiFi is widely used for human activity recognition. In this article, we propose a deep learning design for location- and person-independent activity recognition with WiFi. The proposed design consists of three Deep Neural Networks (DNNs): a 2D Convolutional Neural Network (CNN) as the recognition algorithm, a 1D CNN as the state machine, and a reinforcement learning agent for neural architecture search. The recognition algorithm learns location- and person-independent features from different perspectives of CSI data. The state machine learns temporal dependency information from history classification results. The reinforcement learning agent optimizes the neural architecture of the recognition algorithm using a Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM). The proposed design is evaluated in a lab environment with different WiFi device locations, antenna orientations, sitting/standing/walking locations/orientations, and multiple persons. The proposed design has 97% average accuracy when testing devices and persons are not seen during training. The proposed design is also evaluated by two public datasets with accuracy of 80% and 83%. The proposed design needs very little human efforts for ground truth labeling, feature engineering, signal processing, and tuning of learning parameters and hyperparameters.

Download Full-text

Accelerating Reinforcement Learning through Implicit Imitation

Journal of Artificial Intelligence Research ◽

10.1613/jair.898 ◽

2003 ◽

Vol 19 ◽

pp. 569-629 ◽

Cited By ~ 72

Author(s):

B. Price ◽

C. Boutilier

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Formal Model ◽

The State ◽

Learning Agent ◽

Relative Value ◽

Multiagent Environments ◽

Improved Performance ◽

Prioritized Sweeping ◽

Extract Information

Imitation can be viewed as a means of enhancing learning in multiagent environments. It augments an agent's ability to learn useful behaviors by making intelligent use of the knowledge implicit in behaviors demonstrated by cooperative teachers or other more experienced agents. We propose and study a formal model of implicit imitation that can accelerate reinforcement learning dramatically in certain cases. Roughly, by observing a mentor, a reinforcement-learning agent can extract information about its own capabilities in, and the relative value of, unvisited parts of the state space. We study two specific instantiations of this model, one in which the learning agent and the mentor have identical abilities, and one designed to deal with agents and mentors with different action sets. We illustrate the benefits of implicit imitation by integrating it with prioritized sweeping, and demonstrating improved performance and convergence through observation of single and multiple mentors. Though we make some stringent assumptions regarding observability and possible interactions, we briefly comment on extensions of the model that relax these restricitions.

Download Full-text

Object Affordance Driven Inverse Reinforcement Learning Through Conceptual Abstraction and Advice

Paladyn Journal of Behavioral Robotics ◽

10.1515/pjbr-2018-0021 ◽

2018 ◽

Vol 9 (1) ◽

pp. 277-294 ◽

Cited By ~ 1

Author(s):

Rupam Bhattacharyya ◽

Shyamanta M. Hazarika

Keyword(s):

Reinforcement Learning ◽

High Dimensional ◽

Inverse Reinforcement Learning ◽

Intent Recognition ◽

Reward Function ◽

Object Affordances ◽

Learning Agent ◽

Markov Decision ◽

Observed Behaviour ◽

Object Affordance

Abstract Within human Intent Recognition (IR), a popular approach to learning from demonstration is Inverse Reinforcement Learning (IRL). IRL extracts an unknown reward function from samples of observed behaviour. Traditional IRL systems require large datasets to recover the underlying reward function. Object affordances have been used for IR. Existing literature on recognizing intents through object affordances fall short of utilizing its true potential. In this paper, we seek to develop an IRL system which drives human intent recognition along with the capability to handle high dimensional demonstrations exploiting the capability of object affordances. An architecture for recognizing human intent is presented which consists of an extended Maximum Likelihood Inverse Reinforcement Learning agent. Inclusion of Symbolic Conceptual Abstraction Engine (SCAE) along with an advisor allows the agent to work on Conceptually Abstracted Markov Decision Process. The agent recovers object affordance based reward function from high dimensional demonstrations. This function drives a Human Intent Recognizer through identification of probable intents. Performance of the resulting system on the standard CAD-120 dataset shows encouraging result.

Download Full-text

CURIOSITY-DRIVEN REINFORCEMENT LEARNING AGENT FOR MAPPING UNKNOWN INDOOR ENVIRONMENTS

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-1-2021-129-2021 ◽

2021 ◽

Vol V-1-2021 ◽

pp. 129-136

Author(s):

N. Botteghi ◽

R. Schulte ◽

B. Sirmacek ◽

M. Poel ◽

C. Brune

Keyword(s):

Artificial Intelligence ◽

Reinforcement Learning ◽

Mobile Robot ◽

Indoor Environments ◽

Reward Function ◽

Learning Framework ◽

Learning Agent ◽

The World ◽

Reward Functions ◽

Slam Algorithm

Abstract. Autonomously exploring and mapping is one of the open challenges of robotics and artificial intelligence. Especially when the environments are unknown, choosing the optimal navigation directive is not straightforward. In this paper, we propose a reinforcement learning framework for navigating, exploring, and mapping unknown environments. The reinforcement learning agent is in charge of selecting the commands for steering the mobile robot, while a SLAM algorithm estimates the robot pose and maps the environments. The agent, to select optimal actions, is trained to be curious about the world. This concept translates into the introduction of a curiosity-driven reward function that encourages the agent to steer the mobile robot towards unknown and unseen areas of the world and the map. We test our approach in explorations challenges in different indoor environments. The agent trained with the proposed reward function outperforms the agents trained with reward functions commonly used in the literature for solving such tasks.

Download Full-text

Algorithmic Improvements for Deep Reinforcement Learning Applied to Interactive Fiction

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5857 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4328-4336

Author(s):

Vishal Jain ◽

William Fedus ◽

Hugo Larochelle ◽

Doina Precup ◽

Marc G. Bellemare

Keyword(s):

Reinforcement Learning ◽

Structural Characteristics ◽

Learning Algorithms ◽

Learning Problem ◽

Interactive Fiction ◽

Reward Function ◽

Learning Agent ◽

Accumulated Reward ◽

Partially Observable ◽

Action Spaces

Text-based games are a natural challenge domain for deep reinforcement learning algorithms. Their state and action spaces are combinatorially large, their reward function is sparse, and they are partially observable: the agent is informed of the consequences of its actions through textual feedback. In this paper we emphasize this latter point and consider the design of a deep reinforcement learning agent that can play from feedback alone. Our design recognizes and takes advantage of the structural characteristics of text-based games. We first propose a contextualisation mechanism, based on accumulated reward, which simplifies the learning problem and mitigates partial observability. We then study different methods that rely on the notion that most actions are ineffectual in any given situation, following Zahavy et al.'s idea of an admissible action. We evaluate these techniques in a series of text-based games of increasing difficulty based on the TextWorld framework, as well as the iconic game Zork. Empirically, we find that these techniques improve the performance of a baseline deep reinforcement learning agent applied to text-based games.

Download Full-text

Towards Adaptive Enterprise

Advanced Digital Architectures for Model-Driven Adaptive Enterprises - Advances in E-Business Research ◽

10.4018/978-1-7998-0108-5.ch007 ◽

2020 ◽

pp. 132-157

Author(s):

Harshad Khadilkar ◽

Aditya Avinash Paranjape

Keyword(s):

Decision Making ◽

Adaptive Control ◽

Reinforcement Learning ◽

State Of The Art ◽

Business Environment ◽

The State ◽

Learning Needs

The key to a successful adaptive enterprise lies in techniques and algorithms that enable the enterprise to learn about its environment and use the learning to make decisions that maximize its objectives. The volatile nature of the contemporary business environment means that learning needs to be continuous and reliable, and the decision-making rapid and accurate. In this chapter, the authors investigate two promising families of tools that can be used to design such algorithms: adaptive control and reinforcement learning. Both methodologies have evolved over the years into mathematically rigorous and practically reliable solutions. They review the foundations, the state-of-the-art, and the limitations of these methodologies. They discuss possible ways to bring together these techniques in a way that brings out the best of their capabilities.

Download Full-text

Reinforcement Learning with a Corrupted Reward Channel

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/656 ◽

2017 ◽

Cited By ~ 9

Author(s):

Tom Everitt ◽

Victoria Krakovna ◽

Laurent Orseau ◽

Shane Legg

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Decision Problem ◽

Inverse Reinforcement Learning ◽

Markov Decision Problem ◽

Software Bugs ◽

Reward Function ◽

Learning Agent ◽

Markov Decision ◽

Simplifying Assumptions

No real-world reward function is perfect. Sensory errors and software bugs may result in agents getting higher (or lower) rewards than they should. For example, a reinforcement learning agent may prefer states where a sensory error gives it the maximum reward, but where the true reward is actually small. We formalise this problem as a generalised Markov Decision Problem called Corrupt Reward MDP. Traditional RL methods fare poorly in CRMDPs, even under strong simplifying assumptions and when trying to compensate for the possibly corrupt rewards. Two ways around the problem are investigated. First, by giving the agent richer data, such as in inverse reinforcement learning and semi-supervised reinforcement learning, reward corruption stemming from systematic sensory errors may sometimes be completely managed. Second, by using randomisation to blunt the agent's optimisation, reward corruption can be partially managed under some assumptions.

Download Full-text

Learning Reward Function with Matching Network for Mapless Navigation

Sensors ◽

10.3390/s20133664 ◽

2020 ◽

Vol 20 (13) ◽

pp. 3664 ◽

Cited By ~ 1

Author(s):

Qichen Zhang ◽

Meiqiang Zhu ◽

Liang Zou ◽

Ming Li ◽

Yong Zhang

Keyword(s):

Reinforcement Learning ◽

Optimal Strategy ◽

State Of The Art ◽

The State ◽

Matching Network ◽

Additional Training ◽

Reward Function ◽

Moving Obstacles ◽

Reward Shaping ◽

Simulation Results

Deep reinforcement learning (DRL) has been successfully applied in mapless navigation. An important issue in DRL is to design a reward function for evaluating actions of agents. However, designing a robust and suitable reward function greatly depends on the designer’s experience and intuition. To address this concern, we consider employing reward shaping from trajectories on similar navigation tasks without human supervision, and propose a general reward function based on matching network (MN). The MN-based reward function is able to gain the experience by pre-training through trajectories on different navigation tasks and accelerate the training speed of DRL in new tasks. The proposed reward function keeps the optimal strategy of DRL unchanged. The simulation results on two static maps show that the DRL converge with less iterations via the learned reward function than the state-of-the-art mapless navigation methods. The proposed method performs well in dynamic maps with partially moving obstacles. Even when test maps are different from training maps, the proposed strategy is able to complete the navigation tasks without additional training.

Download Full-text

Cooperative update of beliefs and state-transition functions in human reinforcement learning

Scientific Reports ◽

10.1038/s41598-019-53600-9 ◽

2019 ◽

Vol 9 (1) ◽

Author(s):

Hiroshi Higashi ◽

Tetsuto Minami ◽

Shigeki Nakauchi

Keyword(s):

Reinforcement Learning ◽

Bayesian Learning ◽

Computational Models ◽

State Transition ◽

Learning Strategy ◽

The State ◽

Reward Function ◽

Outcome Feedback ◽

Transition Functions ◽

Trial Analysis

AbstractIt is widely known that reinforcement learning systems in the brain contribute to learning via interactions with the environment. These systems are capable of solving multidimensional problems, in which some dimensions are relevant to a reward, while others are not. To solve these problems, computational models use Bayesian learning, a strategy supported by behavioral and neural evidence in human. Bayesian learning takes into account beliefs, which represent a learner’s confidence in a particular dimension being relevant to the reward. Beliefs are given as a posterior probability of the state-transition (reward) function that maps the optimal actions to the states in each dimension. However, when it comes to implementing this learning strategy, the order in which beliefs and state-transition functions update remains unclear. The present study investigates this update order using a trial-by-trial analysis of human behavior and electroencephalography signals during a task in which learners have to identify the reward-relevant dimension. Our behavioral and neural results reveal a cooperative update—within 300 ms after the outcome feedback, the state-transition functions are updated, followed by the beliefs for each dimension.

Download Full-text