action spaces
Recently Published Documents


TOTAL DOCUMENTS

240
(FIVE YEARS 91)

H-INDEX

19
(FIVE YEARS 4)

2021 ◽  
Vol 12 (1) ◽  
pp. 47
Author(s):  
Jamal Shams Khanzada ◽  
Wasif Muhammad ◽  
Muhammad Jehanzeb Irshad

Quadcopters are finding their place in everything from transportation, delivery, hospitals, and to homes in almost every part of daily life. In places where human intervention for quadcopter flight control is impossible, it becomes necessary to equip drones with intelligent autopilot systems so that they can make decisions on their own. All previous reinforcement learning (RL)-based efforts for quadcopter flight control in complex, dynamic, and unstructured environments remained unsuccessful during the training phase in avoiding the trend of catastrophic failures by naturally unstable quadcopters. In this work, we propose a complementary approach for quadcopter flight control using prediction error as an effective control policy reward in the sensory space instead of rewards from unstable action spaces alike in conventional RL approaches. The proposed predictive coding biased competition using divisive input modulation (PC/BC-DIM) neural network learns prediction error-based flight control policy without physically actuating quadcopter propellers, which ensures its safety during training. The proposed network learned flight control policy without any physical flights, which reduced the training time to almost zero. The simulation results showed that the trained agent reached the destination accurately. For 20 quadcopter flight trails, the average path deviation from the ground truth was 1.495 and the root mean square (RMS) of the goal reached 1.708.


Author(s):  
Huizhen Yu

We consider the linear programming approach for constrained and unconstrained Markov decision processes (MDPs) under the long-run average-cost criterion, where the class of MDPs in our study have Borel state spaces and discrete countable action spaces. Under a strict unboundedness condition on the one-stage costs and a recently introduced majorization condition on the state transition stochastic kernel, we study infinite-dimensional linear programs for the average-cost MDPs and prove the absence of a duality gap and other optimality results. Our results do not require a lower-semicontinuous MDP model. Thus, they can be applied to countable action space MDPs where the dynamics and one-stage costs are discontinuous in the state variable. Our proofs make use of the continuity property of Borel measurable functions asserted by Lusin’s theorem.


2021 ◽  
Author(s):  
Itai Arieli ◽  
Yakov Babichenko ◽  
Manuel Mueller-Frank

Naïve Learning in a Binary Action, Social Network Environment In “Naïve Learning Through Probability Overmatching,” I. Arieli, Y. Babichenko, and M. Mueller-Frank consider an environment where privately informed agents select a binary action repeatedly observing the past actions of their neighbors in a social network. Rational inference has been shown to be exceedingly complex in this environment. Instead, this paper focuses on boundedly rational agents that form beliefs according to discretized DeGroot updating and apply a decision rule that assigns a (mixed) action to each belief. It is shown that naïve learning, where the long run actions of all agents are optimal given their pooled private information, can be achieved in any strongly connected network if beliefs satisfy a high level of inertia and the decision rule coincides with probability overmatching. The main difference to existing naïve learning results is that here it is shown to hold (1) for binary rather than uncountable action spaces and (2) even for network and information structures where Bayesian agents fail to learn.


2021 ◽  
pp. 1-36
Author(s):  
Ayush Raina ◽  
Jonathan Cagan ◽  
Christopher McComb

Abstract Generative design problems often encompass complex action spaces that may be divergent over time, contain state-dependent constraints, or involve hybrid (discrete and continuous) domains. To address those challenges, this work introduces Design Strategy Network (DSN), a data-driven deep hierarchical framework that can learn strategies over these arbitrary complex action spaces. The hierarchical architecture decomposes every action decision into first predicting a preferred spatial region in the design space and then outputting a probability distribution over a set of possible actions from that region. This framework comprises a convolutional encoder to work with image-based design state representations, a multi-layer perceptron to predict a spatial region, and a weight-sharing network to generate a probability distribution over unordered set-based inputs of feasible actions. Applied to a truss design study, the framework learns to predict the actions of human designers in the study, capturing their truss generation strategies in the process. Results show that DSNs significantly outperform non-hierarchical methods of policy representation, demonstrating their superiority in complex action space problems.


2021 ◽  
Author(s):  
Nicolo Botteghi ◽  
Khaled Alaa ◽  
Mannes Poel ◽  
Beril Sirmacek ◽  
Christoph Brune ◽  
...  

AI ◽  
2021 ◽  
Vol 2 (3) ◽  
pp. 366-382
Author(s):  
Zhihan Xue ◽  
Tad Gonsalves

Research on autonomous obstacle avoidance of drones has recently received widespread attention from researchers. Among them, an increasing number of researchers are using machine learning to train drones. These studies typically adopt supervised learning or reinforcement learning to train the networks. Supervised learning has a disadvantage in that it takes a significant amount of time to build the datasets, because it is difficult to cover the complex and changeable drone flight environment in a single dataset. Reinforcement learning can overcome this problem by using drones to learn data in the environment. However, the current research results based on reinforcement learning are mainly focused on discrete action spaces. In this way, the movement of drones lacks precision and has somewhat unnatural flying behavior. This study aims to use the soft-actor-critic algorithm to train a drone to perform autonomous obstacle avoidance in continuous action space using only the image data. The algorithm is trained and tested in a simulation environment built by Airsim. The results show that our algorithm enables the UAV to avoid obstacles in the training environment only by inputting the depth map. Moreover, it also has a higher obstacle avoidance rate in the reconfigured environment without retraining.


2021 ◽  
Author(s):  
Kleber Padovani ◽  
Roberto Xavier ◽  
André Carvalho ◽  
Anna Reali ◽  
Annie Chateau ◽  
...  

Abstract Genome assembly is one of the most relevant and computationally complex tasks in genomics projects. It aims to reconstruct a genome through the analysis of several small textual fragments of such genome — named reads. Ideally, besides ignoring any errors contained in reads, the reconstructed genome should also optimally combine these reads, thus reaching the original genome. The quality of the genome assembly is relevant because the more reliable the genomes, the more accurate the understanding of the characteristics and functions of living beings, and it allows generating many positive impacts on society, including the prevention and treatment of diseases. The assembly becomes even more complex (and it is termed de novo in this case) when the assembler software is not supplied with a similar genome to be used as a reference. Current assemblers have predominantly used heuristic strategies on computational graphs. Despite being widely used in genomics projects, there is still no irrefutably best assembler for any genome, and the proper choice of these assemblers and their configurations depends on Bioinformatics experts. The use of reinforcement learning has proven to be very promising for solving complex activities without human supervision during their learning process. However, their successful applications are predominantly focused on fictional and entertainment problems-such as games. Based on the above, this work aims to shed light on the application of reinforcement learning to solve this relevant real-world problem, the genome assembly. By expanding the only approach found in the literature that addresses this problem, we carefully explored the aspects of intelligent agent learning, performed by the Q-learning algorithm, to understand its suitability to be applied in scenarios whose characteristics are more similar to those faced by real genome projects. The improvements proposed here include changing the previously proposed reward system and including state space exploration optimization strategies based on dynamic pruning and mutual collaboration with evolutionary computing. These investigations were tried on 23 new environments with larger inputs than those used previously. All these environments are freely available on the internet for the evolution of this research by the scientific community. The results suggest consistent performance progress using the proposed improvements, however, they also demonstrate the limitations of them, especially related to the high dimensionality of state and action spaces. We also present, later, the paths that can be traced to tackle genome assembly efficiently in real scenarios considering recent, successfully reinforcement learning applications — including deep reinforcement learning — from other domains dealing with high-dimensional inputs.


Sign in / Sign up

Export Citation Format

Share Document