Learning efficient navigation in vortical flow fields

AbstractEfficient point-to-point navigation in the presence of a background flow field is important for robotic applications such as ocean surveying. In such applications, robots may only have knowledge of their immediate surroundings or be faced with time-varying currents, which limits the use of optimal control techniques. Here, we apply a recently introduced Reinforcement Learning algorithm to discover time-efficient navigation policies to steer a fixed-speed swimmer through unsteady two-dimensional flow fields. The algorithm entails inputting environmental cues into a deep neural network that determines the swimmer’s actions, and deploying Remember and Forget Experience Replay. We find that the resulting swimmers successfully exploit the background flow to reach the target, but that this success depends on the sensed environmental cue. Surprisingly, a velocity sensing approach significantly outperformed a bio-mimetic vorticity sensing approach, and achieved a near 100% success rate in reaching the target locations while approaching the time-efficiency of optimal navigation trajectories.

Download Full-text

A real-time HIL control system on rotary inverted pendulum hardware platform based on double deep Q-network

Measurement and Control ◽

10.1177/00202940211000380 ◽

2021 ◽

Vol 54 (3-4) ◽

pp. 417-428

Author(s):

Yanyan Dai ◽

KiDong Lee ◽

SukGyu Lee

Keyword(s):

Control System ◽

Reinforcement Learning ◽

Inverted Pendulum ◽

Learning Algorithm ◽

Deep Understanding ◽

Control Engineering ◽

Experience Replay ◽

Real Hardware ◽

Rotary Inverted Pendulum ◽

Reinforcement Learning Algorithm

For real applications, rotary inverted pendulum systems have been known as the basic model in nonlinear control systems. If researchers have no deep understanding of control, it is difficult to control a rotary inverted pendulum platform using classic control engineering models, as shown in section 2.1. Therefore, without classic control theory, this paper controls the platform by training and testing reinforcement learning algorithm. Many recent achievements in reinforcement learning (RL) have become possible, but there is a lack of research to quickly test high-frequency RL algorithms using real hardware environment. In this paper, we propose a real-time Hardware-in-the-loop (HIL) control system to train and test the deep reinforcement learning algorithm from simulation to real hardware implementation. The Double Deep Q-Network (DDQN) with prioritized experience replay reinforcement learning algorithm, without a deep understanding of classical control engineering, is used to implement the agent. For the real experiment, to swing up the rotary inverted pendulum and make the pendulum smoothly move, we define 21 actions to swing up and balance the pendulum. Comparing Deep Q-Network (DQN), the DDQN with prioritized experience replay algorithm removes the overestimate of Q value and decreases the training time. Finally, this paper shows the experiment results with comparisons of classic control theory and different reinforcement learning algorithms.

Download Full-text

Finite-Volume Simulation of 3-D Vortical Flow-Fields About Road Vehicles with Various After-Body Configuration

10.4271/931896 ◽

1993 ◽

Cited By ~ 2

Author(s):

Kiyohira Aoki ◽

Ming Zhu ◽

Toshikazu Ohbayashi ◽

Hideaki Miyata

Keyword(s):

Finite Volume ◽

Flow Fields ◽

Vortical Flow ◽

Road Vehicles ◽

Finite Volume Simulation

Download Full-text

On the Importance of Assumptions for Bulk Flow in Hemodynamic Models of the Carotid Bifurcation

ASME 2011 Summer Bioengineering Conference, Parts A and B ◽

10.1115/sbc2011-53437 ◽

2011 ◽

Author(s):

Umberto Morbiducci ◽

Diana Massai ◽

Diego Gallo ◽

Raffaele Ponzini ◽

Marco A. Deriu ◽

...

Keyword(s):

Vessel Wall ◽

Carotid Bifurcation ◽

Flow Fields ◽

Transport Processes ◽

Bulk Flow ◽

Arterial System ◽

Vortical Flow ◽

Primary Role ◽

4D Flow ◽

The Impact

It is widely accepted that the local hemodynamics in the arterial system affects the atherogenic process. In particular the hemodynamic environment at the carotid artery bifurcation has been widely studied due to its predilection for atherosclerosis. Much effort has been spent in the past on image-based CFD carotid bifurcation models to assess the sensitivity to several assumptions of wall shear stress (WSS)-based parameters as indicators of abnormal flow. This luminal-surface-oriented approach was historically driven by histological observations on samples of the vessel wall. The consequence for this was that the reduction of the complexity of 4D flow fields focused mainly on WSS. However, few studies have provided adequate insights into the influence of these assumptions in order to confidently model the 4D hemodynamics within the bifurcation. Only recently the interest in the role played by the bulk flow in the development of the arterial disease has grown dramatically. This is the consequence of the emerging awareness that arterial hemodynamics, being an intricate process that involves interaction, reconnection and continuous re-organization of structures, could play a primary role in the regulation of mass transfer, and of its athero-protective/susceptible effect. Earlier works [1] pointed out the existence of a relationship between helical/vortical flow patterns and transport processes that could affect blood-vessel wall interaction, and might cause alterations in the residence time of atherogenic particles involved in the initiation of inflammatory response. Recently we introduced robust quantitative descriptors of bulk flow that can “reduce” the inherent complexity associated with 4D flow fields in arteries [1]. Here we present a study on the impact of assumptions on blood rheology and outflow boundary conditions (BCs) on bulk flow features within healthy carotid bifurcations, by using 4D flow descriptors. The final goal is to provide adequate insights not only to complement and to integrate, but also to extend with a quantitative characterization of the bulk flow the description currently adopted to classify altered hemodynamics.

Download Full-text

A sample efficient model-based deep reinforcement learning algorithm with experience replay for robot manipulation

International Journal of Intelligent Robotics and Applications ◽

10.1007/s41315-020-00135-2 ◽

2020 ◽

Vol 4 (2) ◽

pp. 217-228

Author(s):

Cheng Zhang ◽

Liang Ma ◽

Alexander Schmitz

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Robot Manipulation ◽

Model Based ◽

Experience Replay ◽

Reinforcement Learning Algorithm

Download Full-text

Towed Underwater SPIV Measurement of Flow Fields Around a Surface-Piercing Cylinder With Free Surface Effects

Volume 1A: Symposia, Part 2 ◽

10.1115/ajkfluids2015-20165 ◽

2015 ◽

Author(s):

Jeonghwa Seo ◽

Bumwoo Han ◽

Shin Hyung Rhee

Keyword(s):

Boundary Layer ◽

Free Surface ◽

Surface Wave ◽

Flow Field ◽

Cross Section ◽

Surface Effects ◽

Flow Fields ◽

Two Dimensional ◽

Two Dimensional Flow ◽

Wave Induced

Effects of free surface on development of turbulent boundary layer and wake fields were investigated. By measuring flow field around a surface piercing cylinder in various advance speed conditions in a towing tank, free surface effects were identified. A towed underwater Stereoscopic Particle Image Velocimetry (SPIV) system was used to measure the flow field under free surface. The cross section of the test model was water plane shape of the Wigley hull, of which longitudinal length and width were 1.0 m and 100 mm, respectively. With sharp bow shape and slender cross section, flow separation was not expected in two-dimensional flow. Flow fields near the free-surface and in deep location that two-dimensional flow field was expected were measured and compared to identify free-surface effects. Some planes perpendicular to longitudinal direction near the model surface and behind the model were selected to track development of turbulent boundary layer. Froude numbers of the test conditions were from 0.126 to 0.40 and corresponding Reynolds numbers were from 395,000 to 1,250,000. In the lowest Froude number condition, free-surface wave was hardly observed and only free surface effects without surface wave could be identified while violent free-surface behavior due to wave-induced separation dominated the flow fields in the highest Froude number condition. From the instantaneous velocity fields, Time-mean velocity, turbulence kinetic energy, and flow structure derived by proper orthogonal decomposition (POD) were analyzed. As the free-surface effect, development of retarded wake, free-surface waves, and wave-induced separation were mainly observed.

Download Full-text

Hydraulics of one- and two-dimensional flow fields in aquifers

Hydrological Sciences Journal ◽

10.1080/02626669709492002 ◽

1997 ◽

Vol 42 (1) ◽

pp. 1-14 ◽

Cited By ~ 1

Author(s):

Z. J. KABALA ◽

B. THORNE

Keyword(s):

Flow Fields ◽

Two Dimensional ◽

Dimensional Flow ◽

Two Dimensional Flow

Download Full-text

705 Numerical Analysis of Complex Vortical Flow Phenomena in Transonic Internal Flow Fields

The Proceedings of the Fluids engineering conference ◽

10.1299/jsmefed.2001.97 ◽

2001 ◽

Vol 2001 (0) ◽

pp. 97

Author(s):

Masato FURUKAWA ◽

Kazutoyo YAMADA ◽

Aritoshi IMAZATO ◽

Masahiro INOUE

Keyword(s):

Numerical Analysis ◽

Internal Flow ◽

Flow Fields ◽

Vortical Flow ◽

Flow Phenomena

Download Full-text

Numerical Computation of Vortical Flow Fields of Double-Delta Wings Moving in a Compressible Viscous Medium

ZAMM ‐ Journal of Applied Mathematics and Mechanics / Zeitschrift für Angewandte Mathematik und Mechanik ◽

10.1002/zamm.19940741012 ◽

1994 ◽

Vol 74 (10) ◽

pp. 475-486 ◽

Cited By ~ 2

Author(s):

A. Das ◽

J. M. A. Longo

Keyword(s):

Numerical Computation ◽

Flow Fields ◽

Viscous Medium ◽

Vortical Flow ◽

Delta Wings

Download Full-text

Neighbor Learning Control: Learning Control for Multiple Subsystems

Volume 9: Mechanical Systems and Control, Parts A, B, and C ◽

10.1115/imece2007-42163 ◽

2007 ◽

Author(s):

Manas C. Menon ◽

H. Harry Asada

Keyword(s):

Control Systems ◽

Iterative Learning Control ◽

Learning Algorithm ◽

Learning Control ◽

Control Algorithms ◽

Good Convergence ◽

Control Techniques ◽

Difficult Time ◽

Control Learning ◽

Build Systems

With the rise of smart material actuators, it has become possible to design and build systems with a large number of small actuators. Many of these actuators exhibit a host of nonlinearities including hysteresis. Learning control algorithms can be used to guarantee good convergence of these systems even in the presence of the nonlinearities. However, they have a difficult time dealing with certain classes of noise or disturbances. We present a neighbor learning algorithm to control systems of this type with multiple identical actuators. In addition, we present a neighbor learning algorithm to control these systems for a certain class of non-identical actuators. We prove that in certain situations these algorithms provide improved convergence when compared to traditional iterative learning control techniques. Simulations results are presented that corroborate our expectations from the proofs.

Download Full-text

A Novel Adaptive Sampling Strategy for Deep Reinforcement Learning

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026821500115 ◽

2021 ◽

Vol 20 (02) ◽

pp. 2150011

Author(s):

Xingxing Liang ◽

Li Chen ◽

Yanghe Feng ◽

Zhong Liu ◽

Yang Ma ◽

...

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Adaptive Sampling ◽

Learning Algorithm ◽

Sampling Strategy ◽

Sequential Decision ◽

Fixed Temperature ◽

Sample Distribution ◽

Intelligent Decision Making ◽

Experience Replay

Reinforcement learning, as an effective method to solve complex sequential decision-making problems, plays an important role in areas such as intelligent decision-making and behavioral cognition. It is well known that the sample experience replay mechanism contributes to the development of current deep reinforcement learning by reusing past samples to improve the efficiency of samples. However, the existing priority experience replay mechanism changes the sample distribution in the sample set due to the higher sampling frequency assigned to a specific transition, and it cannot be applied to actor-critic and other on-policy reinforcement learning algorithm. To address this, we propose an adaptive factor based on TD-error, which further increases sample utilization by giving more attention weight to samples of larger TD-error, and embeds it flexibly into the original Deep Q Network and Advantage Actor-Critic algorithm to improve their performance. Then we carried out the performance evaluation for the proposed architecture in the context of CartPole-V1 and 6 environments of Atari game experiments, respectively, and the obtained results either on the conditions of fixed temperature or annealing temperature, when compared to those produced by the vanilla DQN and original A2C, highlight the advantages in cumulative rewards and climb speed of the improved algorithms.

Download Full-text