scholarly journals Learning efficient navigation in vortical flow fields

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Peter Gunnarson ◽  
Ioannis Mandralis ◽  
Guido Novati ◽  
Petros Koumoutsakos ◽  
John O. Dabiri

AbstractEfficient point-to-point navigation in the presence of a background flow field is important for robotic applications such as ocean surveying. In such applications, robots may only have knowledge of their immediate surroundings or be faced with time-varying currents, which limits the use of optimal control techniques. Here, we apply a recently introduced Reinforcement Learning algorithm to discover time-efficient navigation policies to steer a fixed-speed swimmer through unsteady two-dimensional flow fields. The algorithm entails inputting environmental cues into a deep neural network that determines the swimmer’s actions, and deploying Remember and Forget Experience Replay. We find that the resulting swimmers successfully exploit the background flow to reach the target, but that this success depends on the sensed environmental cue. Surprisingly, a velocity sensing approach significantly outperformed a bio-mimetic vorticity sensing approach, and achieved a near 100% success rate in reaching the target locations while approaching the time-efficiency of optimal navigation trajectories.

2021 ◽  
Vol 54 (3-4) ◽  
pp. 417-428
Author(s):  
Yanyan Dai ◽  
KiDong Lee ◽  
SukGyu Lee

For real applications, rotary inverted pendulum systems have been known as the basic model in nonlinear control systems. If researchers have no deep understanding of control, it is difficult to control a rotary inverted pendulum platform using classic control engineering models, as shown in section 2.1. Therefore, without classic control theory, this paper controls the platform by training and testing reinforcement learning algorithm. Many recent achievements in reinforcement learning (RL) have become possible, but there is a lack of research to quickly test high-frequency RL algorithms using real hardware environment. In this paper, we propose a real-time Hardware-in-the-loop (HIL) control system to train and test the deep reinforcement learning algorithm from simulation to real hardware implementation. The Double Deep Q-Network (DDQN) with prioritized experience replay reinforcement learning algorithm, without a deep understanding of classical control engineering, is used to implement the agent. For the real experiment, to swing up the rotary inverted pendulum and make the pendulum smoothly move, we define 21 actions to swing up and balance the pendulum. Comparing Deep Q-Network (DQN), the DDQN with prioritized experience replay algorithm removes the overestimate of Q value and decreases the training time. Finally, this paper shows the experiment results with comparisons of classic control theory and different reinforcement learning algorithms.


Author(s):  
Umberto Morbiducci ◽  
Diana Massai ◽  
Diego Gallo ◽  
Raffaele Ponzini ◽  
Marco A. Deriu ◽  
...  

It is widely accepted that the local hemodynamics in the arterial system affects the atherogenic process. In particular the hemodynamic environment at the carotid artery bifurcation has been widely studied due to its predilection for atherosclerosis. Much effort has been spent in the past on image-based CFD carotid bifurcation models to assess the sensitivity to several assumptions of wall shear stress (WSS)-based parameters as indicators of abnormal flow. This luminal-surface-oriented approach was historically driven by histological observations on samples of the vessel wall. The consequence for this was that the reduction of the complexity of 4D flow fields focused mainly on WSS. However, few studies have provided adequate insights into the influence of these assumptions in order to confidently model the 4D hemodynamics within the bifurcation. Only recently the interest in the role played by the bulk flow in the development of the arterial disease has grown dramatically. This is the consequence of the emerging awareness that arterial hemodynamics, being an intricate process that involves interaction, reconnection and continuous re-organization of structures, could play a primary role in the regulation of mass transfer, and of its athero-protective/susceptible effect. Earlier works [1] pointed out the existence of a relationship between helical/vortical flow patterns and transport processes that could affect blood-vessel wall interaction, and might cause alterations in the residence time of atherogenic particles involved in the initiation of inflammatory response. Recently we introduced robust quantitative descriptors of bulk flow that can “reduce” the inherent complexity associated with 4D flow fields in arteries [1]. Here we present a study on the impact of assumptions on blood rheology and outflow boundary conditions (BCs) on bulk flow features within healthy carotid bifurcations, by using 4D flow descriptors. The final goal is to provide adequate insights not only to complement and to integrate, but also to extend with a quantitative characterization of the bulk flow the description currently adopted to classify altered hemodynamics.


Author(s):  
Jeonghwa Seo ◽  
Bumwoo Han ◽  
Shin Hyung Rhee

Effects of free surface on development of turbulent boundary layer and wake fields were investigated. By measuring flow field around a surface piercing cylinder in various advance speed conditions in a towing tank, free surface effects were identified. A towed underwater Stereoscopic Particle Image Velocimetry (SPIV) system was used to measure the flow field under free surface. The cross section of the test model was water plane shape of the Wigley hull, of which longitudinal length and width were 1.0 m and 100 mm, respectively. With sharp bow shape and slender cross section, flow separation was not expected in two-dimensional flow. Flow fields near the free-surface and in deep location that two-dimensional flow field was expected were measured and compared to identify free-surface effects. Some planes perpendicular to longitudinal direction near the model surface and behind the model were selected to track development of turbulent boundary layer. Froude numbers of the test conditions were from 0.126 to 0.40 and corresponding Reynolds numbers were from 395,000 to 1,250,000. In the lowest Froude number condition, free-surface wave was hardly observed and only free surface effects without surface wave could be identified while violent free-surface behavior due to wave-induced separation dominated the flow fields in the highest Froude number condition. From the instantaneous velocity fields, Time-mean velocity, turbulence kinetic energy, and flow structure derived by proper orthogonal decomposition (POD) were analyzed. As the free-surface effect, development of retarded wake, free-surface waves, and wave-induced separation were mainly observed.


2001 ◽  
Vol 2001 (0) ◽  
pp. 97
Author(s):  
Masato FURUKAWA ◽  
Kazutoyo YAMADA ◽  
Aritoshi IMAZATO ◽  
Masahiro INOUE

Author(s):  
Manas C. Menon ◽  
H. Harry Asada

With the rise of smart material actuators, it has become possible to design and build systems with a large number of small actuators. Many of these actuators exhibit a host of nonlinearities including hysteresis. Learning control algorithms can be used to guarantee good convergence of these systems even in the presence of the nonlinearities. However, they have a difficult time dealing with certain classes of noise or disturbances. We present a neighbor learning algorithm to control systems of this type with multiple identical actuators. In addition, we present a neighbor learning algorithm to control these systems for a certain class of non-identical actuators. We prove that in certain situations these algorithms provide improved convergence when compared to traditional iterative learning control techniques. Simulations results are presented that corroborate our expectations from the proofs.


Author(s):  
Xingxing Liang ◽  
Li Chen ◽  
Yanghe Feng ◽  
Zhong Liu ◽  
Yang Ma ◽  
...  

Reinforcement learning, as an effective method to solve complex sequential decision-making problems, plays an important role in areas such as intelligent decision-making and behavioral cognition. It is well known that the sample experience replay mechanism contributes to the development of current deep reinforcement learning by reusing past samples to improve the efficiency of samples. However, the existing priority experience replay mechanism changes the sample distribution in the sample set due to the higher sampling frequency assigned to a specific transition, and it cannot be applied to actor-critic and other on-policy reinforcement learning algorithm. To address this, we propose an adaptive factor based on TD-error, which further increases sample utilization by giving more attention weight to samples of larger TD-error, and embeds it flexibly into the original Deep Q Network and Advantage Actor-Critic algorithm to improve their performance. Then we carried out the performance evaluation for the proposed architecture in the context of CartPole-V1 and 6 environments of Atari game experiments, respectively, and the obtained results either on the conditions of fixed temperature or annealing temperature, when compared to those produced by the vanilla DQN and original A2C, highlight the advantages in cumulative rewards and climb speed of the improved algorithms.


Sign in / Sign up

Export Citation Format

Share Document