Integrating human experience in deep reinforcement learning for multi-UAV collision detection and avoidance

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Guanzheng Wang ◽  
Yinbo Xu ◽  
Zhihong Liu ◽  
Xin Xu ◽  
Xiangke Wang ◽  
...  

Purpose This paper aims to realize a fully distributed multi-UAV collision detection and avoidance based on deep reinforcement learning (DRL). To deal with the problem of low sample efficiency in DRL and speed up the training. To improve the applicability and reliability of the DRL-based approach in multi-UAV control problems. Design/methodology/approach In this paper, a fully distributed collision detection and avoidance approach for multi-UAV based on DRL is proposed. A method that integrates human experience into policy training via a human experience-based adviser is proposed. The authors propose a hybrid control method which combines the learning-based policy with traditional model-based control. Extensive experiments including simulations, real flights and comparative experiments are conducted to evaluate the performance of the approach. Findings A fully distributed multi-UAV collision detection and avoidance method based on DRL is realized. The reward curve shows that the training process when integrating human experience is significantly accelerated and the mean episode reward is higher than the pure DRL method. The experimental results show that the DRL method with human experience integration has a significant improvement than the pure DRL method for multi-UAV collision detection and avoidance. Moreover, the safer flight brought by the hybrid control method has also been validated. Originality/value The fully distributed architecture is suitable for large-scale unmanned aerial vehicle (UAV) swarms and real applications. The DRL method with human experience integration has significantly accelerated the training compared to the pure DRL method. The proposed hybrid control strategy makes up for the shortcomings of two-dimensional light detection and ranging and other puzzles in applications.

2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Jafar Tavoosi

PurposeIn this paper, an innovative hybrid intelligent position control method for vertical take-off and landing (VTOL) tiltrotor unmanned aerial vehicle (UAV) is proposed. So the more accurate the reference position signals tracking, the proposed control system will be better.Design/methodology/approachIn the proposed method, for the vertical flight mode, first the model reference adaptive controller (MRAC) operates and for the horizontal flight, the model predictive control (MPC) will operate. Since the linear model is used for both of these controllers and naturally has an error compared to the real nonlinear model, a neural network is used to compensate for them. So the main novelties of this paper are a new hybrid control design (MRAC & MPC) and a neural network-based compensator for tiltrotor UAV.FindingsThe proper performance of the proposed control method in the simulation results is clear. Also the results showed that the role of compensator is very important and necessary, especially in extreme speed wind conditions and uncertain parameters.Originality/valueNovel hybrid control method. 10;-New method to use neural network as compensator in an UAV.


2018 ◽  
Vol 8 (1) ◽  
pp. 18
Author(s):  
Kees Bourgonje ◽  
Hubert J. Veringa ◽  
David M.J. Smeulders ◽  
Jeroen A. van Oijen

To speed up the torrefaction process in traditional torrefaction reactors, in particular auger reactors, the temperature of the reactor is substantially higher than the required torrefaction process temperature. This is due to the low heat conductivity of biomass. Unfortunately, the off-gas characteristics of biomass are very sensitive in the temperature window of 180-300°C which can cause a thermal runaway situation in which the process temperature exceeds the intended level. Due to this very sensitive temperature dependence of biomass pyrolysis and its accompanying gas production, a potential solution is to inject small amounts of air directly into the torrefaction reactor. It is found experimentally that this air injection can regulate the temperature of the biomass very rapidly compared to traditional temperature regulation by changing the reactor wall temperature. With this new torrefaction temperature control method, thermal runaway situations can be avoided and the temperature of the biomass in the reactor can be regulated better. Experiments with large beech wood samples show that the torrefaction reaction rate and the temperature in the core of the sample depend on the amount of injected air. Since the flow of combustible gasses (torr-gas) originating from the torrefaction process is very sensitive to temperature, the heat production by combusting the torr-gas can be controlled to some extent. This will result in both a more homogeneous torrefied product as well as a more stable processing of varying biomass types in large-scale torrefaction systems.


2017 ◽  
Vol 89 (2) ◽  
pp. 193-202 ◽  
Author(s):  
Halit Firat Erdogan ◽  
Ayhan Kural ◽  
Can Ozsoy

Purpose The purpose of this paper is to design a controller for the unmanned aerial vehicle (UAV). Design/methodology/approach In this study, the constrained multivariable multiple-input and multiple-output (MIMO) model predictive controller (MPC) has been designed to control all outputs by manipulating inputs. The aim of the autopilot of UAV is to keep the UAV around trim condition and to track airspeed commands. Findings The purpose of using this control method is to decrease the control effort under the certain constraints and deal with interactions between each output and input while tracking airspeed commands. Originality/value By using constraint, multivariable (four inputs and seven outputs) MPC unlike the relevant literature in this field, the UAV tracked airspeed commands with minimum control effort dealing with interactions between each input and output under disturbances such as wind.


Author(s):  
Xiangyu Liu ◽  
Ping Zhang ◽  
Guanglong Du

Purpose – The purpose of this paper is to provide a hybrid adaptive impedance-leader-follower control algorithm for multi-arm coordination manipulators, which is significant for dealing with the problems of kinematics inconsistency and error accumulation of interactive force in multi-arm system. Design/methodology/approach – This paper utilized a motion mapping theory in Cartesian space to establish a centralized dynamic leader-follower control algorithm which helped to reduce the possibility of kinematics inconsistency for multiple manipulators. A virtual linear spring model (VLSM) was presented based on a recognition approach of characteristic marker. This paper accomplished an adaptive impedance control algorithm based on the VLSM, which took into account the non-rigid contact characteristic. Experimentally demonstrated results showed the proposed algorithm guarantees that the motion and interactive forces asymptotically converge to the prescribed values. Findings – The hybrid control method improves the accuracy and reliability of multi-arm coordination system, which presents a new control framework for multiple manipulators. Practical implications – This algorithm has significant commercial applications, as a means of controlling multi-arm coordination manipulators that could serve to handle large objects and assemble complicated objects in industrial and hazardous environment. Originality/value – This work presented a new control framework for multiple coordination manipulators, which can ensure consistent kinematics and reduce the influence of error accumulation, and thus can improve the accuracy and reliability of multi-arm coordination system.


2020 ◽  
Vol 6 (48) ◽  
pp. eabd6716
Author(s):  
Jianli Zhang ◽  
Junyan Yang ◽  
Yuanxing Zhang ◽  
Michael A. Bevan

We report a feedback control method to remove grain boundaries and produce circular shaped colloidal crystals using morphing energy landscapes and reinforcement learning–based policies. We demonstrate this approach in optical microscopy and computer simulation experiments for colloidal particles in ac electric fields. First, we discover how tunable energy landscape shapes and orientations enhance grain boundary motion and crystal morphology relaxation. Next, reinforcement learning is used to develop an optimized control policy to actuate morphing energy landscapes to produce defect-free crystals orders of magnitude faster than natural relaxation times. Morphing energy landscapes mechanistically enable rapid crystal repair via anisotropic stresses to control defect and shape relaxation without melting. This method is scalable for up to at least N = 103 particles with mean process times scaling as N0.5. Further scalability is possible by controlling parallel local energy landscapes (e.g., periodic landscapes) to generate large-scale global defect-free hierarchical structures.


2020 ◽  
Vol 12 (22) ◽  
pp. 9333
Author(s):  
Sangwook Han

This paper proposes a reinforcement learning-based approach that optimises bus and line control methods to solve the problem of short circuit currents in power systems. Expansion of power grids leads to concentrated power output and more lines for large-scale transmission, thereby increasing short circuit currents. The short circuit currents must be managed systematically by controlling the buses and lines such as separating, merging, and moving a bus, line, or transformer. However, there are countless possible control schemes in an actual grid. Moreover, to ensure compliance with power system reliability standards, no bus should exceed breaker capacity nor should lines or transformers be overloaded. For this reason, examining and selecting a plan requires extensive time and effort. To solve these problems, this paper introduces reinforcement learning to optimise control methods. By providing appropriate rewards for each control action, a policy is set, and the optimal control method is obtained through a maximising value method. In addition, a technique is presented that systematically defines the bus and line separation measures, limits the range of measures to those with actual power grid applicability, and reduces the optimisation time while increasing the convergence probability and enabling use in actual power grid operation. In the future, this technique will contribute significantly to establishing power grid operation plans based on short circuit currents.


2020 ◽  
Vol 31 (4) ◽  
pp. 615-635 ◽  
Author(s):  
Raymond P. Fisk ◽  
Linda Alkire (née Nasr) ◽  
Laurel Anderson ◽  
David E. Bowen ◽  
Thorsten Gruber ◽  
...  

PurposeElevating the human experience (HX) through research collaborations is the purpose of this article. ServCollab facilitates and supports service research collaborations that seek to reduce human suffering and improve human well-being.Design/methodology/approachTo catalyze this initiative, the authors introduce ServCollab's three human rights goals (serve, enable and transform), standards of justice for serving humanity (distributive, procedural and interactional justice) and research approaches for serving humanity (service design and community action research).Research implicationsServCollab seeks to advance the service research field via large-scale service research projects that pursue theory building, research and action. Service inclusion is the first focus of ServCollab and is illustrated through two projects (transformative refugee services and virtual assistants in social care). This paper seeks to encourage collaboration in more large-scale service research projects that elevate the HX.Practical implicationsServCollab seeks to raise the aspirations of service researchers, expand the skills of service research teams and build mutually collaborative service research approaches that transform human lives.Originality/valueServCollab is a unique organization within the burgeoning service research community. By collaborating with service researchers, with service research centers, with universities, with nonprofit agencies and with foundations, ServCollab will build research capacity to address large-scale human service system problems. ServCollab takes a broad perspective for serving humanity by focusing on the HX. Current business research focuses on the interactive roles of customer experience and employee experience. From the perspective of HX, such role labels are insufficient concepts for the full spectrum of human life.


Aerospace ◽  
2021 ◽  
Vol 8 (4) ◽  
pp. 99
Author(s):  
Yixin Huang ◽  
Xiaojia Xiang ◽  
Han Zhou ◽  
Dengqing Tang ◽  
Yihao Sun

In order to solve the problem of how to efficiently control a large-scale swarm Unmanned Aerial Vehicle (UAV) system, which performs complex tasks with limited manpower in a non-ideal environment, this paper proposes a parallel UAV swarm control method. The key technology of parallel control is to establish a one-to-one artificial UAV system corresponding to the aerial swarm UAV on the ground. This paper focuses on the computational experiments algorithm for artificial UAV system establishment, including data processing, model identification, model verification and state prediction. Furthermore, this paper performs a comprehensive flight mission with four common modes (climbing, level flighting, turning and descending) for verification. The results of the identification experiment present a good consistency between the outputs of the refined dynamics model and the real flight data. The prediction experiment results show that the prediction method in this paper can basically guarantee that the prediction states error is kept within 10% about 16 s.


Author(s):  
Pengcheng Wang ◽  
Dengfeng Zhang ◽  
Baochun Lu

Purpose This paper aims to address the collision problem between robot and the external environment (including human) in an unstructured situation. A new collision detection and torque optimization control method is proposed. Design/methodology/approach Firstly, when the collision appears, a second-order Taylor observer is proposed to estimate the residual value. Secondly, the band-pass filter is used to reduce the high-frequency torque modeling dynamic uncertainty. With the estimate information and the torque value, a variable impedance control approach is then synthesized to guarantee that the collision is avoided or the collision will be terminated with different contact models and positions. However, in terms of adaptive linear force error, the variation of the thickness of the boundary layer is controlled by the new proximity function. Findings Finally, the experimental results show the better performance of the proposed control method, realizing the force control during the collision process. Originality/value Origin approach and origin experiment.


Sign in / Sign up

Export Citation Format

Share Document