scholarly journals CALCULATION OF MANIPULATOR EXPERIMENTAL MODEL FOR STUDYING THE METHODS OF LEARNING (WITH REINFORCEMENT)

2021 ◽  
Vol 11 (1) ◽  
pp. 147-154
Author(s):  
Dmitriy Stupnikov ◽  
Andrey Tolstyh ◽  
Sergey Malyukov ◽  
Aleksey Aksenov ◽  
Sergey Novikov

Reinforcement learning is a type of machine learning algorithm. These algorithms interact with the model of the environment in which the robotic system is supposed to be used, and make it possible to obtain relatively simple approximations of effective sets of system actions to achieve the set goal. The use of reinforcement learning will allow training the model on server hardware, and in the final system use already trained neural networks, the complexity of calculating the response of which directly depends on their topology. In the presented work, a statistical calculation of a prototype of a robotic manipulator for bench research of reinforcement learning systems has been carried out. The choice of design features and materials has been substantiated; the main units and design features have been considered. The studies were carried out in the SolidWorks Simulation software. A prototype of a robotic manipulator with a sufficiently high safety margin was obtained. It is concluded that the main stress concentrator is the junction of the eyelet and the platform, however, the maximum stress value was 38.804 kgf/sm2, which is insignificant. In this case, the maximum resulting movement will be concentrated in the upper part of the eyelet, and will shift depending on the position of the manipulator arm. The maximum recorded displacement is 0.073 mm, which is negligible

2014 ◽  
Vol 587-589 ◽  
pp. 2137-2140
Author(s):  
Xin Li ◽  
Feng Chen

Traffic emission is one of the main pollution sources of urban atmospheric environment. Traffic control scheme of intersection has important influence on vehicle emission. Research on low emission traffic signal control scheme has become one of focuses of Intelligent Transportation. Current typical control methods of traffic emission are based on optimizing the average delay and number of stops. However, it is extremely difficult to use mathematical formula to calculate the delay and the number of stops in the presence of initial queue length of intersection. In order to solve this problem, we proposed a traffic emission control algorithm based on reinforcement learning. The simulation experiments were carried out by using the microscopic traffic simulation software. Compared with the Hideki emission control scheme, the experimental results show that the reinforcement learning algorithm is more effective. The average vehicle emissions are reduced by 12.2% for high saturation of the intersection.


2020 ◽  
Vol 2020 ◽  
pp. 1-17
Author(s):  
Zhuang Wang ◽  
Hui Li ◽  
Haolin Wu ◽  
Zhaoxin Wu

In a one-on-one air combat game, the opponent’s maneuver strategy is usually not deterministic, which leads us to consider a variety of opponent’s strategies when designing our maneuver strategy. In this paper, an alternate freeze game framework based on deep reinforcement learning is proposed to generate the maneuver strategy in an air combat pursuit. The maneuver strategy agents for aircraft guidance of both sides are designed in a flight level with fixed velocity and the one-on-one air combat scenario. Middleware which connects the agents and air combat simulation software is developed to provide a reinforcement learning environment for agent training. A reward shaping approach is used, by which the training speed is increased, and the performance of the generated trajectory is improved. Agents are trained by alternate freeze games with a deep reinforcement algorithm to deal with nonstationarity. A league system is adopted to avoid the red queen effect in the game where both sides implement adaptive strategies. Simulation results show that the proposed approach can be applied to maneuver guidance in air combat, and typical angle fight tactics can be learnt by the deep reinforcement learning agents. For the training of an opponent with the adaptive strategy, the winning rate can reach more than 50%, and the losing rate can be reduced to less than 15%. In a competition with all opponents, the winning rate of the strategic agent selected by the league system is more than 44%, and the probability of not losing is about 75%.


Symmetry ◽  
2021 ◽  
Vol 13 (3) ◽  
pp. 471
Author(s):  
Jai Hoon Park ◽  
Kang Hoon Lee

Designing novel robots that can cope with a specific task is a challenging problem because of the enormous design space that involves both morphological structures and control mechanisms. To this end, we present a computational method for automating the design of modular robots. Our method employs a genetic algorithm to evolve robotic structures as an outer optimization, and it applies a reinforcement learning algorithm to each candidate structure to train its behavior and evaluate its potential learning ability as an inner optimization. The size of the design space is reduced significantly by evolving only the robotic structure and by performing behavioral optimization using a separate training algorithm compared to that when both the structure and behavior are evolved simultaneously. Mutual dependence between evolution and learning is achieved by regarding the mean cumulative rewards of a candidate structure in the reinforcement learning as its fitness in the genetic algorithm. Therefore, our method searches for prospective robotic structures that can potentially lead to near-optimal behaviors if trained sufficiently. We demonstrate the usefulness of our method through several effective design results that were automatically generated in the process of experimenting with actual modular robotics kit.


2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Peter Morales ◽  
Rajmonda Sulo Caceres ◽  
Tina Eliassi-Rad

AbstractComplex networks are often either too large for full exploration, partially accessible, or partially observed. Downstream learning tasks on these incomplete networks can produce low quality results. In addition, reducing the incompleteness of the network can be costly and nontrivial. As a result, network discovery algorithms optimized for specific downstream learning tasks given resource collection constraints are of great interest. In this paper, we formulate the task-specific network discovery problem as a sequential decision-making problem. Our downstream task is selective harvesting, the optimal collection of vertices with a particular attribute. We propose a framework, called network actor critic (NAC), which learns a policy and notion of future reward in an offline setting via a deep reinforcement learning algorithm. The NAC paradigm utilizes a task-specific network embedding to reduce the state space complexity. A detailed comparative analysis of popular network embeddings is presented with respect to their role in supporting offline planning. Furthermore, a quantitative study is presented on various synthetic and real benchmarks using NAC and several baselines. We show that offline models of reward and network discovery policies lead to significantly improved performance when compared to competitive online discovery algorithms. Finally, we outline learning regimes where planning is critical in addressing sparse and changing reward signals.


Sign in / Sign up

Export Citation Format

Share Document