Intelligent train control for cooperative train formation: A deep reinforcement learning approach

Author(s):  
Danyang Zhang ◽  
Junhui Zhao ◽  
Yang Zhang ◽  
Qingmiao Zhang

Considering the intelligent train control problem in long-term evolution for metro system, a new train-to-train communication-based train control system is proposed, where the cooperative train formation technology is introduced for realizing a more flexible train operation mode. To break the limitation of centralized train control, a pre-exploration-based two-stage deep Q-learning algorithm is adopted in the cooperative train formation, which is one of the first intelligent approaches for urban railway formation control. In addition, a comfort-considered algorithm is given, where optimization measures are taken for providing superior passenger experience. The simulation results illustrate that the optimized algorithm has a smoother jerk curve during the train control process, and the passenger comfort can be improved. Furthermore, the proposed algorithm can effectively accomplish the train control task in the multi-train tracking scenarios, and meet the control requirements of the cooperative formation system.

Aerospace ◽  
2021 ◽  
Vol 8 (4) ◽  
pp. 113
Author(s):  
Pedro Andrade ◽  
Catarina Silva ◽  
Bernardete Ribeiro ◽  
Bruno F. Santos

This paper presents a Reinforcement Learning (RL) approach to optimize the long-term scheduling of maintenance for an aircraft fleet. The problem considers fleet status, maintenance capacity, and other maintenance constraints to schedule hangar checks for a specified time horizon. The checks are scheduled within an interval, and the goal is to, schedule them as close as possible to their due date. In doing so, the number of checks is reduced, and the fleet availability increases. A Deep Q-learning algorithm is used to optimize the scheduling policy. The model is validated in a real scenario using maintenance data from 45 aircraft. The maintenance plan that is generated with our approach is compared with a previous study, which presented a Dynamic Programming (DP) based approach and airline estimations for the same period. The results show a reduction in the number of checks scheduled, which indicates the potential of RL in solving this problem. The adaptability of RL is also tested by introducing small disturbances in the initial conditions. After training the model with these simulated scenarios, the results show the robustness of the RL approach and its ability to generate efficient maintenance plans in only a few seconds.


2000 ◽  
Vol 14 (2) ◽  
pp. 243-258 ◽  
Author(s):  
V. S. Borkar

A simulation-based algorithm for learning good policies for a discrete-time stochastic control process with unknown transition law is analyzed when the state and action spaces are compact subsets of Euclidean spaces. This extends the Q-learning scheme of discrete state/action problems along the lines of Baker [4]. Almost sure convergence is proved under suitable conditions.


2012 ◽  
Vol 3 (2) ◽  
pp. 39-57 ◽  
Author(s):  
Ioan Sorin Comsa ◽  
Mehmet Aydin ◽  
Sijing Zhang ◽  
Pierre Kuonen ◽  
Jean–Frédéric Wagen

The use of the intelligent packet scheduling process is absolutely necessary in order to make the radio resources usage more efficient in recent high-bit-rate demanding radio access technologies such as Long Term Evolution (LTE). Packet scheduling procedure works with various dispatching rules with different behaviors. In the literature, the scheduling disciplines are applied for the entire transmission sessions and the scheduler performance strongly depends on the exploited discipline. The method proposed in this paper aims to discuss how a straightforward schedule can be provided within the transmission time interval (TTI) sub-frame using a mixture of dispatching disciplines per TTI instead of a single rule adopted across the whole transmission. This is to maximize the system throughput while assuring the best user fairness. This requires adopting a policy of how to mix the rules and a refinement procedure to call the best rule each time. Two scheduling policies are proposed for how to mix the rules including use of Q learning algorithm for refining the policies. Simulation results indicate that the proposed methods outperform the existing scheduling techniques by maximizing the system throughput without harming the user fairness performance.


Author(s):  
Chun-Yang Zhang ◽  
Dewang Chen ◽  
Jiateng Yin ◽  
Long Chen

Most existing automatic train operation (ATO) models are based on different train control algorithms and aim to closely track the target velocity curve optimized offline. This kind of model easily leads to some problems, such as frequent changes of the control outputs, inflexibility of real-time adjustments, reduced riding comfort and increased energy consumption. A new data-driven train operation (DTO) model is proposed in this paper to conduct the train control by employing expert knowledge learned from experienced drivers, online optimization approach based on gradient descent, and a heuristic parking method. Rather than directly to model the target velocity curve, the DTO model alternatively uses the online and offline operation data to infer the basic control output according to the domain expert knowledge. The online adjustment is performed over the basic output to achieve stability. The proposed train operation model is evaluated in a simulation platform using the field data collected in YiZhuang Line of Beijing Subway. Compared with the curve tracking approaches, the proposed DTO model achieves significant improvements in energy consumption and riding comfort. Furthermore, the DTO model has more advantages including the flexibility of the timetable adjustments and the less operation mode conversions, that are beneficial to the service life of train operation systems. The DTO model also shows velocity trajectories and operation mode conversions similar to the one of experienced drivers, while achieving less energy consumption and smaller parking error. The robustness of the proposed algorithm is verified through numerical simulations with different system parameters, complicated velocity restrictions, diverse running times and steep gradients.


Author(s):  
Peter T. Katsumata

Rail is used as a form of transportation by millions of people each day. Many of these rail transit systems utilize automatic operation. Automatic operation of rail transit vehicles is provided by an Automatic Train Control (ATC) system, which is typically partitioned into three subsystems: Automatic Train Protection (ATP), Automatic Train Operation (ATO), and Automatic Train Supervision (ATS). This paper discusses the results of a post-incident safety analysis performed on an ATP system. A Fault Tree Analysis (FTA) was performed on a vehicle ATP subsystem following several incidents involving a compromise in system safety. The results of the FTA showed that the vehicle ATP subsystem did not meet the “fail safe” design criteria. This paper uses the results of the FTA to identify possible safety improvements.


2009 ◽  
Vol 28 (12) ◽  
pp. 3268-3270
Author(s):  
Chao WANG ◽  
Jing GUO ◽  
Zhen-qiang BAO

Sign in / Sign up

Export Citation Format

Share Document