Multiagent Learning-Based Approach to Transit Assignment Problem

Author(s):  
Mohammed Wahba ◽  
Amer Shalaby

This paper presents an operational prototype of an innovative framework for the transit assignment problem, structured in a multiagent way and inspired by a learning-based approach. The proposed framework is based on representing passengers and their learning and decision-making activities explicitly. The underlying hypothesis is that individual passengers are expected to adjust their behavior (i.e., trip choices) according to their experience with transit system performance. A hypothetical transit network, which consists of 22 routes and 194 stops, has been developed within a microsimulation platform (Paramics). A population of 3,000 passengers was generated and synthesized to model the transit assignment process in the morning peak period. Using reinforcement learning to represent passengers’ adaptation and accounting for differences in passengers’ preferences and the dynamics of the transit network, the prototype has demonstrated that the proposed approach can simultaneously predict how passengers will choose their routes and estimate the total passenger travel cost in a congested network as well as loads on different transit routes.

2016 ◽  
Vol 113 (24) ◽  
pp. 6797-6802 ◽  
Author(s):  
Samuel D. McDougle ◽  
Matthew J. Boggess ◽  
Matthew J. Crossley ◽  
Darius Parvin ◽  
Richard B. Ivry ◽  
...  

When a person fails to obtain an expected reward from an object in the environment, they face a credit assignment problem: Did the absence of reward reflect an extrinsic property of the environment or an intrinsic error in motor execution? To explore this problem, we modified a popular decision-making task used in studies of reinforcement learning, the two-armed bandit task. We compared a version in which choices were indicated by key presses, the standard response in such tasks, to a version in which the choices were indicated by reaching movements, which affords execution failures. In the key press condition, participants exhibited a strong risk aversion bias; strikingly, this bias reversed in the reaching condition. This result can be explained by a reinforcement model wherein movement errors influence decision-making, either by gating reward prediction errors or by modifying an implicit representation of motor competence. Two further experiments support the gating hypothesis. First, we used a condition in which we provided visual cues indicative of movement errors but informed the participants that trial outcomes were independent of their actual movements. The main result was replicated, indicating that the gating process is independent of participants’ explicit sense of control. Second, individuals with cerebellar degeneration failed to modulate their behavior between the key press and reach conditions, providing converging evidence of an implicit influence of movement error signals on reinforcement learning. These results provide a mechanistically tractable solution to the credit assignment problem.


2013 ◽  
Vol 368-370 ◽  
pp. 1876-1880 ◽  
Author(s):  
Ying Zeng ◽  
Jun Li ◽  
Hui Zhu

Few studies have adequately focused on passenger route choice behavior with congestion consideration, or provided useful guidance on passenger route choice and hence the transit assignment model, which is the writing motivation of this paper. With congestion consideration, travel cost is assessed and and ways to reduce it also identified. Finally, an actual transit network of Chengdu is used as a case study to demonstrate the benefits of the proposed model. The result indicates that the vehicle capacity is an important factor that cant be ignored and a better understanding of passenger route behavior could significantly benefit public transit system.


2013 ◽  
Vol 2013 ◽  
pp. 1-10 ◽  
Author(s):  
Zhengfeng Huang ◽  
Gang Ren ◽  
Haixu Liu

Various factors can make predicting bus passenger demand uncertain. In this study, a bilevel programming model for optimizing bus frequencies based on uncertain bus passenger demand is formulated. There are two terms constituting the upper-level objective. The first is transit network cost, consisting of the passengers’ expected travel time and operating costs, and the second is transit network robustness performance, indicated by the variance in passenger travel time. The second term reflects the risk aversion of decision maker, and it can make the most uncertain demand be met by the bus operation with the optimal transit frequency. With transit link’s proportional flow eigenvalues (mean and covariance) obtained from the lower-level model, the upper-level objective is formulated by the analytical method. In the lower-level model, the above two eigenvalues are calculated by analyzing the propagation of mean transit trips and their variation in the optimal strategy transit assignment process. The genetic algorithm (GA) used to solve the model is tested in an example network. Finally, the model is applied to determining optimal bus frequencies in the city of Liupanshui, China. The total cost of the transit system in Liupanshui can be reduced by about 6% via this method.


Author(s):  
Sylvan Hoover ◽  
J. David Porter ◽  
Claudio Fuentes

Transit agencies have experienced dramatic changes in service and ridership because of the COVID-19 pandemic. As communities transition to a new normal, strategic measures are needed to support continuing disease suppression efforts. This research provides actionable results to transit agencies in the form of improved transit routes. A multi-objective heuristic optimization framework employing the non-dominated sorting genetic algorithm II algorithm generates multiple route solutions that allow transit agencies to balance the utility of service to riders against the susceptibility of routes to enabling the spread of disease in a community. This research uses origin–destination data from a sample population to assess the utility of routes to potential riders, allows vehicle capacity constraints to be varied to support social distancing efforts, and evaluates the resulting transit encounter network produced from the simulated use of transit as a proxy for the susceptibility of a transit system to facilitating the transmission of disease among its riders. A case study of transit at Oregon State University is presented with multiple transit network solutions evaluated and the resulting encounter networks investigated. The improved transit network solution with the closest number of riders (1.2% more than baseline) provides a 10.7% reduction of encounter network edges.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Batel Yifrah ◽  
Ayelet Ramaty ◽  
Genela Morris ◽  
Avi Mendelsohn

AbstractDecision making can be shaped both by trial-and-error experiences and by memory of unique contextual information. Moreover, these types of information can be acquired either by means of active experience or by observing others behave in similar situations. The interactions between reinforcement learning parameters that inform decision updating and memory formation of declarative information in experienced and observational learning settings are, however, unknown. In the current study, participants took part in a probabilistic decision-making task involving situations that either yielded similar outcomes to those of an observed player or opposed them. By fitting alternative reinforcement learning models to each subject, we discerned participants who learned similarly from experience and observation from those who assigned different weights to learning signals from these two sources. Participants who assigned different weights to their own experience versus those of others displayed enhanced memory performance as well as subjective memory strength for episodes involving significant reward prospects. Conversely, memory performance of participants who did not prioritize their own experience over others did not seem to be influenced by reinforcement learning parameters. These findings demonstrate that interactions between implicit and explicit learning systems depend on the means by which individuals weigh relevant information conveyed via experience and observation.


Author(s):  
Ming-Sheng Ying ◽  
Yuan Feng ◽  
Sheng-Gang Ying

AbstractMarkov decision process (MDP) offers a general framework for modelling sequential decision making where outcomes are random. In particular, it serves as a mathematical framework for reinforcement learning. This paper introduces an extension of MDP, namely quantum MDP (qMDP), that can serve as a mathematical model of decision making about quantum systems. We develop dynamic programming algorithms for policy evaluation and finding optimal policies for qMDPs in the case of finite-horizon. The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.


2021 ◽  
Vol 35 (2) ◽  
Author(s):  
Nicolas Bougie ◽  
Ryutaro Ichise

AbstractDeep reinforcement learning methods have achieved significant successes in complex decision-making problems. In fact, they traditionally rely on well-designed extrinsic rewards, which limits their applicability to many real-world tasks where rewards are naturally sparse. While cloning behaviors provided by an expert is a promising approach to the exploration problem, learning from a fixed set of demonstrations may be impracticable due to lack of state coverage or distribution mismatch—when the learner’s goal deviates from the demonstrated behaviors. Besides, we are interested in learning how to reach a wide range of goals from the same set of demonstrations. In this work we propose a novel goal-conditioned method that leverages very small sets of goal-driven demonstrations to massively accelerate the learning process. Crucially, we introduce the concept of active goal-driven demonstrations to query the demonstrator only in hard-to-learn and uncertain regions of the state space. We further present a strategy for prioritizing sampling of goals where the disagreement between the expert and the policy is maximized. We evaluate our method on a variety of benchmark environments from the Mujoco domain. Experimental results show that our method outperforms prior imitation learning approaches in most of the tasks in terms of exploration efficiency and average scores.


Sign in / Sign up

Export Citation Format

Share Document