temporal difference
Recently Published Documents


TOTAL DOCUMENTS

488
(FIVE YEARS 110)

H-INDEX

32
(FIVE YEARS 5)

2022 ◽  
Author(s):  
Zhen Zhang ◽  
Shiqing Zhang ◽  
Xiaoming Zhao ◽  
Linjian Chen ◽  
Jun Yao

Abstract The acceleration of industrialization and urbanization has recently brought about serious air pollution problems, which threaten human health and lives, the environmental safety, and sustainable social development. Air quality prediction is an effective approach for providing early warning of air pollution and supporting cleaner industrial production. However, existing approaches have suffered from a weak ability to capture long-term dependencies and complex relationships from time series PM2.5 data. To address this problem, this paper proposes a new deep learning model called temporal difference-based graph transformer networks (TDGTN) to learn long-term temporal dependencies and complex relationships from time series PM2.5 data for air quality PM2.5 prediction. The proposed TDGTN comprises of encoder and decoder layers associated with the developed graph attention mechanism. In particular, considering the similarity of different time moments and the importance of temporal difference between two adjacent moments for air quality prediction, we first construct graph-structured data from original time series PM2.5 data at different moments without explicit graph structure. Then, based on the constructed graph, we improve the self-attention mechanism with the temporal difference information, and develop a new graph attention mechanism. Finally, the developed graph attention mechanism is embedded into the encoder and decoder layers of the proposed TDGTN to learn long-term temporal dependencies and complex relationships from a graph prospective on air quality PM2.5 prediction tasks. To verify the effectiveness of the proposed method, we conduct air quality prediction experiments on two real-world datasets in China, such as Beijing PM2.5 dataset ranging from 01/01/2010 to 12/31/2014 and Taizhou PM2.5 dataset ranging from 01/01/2017 to 12/31/2019. Compared with other air quality forecasting methods, such as autoregressive moving average (ARMA), support vector regression (SVR), convolutional neural network (CNN), long short-term memory (LSTM), the original Transformer, our experiment results indicate that the proposed method achieves more accurate results on both short-term (1 hour) and long-term (6, 12, 24, 48 hours) air quality prediction tasks.


2021 ◽  
Vol 15 ◽  
Author(s):  
Arthur Prével ◽  
Ruth M. Krebs

In a new environment, humans and animals can detect and learn that cues predict meaningful outcomes, and use this information to adapt their responses. This process is termed Pavlovian conditioning. Pavlovian conditioning is also observed for stimuli that predict outcome-associated cues; a second type of conditioning is termed higher-order Pavlovian conditioning. In this review, we will focus on higher-order conditioning studies with simultaneous and backward conditioned stimuli. We will examine how the results from these experiments pose a challenge to models of Pavlovian conditioning like the Temporal Difference (TD) models, in which learning is mainly driven by reward prediction errors. Contrasting with this view, the results suggest that humans and animals can form complex representations of the (temporal) structure of the task, and use this information to guide behavior, which seems consistent with model-based reinforcement learning. Future investigations involving these procedures could result in important new insights on the mechanisms that underlie Pavlovian conditioning.


2021 ◽  
Vol 580 ◽  
pp. 311-330
Author(s):  
Jiaqing Cao ◽  
Quan Liu ◽  
Fei Zhu ◽  
Qiming Fu ◽  
Shan Zhong

Author(s):  
Han-Chun Huang ◽  
Tsung-Yu Lee ◽  
Cheng-Han Tsai ◽  
Yao-Sing Su ◽  
Yi-Rong Chen ◽  
...  

Circadian pattern influence on the incidence of out-of-hospital cardiac arrest (OHCA) has been demonstrated. However, the effect of temporal difference on the clinical outcomes of OHCA remains inconclusive. Therefore, we conducted a retrospective study in an urban city of Taiwan between January 2018 and December 2020 in order to investigate the relationship between temporal differences and the return of spontaneous circulation (ROSC), sustained (≥24 h) ROSC, and survival to discharge in patients with OHCA. Of the 842 patients with OHCA, 371 occurred in the daytime, 250 in the evening, and 221 at night. During nighttime, there was a decreased incidence of OHCA, but the outcomes of OHCA were significant poor compared to the incidents during the daytime and evening. After multivariate adjustment for influencing factors, OHCAs occurring at night were independently associated with lower probabilities of achieving sustained ROSC (aOR = 0.489, 95%CI: 0.285–0.840, p = 0.009) and survival to discharge (aOR = 0.147, 95%CI: 0.03–0.714, p = 0.017). Subgroup analyses revealed significant temporal differences in male patients, older adult patients, those with longer response times (≥5 min), and witnessed OHCA. The effects of temporal difference on the outcome of OHCA may be a result of physiological factors, underlying etiology of arrest, resuscitative efforts in prehospital and in-hospital stages, or a combination of factors.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Harry J. Stewardson ◽  
Thomas D. Sambrook

AbstractReinforcement learning in humans and other animals is driven by reward prediction errors: deviations between the amount of reward or punishment initially expected and that which is obtained. Temporal difference methods of reinforcement learning generate this reward prediction error at the earliest time at which a revision in reward or punishment likelihood is signalled, for example by a conditioned stimulus. Midbrain dopamine neurons, believed to compute reward prediction errors, generate this signal in response to both conditioned and unconditioned stimuli, as predicted by temporal difference learning. Electroencephalographic recordings of human participants have suggested that a component named the feedback-related negativity (FRN) is generated when this signal is carried to the cortex. If this is so, the FRN should be expected to respond equivalently to conditioned and unconditioned stimuli. However, very few studies have attempted to measure the FRN’s response to unconditioned stimuli. The present study attempted to elicit the FRN in response to a primary aversive stimulus (electric shock) using a design that varied reward prediction error while holding physical intensity constant. The FRN was strongly elicited, but earlier and more transiently than typically seen, suggesting that it may incorporate other processes than the midbrain dopamine system.


2021 ◽  
Author(s):  
Shreevanth Krishnaa Gopalakrishnan ◽  
Saba Al-Rubaye ◽  
Gokhan Inalhan

2021 ◽  
Author(s):  
Emily A. Williams ◽  
Ruth Ogden ◽  
Andrew James Stewart ◽  
Luke Anthony Jones

Trains of auditory clicks increase subsequent judgements of stimulus duration by approximately 10%. Scalar timing theory suggests this is due to a 10% increase in pacemaker rate, a main component of the internal clock. The effect has been demonstrated in many timing tasks, including verbal estimation, temporal generalisation, and temporal bisection. However, the effect of click trains has yet to be examined on temporal sensitivity, commonly measured by temporal difference thresholds. We sought to investigate this both experimentally; where we found no significant increase in temporal sensitivity, and computationally; by modelling the temporal difference threshold task according to scalar timing theory. Our experimental null result presented three possibilities which we investigated by simulating a 10% increase in pacemaker rate in a newly-created scalar timing theory model of thresholds. We found that a 10% increase in pacemaker rate led to a significant improvement in temporal sensitivity in only 8.66% of 10,000 simulations. When a 74% increase in pacemaker rate was modelled to simulate the filled-duration illusion, temporal sensitivity was significantly improved in 55.36% of simulations. Therefore, scalar timing theory does predict improved temporal sensitivity for a faster pacemaker, but the effect of click trains (a supposed 10% increase) appears to be too small to be reliably found in the temporal difference threshold task.


2021 ◽  
Vol 11 (18) ◽  
pp. 8368
Author(s):  
Saeed Harati ◽  
Liliana Perez ◽  
Roberto Molowny-Horas

One of the complexities of social systems is the emergence of behavior norms that are costly for individuals. Study of such complexities is of interest in diverse fields ranging from marketing to sustainability. In this study we built a conceptual Agent-Based Model to simulate interactions between a group of agents and a governing agent, where the governing agent encourages other agents to perform, in exchange for recognition, an action that is beneficial for the governing agent but costly for the individual agents. We equipped the governing agent with six Temporal Difference Reinforcement Learning algorithms to find sequences of decisions that successfully encourage the group of agents to perform the desired action. Our results show that if the individual agents’ perceived cost of the action is low, then the desired action can become a trend in the society without the use of learning algorithms by the governing agent. If the perceived cost to individual agents is high, then the desired output may become rare in the space of all possible outcomes but can be found by appropriate algorithms. We found that Double Learning algorithms perform better than other algorithms we used. Through comparison with a baseline, we showed that our algorithms made a substantial difference in the rewards that can be obtained in the simulations.


Sign in / Sign up

Export Citation Format

Share Document