Non-Markovian Reinforcement-Based on Self-Optimizing Memory Controller

Author(s):  
Hassab Elgawi Osman

This paper contributes on designing robotic self-optimizing memory controller for non-Markovian reinforcement tasks. Rather than holistic search for the whole memory contents the model adopts associated feature analysis to successively memorize a newly event state-action pair as an action of past experience. Actor-Critic learning is used to adaptively tuning the control parameters, while on-line variant of random forests (RF) learner is used as memory-capable to approximate the policy of Actor and the value function of Critic. Learning capability of the proposed model is experimentally examined through non-markovian cart-pole balancing task. The result shows that our self-optimizing memory controller acquired complex behaviors such as balancing two poles simultaneously, displays long-term planning and generalization capacity based on past experiences.

2013 ◽  
Vol 2013 ◽  
pp. 1-20 ◽  
Author(s):  
F. Gideon ◽  
Mark A. Petersen ◽  
Janine Mukuddem-Petersen ◽  
LNP Hlatshwayo

We validate the new Basel liquidity standards as encapsulated by the net stable funding ratio in a quantitative manner. In this regard, we consider the dynamics of inverse net stable funding ratio as a measure to quantify the bank’s prospects for a stable funding over a period of a year. In essence, this justifies how Basel III liquidity standards can be effectively implemented in mitigating liquidity problems. We also discuss various classes of available stable funding and required stable funding. Furthermore, we discuss an optimal control problem for a continuous-time inverse net stable funding ratio. In particular, we make optimal choices for the inverse net stable funding targets in order to formulate its cost. This is normally done by obtaining analytic solution of the value function. Finally, we provide a numerical example for the dynamics of the inverse net stable funding ratio to identify trends in which banks behavior convey forward looking information on long-term market liquidity developments.


2020 ◽  
Vol 12 (21) ◽  
pp. 8883
Author(s):  
Kun Jin ◽  
Wei Wang ◽  
Xuedong Hua ◽  
Wei Zhou

As the key element of urban transportation, taxis services significantly provide convenience and comfort for residents’ travel. However, the reality has not shown much efficiency. Previous researchers mainly aimed to optimize policies by order dispatch on ride-hailing services, which cannot be applied in cruising taxis services. This paper developed the reinforcement learning (RL) framework to optimize driving policies on cruising taxis services. Firstly, we formulated the drivers’ behaviours as the Markov decision process (MDP) progress, considering the influences after taking action in the long run. The RL framework using dynamic programming and data expansion was employed to calculate the state-action value function. Following the value function, drivers can determine the best choice and then quantify the expected future reward at a particular state. By utilizing historic orders data in Chengdu, we analysed the function value’s spatial distribution and demonstrated how the model could optimize the driving policies. Finally, the realistic simulation of the on-demand platform was built. Compared with other benchmark methods, the results verified that the new model performs better in increasing total revenue, answer rate and decreasing waiting time, with the relative percentages of 4.8%, 6.2% and −27.27% at most.


1993 ◽  
Vol 7 (3) ◽  
pp. 369-385 ◽  
Author(s):  
Kyle Siegrist

We consider N sites (N ≤ ∞), each of which may be either occupied or unoccupied. Time is discrete, and at each time unit a set of occupied sites may attempt to capture a previously unoccupied site. The attempt will be successful with a probability that depends on the number of sites making the attempt, in which case the new site will also be occupied. A benefit is gained when new sites are occupied, but capture attempts are costly. The problem of optimal occupation is formulated as a Markov decision process in which the admissible actions are occupation strategies and the cost is a function of the strategy and the number of occupied sites. A partial order on the state-action pairs is used to obtain a comparison result for stationary policies and qualitative results concerning monotonicity of the value function for the n-stage problem (n ≤ ∞). The optimal policies are partially characterized when the cost depends on the action only through the total number of occupation attempts made.


2011 ◽  
Author(s):  
Anouk Festjens ◽  
Siegfried Dewitte ◽  
Enrico Diecidue ◽  
Sabrina Bruyneel

2010 ◽  
Vol 38 (3) ◽  
pp. 228-244 ◽  
Author(s):  
Nenggen Ding ◽  
Saied Taheri

Abstract Easy-to-use tire models for vehicle dynamics have been persistently studied for such applications as control design and model-based on-line estimation. This paper proposes a modified combined-slip tire model based on Dugoff tire. The proposed model takes emphasis on less time consumption for calculation and uses a minimum set of parameters to express tire forces. Modification of Dugoff tire model is made on two aspects: one is taking different tire/road friction coefficients for different magnitudes of slip and the other is employing the concept of friction ellipse. The proposed model is evaluated by comparison with the LuGre tire model. Although there are some discrepancies between the two models, the proposed combined-slip model is generally acceptable due to its simplicity and easiness to use. Extracting parameters from the coefficients of a Magic Formula tire model based on measured tire data, the proposed model is further evaluated by conducting a double lane change maneuver, and simulation results show that the trajectory using the proposed tire model is closer to that using the Magic Formula tire model than Dugoff tire model.


2019 ◽  
Vol 10 (1) ◽  
pp. 21-28
Author(s):  
Aniela Bălăcescu ◽  
Radu Șerban Zaharia

Abstract Tourist services represent a category of services in which the inseparability of production and consumption, the inability to be storable, the immateriality, and last but not least non-durability, induces in tourism management a number of peculiarities and difficulties. Under these circumstances the development of medium-term strategies involves long-term studies regarding on the one hand the developments and characteristics of the demand, and on the other hand the tourist potential analysis at regional and local level. Although in the past 20 years there has been tremendous growth of on-line booking made by household users, the tour operators agencies as well as those with sales activity continue to offer the specific services for a large number of tourists, that number, in the case of domestic tourism, increased by 1.6 times in case of the tour operators and by 4.44 times in case of the agencies with sales activity. At the same time, there have been changes in the preferences of tourists regarding their holiday destinations in Romania. Started on these considerations, paper based on a logistic model, examines the evolution of the probabilities and scores corresponding to the way the Romanian tourists spend their holidays on the types of tourism agencies, actions and tourist areas in Romania.


Psibernetika ◽  
2018 ◽  
Vol 11 (1) ◽  
Author(s):  
Devina Calista ◽  
Garvin Garvin

<p><em>Child abuse by parents is common in households. The impact of violence on children will bring short-term effects and long-term effects that can be attributed to their various emotional, behavioral and social problems in the future; especially in late adolescence that will enter adulthood. Resilience factors increase the likelihood that adolescents who are victims of childhood violence recover from their past experiences</em><em>,</em><em> become more powerful individuals and have a better life. The purpose of this study was to determine the source of resilience in late adolescents who experienced violence from parents in their childhood. This research uses qualitative research methods with in-depth interviews as a method of data collection. The result shows that the three research participants have the aspects of "I Have", "I Am", and "I Can"; a participant has "I Can" aspects as a source of resilience, and one other subject has no source of resilience. The study concluded that parental affection and acceptance of the past experience have role to the three sources of resilience (I Have, I Am, and I Can)</em></p><p><em> </em></p><p><strong><em>Keyword : </em></strong><em>Resilience, adolescence, violence, parents</em></p>


Author(s):  
Rodrigo Cueva ◽  
Guillem Rufian ◽  
Maria Gabriela Valdes

The use of Customer Relationship Managers to foster customers loyalty has become one of the most common business strategies in the past years.  However, CRM solutions do not fill the abundance of happily ever-after relationships that business needs, and each client’s perception is different in the buying process.  Therefore, the experience must be precise, in order to extend the loyalty period of a customer as much as possible. One of the economic sectors in which CRM’s have improved this experience is retailing, where the personalized attention to the customer is a key factor.  However, brick and mortar experiences are not enough to be aware in how environmental changes could affect the industry trends in the long term.  A base unified theoretical framework must be taken into consideration, in order to develop an adaptable model for constructing or implementing CRMs into companies. Thanks to this approximation, the information is complemented, and the outcome will increment the quality in any Marketing/Sales initiative. The goal of this article is to explore the different factors grouped by three main domains within the impact of service quality, from a consumer’s perspective, in both on-line and off-line retailing sector.  Secondly, we plan to go a step further and extract base guidelines about previous analysis for designing CRM’s solutions focused on the loyalty of the customers for a specific retailing sector and its product: Sports Running Shoes.


Author(s):  
Rajat Khurana ◽  
Alok Kumar Singh Kushwaha

Background & Objective: Identification of human actions from video has gathered much attention in past few years. Most of the computer vision tasks such as Health Care Activity Detection, Suspicious Activity detection, Human Computer Interactions etc. are based on the principle of activity detection. Automatic labelling of activity from videos frames is known as activity detection. Motivation of this work is to use most out of the data generated from sensors and use them for recognition of classes. Recognition of actions from videos sequences is a growing field with the upcoming trends of deep neural networks. Automatic learning capability of Convolutional Neural Network (CNN) make them good choice as compared to traditional handcrafted based approaches. With the increasing demand of RGB-D sensors combination of RGB and depth data is in great demand. This work comprises of the use of dynamic images generated from RGB combined with depth map for action recognition purpose. We have experimented our approach on pre trained VGG-F model using MSR Daily activity dataset and UTD MHAD Dataset. We achieve state of the art results. To support our research, we have calculated different parameters apart from accuracy such as precision, F score, recall. Conclusion: Accordingly, the investigation confirms improvement in term of accuracy, precision, F-Score and Recall. The proposed model is 4 Stream model is prone to occlusion, used in real time and also the data from the RGB-D sensor is fully utilized.


Sign in / Sign up

Export Citation Format

Share Document