scholarly journals Carrier-borne aircrafts aviation operation automated scheduling using multiplicative weights apprenticeship learning

2019 ◽  
Vol 16 (1) ◽  
pp. 172988141982891 ◽  
Author(s):  
Mao Zheng ◽  
Fangqing Yang ◽  
Zaopeng Dong ◽  
Shuo Xie ◽  
Xiumin Chu

Efficiency and safety are vital for aviation operations in order to improve the combat capacity of aircraft carrier. In this article, the theory of apprenticeship learning, as a kind of artificial intelligence technology, is applied to constructing the method of automated scheduling. First, with the use of Markov decision process frame, the simulative model of aircrafts launching and recovery was established. Second, the multiplicative weights apprenticeship learning algorithm was applied to creating the optimized scheduling policy. In the situation with an expert to learn from, the learned policy matches quite well with the expert’s demonstration and the total deviations can be limited within 3%. Finally, in the situation without expert’s demonstration, the policy generated by multiplicative weights apprenticeship learning algorithm shows an obvious superiority compared to the three human experts. The results of different operation situations show that the method is highly robust and well functional.

2017 ◽  
Vol 7 (1.5) ◽  
pp. 274
Author(s):  
D. Ganesha ◽  
Vijayakumar Maragal Venkatamuni

This research work presents analysis of Modified Sarsa learning algorithm. Modified Sarsa algorithm.  State-Action-Reward-State-Action (SARSA) is an technique for learning a Markov decision process (MDP) strategy, used in for reinforcement learning int the field of artificial intelligence (AI) and machine learning (ML). The Modified SARSA Algorithm makes better actions to get better rewards.  Experiment are conducted to evaluate the performace for each agent individually. For result comparison among different agent, the same statistics were collected. This work considered varied kind of agents in different level of architecture for experiment analysis. The Fungus world testbed has been considered for experiment which is has been implemented using SwI-Prolog 5.4.6. The fixed obstructs tend to be more versatile, to make a location that is specific to Fungus world testbed environment. The various parameters are introduced in an environment to test a agent’s performance. This modified   SARSA learning algorithm can   be more suitable in EMCAP architecture.  The experiments are conducted the modified   SARSA Learning system gets   more rewards compare to existing  SARSA algorithm.


Author(s):  
Abdelghafour Harraz ◽  
Mostapha Zbakh

Artificial Intelligence allows to create engines that are able to explore, learn environments and therefore create policies that permit to control them in real time with no human intervention. It can be applied, through its Reinforcement Learning techniques component, using frameworks such as temporal differences, State-Action-Reward-State-Action (SARSA), Q Learning to name a few, to systems that are be perceived as a Markov Decision Process, this opens door in front of applying Reinforcement Learning to Cloud Load Balancing to be able to dispatch load dynamically to a given Cloud System. The authors will describe different techniques that can used to implement a Reinforcement Learning based engine in a cloud system.


2010 ◽  
Vol 44-47 ◽  
pp. 3611-3615 ◽  
Author(s):  
Zhi Cong Zhang ◽  
Kai Shun Hu ◽  
Hui Yu Huang ◽  
Shuai Li ◽  
Shao Yong Zhao

Reinforcement learning (RL) is a state or action value based machine learning method which approximately solves large-scale Markov Decision Process (MDP) or Semi-Markov Decision Process (SMDP). A multi-step RL algorithm called Sarsa(,k) is proposed, which is a compromised variation of Sarsa and Sarsa(). It is equivalent to Sarsa if k is 1 and is equivalent to Sarsa() if k is infinite. Sarsa(,k) adjust its performance by setting k value. Two forms of Sarsa(,k), forward view Sarsa(,k) and backward view Sarsa(,k), are constructed and proved equivalent in off-line updating.


2020 ◽  
Vol 34 (04) ◽  
pp. 6720-6728
Author(s):  
Tom Zahavy ◽  
Alon Cohen ◽  
Haim Kaplan ◽  
Yishay Mansour

We consider the applications of the Frank-Wolfe (FW) algorithm for Apprenticeship Learning (AL). In this setting, we are given a Markov Decision Process (MDP) without an explicit reward function. Instead, we observe an expert that acts according to some policy, and the goal is to find a policy whose feature expectations are closest to those of the expert policy. We formulate this problem as finding the projection of the feature expectations of the expert on the feature expectations polytope – the convex hull of the feature expectations of all the deterministic policies in the MDP. We show that this formulation is equivalent to the AL objective and that solving this problem using the FW algorithm is equivalent well-known Projection method of Abbeel and Ng (2004). This insight allows us to analyze AL with tools from convex optimization literature and derive tighter convergence bounds on AL. Specifically, we show that a variation of the FW method that is based on taking “away steps” achieves a linear rate of convergence when applied to AL and that a stochastic version of the FW algorithm can be used to avoid precise estimation of feature expectations. We also experimentally show that this version outperforms the FW baseline. To the best of our knowledge, this is the first work that shows linear convergence rates for AL.


Author(s):  
Md Mahmudul Hasan ◽  
Md Shahinur Rahman ◽  
Adrian Bell

Deep reinforcement learning (DRL) has transformed the field of artificial intelligence (AI) especially after the success of Google DeepMind. This branch of machine learning epitomizes a step toward building autonomous systems by understanding of the visual world. Deep reinforcement learning (RL) is currently applied to different sorts of problems that were previously obstinate. In this chapter, at first, the authors started with an introduction of the general field of RL and Markov decision process (MDP). Then, they clarified the common DRL framework and the necessary components RL settings. Moreover, they analyzed the stochastic gradient descent (SGD)-based optimizers such as ADAM and a non-specific multi-policy selection mechanism in a multi-objective Markov decision process. In this chapter, the authors also included the comparison for different Deep Q networks. In conclusion, they describe several challenges and trends in research within the deep reinforcement learning field.


Author(s):  
Anna Nikolajeva ◽  
Artis Teilans

The research is dedicated to artificial intelligence technology usage in digital marketing personalization. The doctoral theses will aim to create a machine learning algorithm that will increase sales by personalized marketing in electronic commerce website. Machine learning algorithms can be used to find the unobservable probability density function in density estimation problems. Learning algorithms learn on their own based on previous experience and generate their sequences of learning experiences, to acquire new skills through self-guided exploration and social interaction with humans. An entirely personalized advertising experience can be a reality in the nearby future using learning algorithms with training data and new behaviour patterns appearance using unsupervised learning algorithms. Artificial intelligence technology will create website specific adverts in all sales funnels individually.


Author(s):  
Md Mahmudul Hasan ◽  
Md Shahinur Rahman ◽  
Adrian Bell

Deep reinforcement learning (DRL) has transformed the field of artificial intelligence (AI) especially after the success of Google DeepMind. This branch of machine learning epitomizes a step toward building autonomous systems by understanding of the visual world. Deep reinforcement learning (RL) is currently applied to different sorts of problems that were previously obstinate. In this chapter, at first, the authors started with an introduction of the general field of RL and Markov decision process (MDP). Then, they clarified the common DRL framework and the necessary components RL settings. Moreover, they analyzed the stochastic gradient descent (SGD)-based optimizers such as ADAM and a non-specific multi-policy selection mechanism in a multi-objective Markov decision process. In this chapter, the authors also included the comparison for different Deep Q networks. In conclusion, they describe several challenges and trends in research within the deep reinforcement learning field.


Sign in / Sign up

Export Citation Format

Share Document