Implementation of English “Online and Offline” Hybrid Teaching Recommendation Platform Based on Reinforcement Learning

At present, there is a serious disconnect between online teaching and offline teaching in English MOOC large-scale hybrid teaching recommendation platform, which is mainly due to the problems of cold start and matrix sparsity in the recommendation algorithm, and it is difficult to fully tap the user's interest characteristics because it only considers the user's rating but neglects the user's personalized evaluation. In order to solve the above problems, this paper proposes to use reinforcement learning thought and user evaluation factors to realize the online and offline hybrid English teaching recommendation platform. First, the idea of value function estimation in reinforcement learning is introduced, and the difference between user state value functions is used to replace the previous similarity calculation method, thus alleviating the matrix sparsity problem. The learning rate is used to control the convergence speed of the weight vector in the user state value function to alleviate the cold start problem. Second, by adding the learning of the user evaluation vector to the value function estimation of the state value function, the state value function of the user can be estimated approximately and the discrimination degree of the target user can be reflected. Experimental results show that the proposed recommendation algorithm can effectively alleviate the cold start and matrix sparsity problems existing in the current collaborative filtering recommendation algorithm and can dig deep into the characteristics of users' interests and further improve the accuracy of scoring prediction.

Download Full-text

Incremental State Aggregation for Value Function Estimation in Reinforcement Learning

IEEE Transactions on Systems Man and Cybernetics Part B (Cybernetics) ◽

10.1109/tsmcb.2011.2148710 ◽

2011 ◽

Vol 41 (5) ◽

pp. 1407-1416 ◽

Cited By ~ 10

Author(s):

T. Mori ◽

S. Ishii

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

Function Estimation ◽

State Aggregation

Download Full-text

Gamma-Nets: Generalizing Value Estimation over Timescale

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6027 ◽

2020 ◽

Vol 34 (04) ◽

pp. 5717-5725

Author(s):

Craig Sherstan ◽

Shibhansh Dohare ◽

James MacGlashan ◽

Johannes Günther ◽

Patrick M. Pilarski

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

A Priori ◽

Predictive Ability ◽

Representation Learning ◽

Robot Arm ◽

Function Estimation ◽

Temporal Abstraction ◽

Long Time ◽

Function Approximator

Temporal abstraction is a key requirement for agents making decisions over long time horizons—a fundamental challenge in reinforcement learning. There are many reasons why value estimates at multiple timescales might be useful; recent work has shown that value estimates at different time scales can be the basis for creating more advanced discounting functions and for driving representation learning. Further, predictions at many different timescales serve to broaden an agent's model of its environment. One predictive approach of interest within an online learning setting is general value function (GVFs), which represent models of an agent's world as a collection of predictive questions each defined by a policy, a signal to be predicted, and a prediction timescale. In this paper we present Γ-nets, a method for generalizing value function estimation over timescale, allowing a given GVF to be trained and queried for arbitrary timescales so as to greatly increase the predictive ability and scalability of a GVF-based model. The key to our approach is to use timescale as one of the value estimator's inputs. As a result, the prediction target for any timescale is available at every timestep and we are free to train on any number of timescales. We first provide two demonstrations by 1) predicting a square wave and 2) predicting sensorimotor signals on a robot arm using a linear function approximator. Next, we empirically evaluate Γ-nets in the deep reinforcement learning setting using policy evaluation on a set of Atari video games. Our results show that Γ-nets can be effective for predicting arbitrary timescales, with only a small cost in accuracy as compared to learning estimators for fixed timescales. Γ-nets provide a method for accurately and compactly making predictions at many timescales without requiring a priori knowledge of the task, making it a valuable contribution to ongoing work on model-based planning, representation learning, and lifelong learning algorithms.

Download Full-text

Reinforcement Learning with Dynamic Boltzmann Softmax Updates

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/276 ◽

2020 ◽

Author(s):

Ling Pan ◽

Qingpeng Cai ◽

Qi Meng ◽

Wei Chen ◽

Longbo Huang

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

Convergence Property ◽

Important Task ◽

Experimental Results ◽

Value Iteration ◽

Function Estimation ◽

Good Convergence ◽

Direct Use ◽

The Value Function

Value function estimation is an important task in reinforcement learning, i.e., prediction. The Boltzmann softmax operator is a natural value estimator and can provide several benefits. However, it does not satisfy the non-expansion property, and its direct use may fail to converge even in value iteration. In this paper, we propose to update the value function with dynamic Boltzmann softmax (DBS) operator, which has good convergence property in the setting of planning and learning. Experimental results on GridWorld show that the DBS operator enables better estimation of the value function, which rectifies the convergence issue of the softmax operator. Finally, we propose the DBS-DQN algorithm by applying the DBS operator, which outperforms DQN substantially in 40 out of 49 Atari games.

Download Full-text

A solution for the Elevators Group Dispatch by Multiagent Reinforcement Learning

10.5753/eniac.2019.9322 ◽

2019 ◽

Author(s):

Jordão Memória ◽

José Maia

Keyword(s):

Reinforcement Learning ◽

Function Approximation ◽

Value Function ◽

The State ◽

Evaluation Function ◽

State Action ◽

Traffic Pattern ◽

Multiagent Reinforcement Learning ◽

Multi Agent ◽

Action Value

In this work, a modeling and algorithm based on multiagent reinforcement learning is developed for the problem of elevator group dispatch. The main advantage is that, along with the function approximation, this multi-agent solution leads to reduction of the state space, allowing complex states to be addressed with a synthesizing evaluation function. Each elevator is considered an agent that have to decide about two actions: answer or ignore the new call. With some iterations, the agents learn the weights of an evaluation function which approximate the state-action value function. The performance of solution (average waiting time - AWT), shown varying the traffic pattern, flow of people, number of elevators and number of floors, is comparable to other current proposals reported in the literature.

Download Full-text

Boosting Offline Reinforcement Learning with Residual Generative Modeling

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/492 ◽

2021 ◽

Author(s):

Hua Wei ◽

Deheng Ye ◽

Zhao Liu ◽

Hao Wu ◽

Bo Yuan ◽

...

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

Approximation Error ◽

The State ◽

Training Data ◽

Action Function ◽

Q Learning ◽

State Action ◽

Generative Modeling ◽

Benchmark Datasets

Offline reinforcement learning (RL) tries to learn the near-optimal policy with recorded offline experience without online exploration.Current offline RL research includes: 1) generative modeling, i.e., approximating a policy using fixed data; and 2) learning the state-action value function. While most research focuses on the state-action function part through reducing the bootstrapping error in value function approximation induced by the distribution shift of training data, the effects of error propagation in generative modeling have been neglected. In this paper, we analyze the error in generative modeling. We propose AQL (action-conditioned Q-learning), a residual generative model to reduce policy approximation error for offline RL. We show that our method can learn more accurate policy approximations in different benchmark datasets. In addition, we show that the proposed offline RL method can learn more competitive AI agents in complex control tasks under the multiplayer online battle arena (MOBA) game, Honor of Kings.

Download Full-text

A novel movies recommendation algorithm based on reinforcement learning with DDPG policy

International Journal of Intelligent Computing and Cybernetics ◽

10.1108/ijicc-09-2019-0103 ◽

2020 ◽

Vol 13 (1) ◽

pp. 67-79

Author(s):

Qiaoling Zhou

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Cold Start ◽

Sparse Data ◽

Local Optimum ◽

Recommendation Algorithm ◽

Data Set ◽

Content Type ◽

Recommendation Algorithms ◽

The Impact

PurposeEnglish original movies played an important role in English learning and communication. In order to find the required movies for us from a large number of English original movies and reviews, this paper proposed an improved deep reinforcement learning algorithm for the recommendation of movies. In fact, although the conventional movies recommendation algorithms have solved the problem of information overload, they still have their limitations in the case of cold start-up and sparse data.Design/methodology/approachTo solve the aforementioned problems of conventional movies recommendation algorithms, this paper proposed a recommendation algorithm based on the theory of deep reinforcement learning, which uses the deep deterministic policy gradient (DDPG) algorithm to solve the cold starting and sparse data problems and uses Item2vec to transform discrete action space into a continuous one. Meanwhile, a reward function combining with cosine distance and Euclidean distance is proposed to ensure that the neural network does not converge to local optimum prematurely.FindingsIn order to verify the feasibility and validity of the proposed algorithm, the state of the art and the proposed algorithm are compared in indexes of RMSE, recall rate and accuracy based on the MovieLens English original movie data set for the experiments. Experimental results have shown that the proposed algorithm is superior to the conventional algorithm in various indicators.Originality/valueApplying the proposed algorithm to recommend English original movies, DDPG policy produces better recommendation results and alleviates the impact of cold start and sparse data.

Download Full-text

Energy Management of Hybrid UAV Based on Reinforcement Learning

Electronics ◽

10.3390/electronics10161929 ◽

2021 ◽

Vol 10 (16) ◽

pp. 1929

Author(s):

Huan Shen ◽

Yao Zhang ◽

Jianguo Mao ◽

Zhiwei Yan ◽

Linwei Wu

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Energy Management ◽

Internal Combustion Engines ◽

Value Function ◽

Learning Algorithm ◽

The State ◽

Combustion Engines ◽

State Action ◽

Action Value

In order to solve the flight time problem of Unmanned Aerial Vehicles (UAV), this paper proposes a set of energy management strategies based on reinforcement learning for hybrid agricultural UAV. The battery is used to optimize the working point of internal combustion engines to the greatest extent while solving the high power demand issues of UAV and the response problem of internal combustion engines. Firstly, the decision-making oriented hybrid model and UAV dynamic model are established. Owing to the characteristics of the energy management strategy (EMS) based on reinforcement learning (RL), which is an intelligent optimization algorithm that has emerged in recent years, the complex theoretical formula derivation is avoided in the modeling process. In terms of the EMS, a double Q learning algorithm with strong convergence is adopted. The algorithm separates the state action value function database used in derivation decisions and the state action value function-updated database brought by the decision, so as to avoid delay and shock within the convergence process caused by maximum deviation. After the improvement, the off-line training is carried out with a large number of flight data generated in the past. The simulation results demonstrate that the improved algorithm can show better performance with less learning cost than before by virtue of the search function strategy proposed in this paper. In the state space, time-based and residual fuel-based selection are carried out successively, and the convergence rate and application effect are compared and analyzed. The results show that the learning algorithm has stronger robustness and convergence speed due to the appropriate selection of state space under different types of operating cycles. After 120,000 cycles of training, the fuel economy of the improved algorithm in this paper can reach more than 90% of that of the optimal solution, and can perform stably in actual flight.

Download Full-text

Location- and Person-Independent Activity Recognition with WiFi, Deep Neural Networks, and Reinforcement Learning

ACM Transactions on Internet of Things ◽

10.1145/3424739 ◽

2021 ◽

Vol 2 (1) ◽

pp. 1-25

Author(s):

Yongsen Ma ◽

Sheheryar Arshad ◽

Swetha Muniraju ◽

Eric Torkildson ◽

Enrico Rantala ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Reinforcement Learning ◽

Activity Recognition ◽

Deep Neural Networks ◽

State Machine ◽

Recognition Algorithm ◽

The State ◽

Neural Architecture ◽

Learning Agent

In recent years, Channel State Information (CSI) measured by WiFi is widely used for human activity recognition. In this article, we propose a deep learning design for location- and person-independent activity recognition with WiFi. The proposed design consists of three Deep Neural Networks (DNNs): a 2D Convolutional Neural Network (CNN) as the recognition algorithm, a 1D CNN as the state machine, and a reinforcement learning agent for neural architecture search. The recognition algorithm learns location- and person-independent features from different perspectives of CSI data. The state machine learns temporal dependency information from history classification results. The reinforcement learning agent optimizes the neural architecture of the recognition algorithm using a Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM). The proposed design is evaluated in a lab environment with different WiFi device locations, antenna orientations, sitting/standing/walking locations/orientations, and multiple persons. The proposed design has 97% average accuracy when testing devices and persons are not seen during training. The proposed design is also evaluated by two public datasets with accuracy of 80% and 83%. The proposed design needs very little human efforts for ground truth labeling, feature engineering, signal processing, and tuning of learning parameters and hyperparameters.

Download Full-text