scholarly journals Efficient Reinforcement Learning from Demonstration via Bayesian Network-Based Knowledge Extraction

2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Yichuan Zhang ◽  
Yixing Lan ◽  
Qiang Fang ◽  
Xin Xu ◽  
Junxiang Li ◽  
...  

Reinforcement learning from demonstration (RLfD) is considered to be a promising approach to improve reinforcement learning (RL) by leveraging expert demonstrations as the additional decision-making guidance. However, most existing RLfD methods only regard demonstrations as low-level knowledge instances under a certain task. Demonstrations are generally used to either provide additional rewards or pretrain the neural network-based RL policy in a supervised manner, usually resulting in poor generalization capability and weak robustness performance. Considering that human knowledge is not only interpretable but also suitable for generalization, we propose to exploit the potential of demonstrations by extracting knowledge from them via Bayesian networks and develop a novel RLfD method called Reinforcement Learning from demonstration via Bayesian Network-based Knowledge (RLBNK). The proposed RLBNK method takes advantage of node influence with the Wasserstein distance metric (NIW) algorithm to obtain abstract concepts from demonstrations and then a Bayesian network conducts knowledge learning and inference based on the abstract data set, which will yield the coarse policy with corresponding confidence. Once the coarse policy’s confidence is low, another RL-based refine module will further optimize and fine-tune the policy to form a (near) optimal hybrid policy. Experimental results show that the proposed RLBNK method improves the learning efficiency of corresponding baseline RL algorithms under both normal and sparse reward settings. Furthermore, we demonstrate that our RLBNK method delivers better generalization capability and robustness than baseline methods.

Author(s):  
Alexsander Voevoda ◽  
◽  
Dmitry Romannikov ◽  

The application of neural networks for the synthesis of control systems is considered. Examples of synthesis of control systems using methods of reinforcement learning, in which the state vector is involved, are given. And the synthesis of a neural controller for objects with an inaccessible state vector is discussed: 1) a variant using a neural network with recurrent feedbacks; 2) a variant using the input error vector, where each error (except for the first one) enters the input of the neural network passing through the delay element. The disadvantages of the first method include the fact that for such a structure of a neural network it is not possible to apply existing learning methods with confirmation and for training it is required to use a data set obtained, for example, from a previously calculated linear controller. The structure of the neural network used in the second option allows the application of reinforcement learning methods, but the article provides a statement and its proof that for the synthesis of a control system for objects with three or more integrators, a neural network without recurrent connections cannot be used. The application of the above structures is given on examples of the synthesis of control systems for objects 1/s2 and 1/s3 presented in a discrete form.


2020 ◽  
pp. 003329412097815
Author(s):  
Giovanni Briganti ◽  
Donald R. Williams ◽  
Joris Mulder ◽  
Paul Linkowski

The aim of this work is to explore the construct of autistic traits through the lens of network analysis with recently introduced Bayesian methods. A conditional dependence network structure was estimated from a data set composed of 649 university students that completed an autistic traits questionnaire. The connectedness of the network is also explored, as well as sex differences among female and male subjects in regard to network connectivity. The strongest connections in the network are found between items that measure similar autistic traits. Traits related to social skills are the most interconnected items in the network. Sex differences are found between female and male subjects. The Bayesian network analysis offers new insight on the connectivity of autistic traits as well as confirms several findings in the autism literature.


Author(s):  
Komsan Wongkalasin ◽  
Teerapon Upachaban ◽  
Wacharawish Daosawang ◽  
Nattadon Pannucharoenwong ◽  
Phadungsak Ratanadecho

This research aims to enhance the watermelon’s quality selection process, which was traditionally conducted by knocking the watermelon fruit and sort out by the sound’s character. The proposed method in this research is generating the sound spectrum through the watermelon and then analyzes the response signal’s frequency and the amplitude by Fast Fourier Transform (FFT). Then the obtained data were used to train and verify the neural network processor. The result shows that, the frequencies of 129 and 172 Hz were suit to be used in the comparison. Thirty watermelons, which were randomly selected from the orchard, were used to create a data set, and then were cut to manually check and match to the fruits’ quality. The 129 Hz frequency gave the response ranging from 13.57 and above in 3 groups of watermelons quality, including, not fully ripened, fully ripened, and close to rotten watermelons. When the 172 Hz gave the response between 11.11–12.72 in not fully ripened watermelons and those of 13.00 or more in the group of close to rotten and hollow watermelons. The response was then used as a training condition for the artificial neural network processor of the sorting machine prototype. The verification results provided a reasonable prediction of the ripeness level of watermelon and can be used as a pilot prototype to improve the efficiency of the tools to obtain a modern-watermelon quality selection tool, which could enhance the competitiveness of the local farmers on the product quality control.


2005 ◽  
Vol 488-489 ◽  
pp. 793-796 ◽  
Author(s):  
Hai Ding Liu ◽  
Ai Tao Tang ◽  
Fu Sheng Pan ◽  
Ru Lin Zuo ◽  
Ling Yun Wang

A model was developed for the analysis and prediction of correlation between composition and mechanical properties of Mg-Al-Zn (AZ) magnesium alloys by applying artificial neural network (ANN). The input parameters of the neural network (NN) are alloy composition. The outputs of the NN model are important mechanical properties, including ultimate tensile strength, tensile yield strength and elongation. The model is based on multilayer feedforward neural network. The NN was trained with comprehensive data set collected from domestic and foreign literature. A very good performance of the neural network was achieved. The model can be used for the simulation and prediction of mechanical properties of AZ system magnesium alloys as functions of composition.


Author(s):  
Tu Renwei ◽  
Zhu Zhongjie ◽  
Bai Yongqiang ◽  
Gao Ming ◽  
Ge Zhifeng

Unmanned Aerial Vehicle (UAV) inspection has become one of main methods for current transmission line inspection, but there are still some shortcomings such as slow detection speed, low efficiency, and inability for low light environment. To address these issues, this paper proposes a deep learning detection model based on You Only Look Once (YOLO) v3. On the one hand, the neural network structure is simplified, that is the three feature maps of YOLO v3 are pruned into two to meet specific detection requirements. Meanwhile, the K-means++ clustering method is used to calculate the anchor value of the data set to improve the detection accuracy. On the other hand, 1000 sets of power tower and insulator data sets are collected, which are inverted and scaled to expand the data set, and are fully optimized by adding different illumination and viewing angles. The experimental results show that this model using improved YOLO v3 can effectively improve the detection accuracy by 6.0%, flops by 8.4%, and the detection speed by about 6.0%.


Author(s):  
Guiliang Liu ◽  
Oliver Schulte

A variety of machine learning models have been proposed to assess the performance of players in professional sports. However, they have only a limited ability to model how player performance depends on the game context. This paper proposes a new approach to capturing game context: we apply Deep Reinforcement Learning (DRL) to learn an action-value Q function from 3M play-by-play events in the National Hockey League (NHL). The neural network representation integrates both continuous context signals and game history, using a possession-based LSTM. The learned Q-function is used to value players' actions under different game contexts. To assess a player's overall performance, we introduce a novel Game Impact Metric (GIM) that aggregates the values of the player's actions. Empirical Evaluation shows GIM is consistent throughout a play season, and correlates highly with standard success measures and future salary.


2019 ◽  
Vol 2 (1) ◽  
Author(s):  
Jeffrey Micher

We present a method for building a morphological generator from the output of an existing analyzer for Inuktitut, in the absence of a two-way finite state transducer which would normally provide this functionality. We make use of a sequence to sequence neural network which “translates” underlying Inuktitut morpheme sequences into surface character sequences. The neural network uses only the previous and the following morphemes as context. We report a morpheme accuracy of approximately 86%. We are able to increase this accuracy slightly by passing deep morphemes directly to output for unknown morphemes. We do not see significant improvement when increasing training data set size, and postulate possible causes for this.


2019 ◽  
Vol 2019 (02) ◽  
pp. 89-98
Author(s):  
Vijayakumar T

Predicting the category of tumors and the types of the cancer in its early stage remains as a very essential process to identify depth of the disease and treatment available for it. The neural network that functions similar to the human nervous system is widely utilized in the tumor investigation and the cancer prediction. The paper presents the analysis of the performance of the neural networks such as the, FNN (Feed Forward Neural Networks), RNN (Recurrent Neural Networks) and the CNN (Convolutional Neural Network) investigating the tumors and predicting the cancer. The results obtained by evaluating the neural networks on the breast cancer Wisconsin original data set shows that the CNN provides 43 % better prediction than the FNN and 25% better prediction than the RNN.


2009 ◽  
Vol 2 (1) ◽  
pp. 421-475 ◽  
Author(s):  
A. Velo ◽  
F. F. Pérez ◽  
X. Lin ◽  
R. M. Key ◽  
T. Tanhua ◽  
...  

Abstract. Data on carbon and carbon-relevant hydrographic and hydrochemical parameters from previously non-publicly available cruise data sets in the Artic Mediterranean Seas (AMS), Atlantic and Southern Ocean have been retrieved and merged to a new database: CARINA (CARbon IN the Atlantic). These data have gone through rigorous quality control (QC) procedures to assure the highest possible quality and consistency. The data for most of the measured parameters in the CARINA database were objectively examined in order to quantify systematic differences in the reported values, i.e. secondary quality control. Systematic biases found in the data have been corrected in the data products, i.e. three merged data files with measured, calculated and interpolated data for each of the three CARINA regions; AMS, Atlantic and Southern Ocean. Out of a total of 188 cruise entries in the CARINA database, 59 reported pH measured values. Here we present details of the secondary QC on pH for the CARINA database. Procedures of quality control, including crossover analysis between cruises and inversion analysis of all crossover data are briefly described. Adjustments were applied to the pH values for 21 of the cruises in the CARINA dataset. With these adjustments the CARINA database is consistent both internally as well as with GLODAP data, an oceanographic data set based on the World Hydrographic Program in the 1990s. Based on our analysis we estimate the internal accuracy of the CARINA pH data to be 0.005 pH units. The CARINA data are now suitable for accurate assessments of, for example, oceanic carbon inventories and uptake rates and for model validation.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Oluwafemi Ajayi ◽  
Reolyn Heymann

Purpose Energy management is critical to data centres (DCs) majorly because they are high energy-consuming facilities and demand for their services continue to rise due to rapidly increasing global demand for cloud services and other technological services. This projected sectoral growth is expected to translate into increased energy demand from the sector, which is already considered a major energy consumer unless innovative steps are used to drive effective energy management systems. The purpose of this study is to provide insights into the expected energy demand of the DC and the impact each measured parameter has on the building's energy demand profile. This serves as a basis for the design of an effective energy management system. Design/methodology/approach This study proposes novel tunicate swarm algorithm (TSA) for training an artificial neural network model used for predicting the energy demand of a DC. The objective is to find the optimal weights and biases of the model while avoiding commonly faced challenges when using the backpropagation algorithm. The model implementation is based on historical energy consumption data of an anonymous DC operator in Cape Town, South Africa. The data set provided consists of variables such as ambient temperature, ambient relative humidity, chiller output temperature and computer room air conditioning air supply temperature, which serve as inputs to the neural network that is designed to predict the DC’s hourly energy consumption for July 2020. Upon preprocessing of the data set, total sample number for each represented variable was 464. The 80:20 splitting ratio was used to divide the data set into training and testing set respectively, making 452 samples for the training set and 112 samples for the testing set. A weights-based approach has also been used to analyze the relative impact of the model’s input parameters on the DC’s energy demand pattern. Findings The performance of the proposed model has been compared with those of neural network models trained using state of the art algorithms such as moth flame optimization, whale optimization algorithm and ant lion optimizer. From analysis, it was found that the proposed TSA outperformed the other methods in training the model based on their mean squared error, root mean squared error, mean absolute error, mean absolute percentage error and prediction accuracy. Analyzing the relative percentage contribution of the model's input parameters based on the weights of the neural network also shows that the ambient temperature of the DC has the highest impact on the building’s energy demand pattern. Research limitations/implications The proposed novel model can be applied to solving other complex engineering problems such as regression and classification. The methodology for optimizing the multi-layered perceptron neural network can also be further applied to other forms of neural networks for improved performance. Practical implications Based on the forecasted energy demand of the DC and an understanding of how the input parameters impact the building's energy demand pattern, neural networks can be deployed to optimize the cooling systems of the DC for reduced energy cost. Originality/value The use of TSA for optimizing the weights and biases of a neural network is a novel study. The application context of this study which is DCs is quite untapped in the literature, leaving many gaps for further research. The proposed prediction model can be further applied to other regression tasks and classification tasks. Another contribution of this study is the analysis of the neural network's input parameters, which provides insight into the level to which each parameter influences the DC’s energy demand profile.


Sign in / Sign up

Export Citation Format

Share Document