Reinforcement learning using quantum Boltzmann machines

We investigate whether quantum annealers with select chip layouts can outperform classical computers in reinforcement learning tasks. We associate a transverse field Ising spin Hamiltonian with a layout of qubits similar to that of a deep Boltzmann machine (DBM) and use simulated quantum annealing (SQA) to numerically simulate quantum sampling from this system. We design a reinforcement learning algorithm in which the set of visible nodes representing the states and actions of an optimal policy are the first and last layers of the deep network. In absence of a transverse field, our simulations show that DBMs are trained more effectively than restricted Boltzmann machines (RBM) with the same number of nodes. We then develop a framework for training the network as a quantum Boltzmann machine (QBM) in the presence of a significant transverse field for reinforcement learning. This method also outperforms the reinforcement learning method that uses RBMs.

Download Full-text

Boltzmann machines with clusters of stochastic binary units

International Journal of Modeling Simulation and Scientific Computing ◽

10.1142/s1793962316500185 ◽

2016 ◽

Vol 07 (02) ◽

pp. 1650018

Author(s):

Da Teng ◽

Zhang Li ◽

Guanghong Gong ◽

Liang Han

Keyword(s):

Gaussian Distribution ◽

Learning Algorithm ◽

Hidden Variables ◽

Recognition Task ◽

Boltzmann Machine ◽

Restricted Boltzmann Machines ◽

Boltzmann Machines ◽

New Variant ◽

Deep Boltzmann Machine ◽

New Learning

The original restricted Boltzmann machines (RBMs) are extended by replacing the binary visible and hidden variables with clusters of binary units, and a new learning algorithm for training deep Boltzmann machine of this new variant is proposed. The sum of binary units of each cluster is approximated by a Gaussian distribution. Experiments demonstrate that the proposed Boltzmann machines can achieve good performance in the MNIST handwritten digital recognition task.

Download Full-text

Automatically Mapped Transfer between Reinforcement Learning Tasks via Three-Way Restricted Boltzmann Machines

Advanced Information Systems Engineering - Lecture Notes in Computer Science ◽

10.1007/978-3-642-40991-2_29 ◽

2013 ◽

pp. 449-464 ◽

Cited By ~ 6

Author(s):

Haitham Bou Ammar ◽

Decebal Constantin Mocanu ◽

Matthew E. Taylor ◽

Kurt Driessens ◽

Karl Tuyls ◽

...

Keyword(s):

Reinforcement Learning ◽

Restricted Boltzmann Machines ◽

Boltzmann Machines ◽

Learning Tasks

Download Full-text

Dempster–Shafer Fusion Based on a Deep Boltzmann Machine for Blood Pressure Estimation

Applied Sciences ◽

10.3390/app9010096 ◽

2018 ◽

Vol 9 (1) ◽

pp. 96 ◽

Cited By ~ 3

Author(s):

Soojeong Lee ◽

Joon-Hyuk Chang

Keyword(s):

Blood Pressure ◽

Middle Layer ◽

Upper And Lower Bounds ◽

Boltzmann Machine ◽

Restricted Boltzmann Machines ◽

Boltzmann Machines ◽

Estimation Uncertainty ◽

Deep Boltzmann Machine ◽

Blood Pressure Estimation ◽

Pressure Estimate

We propose a technique using Dempster–Shafer fusion based on a deep Boltzmann machine to classify and estimate systolic blood pressure and diastolic blood pressure categories using oscillometric blood pressure measurements. The deep Boltzmann machine is a state-of-the-art technology in which multiple restricted Boltzmann machines are accumulated. Unlike deep belief networks, each unit in the middle layer of the deep Boltzmann machine obtain information up and down to prevent uncertainty at the inference step. Dempster–Shafer fusion can be incorporated to enable combined independent estimation of the observations, and a confidence increase for a given deep Boltzmann machine estimate can be clearly observed. Our work provides an accurate blood pressure estimate, a blood pressure category with upper and lower bounds, and a solution that can reduce estimation uncertainty. This study is one of the first to use deep Boltzmann machine-based Dempster–Shafer fusion to classify and estimate blood pressure.

Download Full-text

Selective network discovery via deep reinforcement learning on embedded spaces

Applied Network Science ◽

10.1007/s41109-021-00365-8 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Peter Morales ◽

Rajmonda Sulo Caceres ◽

Tina Eliassi-Rad

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Sequential Decision ◽

Network Discovery ◽

Learning Tasks ◽

Partially Observed ◽

Decision Making Problem ◽

Resource Collection ◽

Improved Performance ◽

Discovery Algorithms

AbstractComplex networks are often either too large for full exploration, partially accessible, or partially observed. Downstream learning tasks on these incomplete networks can produce low quality results. In addition, reducing the incompleteness of the network can be costly and nontrivial. As a result, network discovery algorithms optimized for specific downstream learning tasks given resource collection constraints are of great interest. In this paper, we formulate the task-specific network discovery problem as a sequential decision-making problem. Our downstream task is selective harvesting, the optimal collection of vertices with a particular attribute. We propose a framework, called network actor critic (NAC), which learns a policy and notion of future reward in an offline setting via a deep reinforcement learning algorithm. The NAC paradigm utilizes a task-specific network embedding to reduce the state space complexity. A detailed comparative analysis of popular network embeddings is presented with respect to their role in supporting offline planning. Furthermore, a quantitative study is presented on various synthetic and real benchmarks using NAC and several baselines. We show that offline models of reward and network discovery policies lead to significantly improved performance when compared to competitive online discovery algorithms. Finally, we outline learning regimes where planning is critical in addressing sparse and changing reward signals.

Download Full-text

Approximate Learning Algorithm in Boltzmann Machines

Neural Computation ◽

10.1162/neco.2009.08-08-844 ◽

2009 ◽

Vol 21 (11) ◽

pp. 3130-3178 ◽

Cited By ~ 25

Author(s):

Muneki Yasuda ◽

Kazuyuki Tanaka

Keyword(s):

Markov Random Fields ◽

Belief Propagation ◽

Learning Algorithm ◽

Learning Algorithms ◽

Mean Field ◽

Spin Model ◽

Field Methods ◽

Approximate Methods ◽

Boltzmann Machines ◽

Ising Spin

Boltzmann machines can be regarded as Markov random fields. For binary cases, they are equivalent to the Ising spin model in statistical mechanics. Learning systems in Boltzmann machines are one of the NP-hard problems. Thus, in general we have to use approximate methods to construct practical learning algorithms in this context. In this letter, we propose new and practical learning algorithms for Boltzmann machines by using the belief propagation algorithm and the linear response approximation, which are often referred as advanced mean field methods. Finally, we show the validity of our algorithm using numerical experiments.

Download Full-text

Reward-Free Reinforcement Learning Algorithm Using Prediction Network

Fuzzy Systems and Data Mining VI - Frontiers in Artificial Intelligence and Applications ◽

10.3233/faia200744 ◽

2020 ◽

Author(s):

Zhen Yu ◽

Yimin Feng ◽

Lijun Liu

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Value Functions ◽

Learning Method ◽

Reward Function ◽

Network Training ◽

Learning Tasks ◽

Reward Value ◽

Policy Gradient ◽

Reward Functions

In general reinforcement learning tasks, the formulation of reward functions is a very important step in reinforcement learning. The reward function is not easy to formulate in a large number of systems. The network training effect is sensitive to the reward function, and different reward value functions will get different results. For a class of systems that meet specific conditions, the traditional reinforcement learning method is improved. A state quantity function is designed to replace the reward function, which is more efficient than the traditional reward function. At the same time, the predictive network link is designed so that the network can learn the value of the general state by using the special state. The overall structure of the network will be improved based on the Deep Deterministic Policy Gradient (DDPG) algorithm. Finally, the algorithm was successfully applied in the environment of FrozenLake, and achieved good performance. The experiment proves the effectiveness of the algorithm and realizes rewardless reinforcement learning in a class of systems.

Download Full-text

Learning algorithm in restricted Boltzmann machines using Kullback-Leibler importance estimation procedure

Nonlinear Theory and Its Applications IEICE ◽

10.1587/nolta.2.153 ◽

2011 ◽

Vol 2 (2) ◽

pp. 153-164 ◽

Cited By ~ 1

Author(s):

Muneki Yasuda ◽

Tetsuharu Sakurai ◽

Kazuyuki Tanaka

Keyword(s):

Learning Algorithm ◽

Estimation Procedure ◽

Restricted Boltzmann Machines ◽

Boltzmann Machines

Download Full-text

Approximate Learning Algorithm for Restricted Boltzmann Machines

2008 International Conference on Computational Intelligence for Modelling Control & Automation ◽

10.1109/cimca.2008.57 ◽

2008 ◽

Cited By ~ 3

Author(s):

Muneki Yasuda ◽

Kazuyuki Tanaka

Keyword(s):

Learning Algorithm ◽

Restricted Boltzmann Machines ◽

Boltzmann Machines

Download Full-text

Mode-assisted joint training of deep Boltzmann machines

Scientific Reports ◽

10.1038/s41598-021-98404-y ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Haik Manukian ◽

Massimiliano Di Ventra

Keyword(s):

Probability Distributions ◽

Great Success ◽

Boltzmann Machine ◽

Data Set ◽

Boltzmann Machines ◽

Compact Representations ◽

Hardware Implementations ◽

Deep Boltzmann Machine ◽

Unsupervised Training ◽

Performance Gains

AbstractThe deep extension of the restricted Boltzmann machine (RBM), known as the deep Boltzmann machine (DBM), is an expressive family of machine learning models which can serve as compact representations of complex probability distributions. However, jointly training DBMs in the unsupervised setting has proven to be a formidable task. A recent technique we have proposed, called mode-assisted training, has shown great success in improving the unsupervised training of RBMs. Here, we show that the performance gains of the mode-assisted training are even more dramatic for DBMs. In fact, DBMs jointly trained with the mode-assisted algorithm can represent the same data set with orders of magnitude lower number of total parameters compared to state-of-the-art training procedures and even with respect to RBMs, provided a fan-in network topology is also introduced. This substantial saving in number of parameters makes this training method very appealing also for hardware implementations.

Download Full-text

Analyzing joint brand purchases by conditional restricted Boltzmann machines

Review of Managerial Science ◽

10.1007/s11846-021-00478-5 ◽

2021 ◽

Author(s):

Harald Hruschka

Keyword(s):

Hidden Variables ◽

Restricted Boltzmann Machine ◽

Boltzmann Machine ◽

Restricted Boltzmann Machines ◽

Market Basket ◽

Boltzmann Machines ◽

Independent Variables ◽

Product Categories ◽

Pseudo Likelihood ◽

Marketing Variables

AbstractWe introduce the conditional restricted Boltzmann machine as method to analyze brand-level market basket data of individual households. The conditional restricted Boltzmann machine includes marketing variables and household attributes as independent variables. To our knowledge this is the first study comparing the conditional restricted Boltzmann machine to homogeneous and heterogeneous multivariate logit models for brand-level market basket data across several product categories. We explain how to estimate the conditional restricted Boltzmann machine starting from a restricted Boltzmann machine without independent variables. The conditional restricted Boltzmann machine turns out to excel all the other investigated models in terms of log pseudo-likelihood for holdout data. We interpret the selected conditional restricted Boltzmann machine based on coefficients linking purchases to hidden variables, interdependences between brand pairs as well as own and cross effects of marketing variables. The conditional restricted Boltzmann machine indicates pairwise relationships between brands that are more varied than those of the multivariate logit model are. Based on the pairwise interdependences inferred from the restricted Boltzmann machine we determine the competitive structure of brands by means of cluster analysis. Using counterfactual simulations, we investigate what three different models (independent logit, heterogeneous multivariate logit, conditional restricted Boltzmann machine) imply with respect to the retailer’s revenue if each brand is put on display. Finally, we mention possibilities for further research, such as applying the conditional restricted Boltzmann machine to other areas in marketing or retailing.

Download Full-text