Restricted Boltzmann Machines as Models of Interacting Variables

We study the type of distributions that restricted Boltzmann machines (RBMs) with different activation functions can express by investigating the effect of the activation function of the hidden nodes on the marginal distribution they impose on observed bi nary nodes. We report an exact expression for these marginals in the form of a model of interacting binary variables with the explicit form of the interactions depending on the hidden node activation function. We study the properties of these interactions in detail and evaluate how the accuracy with which the RBM approximates distributions over binary variables depends on the hidden node activation function and the number of hidden nodes. When the inferred RBM parameters are weak, an intuitive pattern is found for the expression of the interaction terms, which reduces substantially the differences across activation functions. We show that the weak parameter approximation is a good approximation for different RBMs trained on the MNIST data set. Interestingly, in these cases, the mapping reveals that the inferred models are essentially low order interaction models.

Download Full-text

Estimation of surface roughness and cutting speed in CNC WEDM by artificial neural network that employed trainable activation function

Proceedings of the Institution of Mechanical Engineers Part C Journal of Mechanical Engineering Science ◽

10.1177/0954406221990057 ◽

2021 ◽

pp. 095440622199005

Author(s):

Hüseyin Gürbüz

Keyword(s):

Surface Roughness ◽

Electrical Discharge Machining ◽

Cutting Speed ◽

Activation Function ◽

Activation Functions ◽

Data Set ◽

Artificial Neural ◽

Robust Prediction ◽

Or Modelling

Activation functions are the most significant properties of artificial neural networks (ANN) because these functions are directly related with the ability of ANN in learning or modelling a system or a function. Furthermore, another reason for the significance of the fact that determination of optimal activation function in ANN is its relationship with success level. In this experimental study, the effects of different types of wire electrodes, cooling techniques and workpiece materials on surface roughness (Ra) and cutting speed (Vc) in wire electrical discharge machining (WEDM) were investigated by using trainable activation functions (AFt) and modelling them in ANNs. So far, a number of methods have been performed according to the data set in order to optimally predict Ra and Vc results. Among these methods, randomized ANN with AFt was found to be the best one for robust prediction according to RMSE values. While the value was 0.280 for Vc, it was 0.2104 for Ra. Optimum activation functions in Ra and Vc were found at first and third degree trainable functions, respectively.

Download Full-text

A 64 bit quantum dragon data-set for machine learning

Journal of Physics Conference Series ◽

10.1088/1742-6596/2122/1/012005 ◽

2021 ◽

Vol 2122 (1) ◽

pp. 012005

Author(s):

M.A. Novotný ◽

Yaroslav Koshka ◽

G. Inkoonv ◽

Vivek Dixit

Keyword(s):

Machine Learning ◽

Finite Interval ◽

Tight Binding ◽

Tight Binding Model ◽

Restricted Boltzmann Machines ◽

Binding Model ◽

Data Set ◽

Boltzmann Machines ◽

Electron Transmission ◽

D Wave

Abstract Design and examples of a sixty-four bit quantum dragon data-set are presented. A quantum dragon is a tight-binding model for a strongly disordered nanodevice, but when connected to appropriate semi-infinite leads has complete electron transmission for a finite interval of energies. The labeled data-set contains records which are quantum dragons, which are not quantum dragons, and which are indeterminate. The quantum dragon data-set is designed to be difficult for trained humans and machines to label a nanodevice with regard to its quantum dragon property. The 64 bit record length allows the data-set to be utilized in restricted Boltzmann machines which fit well onto the D-Wave 2000Q quantum annealer architecture.

Download Full-text

Analysis of Non-Linear Activation Functions for Classification Tasks Using Convolutional Neural Networks

Recent Patents on Computer Science ◽

10.2174/2213275911666181025143029 ◽

2019 ◽

Vol 12 (3) ◽

pp. 156-161 ◽

Cited By ~ 3

Author(s):

Aman Dureja ◽

Payal Pahwa

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Activation Function ◽

Primary Objective ◽

Experimental Comparison ◽

Activation Functions ◽

Practical Applications ◽

Network Activation ◽

Non Linear ◽

Hidden Layer

Background: In making the deep neural network, activation functions play an important role. But the choice of activation functions also affects the network in term of optimization and to retrieve the better results. Several activation functions have been introduced in machine learning for many practical applications. But which activation function should use at hidden layer of deep neural networks was not identified. Objective: The primary objective of this analysis was to describe which activation function must be used at hidden layers for deep neural networks to solve complex non-linear problems. Methods: The configuration for this comparative model was used by using the datasets of 2 classes (Cat/Dog). The number of Convolutional layer used in this network was 3 and the pooling layer was also introduced after each layer of CNN layer. The total of the dataset was divided into the two parts. The first 8000 images were mainly used for training the network and the next 2000 images were used for testing the network. Results: The experimental comparison was done by analyzing the network by taking different activation functions on each layer of CNN network. The validation error and accuracy on Cat/Dog dataset were analyzed using activation functions (ReLU, Tanh, Selu, PRelu, Elu) at number of hidden layers. Overall the Relu gave best performance with the validation loss at 25th Epoch 0.3912 and validation accuracy at 25th Epoch 0.8320. Conclusion: It is found that a CNN model with ReLU hidden layers (3 hidden layers here) gives best results and improve overall performance better in term of accuracy and speed. These advantages of ReLU in CNN at number of hidden layers are helpful to effectively and fast retrieval of images from the databases.

Download Full-text

Analysis on Noisy Boltzmann Machines and Noisy Restricted Boltzmann Machines

IEEE Access ◽

10.1109/access.2021.3102275 ◽

2021 ◽

pp. 1-1

Author(s):

Wenhao Lu ◽

Chi-Sing Leung ◽

John Sum

Keyword(s):

Restricted Boltzmann Machines ◽

Boltzmann Machines

Download Full-text

Hardware implementation of radial-basis neural networks with Gaussian activation functions on FPGA

Neural Computing and Applications ◽

10.1007/s00521-021-05706-3 ◽

2021 ◽

Author(s):

Volodymyr Shymkovych ◽

Sergii Telenyk ◽

Petro Kravets

Keyword(s):

Neural Networks ◽

Hardware Implementation ◽

Gaussian Function ◽

Activation Function ◽

Rbf Neural Networks ◽

Activation Functions ◽

Rbf Network ◽

Combination Scheme ◽

Radial Basis ◽

Hidden Layer

AbstractThis article introduces a method for realizing the Gaussian activation function of radial-basis (RBF) neural networks with their hardware implementation on field-programmable gaits area (FPGAs). The results of modeling of the Gaussian function on FPGA chips of different families have been presented. RBF neural networks of various topologies have been synthesized and investigated. The hardware component implemented by this algorithm is an RBF neural network with four neurons of the latent layer and one neuron with a sigmoid activation function on an FPGA using 16-bit numbers with a fixed point, which took 1193 logic matrix gate (LUTs—LookUpTable). Each hidden layer neuron of the RBF network is designed on an FPGA as a separate computing unit. The speed as a total delay of the combination scheme of the block RBF network was 101.579 ns. The implementation of the Gaussian activation functions of the hidden layer of the RBF network occupies 106 LUTs, and the speed of the Gaussian activation functions is 29.33 ns. The absolute error is ± 0.005. The Spartan 3 family of chips for modeling has been used to get these results. Modeling on chips of other series has been also introduced in the article. RBF neural networks of various topologies have been synthesized and investigated. Hardware implementation of RBF neural networks with such speed allows them to be used in real-time control systems for high-speed objects.

Download Full-text

Trigonometric Inference Providing Learning in Deep Neural Networks

Applied Sciences ◽

10.3390/app11156704 ◽

2021 ◽

Vol 11 (15) ◽

pp. 6704

Author(s):

Jingyong Cai ◽

Masashi Takemoto ◽

Yuming Qiu ◽

Hironori Nakajo

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Neural Networks ◽

Activation Function ◽

Trigonometric Approximation ◽

Model Parameters ◽

Training Algorithms ◽

Activation Functions ◽

Classical Training ◽

Sum Formula

Despite being heavily used in the training of deep neural networks (DNNs), multipliers are resource-intensive and insufficient in many different scenarios. Previous discoveries have revealed the superiority when activation functions, such as the sigmoid, are calculated by shift-and-add operations, although they fail to remove multiplications in training altogether. In this paper, we propose an innovative approach that can convert all multiplications in the forward and backward inferences of DNNs into shift-and-add operations. Because the model parameters and backpropagated errors of a large DNN model are typically clustered around zero, these values can be approximated by their sine values. Multiplications between the weights and error signals are transferred to multiplications of their sine values, which are replaceable with simpler operations with the help of the product to sum formula. In addition, a rectified sine activation function is utilized for further converting layer inputs into sine values. In this way, the original multiplication-intensive operations can be computed through simple add-and-shift operations. This trigonometric approximation method provides an efficient training and inference alternative for devices with insufficient hardware multipliers. Experimental results demonstrate that this method is able to obtain a performance close to that of classical training algorithms. The approach we propose sheds new light on future hardware customization research for machine learning.

Download Full-text

High performance logistic regression for privacy-preserving genome analysis

BMC Medical Genomics ◽

10.1186/s12920-020-00869-9 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Martine De Cock ◽

Rafael Dowsley ◽

Anderson C. A. Nascimento ◽

Davis Railsback ◽

Jianwei Shen ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Genome Analysis ◽

Local Area Network ◽

Local Area ◽

Activation Function ◽

Area Network ◽

Learning Models ◽

Data Set ◽

Machine Learning Models

Abstract Background In biomedical applications, valuable data is often split between owners who cannot openly share the data because of privacy regulations and concerns. Training machine learning models on the joint data without violating privacy is a major technology challenge that can be addressed by combining techniques from machine learning and cryptography. When collaboratively training machine learning models with the cryptographic technique named secure multi-party computation, the price paid for keeping the data of the owners private is an increase in computational cost and runtime. A careful choice of machine learning techniques, algorithmic and implementation optimizations are a necessity to enable practical secure machine learning over distributed data sets. Such optimizations can be tailored to the kind of data and Machine Learning problem at hand. Methods Our setup involves secure two-party computation protocols, along with a trusted initializer that distributes correlated randomness to the two computing parties. We use a gradient descent based algorithm for training a logistic regression like model with a clipped ReLu activation function, and we break down the algorithm into corresponding cryptographic protocols. Our main contributions are a new protocol for computing the activation function that requires neither secure comparison protocols nor Yao’s garbled circuits, and a series of cryptographic engineering optimizations to improve the performance. Results For our largest gene expression data set, we train a model that requires over 7 billion secure multiplications; the training completes in about 26.90 s in a local area network. The implementation in this work is a further optimized version of the implementation with which we won first place in Track 4 of the iDASH 2019 secure genome analysis competition. Conclusions In this paper, we present a secure logistic regression training protocol and its implementation, with a new subprotocol to securely compute the activation function. To the best of our knowledge, we present the fastest existing secure multi-party computation implementation for training logistic regression models on high dimensional genome data distributed across a local area network.

Download Full-text

Adaptive hyperparameter updating for training restricted Boltzmann machines on quantum annealers

Scientific Reports ◽

10.1038/s41598-021-82197-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Guanglei Xu ◽

William S. Oates

Keyword(s):

Neural Network ◽

Maximum Likelihood ◽

Image Reconstruction ◽

Image Recognition ◽

Shannon Entropy ◽

Reconstruction Error ◽

Likelihood Method ◽

Restricted Boltzmann Machines ◽

Boltzmann Machines ◽

D Wave

AbstractRestricted Boltzmann Machines (RBMs) have been proposed for developing neural networks for a variety of unsupervised machine learning applications such as image recognition, drug discovery, and materials design. The Boltzmann probability distribution is used as a model to identify network parameters by optimizing the likelihood of predicting an output given hidden states trained on available data. Training such networks often requires sampling over a large probability space that must be approximated during gradient based optimization. Quantum annealing has been proposed as a means to search this space more efficiently which has been experimentally investigated on D-Wave hardware. D-Wave implementation requires selection of an effective inverse temperature or hyperparameter ($$\beta $$ β ) within the Boltzmann distribution which can strongly influence optimization. Here, we show how this parameter can be estimated as a hyperparameter applied to D-Wave hardware during neural network training by maximizing the likelihood or minimizing the Shannon entropy. We find both methods improve training RBMs based upon D-Wave hardware experimental validation on an image recognition problem. Neural network image reconstruction errors are evaluated using Bayesian uncertainty analysis which illustrate more than an order magnitude lower image reconstruction error using the maximum likelihood over manually optimizing the hyperparameter. The maximum likelihood method is also shown to out-perform minimizing the Shannon entropy for image reconstruction.

Download Full-text