Multimodal transistors as ReLU activation functions in physical neural network classifiers

AbstractArtificial neural networks (ANNs) providing sophisticated, power-efficient classification are finding their way into thin-film electronics. Thin-film technologies require robust, layout-efficient devices with facile manufacturability. Here, we show how the multimodal transistor’s (MMT’s) transfer characteristic, with linear dependence in saturation, replicates the rectified linear unit (ReLU) activation function of convolutional ANNs (CNNs). Using MATLAB, we evaluate CNN performance using systematically distorted ReLU functions, then substitute measured and simulated MMT transfer characteristics as proxies for ReLU. High classification accuracy is maintained, despite large variations in geometrical and electrical parameters, as CNNs use the same activation functions for training and classification.

Download Full-text

Flatten-T Swish: a thresholded ReLU-Swish-like activation function for deep learning

International Journal of Advances in Intelligent Informatics ◽

10.26555/ijain.v4i2.249 ◽

2018 ◽

Vol 4 (2) ◽

pp. 76 ◽

Cited By ~ 2

Author(s):

Hock Hung Chieng ◽

Noorhaniza Wahid ◽

Ong Pauline ◽

Sai Raj Kishore Perla

Keyword(s):

Deep Learning ◽

Learning Community ◽

Classification Accuracy ◽

Deep Neural Network ◽

Threshold Value ◽

Activation Function ◽

Activation Functions ◽

Rectified Linear Unit ◽

Overall Performance ◽

Fully Connected

Activation functions are essential for deep learning methods to learn and perform complex tasks such as image classification. Rectified Linear Unit (ReLU) has been widely used and become the default activation function across the deep learning community since 2012. Although ReLU has been popular, however, the hard zero property of the ReLU has heavily hindering the negative values from propagating through the network. Consequently, the deep neural network has not been benefited from the negative representations. In this work, an activation function called Flatten-T Swish (FTS) that leverage the benefit of the negative values is proposed. To verify its performance, this study evaluates FTS with ReLU and several recent activation functions. Each activation function is trained using MNIST dataset on five different deep fully connected neural networks (DFNNs) with depth vary from five to eight layers. For a fair evaluation, all DFNNs are using the same configuration settings. Based on the experimental results, FTS with a threshold value, T=-0.20 has the best overall performance. As compared with ReLU, FTS (T=-0.20) improves MNIST classification accuracy by 0.13%, 0.70%, 0.67%, 1.07% and 1.15% on wider 5 layers, slimmer 5 layers, 6 layers, 7 layers and 8 layers DFNNs respectively. Apart from this, the study also noticed that FTS converges twice as fast as ReLU. Although there are other existing activation functions are also evaluated, this study elects ReLU as the baseline activation function.

Download Full-text

Maxout Networks for Visual Recognition

International Journal of Multimedia Data Engineering and Management ◽

10.4018/ijmdem.2019100101 ◽

2019 ◽

Vol 10 (4) ◽

pp. 1-25

Author(s):

Gabriel Castaneda ◽

Paul Morris ◽

Taghi M Khoshgoftaar

Keyword(s):

Neural Networks ◽

Visual Recognition ◽

Activation Function ◽

Entity Recognition ◽

Activation Functions ◽

Hyperbolic Tangent ◽

Rectified Linear Unit ◽

Facial Identification ◽

Leaky Rectified Linear Unit ◽

Better Than

This study investigates the effectiveness of multiple maxout activation variants on image classification, facial identification and verification tasks using convolutional neural networks. A network with maxout activation has a higher number of trainable parameters compared to networks with traditional activation functions. However, it is not clear if the activation function itself or the increase in the number of trainable parameters is responsible for yielding the best performance on different entity recognition tasks. This article investigates if an increase in the number of convolutional filters on the rectified linear unit activation performs equal-to or better-than maxout networks. Our experiments compare rectified linear unit, leaky rectified linear unit, scaled exponential linear unit, and hyperbolic tangent to four maxout variants. Throughout the experiments, we found that on average, across all datasets, the rectified linear unit networks perform better than any maxout activation when the number of convolutional filters is increased six times.

Download Full-text

Analysis of Non-Linear Activation Functions for Classification Tasks Using Convolutional Neural Networks

Recent Patents on Computer Science ◽

10.2174/2213275911666181025143029 ◽

2019 ◽

Vol 12 (3) ◽

pp. 156-161 ◽

Cited By ~ 3

Author(s):

Aman Dureja ◽

Payal Pahwa

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Activation Function ◽

Primary Objective ◽

Experimental Comparison ◽

Activation Functions ◽

Practical Applications ◽

Network Activation ◽

Non Linear ◽

Hidden Layer

Background: In making the deep neural network, activation functions play an important role. But the choice of activation functions also affects the network in term of optimization and to retrieve the better results. Several activation functions have been introduced in machine learning for many practical applications. But which activation function should use at hidden layer of deep neural networks was not identified. Objective: The primary objective of this analysis was to describe which activation function must be used at hidden layers for deep neural networks to solve complex non-linear problems. Methods: The configuration for this comparative model was used by using the datasets of 2 classes (Cat/Dog). The number of Convolutional layer used in this network was 3 and the pooling layer was also introduced after each layer of CNN layer. The total of the dataset was divided into the two parts. The first 8000 images were mainly used for training the network and the next 2000 images were used for testing the network. Results: The experimental comparison was done by analyzing the network by taking different activation functions on each layer of CNN network. The validation error and accuracy on Cat/Dog dataset were analyzed using activation functions (ReLU, Tanh, Selu, PRelu, Elu) at number of hidden layers. Overall the Relu gave best performance with the validation loss at 25th Epoch 0.3912 and validation accuracy at 25th Epoch 0.8320. Conclusion: It is found that a CNN model with ReLU hidden layers (3 hidden layers here) gives best results and improve overall performance better in term of accuracy and speed. These advantages of ReLU in CNN at number of hidden layers are helpful to effectively and fast retrieval of images from the databases.

Download Full-text

Hardware implementation of radial-basis neural networks with Gaussian activation functions on FPGA

Neural Computing and Applications ◽

10.1007/s00521-021-05706-3 ◽

2021 ◽

Author(s):

Volodymyr Shymkovych ◽

Sergii Telenyk ◽

Petro Kravets

Keyword(s):

Neural Networks ◽

Hardware Implementation ◽

Gaussian Function ◽

Activation Function ◽

Rbf Neural Networks ◽

Activation Functions ◽

Rbf Network ◽

Combination Scheme ◽

Radial Basis ◽

Hidden Layer

AbstractThis article introduces a method for realizing the Gaussian activation function of radial-basis (RBF) neural networks with their hardware implementation on field-programmable gaits area (FPGAs). The results of modeling of the Gaussian function on FPGA chips of different families have been presented. RBF neural networks of various topologies have been synthesized and investigated. The hardware component implemented by this algorithm is an RBF neural network with four neurons of the latent layer and one neuron with a sigmoid activation function on an FPGA using 16-bit numbers with a fixed point, which took 1193 logic matrix gate (LUTs—LookUpTable). Each hidden layer neuron of the RBF network is designed on an FPGA as a separate computing unit. The speed as a total delay of the combination scheme of the block RBF network was 101.579 ns. The implementation of the Gaussian activation functions of the hidden layer of the RBF network occupies 106 LUTs, and the speed of the Gaussian activation functions is 29.33 ns. The absolute error is ± 0.005. The Spartan 3 family of chips for modeling has been used to get these results. Modeling on chips of other series has been also introduced in the article. RBF neural networks of various topologies have been synthesized and investigated. Hardware implementation of RBF neural networks with such speed allows them to be used in real-time control systems for high-speed objects.

Download Full-text

Trigonometric Inference Providing Learning in Deep Neural Networks

Applied Sciences ◽

10.3390/app11156704 ◽

2021 ◽

Vol 11 (15) ◽

pp. 6704

Author(s):

Jingyong Cai ◽

Masashi Takemoto ◽

Yuming Qiu ◽

Hironori Nakajo

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Neural Networks ◽

Activation Function ◽

Trigonometric Approximation ◽

Model Parameters ◽

Training Algorithms ◽

Activation Functions ◽

Classical Training ◽

Sum Formula

Despite being heavily used in the training of deep neural networks (DNNs), multipliers are resource-intensive and insufficient in many different scenarios. Previous discoveries have revealed the superiority when activation functions, such as the sigmoid, are calculated by shift-and-add operations, although they fail to remove multiplications in training altogether. In this paper, we propose an innovative approach that can convert all multiplications in the forward and backward inferences of DNNs into shift-and-add operations. Because the model parameters and backpropagated errors of a large DNN model are typically clustered around zero, these values can be approximated by their sine values. Multiplications between the weights and error signals are transferred to multiplications of their sine values, which are replaceable with simpler operations with the help of the product to sum formula. In addition, a rectified sine activation function is utilized for further converting layer inputs into sine values. In this way, the original multiplication-intensive operations can be computed through simple add-and-shift operations. This trigonometric approximation method provides an efficient training and inference alternative for devices with insufficient hardware multipliers. Experimental results demonstrate that this method is able to obtain a performance close to that of classical training algorithms. The approach we propose sheds new light on future hardware customization research for machine learning.

Download Full-text

A visual terrain classification method for mobile robots’ navigation based on convolutional neural network and support vector machine

Transactions of the Institute of Measurement and Control ◽

10.1177/0142331220987917 ◽

2021 ◽

pp. 014233122098791

Author(s):

Wanli Wang ◽

Botao Zhang ◽

Kaiqi Wu ◽

Sergey A Chepinskiy ◽

Anton A Zhilenkov ◽

...

Keyword(s):

Neural Network ◽

Support Vector Machine ◽

Mobile Robots ◽

Convolutional Neural Network ◽

Hybrid Method ◽

Classification Accuracy ◽

Support Vector ◽

High Classification Accuracy ◽

Enhance Efficiency ◽

Multi Class Classification

In this paper, a hybrid method based on deep learning is proposed to visually classify terrains encountered by mobile robots. Considering the limited computing resource on mobile robots and the requirement for high classification accuracy, the proposed hybrid method combines a convolutional neural network with a support vector machine to keep a high classification accuracy while improve work efficiency. The key idea is that the convolutional neural network is used to finish a multi-class classification and simultaneously the support vector machine is used to make a two-class classification. The two-class classification performed by the support vector machine is aimed at one kind of terrain that users are mostly concerned with. Results of the two classifications will be consolidated to get the final classification result. The convolutional neural network used in this method is modified for the on-board usage of mobile robots. In order to enhance efficiency, the convolutional neural network has a simple architecture. The convolutional neural network and the support vector machine are trained and tested by using RGB images of six kinds of common terrains. Experimental results demonstrate that this method can help robots classify terrains accurately and efficiently. Therefore, the proposed method has a significant potential for being applied to the on-board usage of mobile robots.

Download Full-text

Privacy-Preserving Deep Neural Network Methods: Computational and Perceptual Methods—An Overview

Electronics ◽

10.3390/electronics10111367 ◽

2021 ◽

Vol 10 (11) ◽

pp. 1367

Author(s):

Raghida El El Saj ◽

Ehsan Sedgh Sedgh Gooya ◽

Ayman Alfalou ◽

Mohamad Khalil

Keyword(s):

Neural Networks ◽

Classification Accuracy ◽

Deep Neural Networks ◽

Privacy Preserving ◽

Sensitive Data ◽

Encrypted Data ◽

High Classification Accuracy ◽

Hide Information ◽

Network Methods ◽

Cloud Environments

Privacy-preserving deep neural networks have become essential and have attracted the attention of many researchers due to the need to maintain the privacy and the confidentiality of personal and sensitive data. The importance of privacy-preserving networks has increased with the widespread use of neural networks as a service in unsecured cloud environments. Different methods have been proposed and developed to solve the privacy-preserving problem using deep neural networks on encrypted data. In this article, we reviewed some of the most relevant and well-known computational and perceptual image encryption methods. These methods as well as their results have been presented, compared, and the conditions of their use, the durability and robustness of some of them against attacks, have been discussed. Some of the mentioned methods have demonstrated an ability to hide information and make it difficult for adversaries to retrieve it while maintaining high classification accuracy. Based on the obtained results, it was suggested to develop and use some of the cited privacy-preserving methods in applications other than classification.

Download Full-text

BeautyNet: Joint Multiscale CNN and Transfer Learning Method for Unconstrained Facial Beauty Prediction

Computational Intelligence and Neuroscience ◽

10.1155/2019/1910624 ◽

2019 ◽

Vol 2019 ◽

pp. 1-14 ◽

Cited By ~ 4

Author(s):

Yikui Zhai ◽

He Cao ◽

Wenbo Deng ◽

Junying Gan ◽

Vincenzo Piuri ◽

...

Keyword(s):

Transfer Learning ◽

Classification Accuracy ◽

Learning Strategy ◽

State Of The Art ◽

Activation Function ◽

Training Data ◽

Fine Grained ◽

Pattern Recognition Problem ◽

Face Features ◽

Facial Beauty

Because of the lack of discriminative face representations and scarcity of labeled training data, facial beauty prediction (FBP), which aims at assessing facial attractiveness automatically, has become a challenging pattern recognition problem. Inspired by recent promising work on fine-grained image classification using the multiscale architecture to extend the diversity of deep features, BeautyNet for unconstrained facial beauty prediction is proposed in this paper. Firstly, a multiscale network is adopted to improve the discriminative of face features. Secondly, to alleviate the computational burden of the multiscale architecture, MFM (max-feature-map) is utilized as an activation function which can not only lighten the network and speed network convergence but also benefit the performance. Finally, transfer learning strategy is introduced here to mitigate the overfitting phenomenon which is caused by the scarcity of labeled facial beauty samples and improves the proposed BeautyNet’s performance. Extensive experiments performed on LSFBD demonstrate that the proposed scheme outperforms the state-of-the-art methods, which can achieve 67.48% classification accuracy.

Download Full-text

Deep neural network based on generalized neo-fuzzy neurons and its learning based on backpropagation

Artificial Intelligence ◽

10.15407/jai2021.01.032 ◽

2021 ◽

Vol 26 (jai2021.26(1)) ◽

pp. 32-41

Author(s):

Bodyanskiy Y ◽

◽

Antonenko T ◽

Keyword(s):

Neural Network ◽

Neural Networks ◽

Learning Process ◽

Activation Function ◽

Point Of View ◽

Basic Unit ◽

Theoretical Point ◽

Activation Functions ◽

Approximation Properties ◽

Network Training

Modern approaches in deep neural networks have a number of issues related to the learning process and computational costs. This article considers the architecture grounded on an alternative approach to the basic unit of the neural network. This approach achieves optimization in the calculations and gives rise to an alternative way to solve the problems of the vanishing and exploding gradient. The main issue of the article is the usage of the deep stacked neo-fuzzy system, which uses a generalized neo-fuzzy neuron to optimize the learning process. This approach is non-standard from a theoretical point of view, so the paper presents the necessary mathematical calculations and describes all the intricacies of using this architecture from a practical point of view. From a theoretical point, the network learning process is fully disclosed. Derived all necessary calculations for the use of the backpropagation algorithm for network training. A feature of the network is the rapid calculation of the derivative for the activation functions of neurons. This is achieved through the use of fuzzy membership functions. The paper shows that the derivative of such function is a constant, and this is a reason for the statement of increasing in the optimization rate in comparison with neural networks which use neurons with more common activation functions (ReLU, sigmoid). The paper highlights the main points that can be improved in further theoretical developments on this topic. In general, these issues are related to the calculation of the activation function. The proposed methods cope with these points and allow approximation using the network, but the authors already have theoretical justifications for improving the speed and approximation properties of the network. The results of the comparison of the proposed network with standard neural network architectures are shown

Download Full-text