RSigELU: A nonlinear activation function for deep neural networks

Most deep neural networks use simple, fixed activation functions, such as sigmoids or rectified linear units, regardless of domain or network structure. We introduce differential equation units (DEUs), an improvement to modern neural networks, which enables each neuron to learn a particular nonlinear activation function from a family of solutions to an ordinary differential equation. Specifically, each neuron may change its functional form during training based on the behavior of the other parts of the network. We show that using neurons with DEU activation functions results in a more compact network capable of achieving comparable, if not superior, performance when compared to much larger networks.

Download Full-text

Ti3C2Tx MXene Enabled All-Optical Nonlinear Activation Function for On-Chip Photonic Deep Neural Networks

10.21203/rs.3.rs-919901/v1 ◽

2021 ◽

Author(s):

Adir Hazan ◽

Barak Ratzker ◽

Danzhen Zhang ◽

Aviad Katiyi ◽

Nachum Frage ◽

...

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Neural Networks ◽

Activation Function ◽

Physical Structure ◽

Major Step ◽

Promising Alternative ◽

Silicon Waveguide ◽

All Optical ◽

Nonlinear Activation Function

Abstract Neural networks are one of the first major milestones in developing artificial intelligence systems. The utilisation of integrated photonics in neural networks offers a promising alternative approach to microelectronic and hybrid optical-electronic implementations due to improvements in computational speed and low energy consumption in machine-learning tasks. However, at present, most of the neural network hardware systems are still electronic-based due to a lack of optical realisation of the nonlinear activation function. Here, we experimentally demonstrate two novel approaches for implementing an all-optical neural nonlinear activation function based on utilising unique light-matter interactions in 2D Ti3C2Tx (MXene) in the infrared (IR) range in two configurations: 1) a saturable absorber made of MXene thin film, and 2) a silicon waveguide with MXene flakes overlayer. These configurations may serve as nonlinear units in photonic neural networks, while their nonlinear transfer function can be flexibly designed to optimise the performance of different neuromorphic tasks, depending on the operating wavelength. The proposed configurations are reconfigurable and can therefore be adjusted for various applications without the need to modify the physical structure. We confirm the capability and feasibility of the obtained results in machine-learning applications via an Modified National Institute of Standards and Technology (MNIST) handwritten digit classifications task, with near 99% accuracy. Our developed concept for an all-optical neuron is expected to constitute a major step towards the realization of all-optically implemented deep neural networks.

Download Full-text

Analysis of Non-Linear Activation Functions for Classification Tasks Using Convolutional Neural Networks

Recent Patents on Computer Science ◽

10.2174/2213275911666181025143029 ◽

2019 ◽

Vol 12 (3) ◽

pp. 156-161 ◽

Cited By ~ 3

Author(s):

Aman Dureja ◽

Payal Pahwa

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Activation Function ◽

Primary Objective ◽

Experimental Comparison ◽

Activation Functions ◽

Practical Applications ◽

Network Activation ◽

Non Linear ◽

Hidden Layer

Background: In making the deep neural network, activation functions play an important role. But the choice of activation functions also affects the network in term of optimization and to retrieve the better results. Several activation functions have been introduced in machine learning for many practical applications. But which activation function should use at hidden layer of deep neural networks was not identified. Objective: The primary objective of this analysis was to describe which activation function must be used at hidden layers for deep neural networks to solve complex non-linear problems. Methods: The configuration for this comparative model was used by using the datasets of 2 classes (Cat/Dog). The number of Convolutional layer used in this network was 3 and the pooling layer was also introduced after each layer of CNN layer. The total of the dataset was divided into the two parts. The first 8000 images were mainly used for training the network and the next 2000 images were used for testing the network. Results: The experimental comparison was done by analyzing the network by taking different activation functions on each layer of CNN network. The validation error and accuracy on Cat/Dog dataset were analyzed using activation functions (ReLU, Tanh, Selu, PRelu, Elu) at number of hidden layers. Overall the Relu gave best performance with the validation loss at 25th Epoch 0.3912 and validation accuracy at 25th Epoch 0.8320. Conclusion: It is found that a CNN model with ReLU hidden layers (3 hidden layers here) gives best results and improve overall performance better in term of accuracy and speed. These advantages of ReLU in CNN at number of hidden layers are helpful to effectively and fast retrieval of images from the databases.

Download Full-text

Trigonometric Inference Providing Learning in Deep Neural Networks

Applied Sciences ◽

10.3390/app11156704 ◽

2021 ◽

Vol 11 (15) ◽

pp. 6704

Author(s):

Jingyong Cai ◽

Masashi Takemoto ◽

Yuming Qiu ◽

Hironori Nakajo

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Neural Networks ◽

Activation Function ◽

Trigonometric Approximation ◽

Model Parameters ◽

Training Algorithms ◽

Activation Functions ◽

Classical Training ◽

Sum Formula

Despite being heavily used in the training of deep neural networks (DNNs), multipliers are resource-intensive and insufficient in many different scenarios. Previous discoveries have revealed the superiority when activation functions, such as the sigmoid, are calculated by shift-and-add operations, although they fail to remove multiplications in training altogether. In this paper, we propose an innovative approach that can convert all multiplications in the forward and backward inferences of DNNs into shift-and-add operations. Because the model parameters and backpropagated errors of a large DNN model are typically clustered around zero, these values can be approximated by their sine values. Multiplications between the weights and error signals are transferred to multiplications of their sine values, which are replaceable with simpler operations with the help of the product to sum formula. In addition, a rectified sine activation function is utilized for further converting layer inputs into sine values. In this way, the original multiplication-intensive operations can be computed through simple add-and-shift operations. This trigonometric approximation method provides an efficient training and inference alternative for devices with insufficient hardware multipliers. Experimental results demonstrate that this method is able to obtain a performance close to that of classical training algorithms. The approach we propose sheds new light on future hardware customization research for machine learning.

Download Full-text

Efficient approximation of solutions of parametric linear transport equations by ReLU DNNs

Advances in Computational Mathematics ◽

10.1007/s10444-020-09834-7 ◽

2021 ◽

Vol 47 (1) ◽

Author(s):

Fabian Laakmann ◽

Philipp Petersen

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Initial Conditions ◽

Activation Function ◽

Transport Equations ◽

High Dimensional ◽

Linear Transport ◽

Approximation Rates ◽

Curse Of Dimension ◽

Efficient Approximation

AbstractWe demonstrate that deep neural networks with the ReLU activation function can efficiently approximate the solutions of various types of parametric linear transport equations. For non-smooth initial conditions, the solutions of these PDEs are high-dimensional and non-smooth. Therefore, approximation of these functions suffers from a curse of dimension. We demonstrate that through their inherent compositionality deep neural networks can resolve the characteristic flow underlying the transport equations and thereby allow approximation rates independent of the parameter dimension.

Download Full-text

A parameterized activation function for learning fuzzy logic operations in deep neural networks

2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC) ◽

10.1109/smc.2017.8122696 ◽

2017 ◽

Cited By ~ 1

Author(s):

Luke B. Godfrey ◽

Michael S. Gashler

Keyword(s):

Neural Networks ◽

Fuzzy Logic ◽

Deep Neural Networks ◽

Activation Function ◽

Logic Operations

Download Full-text

Discussion of: “Nonparametric regression using deep neural networks with ReLU activation function”

The Annals of Statistics ◽

10.1214/19-aos1911 ◽

2020 ◽

Vol 48 (4) ◽

pp. 1902-1905

Author(s):

Gitta Kutyniok

Keyword(s):

Neural Networks ◽

Nonparametric Regression ◽

Deep Neural Networks ◽

Activation Function

Download Full-text

Relating the Slope of the Activation Function and the Learning Rate Within a Recurrent Neural Network

Neural Computation ◽

10.1162/089976699300016340 ◽

1999 ◽

Vol 11 (5) ◽

pp. 1069-1077 ◽

Cited By ~ 28

Author(s):

Danilo P. Mandic ◽

Jonathon A. Chambers

Keyword(s):

Neural Network ◽

Neural Networks ◽

Recurrent Neural Network ◽

Recurrent Neural Networks ◽

Degrees Of Freedom ◽

Learning Algorithm ◽

Activation Function ◽

Learning Rate ◽

Optimization Task ◽

Nonlinear Activation Function

A relationship between the learning rate η in the learning algorithm, and the slope β in the nonlinear activation function, for a class of recurrent neural networks (RNNs) trained by the real-time recurrent learning algorithm is provided. It is shown that an arbitrary RNN can be obtained via the referent RNN, with some deterministic rules imposed on its weights and the learning rate. Such relationships reduce the number of degrees of freedom when solving the nonlinear optimization task of finding the optimal RNN parameters.

Download Full-text

Rejoinder: “Nonparametric regression using deep neural networks with ReLU activation function”

The Annals of Statistics ◽

10.1214/19-aos1931 ◽

2020 ◽

Vol 48 (4) ◽

pp. 1916-1921 ◽

Cited By ~ 1

Author(s):

Johannes Schmidt-Hieber

Keyword(s):

Neural Networks ◽

Nonparametric Regression ◽

Deep Neural Networks ◽

Activation Function

Download Full-text

Nonparametric regression using deep neural networks with ReLU activation function

The Annals of Statistics ◽

10.1214/19-aos1875 ◽

2020 ◽

Vol 48 (4) ◽

pp. 1875-1897 ◽

Cited By ~ 6

Author(s):

Johannes Schmidt-Hieber

Keyword(s):

Neural Networks ◽

Nonparametric Regression ◽

Deep Neural Networks ◽

Activation Function

Download Full-text

RSigELU: A nonlinear activation function for deep neural networks

Differential Equation Units: Learning Functional Forms of Activation Functions from Data

Ti3C2Tx MXene Enabled All-Optical Nonlinear Activation Function for On-Chip Photonic Deep Neural Networks

Analysis of Non-Linear Activation Functions for Classification Tasks Using Convolutional Neural Networks

Trigonometric Inference Providing Learning in Deep Neural Networks

Efficient approximation of solutions of parametric linear transport equations by ReLU DNNs

A parameterized activation function for learning fuzzy logic operations in deep neural networks

Discussion of: “Nonparametric regression using deep neural networks with ReLU activation function”

Relating the Slope of the Activation Function and the Learning Rate Within a Recurrent Neural Network

Rejoinder: “Nonparametric regression using deep neural networks with ReLU activation function”

Nonparametric regression using deep neural networks with ReLU activation function

Export Citation Format