A note on the applications of one primary function in deep neural networks

International Journal of Wavelets Multiresolution and Information Processing ◽

10.1142/s0219691321500582 ◽

2021 ◽

Author(s):

Hengjie Chen ◽

Zhong Li

Keyword(s):

Neural Networks ◽

Mathematical Knowledge ◽

Deep Neural Networks ◽

Activation Function ◽

Continuous Functions ◽

Middle Point ◽

Deep Network ◽

Rectified Linear Unit ◽

The Mean ◽

The Difference

By applying fundamental mathematical knowledge, this paper proves that the function [Formula: see text] is an integer no less than [Formula: see text] has the property that the difference between the function value of middle point of arbitrarily two adjacent equidistant distribution nodes on [Formula: see text] and the mean of function values of these two nodes is a constant depending only on the number of nodes if and only if [Formula: see text] By them, we establish an important result about deep neural networks that the function [Formula: see text] can be interpolated by a deep Rectified Linear Unit (ReLU) network with depth [Formula: see text] on the equidistant distribution nodes in interval [Formula: see text] and the error of approximation is [Formula: see text] Then based on the main result that has just been proven and the Chebyshev orthogonal polynomials, we construct a deep network and give the error estimate of approximation to polynomials and continuous functions, respectively. In addition, this paper constructs one deep network with local sparse connections, shared weights and activation function [Formula: see text] and discusses its density and complexity.

Download Full-text

Analysis of Non-Linear Activation Functions for Classification Tasks Using Convolutional Neural Networks

Recent Patents on Computer Science ◽

10.2174/2213275911666181025143029 ◽

2019 ◽

Vol 12 (3) ◽

pp. 156-161 ◽

Cited By ~ 3

Author(s):

Aman Dureja ◽

Payal Pahwa

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Activation Function ◽

Primary Objective ◽

Experimental Comparison ◽

Activation Functions ◽

Practical Applications ◽

Network Activation ◽

Non Linear ◽

Hidden Layer

Background: In making the deep neural network, activation functions play an important role. But the choice of activation functions also affects the network in term of optimization and to retrieve the better results. Several activation functions have been introduced in machine learning for many practical applications. But which activation function should use at hidden layer of deep neural networks was not identified. Objective: The primary objective of this analysis was to describe which activation function must be used at hidden layers for deep neural networks to solve complex non-linear problems. Methods: The configuration for this comparative model was used by using the datasets of 2 classes (Cat/Dog). The number of Convolutional layer used in this network was 3 and the pooling layer was also introduced after each layer of CNN layer. The total of the dataset was divided into the two parts. The first 8000 images were mainly used for training the network and the next 2000 images were used for testing the network. Results: The experimental comparison was done by analyzing the network by taking different activation functions on each layer of CNN network. The validation error and accuracy on Cat/Dog dataset were analyzed using activation functions (ReLU, Tanh, Selu, PRelu, Elu) at number of hidden layers. Overall the Relu gave best performance with the validation loss at 25th Epoch 0.3912 and validation accuracy at 25th Epoch 0.8320. Conclusion: It is found that a CNN model with ReLU hidden layers (3 hidden layers here) gives best results and improve overall performance better in term of accuracy and speed. These advantages of ReLU in CNN at number of hidden layers are helpful to effectively and fast retrieval of images from the databases.

Download Full-text

Trigonometric Inference Providing Learning in Deep Neural Networks

Applied Sciences ◽

10.3390/app11156704 ◽

2021 ◽

Vol 11 (15) ◽

pp. 6704

Author(s):

Jingyong Cai ◽

Masashi Takemoto ◽

Yuming Qiu ◽

Hironori Nakajo

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Neural Networks ◽

Activation Function ◽

Trigonometric Approximation ◽

Model Parameters ◽

Training Algorithms ◽

Activation Functions ◽

Classical Training ◽

Sum Formula

Despite being heavily used in the training of deep neural networks (DNNs), multipliers are resource-intensive and insufficient in many different scenarios. Previous discoveries have revealed the superiority when activation functions, such as the sigmoid, are calculated by shift-and-add operations, although they fail to remove multiplications in training altogether. In this paper, we propose an innovative approach that can convert all multiplications in the forward and backward inferences of DNNs into shift-and-add operations. Because the model parameters and backpropagated errors of a large DNN model are typically clustered around zero, these values can be approximated by their sine values. Multiplications between the weights and error signals are transferred to multiplications of their sine values, which are replaceable with simpler operations with the help of the product to sum formula. In addition, a rectified sine activation function is utilized for further converting layer inputs into sine values. In this way, the original multiplication-intensive operations can be computed through simple add-and-shift operations. This trigonometric approximation method provides an efficient training and inference alternative for devices with insufficient hardware multipliers. Experimental results demonstrate that this method is able to obtain a performance close to that of classical training algorithms. The approach we propose sheds new light on future hardware customization research for machine learning.

Download Full-text

Efficient approximation of solutions of parametric linear transport equations by ReLU DNNs

Advances in Computational Mathematics ◽

10.1007/s10444-020-09834-7 ◽

2021 ◽

Vol 47 (1) ◽

Author(s):

Fabian Laakmann ◽

Philipp Petersen

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Initial Conditions ◽

Activation Function ◽

Transport Equations ◽

High Dimensional ◽

Linear Transport ◽

Approximation Rates ◽

Curse Of Dimension ◽

Efficient Approximation

AbstractWe demonstrate that deep neural networks with the ReLU activation function can efficiently approximate the solutions of various types of parametric linear transport equations. For non-smooth initial conditions, the solutions of these PDEs are high-dimensional and non-smooth. Therefore, approximation of these functions suffers from a curse of dimension. We demonstrate that through their inherent compositionality deep neural networks can resolve the characteristic flow underlying the transport equations and thereby allow approximation rates independent of the parameter dimension.

Download Full-text

Research on improved convolutional wavelet neural network

Scientific Reports ◽

10.1038/s41598-021-97195-6 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Jingwei Liu ◽

Peixuan Li ◽

Xuehan Tang ◽

Jiaxin Li ◽

Jiaming Chen

Keyword(s):

Neural Network ◽

Neural Networks ◽

Convolutional Neural Network ◽

Radial Basis Function ◽

Error Rate ◽

Basis Function ◽

Activation Function ◽

Wavelet Neural Network ◽

Mean Square ◽

The Mean

AbstractArtificial neural networks (ANN) which include deep learning neural networks (DNN) have problems such as the local minimal problem of Back propagation neural network (BPNN), the unstable problem of Radial basis function neural network (RBFNN) and the limited maximum precision problem of Convolutional neural network (CNN). Performance (training speed, precision, etc.) of BPNN, RBFNN and CNN are expected to be improved. Main works are as follows: Firstly, based on existing BPNN and RBFNN, Wavelet neural network (WNN) is implemented in order to get better performance for further improving CNN. WNN adopts the network structure of BPNN in order to get faster training speed. WNN adopts the wavelet function as an activation function, whose form is similar to the radial basis function of RBFNN, in order to solve the local minimum problem. Secondly, WNN-based Convolutional wavelet neural network (CWNN) method is proposed, in which the fully connected layers (FCL) of CNN is replaced by WNN. Thirdly, comparative simulations based on MNIST and CIFAR-10 datasets among the discussed methods of BPNN, RBFNN, CNN and CWNN are implemented and analyzed. Fourthly, the wavelet-based Convolutional Neural Network (WCNN) is proposed, where the wavelet transformation is adopted as the activation function in Convolutional Pool Neural Network (CPNN) of CNN. Fifthly, simulations based on CWNN are implemented and analyzed on the MNIST dataset. Effects are as follows: Firstly, WNN can solve the problems of BPNN and RBFNN and have better performance. Secondly, the proposed CWNN can reduce the mean square error and the error rate of CNN, which means CWNN has better maximum precision than CNN. Thirdly, the proposed WCNN can reduce the mean square error and the error rate of CWNN, which means WCNN has better maximum precision than CWNN.

Download Full-text

Prediction of Causative Genes in Inherited Retinal Disorders from Spectral-Domain Optical Coherence Tomography Utilizing Deep Learning Techniques

Journal of Ophthalmology ◽

10.1155/2019/1691064 ◽

2019 ◽

Vol 2019 ◽

pp. 1-7 ◽

Cited By ~ 10

Author(s):

Yu Fujinami-Yokokawa ◽

Nikolas Pontikos ◽

Lizhu Yang ◽

Kazushige Tsunoda ◽

Kazutoshi Yoshitake ◽

...

Keyword(s):

Neural Networks ◽

Optical Coherence Tomography ◽

Classification Accuracy ◽

Deep Neural Networks ◽

Test Accuracy ◽

Test Sets ◽

The Mean ◽

Per Gene ◽

Sd Oct ◽

Gene Category

Purpose. To illustrate a data-driven deep learning approach to predicting the gene responsible for the inherited retinal disorder (IRD) in macular dystrophy caused by ABCA4 and RP1L1 gene aberration in comparison with retinitis pigmentosa caused by EYS gene aberration and normal subjects. Methods. Seventy-five subjects with IRD or no ocular diseases have been ascertained from the database of Japan Eye Genetics Consortium; 10 ABCA4 retinopathy, 20 RP1L1 retinopathy, 28 EYS retinopathy, and 17 normal patients/subjects. Horizontal/vertical cross-sectional scans of optical coherence tomography (SD-OCT) at the central fovea were cropped/adjusted to a resolution of 400 pixels/inch with a size of 750 × 500 pix2 for learning. Subjects were randomly split following a 3 : 1 ratio into training and test sets. The commercially available learning tool, Medic mind was applied to this four-class classification program. The classification accuracy, sensitivity, and specificity were calculated during the learning process. This process was repeated four times with random assignment to training and test sets to control for selection bias. For each training/testing process, the classification accuracy was calculated per gene category. Results. A total of 178 images from 75 subjects were included in this study. The mean training accuracy was 98.5%, ranging from 90.6 to 100.0. The mean overall test accuracy was 90.9% (82.0–97.6). The mean test accuracy per gene category was 100% for ABCA4, 78.0% for RP1L1, 89.8% for EYS, and 93.4% for Normal. Test accuracy of RP1L1 and EYS was not high relative to the training accuracy which suggests overfitting. Conclusion. This study highlighted a novel application of deep neural networks in the prediction of the causative gene in IRD retinopathies from SD-OCT, with a high prediction accuracy. It is anticipated that deep neural networks will be integrated into general screening to support clinical/genetic diagnosis, as well as enrich the clinical education.

Download Full-text

A parameterized activation function for learning fuzzy logic operations in deep neural networks

2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC) ◽

10.1109/smc.2017.8122696 ◽

2017 ◽

Cited By ~ 1

Author(s):

Luke B. Godfrey ◽

Michael S. Gashler

Keyword(s):

Neural Networks ◽

Fuzzy Logic ◽

Deep Neural Networks ◽

Activation Function ◽

Logic Operations

Download Full-text

Discussion of: “Nonparametric regression using deep neural networks with ReLU activation function”

The Annals of Statistics ◽

10.1214/19-aos1911 ◽

2020 ◽

Vol 48 (4) ◽

pp. 1902-1905

Author(s):

Gitta Kutyniok

Keyword(s):

Neural Networks ◽

Nonparametric Regression ◽

Deep Neural Networks ◽

Activation Function

Download Full-text

Rejoinder: “Nonparametric regression using deep neural networks with ReLU activation function”

The Annals of Statistics ◽

10.1214/19-aos1931 ◽

2020 ◽

Vol 48 (4) ◽

pp. 1916-1921 ◽

Cited By ~ 1

Author(s):

Johannes Schmidt-Hieber

Keyword(s):

Neural Networks ◽

Nonparametric Regression ◽

Deep Neural Networks ◽

Activation Function

Download Full-text

Nonparametric regression using deep neural networks with ReLU activation function

The Annals of Statistics ◽

10.1214/19-aos1875 ◽

2020 ◽

Vol 48 (4) ◽

pp. 1875-1897 ◽

Cited By ~ 6

Author(s):

Johannes Schmidt-Hieber

Keyword(s):

Neural Networks ◽

Nonparametric Regression ◽

Deep Neural Networks ◽

Activation Function

Download Full-text

Deep neural networks design and analysis for automatic phase pickers from three-component microseismic recordings

Geophysical Journal International ◽

10.1093/gji/ggz441 ◽

2019 ◽

Vol 220 (1) ◽

pp. 323-334

Author(s):

Jing Zheng ◽

Shuaishuai Shen ◽

Tianqi Jiang ◽

Weiqiang Zhu

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Semantic Segmentation ◽

Structure Design ◽

P Wave ◽

Negative Slope ◽

Semantic Features ◽

S Wave ◽

Automatic Phase ◽

Rectified Linear Unit

SUMMARY It is essential to pick P-wave and S-wave arrival times rapidly and accurately for the microseismic monitoring systems. Meanwhile, it is not easy to identify the arrivals at a true phase automatically using traditional picking method. This is one of the reasons that many researchers are trying to introduce deep neural networks to solve these problems. Convolutional neural networks (CNNs) are very attractive for designing automatic phase pickers especially after introducing the fundamental network structure from semantic segmentation field, which can give the probability outputs for every labelled phase at every sample in the recordings. The typical segmentation architecture consists of two main parts: (1) an encoder part trained to extracting coarse semantic features; (2) a decoder part responsible not only for recovering the input resolution at the output but also for obtaining sparse representation of the objects. The fundamental segmentation structure performs well; however, the influence of the parameters in the structure on the pickers has not been investigated. It means that the structure design just depends on experience and tests. In this paper, we solve two main questions to give some guidance on network design. First, we show what sparse features will learn from the three-component microseismic recordings using CNNs. Second, the influence of two key parameters in the network on pickers, namely, the depth of decoder and activation functions, is analysed. Increasing the number of levels for a certain layer in the decoder will increase the burden of demand on trainable parameters, but it is beneficial to the accuracy of the model. Reasonable depth of the decoder can balance prediction accuracy and the demand of labelled data, which is important for microseismic systems because manual labelling process will decrease the real-time performance in monitoring tasks. Standard rectified linear unit (ReLU) and leaky rectified linear unit (Leaky ReLU) with different negative slopes are compared for the analysis. Leaky ReLU with a small negative slope can improve the performance of a given model than ReLU activation function by keeping some information about the negative parts.

Download Full-text