scholarly journals Weighted sigmoid gate unit for an activation function of deep neural network

2020 ◽  
Vol 135 ◽  
pp. 354-359 ◽  
Author(s):  
Masayuki Tanaka
2021 ◽  
pp. 1063293X2110251
Author(s):  
K Vijayakumar ◽  
Vinod J Kadam ◽  
Sudhir Kumar Sharma

Deep Neural Network (DNN) stands for multilayered Neural Network (NN) that is capable of progressively learn the more abstract and composite representations of the raw features of the input data received, with no need for any feature engineering. They are advanced NNs having repetitious hidden layers between the initial input and the final layer. The working principle of such a standard deep classifier is based on a hierarchy formed by the composition of linear functions and a defined nonlinear Activation Function (AF). It remains uncertain (not clear) how the DNN classifier can function so well. But it is clear from many studies that within DNN, the AF choice has a notable impact on the kinetics of training and the success of tasks. In the past few years, different AFs have been formulated. The choice of AF is still an area of active study. Hence, in this study, a novel deep Feed forward NN model with four AFs has been proposed for breast cancer classification: hidden layer 1: Swish, hidden layer, 2:-LeakyReLU, hidden layer 3: ReLU, and final output layer: naturally Sigmoidal. The purpose of the study is twofold. Firstly, this study is a step toward a more profound understanding of DNN with layer-wise different AFs. Secondly, research is also aimed to explore better DNN-based systems to build predictive models for breast cancer data with improved accuracy. Therefore, the benchmark UCI dataset WDBC was used for the validation of the framework and evaluated using a ten-fold CV method and various performance indicators. Multiple simulations and outcomes of the experimentations have shown that the proposed solution performs in a better way than the Sigmoid, ReLU, and LeakyReLU and Swish activation DNN in terms of different parameters. This analysis contributes to producing an expert and precise clinical dataset classification method for breast cancer. Furthermore, the model also achieved improved performance compared to many established state-of-the-art algorithms/models.


2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Yichen Sun ◽  
Mingli Dong ◽  
Mingxin Yu ◽  
Jiabin Xia ◽  
Xu Zhang ◽  
...  

A photonic artificial intelligence chip is based on an optical neural network (ONN), low power consumption, low delay, and strong antiinterference ability. The all-optical diffractive deep neural network has recently demonstrated its inference capabilities on the image classification task. However, the size of the physical model does not have miniaturization and integration, and the optical nonlinearity is not incorporated into the diffraction neural network. By introducing the nonlinear characteristics of the network, complex tasks can be completed with high accuracy. In this study, a nonlinear all-optical diffraction deep neural network (N-D2NN) model based on 10.6 μm wavelength is constructed by combining the ONN and complex-valued neural networks with the nonlinear activation function introduced into the structure. To be specific, the improved activation function of the rectified linear unit (ReLU), i.e., Leaky-ReLU, parametric ReLU (PReLU), and randomized ReLU (RReLU), is selected as the activation function of the N-D2NN model. Through numerical simulation, it is proved that the N-D2NN model based on 10.6 μm wavelength has excellent representation ability, which enables them to perform classification learning tasks of the MNIST handwritten digital dataset and Fashion-MNIST dataset well, respectively. The results show that the N-D2NN model with the RReLU activation function has the highest classification accuracy of 97.86% and 89.28%, respectively. These results provide a theoretical basis for the preparation of miniaturized and integrated N-D2NN model photonic artificial intelligence chips.


Sensors ◽  
2020 ◽  
Vol 20 (8) ◽  
pp. 2226
Author(s):  
Hashim Yasin ◽  
Mazhar Hussain ◽  
Andreas Weber

In this paper, we propose a novel and efficient framework for 3D action recognition using a deep learning architecture. First, we develop a 3D normalized pose space that consists of only 3D normalized poses, which are generated by discarding translation and orientation information. From these poses, we extract joint features and employ them further in a Deep Neural Network (DNN) in order to learn the action model. The architecture of our DNN consists of two hidden layers with the sigmoid activation function and an output layer with the softmax function. Furthermore, we propose a keyframe extraction methodology through which, from a motion sequence of 3D frames, we efficiently extract the keyframes that contribute substantially to the performance of the action. In this way, we eliminate redundant frames and reduce the length of the motion. More precisely, we ultimately summarize the motion sequence, while preserving the original motion semantics. We only consider the remaining essential informative frames in the process of action recognition, and the proposed pipeline is sufficiently fast and robust as a result. Finally, we evaluate our proposed framework intensively on publicly available benchmark Motion Capture (MoCap) datasets, namely HDM05 and CMU. From our experiments, we reveal that our proposed scheme significantly outperforms other state-of-the-art approaches.


2018 ◽  
Vol 19 ◽  
pp. 01009
Author(s):  
Stanisław Płaczek ◽  
Aleksander Płaczek

In the article, emphasis is put on the modern artificial neural network structure, which in the literature is known as a deep neural network. Network includes more than one hidden layer and comprises many standard modules with ReLu nonlinear activation function. A learning algorithm includes two standard steps, forward and backward, and its effectiveness depends on the way the learning error is transported back through all the layers to the first layer. Taking into account all the dimensionalities of matrixes and the nonlinear characteristics of ReLu activation function, the problem is very difficult from a theoretic point of view. To implement simple assumptions in the analysis, formal formulas are used to describe relations between the structure of every layer and the internal input vector. In practice tasks, neural networks’ internal layer matrixes with ReLu activations function, include a lot of null value of weight coefficients. This phenomenon has a negatives impact on the effectiveness of the learning algorithm convergences. A theoretical analysis could help to build more effective algorithms.


Author(s):  
Kun Huang ◽  
Bingbing Ni ◽  
Xiaokang Yang

Quantization has shown stunning efficiency on deep neural network, especially for portable devices with limited resources. Most existing works uncritically extend weight quantization methods to activations. However, we take the view that best performance can be obtained by applying different quantization methods to weights and activations respectively. In this paper, we design a new activation function dubbed CReLU from the quantization perspective and further complement this design with appropriate initialization method and training procedure. Moreover, we develop a specific quantization strategy in which we formulate the forward and backward approximation of weights with binary values and quantize the activations to low bitwdth using linear or logarithmic quantizer. We show, for the first time, our final quantized model with binary weights and ultra low bitwidth activations outperforms the previous best models by large margins on ImageNet as well as achieving nearly a 10.85× theoretical speedup with ResNet-18. Furthermore, ablation experiments and theoretical analysis demonstrate the effectiveness and robustness of CReLU in comparison with other activation functions.


Author(s):  
Indrajeet Kumar ◽  
Abhishek Kumar ◽  
V D Ambeth Kumar ◽  
Ramani Kannan ◽  
Vrince Vimal ◽  
...  

AbstractBreast tumors are from the common infections among women around the world. Classifying the various types of breast tumors contribute to treating breast tumors more efficiently. However, this classification task is often hindered by dense tissue patterns captured in mammograms. The present study has been proposed a dense tissue pattern characterization framework using deep neural network. A total of 322 mammograms belonging to the mini-MIAS dataset and 4880 mammograms from DDSM dataset have been taken, and an ROI of fixed size 224 × 224 pixels from each mammogram has been extracted. In this work, tedious experimentation has been executed using different combinations of training and testing sets using different activation function with AlexNet, ResNet-18 model. Data augmentation has been used to create a similar type of virtual image for proper training of the DL model. After that, the testing set is applied on the trained model to validate the proposed model. During experiments, four different activation functions ‘sigmoid’, ‘tanh’, ‘ReLu’, and ‘leakyReLu’ are used, and the outcome for each function has been reported. It has been found that activation function ‘ReLu’ perform always outstanding with respect to others. For each experiment, classification accuracy and kappa coefficient have been computed. The obtained accuracy and kappa value for MIAS dataset using ResNet-18 model is 91.3% and 0.803, respectively. For DDSM dataset, the accuracy of 92.3% and kappa coefficient value of 0.846 are achieved. After the combination of both dataset images, the achieved accuracy is 91.9%, and kappa coefficient value is 0.839 using ResNet-18 model. Finally, it has been concluded that the ResNet-18 model and ReLu activation function yield outstanding performance for the task.


Sensors ◽  
2022 ◽  
Vol 22 (2) ◽  
pp. 656
Author(s):  
Jingyi Liu ◽  
Shuni Song ◽  
Jiayi Wang ◽  
Maimutimin Balaiti ◽  
Nina Song ◽  
...  

With the improvement of industrial requirements for the quality of cold rolled strips, flatness has become one of the most important indicators for measuring the quality of cold rolled strips. In this paper, the strip production data of a 1250 mm tandem cold mill in a steel plant is modeled by an improved deep neural network (the improved DNN) to improve the accuracy of strip shape prediction. Firstly, the type of activation function is analyzed, and the monotonicity of the activation function is deemed independent of the convexity of the loss function in the deep network. Regardless of whether the activation function is monotonic, the loss function is not strictly convex. Secondly, the non-convex optimization of the loss functionextended from the deep linear network to the deep nonlinear network, is discussed, and the critical point of the deep nonlinear network is identified as the global minimum point. Finally, an improved Swish activation function based on batch normalization is proposed, and its performance is evaluated on the MNIST dataset. The experimental results show that the loss of an improved Swish function is lower than that of other activation functions. The prediction accuracy of a deep neural network (DNN) with an improved Swish function is 0.38% more than that of a deep neural network (DNN) with a regular Swish function. For the DNN with the improved Swish function, the mean square error of the prediction for the flatness of cold rolled strip is reduced to 65% of the regular DNN. The accuracy of the improved DNN is up to and higher than the industrial requirements. The shape prediction of the improved DNN will assist and guide the industrial production process, reducing the scrap yield and industrial cost.


Sign in / Sign up

Export Citation Format

Share Document