scholarly journals Heterogeneous "cell types" can improve performance of deep neural networks

2021 ◽  
Author(s):  
Briar Doty ◽  
Stefan Mihalas ◽  
Anton Arkhipov ◽  
Alex Piet

Deep convolutional neural networks (CNNs) are powerful computational tools for a large variety of tasks (Goodfellow, 2016). Their architecture, composed of layers of repeated identical neural units, draws inspiration from visual neuroscience. However, biological circuits contain a myriad of additional details and complexity not translated to CNNs, including diverse neural cell types (Tasic, 2018). Many possible roles for neural cell types have been proposed, including: learning, stabilizing excitation and inhibition, and diverse normalization (Marblestone, 2016; Gouwens, 2019). Here we investigate whether neural cell types, instantiated as diverse activation functions in CNNs, can assist in the feed-forward computational abilities of neural circuits. Our heterogeneous cell type networks mix multiple activation functions within each activation layer. We assess the value of mixed activation functions by comparing image classification performance to that of homogeneous control networks with only one activation function per network. We observe that mixing activation functions can improve the image classification abilities of CNNs. Importantly, we find larger improvements when the activation functions are more diverse, and in more constrained networks. Our results suggest a feed-forward computational role for diverse cell types in biological circuits. Additionally, our results open new avenues for the development of more powerful CNNs.

Sensors ◽  
2021 ◽  
Vol 21 (3) ◽  
pp. 764
Author(s):  
Zhiwen Huang ◽  
Quan Zhou ◽  
Xingxing Zhu ◽  
Xuming Zhang

In many medical image classification tasks, there is insufficient image data for deep convolutional neural networks (CNNs) to overcome the over-fitting problem. The light-weighted CNNs are easy to train but they usually have relatively poor classification performance. To improve the classification ability of light-weighted CNN models, we have proposed a novel batch similarity-based triplet loss to guide the CNNs to learn the weights. The proposed loss utilizes the similarity among multiple samples in the input batches to evaluate the distribution of training data. Reducing the proposed loss can increase the similarity among images of the same category and reduce the similarity among images of different categories. Besides this, it can be easily assembled into regular CNNs. To appreciate the performance of the proposed loss, some experiments have been done on chest X-ray images and skin rash images to compare it with several losses based on such popular light-weighted CNN models as EfficientNet, MobileNet, ShuffleNet and PeleeNet. The results demonstrate the applicability and effectiveness of our method in terms of classification accuracy, sensitivity and specificity.


2020 ◽  
Vol 2020 (10) ◽  
pp. 28-1-28-7 ◽  
Author(s):  
Kazuki Endo ◽  
Masayuki Tanaka ◽  
Masatoshi Okutomi

Classification of degraded images is very important in practice because images are usually degraded by compression, noise, blurring, etc. Nevertheless, most of the research in image classification only focuses on clean images without any degradation. Some papers have already proposed deep convolutional neural networks composed of an image restoration network and a classification network to classify degraded images. This paper proposes an alternative approach in which we use a degraded image and an additional degradation parameter for classification. The proposed classification network has two inputs which are the degraded image and the degradation parameter. The estimation network of degradation parameters is also incorporated if degradation parameters of degraded images are unknown. The experimental results showed that the proposed method outperforms a straightforward approach where the classification network is trained with degraded images only.


2019 ◽  
Vol 12 (3) ◽  
pp. 156-161 ◽  
Author(s):  
Aman Dureja ◽  
Payal Pahwa

Background: In making the deep neural network, activation functions play an important role. But the choice of activation functions also affects the network in term of optimization and to retrieve the better results. Several activation functions have been introduced in machine learning for many practical applications. But which activation function should use at hidden layer of deep neural networks was not identified. Objective: The primary objective of this analysis was to describe which activation function must be used at hidden layers for deep neural networks to solve complex non-linear problems. Methods: The configuration for this comparative model was used by using the datasets of 2 classes (Cat/Dog). The number of Convolutional layer used in this network was 3 and the pooling layer was also introduced after each layer of CNN layer. The total of the dataset was divided into the two parts. The first 8000 images were mainly used for training the network and the next 2000 images were used for testing the network. Results: The experimental comparison was done by analyzing the network by taking different activation functions on each layer of CNN network. The validation error and accuracy on Cat/Dog dataset were analyzed using activation functions (ReLU, Tanh, Selu, PRelu, Elu) at number of hidden layers. Overall the Relu gave best performance with the validation loss at 25th Epoch 0.3912 and validation accuracy at 25th Epoch 0.8320. Conclusion: It is found that a CNN model with ReLU hidden layers (3 hidden layers here) gives best results and improve overall performance better in term of accuracy and speed. These advantages of ReLU in CNN at number of hidden layers are helpful to effectively and fast retrieval of images from the databases.


Author(s):  
Volodymyr Shymkovych ◽  
Sergii Telenyk ◽  
Petro Kravets

AbstractThis article introduces a method for realizing the Gaussian activation function of radial-basis (RBF) neural networks with their hardware implementation on field-programmable gaits area (FPGAs). The results of modeling of the Gaussian function on FPGA chips of different families have been presented. RBF neural networks of various topologies have been synthesized and investigated. The hardware component implemented by this algorithm is an RBF neural network with four neurons of the latent layer and one neuron with a sigmoid activation function on an FPGA using 16-bit numbers with a fixed point, which took 1193 logic matrix gate (LUTs—LookUpTable). Each hidden layer neuron of the RBF network is designed on an FPGA as a separate computing unit. The speed as a total delay of the combination scheme of the block RBF network was 101.579 ns. The implementation of the Gaussian activation functions of the hidden layer of the RBF network occupies 106 LUTs, and the speed of the Gaussian activation functions is 29.33 ns. The absolute error is ± 0.005. The Spartan 3 family of chips for modeling has been used to get these results. Modeling on chips of other series has been also introduced in the article. RBF neural networks of various topologies have been synthesized and investigated. Hardware implementation of RBF neural networks with such speed allows them to be used in real-time control systems for high-speed objects.


2021 ◽  
Vol 11 (15) ◽  
pp. 6704
Author(s):  
Jingyong Cai ◽  
Masashi Takemoto ◽  
Yuming Qiu ◽  
Hironori Nakajo

Despite being heavily used in the training of deep neural networks (DNNs), multipliers are resource-intensive and insufficient in many different scenarios. Previous discoveries have revealed the superiority when activation functions, such as the sigmoid, are calculated by shift-and-add operations, although they fail to remove multiplications in training altogether. In this paper, we propose an innovative approach that can convert all multiplications in the forward and backward inferences of DNNs into shift-and-add operations. Because the model parameters and backpropagated errors of a large DNN model are typically clustered around zero, these values can be approximated by their sine values. Multiplications between the weights and error signals are transferred to multiplications of their sine values, which are replaceable with simpler operations with the help of the product to sum formula. In addition, a rectified sine activation function is utilized for further converting layer inputs into sine values. In this way, the original multiplication-intensive operations can be computed through simple add-and-shift operations. This trigonometric approximation method provides an efficient training and inference alternative for devices with insufficient hardware multipliers. Experimental results demonstrate that this method is able to obtain a performance close to that of classical training algorithms. The approach we propose sheds new light on future hardware customization research for machine learning.


Author(s):  
Hannah Garcia Doherty ◽  
Roberto Arnaiz Burgueño ◽  
Roeland P. Trommel ◽  
Vasileios Papanastasiou ◽  
Ronny I. A. Harmanny

Abstract Identification of human individuals within a group of 39 persons using micro-Doppler (μ-D) features has been investigated. Deep convolutional neural networks with two different training procedures have been used to perform classification. Visualization of the inner network layers revealed the sections of the input image most relevant when determining the class label of the target. A convolutional block attention module is added to provide a weighted feature vector in the channel and feature dimension, highlighting the relevant μ-D feature-filled areas in the image and improving classification performance.


Author(s):  
Bo Wang ◽  
Xiaoting Yu ◽  
Chengeng Huang ◽  
Qinghong Sheng ◽  
Yuanyuan Wang ◽  
...  

The excellent feature extraction ability of deep convolutional neural networks (DCNNs) has been demonstrated in many image processing tasks, by which image classification can achieve high accuracy with only raw input images. However, the specific image features that influence the classification results are not readily determinable and what lies behind the predictions is unclear. This study proposes a method combining the Sobel and Canny operators and an Inception module for ship classification. The Sobel and Canny operators obtain enhanced edge features from the input images. A convolutional layer is replaced with the Inception module, which can automatically select the proper convolution kernel for ship objects in different image regions. The principle is that the high-level features abstracted by the DCNN, and the features obtained by multi-convolution concatenation of the Inception module must ultimately derive from the edge information of the preprocessing input images. This indicates that the classification results are based on the input edge features, which indirectly interpret the classification results to some extent. Experimental results show that the combination of the edge features and the Inception module improves DCNN ship classification performance. The original model with the raw dataset has an average accuracy of 88.72%, while when using enhanced edge features as input, it achieves the best performance of 90.54% among all models. The model that replaces the fifth convolutional layer with the Inception module has the best performance of 89.50%. It performs close to VGG-16 on the raw dataset and is significantly better than other deep neural networks. The results validate the functionality and feasibility of the idea posited.


2021 ◽  
Vol 26 (jai2021.26(1)) ◽  
pp. 32-41
Author(s):  
Bodyanskiy Y ◽  
◽  
Antonenko T ◽  

Modern approaches in deep neural networks have a number of issues related to the learning process and computational costs. This article considers the architecture grounded on an alternative approach to the basic unit of the neural network. This approach achieves optimization in the calculations and gives rise to an alternative way to solve the problems of the vanishing and exploding gradient. The main issue of the article is the usage of the deep stacked neo-fuzzy system, which uses a generalized neo-fuzzy neuron to optimize the learning process. This approach is non-standard from a theoretical point of view, so the paper presents the necessary mathematical calculations and describes all the intricacies of using this architecture from a practical point of view. From a theoretical point, the network learning process is fully disclosed. Derived all necessary calculations for the use of the backpropagation algorithm for network training. A feature of the network is the rapid calculation of the derivative for the activation functions of neurons. This is achieved through the use of fuzzy membership functions. The paper shows that the derivative of such function is a constant, and this is a reason for the statement of increasing in the optimization rate in comparison with neural networks which use neurons with more common activation functions (ReLU, sigmoid). The paper highlights the main points that can be improved in further theoretical developments on this topic. In general, these issues are related to the calculation of the activation function. The proposed methods cope with these points and allow approximation using the network, but the authors already have theoretical justifications for improving the speed and approximation properties of the network. The results of the comparison of the proposed network with standard neural network architectures are shown


Sign in / Sign up

Export Citation Format

Share Document