scholarly journals Super Sparse Convolutional Neural Networks

Author(s):  
Yao Lu ◽  
Guangming Lu ◽  
Bob Zhang ◽  
Yuanrong Xu ◽  
Jinxing Li

To construct small mobile networks without performance loss and address the over-fitting issues caused by the less abundant training datasets, this paper proposes a novel super sparse convolutional (SSC) kernel, and its corresponding network is called SSC-Net. In a SSC kernel, every spatial kernel has only one non-zero parameter and these non-zero spatial positions are all different. The SSC kernel can effectively select the pixels from the feature maps according to its non-zero positions and perform on them. Therefore, SSC can preserve the general characteristics of the geometric and the channels’ differences, resulting in preserving the quality of the retrieved features and meeting the general accuracy requirements. Furthermore, SSC can be entirely implemented by the “shift” and “group point-wise” convolutional operations without any spatial kernels (e.g., “3×3”). Therefore, SSC is the first method to remove the parameters’ redundancy from the both spatial extent and the channel extent, leading to largely decreasing the parameters and Flops as well as further reducing the img2col and col2img operations implemented by the low leveled libraries. Meanwhile, SSC-Net can improve the sparsity and overcome the over-fitting more effectively than the other mobile networks. Comparative experiments were performed on the less abundant CIFAR and low resolution ImageNet datasets. The results showed that the SSC-Nets can significantly decrease the parameters and the computational Flops without any performance losses. Additionally, it can also improve the ability of addressing the over-fitting problem on the more challenging less abundant datasets.

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Andry Chowanda

AbstractSocial interactions are important for us, humans, as social creatures. Emotions play an important part in social interactions. They usually express meanings along with the spoken utterances to the interlocutors. Automatic facial expressions recognition is one technique to automatically capture, recognise, and understand emotions from the interlocutor. Many techniques proposed to increase the accuracy of emotions recognition from facial cues. Architecture such as convolutional neural networks demonstrates promising results for emotions recognition. However, most of the current models of convolutional neural networks require an enormous computational power to train and process emotional recognition. This research aims to build compact networks with depthwise separable layers while also maintaining performance. Three datasets and three other similar architectures were used to be compared with the proposed architecture. The results show that the proposed architecture performed the best among the other architectures. It achieved up to 13% better accuracy and 6–71% smaller and more compact than the other architectures. The best testing accuracy achieved by the architecture was 99.4%.


Author(s):  
G. Touya ◽  
F. Brisebard ◽  
F. Quinton ◽  
A. Courtial

Abstract. Visually impaired people cannot use classical maps but can learn to use tactile relief maps. These tactile maps are crucial at school to learn geography and history as well as the other students. They are produced manually by professional transcriptors in a very long and costly process. A platform able to generate tactile maps from maps scanned from geography textbooks could be extremely useful to these transcriptors, to fasten their production. As a first step towards such a platform, this paper proposes a method to infer the scale and the content of the map from its image. We used convolutional neural networks trained with a few hundred maps from French geography textbooks, and the results show promising results to infer labels about the content of the map (e.g. ”there are roads, cities and administrative boundaries”), and to infer the extent of the map (e.g. a map of France or of Europe).


2021 ◽  
Author(s):  
Kosuke Honda ◽  
Hamido Fujita

In recent years, template-based methods such as Siamese network trackers and Correlation Filter (CF) based trackers have achieved state-of-the-art performance in several benchmarks. Recent Siamese network trackers use deep features extracted from convolutional neural networks to locate the target. However, the tracking performance of these trackers decreases when there are similar distractors to the object and the target object is deformed. On the other hand, correlation filter (CF)-based trackers that use handcrafted features (e.g., HOG features) to spatially locate the target. These two approaches have complementary characteristics due to differences in learning methods, features used, and the size of search regions. Also, we found that these trackers are complementary in terms of performance in benchmarking. Therefore, we propose the “Complementary Tracking framework using Average peak-to-correlation energy” (CTA). CTA is the generic object tracking framework that connects CF-trackers and Siamese-trackers in parallel and exploits the complementary features of these. In CTA, when a tracking failure of the Siamese tracker is detected using Average peak-to-correlation energy (APCE), which is an evaluation index of the response map matrix, the CF-trackers correct the output. In experimental on OTB100, CTA significantly improves the performance over the original tracker for several combinations of Siamese-trackers and CF-rackers.


2020 ◽  
Vol 16 (5) ◽  
pp. 155014772092048
Author(s):  
Miguel Ángel López-Medina ◽  
Macarena Espinilla ◽  
Chris Nugent ◽  
Javier Medina Quero

The automatic detection of falls within environments where sensors are deployed has attracted considerable research interest due to the prevalence and impact of falling people, especially the elderly. In this work, we analyze the capabilities of non-invasive thermal vision sensors to detect falls using several architectures of convolutional neural networks. First, we integrate two thermal vision sensors with different capabilities: (1) low resolution with a wide viewing angle and (2) high resolution with a central viewing angle. Second, we include fuzzy representation of thermal information. Third, we enable the generation of a large data set from a set of few images using ad hoc data augmentation, which increases the original data set size, generating new synthetic images. Fourth, we define three types of convolutional neural networks which are adapted for each thermal vision sensor in order to evaluate the impact of the architecture on fall detection performance. The results show encouraging performance in single-occupancy contexts. In multiple occupancy, the low-resolution thermal vision sensor with a wide viewing angle obtains better performance and reduction of learning time, in comparison with the high-resolution thermal vision sensors with a central viewing angle.


Author(s):  
R. Rios-Cabrera ◽  
I Lopez-Juarez ◽  
Hsieh Sheng-Jen

An image processing methodology for the extraction of potato properties is explained. The objective is to determine their quality evaluating physical properties and using Artificial Neural Networks (ANN’s) to find misshapen potatoes. A comparative analysis for three connectionist models (Backpropagation, Perceptron and FuzzyARTMAP), evaluating speed and stability for classifying extracted properties is presented. The methodology for image processing and pattern feature extraction is presented together with some results. These results showed that FuzzyARTMAP outperformed the other models due to its stability and convergence speed with times as low as 1 ms per pattern which demonstrates its suitability for real-time inspection. Several algorithms to determine potato defects such as greening, scab, cracks are proposed which can be affectively used for grading different quality of potatoes.


Jurnal INFORM ◽  
2020 ◽  
Vol 5 (2) ◽  
pp. 99
Author(s):  
Andi Sanjaya ◽  
Endang Setyati ◽  
Herman Budianto

This research was conducted to observe the use of architectural model Convolutional Neural Networks (CNN) LeNEt, which was suitable to use for Pandava mask objects. The Data processing in the research was 200 data for each class or similar with 1000 trial data. Architectural model CNN LeNET used input layer 32x32, 64x64, 128x128, 224x224 and 256x256. The trial result with the input layer 32x32 succeeded, showing a faster time compared to the other layer. The result of accuracy value and validation was not under fitted or overfit. However, when the activation of the second dense process as changed from the relu to sigmoid, the result was better in sigmoid, in the tem of time, and the possibility of overfitting was less. The research result had a mean accuracy value of 0.96.


2018 ◽  
pp. 99-103
Author(s):  
D. S. Kolesnikov ◽  
D. A. Kuznetsov

State of the art convolutional neural networks provide high accuracy in solving a wide range of problems. Usually it is achieved by a significant increasing their computational complexity and the representation of the network parameters in single-precision floating point numbers. However, due to the limited resources, the application of networks in embedded systems and mobile applications in real time is problematic. One of the methods to solve this problem is to reduce the bit depth of data and use integer arithmetic. For this purpose, the network parameters are quantized. Performing quantization, it is necessary to ensure a minimum loss of recognition accuracy. The article proposes to use an optimal uniform quantizer with an adaptive step. The quantizer step depends on the distribution function of the quantized parameters. It reduces the effect of the quantization error on the recognition accuracy. There are also described approaches to improving the quality of quantization. The proposed quantization method is estimated on the CIFAR-10 database. It is shown that the optimal uniform quantizer for CIFAR-10 database with 8-bit representation of network parameters allows to achieve the accuracy of the initial trained network.


Author(s):  
Josep Arús-Pous ◽  
Simon Johansson ◽  
Oleksii Prykhodko ◽  
Esben Jannik Bjerrum ◽  
Christian Tyrchan ◽  
...  

Recurrent Neural Networks (RNNs) trained with a set of molecules represented as unique (canonical) SMILES strings, have shown the capacity to create large chemical spaces of valid and meaningful structures. Herein we perform an extensive benchmark on models trained with subsets of GDB-13 of different sizes (1 million , 10,000 and 1,000), with different SMILES variants (canonical, randomized and DeepSMILES), with two different recurrent cell types (LSTM and GRU) and with different hyperparameter combinations. To guide the benchmarks new metrics were developed that define the generated chemical space with respect to its uniformity, closedness and completeness. Results show that models that use LSTM cells trained with 1 million randomized SMILES, a non-unique molecular string representation, are able to generate larger chemical spaces than the other approaches and they represent more accurately the target chemical space. Specifically, a model was trained with randomized SMILES that was able to generate almost all molecules from GDB-13 with a quasi-uniform probability. Models trained with smaller samples show an even bigger improvement when trained with randomized SMILES models. Additionally, models were trained on molecules obtained from ChEMBL and illustrate again that training with randomized SMILES lead to models having a better representation of the drug-like chemical space. Namely, the model trained with randomized SMILES was able to generate at least double the amount of unique molecules with the same distribution of properties comparing to one trained with canonical SMILES.


2019 ◽  
Vol 11 (22) ◽  
pp. 2608 ◽  
Author(s):  
Dong Wang ◽  
Ying Li ◽  
Li Ma ◽  
Zongwen Bai ◽  
Jonathan Chan

In recent years, convolutional neural networks (CNNs) have shown promising performance in the field of multispectral (MS) and panchromatic (PAN) image fusion (MS pansharpening). However, the small-scale data and the gradient vanishing problem have been preventing the existing CNN-based fusion approaches from leveraging deeper networks that potentially have better representation ability to characterize the complex nonlinear mapping relationship between the input (source) and the targeting (fused) images. In this paper, we introduce a very deep network with dense blocks and residual learning to tackle these problems. The proposed network takes advantage of dense connections in dense blocks that have connections for arbitrarily two convolution layers to facilitate gradient flow and implicit deep supervision during training. In addition, reusing feature maps can reduce the number of parameters, which is helpful for reducing overfitting that resulted from small-scale data. Residual learning is explored to reduce the difficulty for the model to generate the MS image with high spatial resolution. The proposed network is evaluated via experiments on three datasets, achieving competitive or superior performance, e.g. the spectral angle mapper (SAM) is decreased over 10% on GaoFen-2, when compared with other state-of-the-art methods.


Sign in / Sign up

Export Citation Format

Share Document