Hybrid pooling with wavelets for convolutional neural networks

State Of The Art ◽

Computational Cost ◽

Relevant Information ◽

Accuracy Improvement ◽

Proposed Model ◽

Benchmark Datasets ◽

Augmentation Techniques ◽

High Computational Cost

The need to detect and classify objects correctly is a constant challenge, being able to recognize them at different scales and scenarios, sometimes cropped or badly lit is not an easy task. Convolutional neural networks (CNN) have become a widely applied technique since they are completely trainable and suitable to extract features. However, the growing number of convolutional neural networks applications constantly pushes their accuracy improvement. Initially, those improvements involved the use of large datasets, augmentation techniques, and complex algorithms. These methods may have a high computational cost. Nevertheless, feature extraction is known to be the heart of the problem. As a result, other approaches combine different technologies to extract better features to improve the accuracy without the need of more powerful hardware resources. In this paper, we propose a hybrid pooling method that incorporates multiresolution analysis within the CNN layers to reduce the feature map size without losing details. To prevent relevant information from losing during the downsampling process an existing pooling method is combined with wavelet transform technique, keeping those details "alive" and enriching other stages of the CNN. Achieving better quality characteristics improves CNN accuracy. To validate this study, ten pooling methods, including the proposed model, are tested using four benchmark datasets. The results are compared with four of the evaluated methods, which are also considered as the state-of-the-art.

LdsConv: Learned Depthwise Separable Convolutions by Group Pruning

Sensors ◽

10.3390/s20154349 ◽

2020 ◽

Vol 20 (15) ◽

pp. 4349

Author(s):

Wenxiang Lin ◽

Yan Ding ◽

Hua-Liang Wei ◽

Xinglin Pan ◽

Yutong Zhang

Keyword(s):

Neural Networks ◽

State Of The Art ◽

Computational Cost ◽

The State ◽

Direct Replacement ◽

Improved Accuracy ◽

Pruning Technique ◽

Strong Capacity

Standard convolutional filters usually capture unnecessary overlap of features resulting in a waste of computational cost. In this paper, we aim to solve this problem by proposing a novel Learned Depthwise Separable Convolution (LdsConv) operation that is smart but has a strong capacity for learning. It integrates the pruning technique into the design of convolutional filters, formulated as a generic convolutional unit that can be used as a direct replacement of convolutions without any adjustments of the architecture. To show the effectiveness of the proposed method, experiments are carried out using the state-of-the-art convolutional neural networks (CNNs), including ResNet, DenseNet, SE-ResNet and MobileNet, respectively. The results show that by simply replacing the original convolution with LdsConv in these CNNs, it can achieve a significantly improved accuracy while reducing computational cost. For the case of ResNet50, the FLOPs can be reduced by 40.9%, meanwhile the accuracy on the associated ImageNet increases.

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

High Performance Gesture Recognition via Effective and Efficient Temporal Modeling

10.24963/ijcai.2019/141 ◽

2019 ◽

Author(s):

Yang Yi ◽

Feng Ni ◽

Yuexin Ma ◽

Xinge Zhu ◽

Yuankai Qi ◽

...

Keyword(s):

Neural Networks ◽

Gesture Recognition ◽

High Performance ◽

Short Term Memory ◽

State Of The Art ◽

Computational Cost ◽

Temporal Modeling ◽

Spatiotemporal Features ◽

Public Datasets

State-of-the-art hand gesture recognition methods have investigated the spatiotemporal features based on 3D convolutional neural networks (3DCNNs) or convolutional long short-term memory (ConvLSTM). However, they often suffer from the inefficiency due to the high computational complexity of their network structures. In this paper, we focus instead on the 1D convolutional neural networks and propose a simple and efficient architectural unit, Multi-Kernel Temporal Block (MKTB), that models the multi-scale temporal responses by explicitly applying different temporal kernels. Then, we present a Global Refinement Block (GRB), which is an attention module for shaping the global temporal features based on the cross-channel similarity. By incorporating the MKTB and GRB, our architecture can effectively explore the spatiotemporal features within tolerable computational cost. Extensive experiments conducted on public datasets demonstrate that our proposed model achieves the state-of-the-art with higher efficiency. Moreover, the proposed MKTB and GRB are plug-and-play modules and the experiments on other tasks, like video understanding and video-based person re-identification, also display their good performance in efficiency and capability of generalization.

Multi-granularity pruning for deep residual networks

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-200771 ◽

2020 ◽

Vol 39 (5) ◽

pp. 7403-7410

Author(s):

Yangke Huang ◽

Zhiming Wang

Keyword(s):

Neural Networks ◽

Compression Ratio ◽

Gradient Descent ◽

Computational Cost ◽

Acceleration Ratio ◽

Network Pruning ◽

High Computational Cost ◽

Pruning Methods

Network pruning has been widely used to reduce the high computational cost of deep convolutional neural networks(CNNs). The dominant pruning methods, channel pruning, removes filters in layers based on their importance or sparsity training. But these methods often give limited acceleration ratio and encounter difficulties when pruning CNNs with skip connections. Block pruning methods take a sequence of consecutive layers (e.g., Conv-BN-ReLu) as a block and remove entire block each time. However, previous methods usually introduce new parameters to help pruning and lead additional parameters and extra computations. This work proposes a novel multi-granularity pruning approach that combines block pruning with channel pruning (BPCP). The block pruning (BP) module remove blocks by directly searches the redundant blocks with gradient descent and leaves no extra parameters in final models, which is friendly to hardware optimization. The channel pruning (CP) module remove redundant channels based on importance criteria and handles CNNs with skip connections properly, which further improves the overall compression ratio. As a result, for CIFAR10, BPCP reduces the number of parameters and MACs of a ResNet56 model up to 78.9% and 80.3% respectively with <3% accuracy drop. In terms of speed, it gives a 3.17 acceleration ratio. Our code has been made available at https://github.com/Pokemon-Huang/BPCP.

A Novel Electricity Theft Detection Scheme Based on Text Convolutional Neural Networks

Energies ◽

10.3390/en13215758 ◽

2020 ◽

Vol 13 (21) ◽

pp. 5758

Author(s):

Xiaofeng Feng ◽

Hengyu Hui ◽

Ziyang Liang ◽

Wenchong Guo ◽

Huakun Que ◽

...

Keyword(s):

Neural Networks ◽

Data Augmentation ◽

State Of The Art ◽

Electricity Consumption ◽

Temporal Data ◽

Detection Scheme ◽

Fine Grained ◽

Electricity Theft ◽

Proposed Model

Electricity theft decreases electricity revenues and brings risks to power usage’s safety, which has been increasingly challenging nowadays. As the mainstream in the relevant studies, the state-of-the-art data-driven approaches mainly detect electricity theft events from the perspective of the correlations between different daily or weekly loads, which is relatively inadequate to extract features from hours or more of fine-grained temporal data. In view of the above deficiencies, we propose a novel electricity theft detection scheme based on text convolutional neural networks (TextCNN). Specifically, we convert electricity consumption measurements over a horizon of interest into a two-dimensional time-series containing the intraday electricity features. Based on the data structure, the proposed method can accurately capture various periodical features of electricity consumption. Moreover, a data augmentation method is proposed to cope with the imbalance of electricity theft data. Extensive experimental results based on realistic Chinese and Irish datasets indicate that the proposed model achieves a better performance compared with other existing methods.

Optimizing 3D Convolution Kernels on Stereo Matching for Resource Efficient Computations

Sensors ◽

10.3390/s21206808 ◽

2021 ◽

Vol 21 (20) ◽

pp. 6808

Author(s):

Jianqiang Xiao ◽

Dianbo Ma ◽

Satoshi Yamane

Keyword(s):

Neural Networks ◽

Computational Complexity ◽

Stereo Matching ◽

State Of The Art ◽

Computational Cost ◽

The State ◽

Matching Network ◽

Convolution Kernels ◽

Low Computational Cost

Despite recent stereo matching algorithms achieving significant results on public benchmarks, the problem of requiring heavy computation remains unsolved. Most works focus on designing an architecture to reduce the computational complexity, while we take aim at optimizing 3D convolution kernels on the Pyramid Stereo Matching Network (PSMNet) for solving the problem. In this paper, we design a series of comparative experiments exploring the performance of well-known convolution kernels on PSMNet. Our model saves the computational complexity from 256.66G MAdd (Multiply-Add operations) to 69.03G MAdd (198.47G MAdd to 10.84G MAdd for only considering 3D convolutional neural networks) without losing accuracy. On Scene Flow and KITTI 2015 datasets, our model achieves results comparable to the state-of-the-art with a low computational cost.

iCaps-Dfake: An Integrated Capsule-Based Model for Deepfake Image and Video Detection

Future Internet ◽

10.3390/fi13040093 ◽

2021 ◽

Vol 13 (4) ◽

pp. 93

Author(s):

Samar Samir Khalil ◽

Sherin M. Youssef ◽

Sherine Nagy Saleh

Keyword(s):

Neural Networks ◽

Performance Metrics ◽

State Of The Art ◽

Local Binary Patterns ◽

Extraction Methods ◽

Media Forensics ◽

Proposed Model ◽

Generalization Problem ◽

Video Detection ◽

Benchmark Datasets

Fake media is spreading like wildfire all over the internet as a result of the great advancement in deepfake creation tools and the huge interest researchers and corporations are showing to explore its limits. Now anyone can create manipulated unethical media forensics, defame, humiliate others or even scam them out of their money with a click of a button. In this research a new deepfake detection approach, iCaps-Dfake, is proposed that competes with state-of-the-art techniques of deepfake video detection and addresses their low generalization problem. Two feature extraction methods are combined, texture-based Local Binary Patterns (LBP) and Convolutional Neural Networks (CNN) based modified High-Resolution Network (HRNet), along with an application of capsule neural networks (CapsNets) implementing a concurrent routing technique. Experiments have been conducted on large benchmark datasets to evaluate the performance of the proposed model. Several performance metrics are applied and experimental results are analyzed. The proposed model was primarily trained and tested on the DeepFakeDetectionChallenge-Preview (DFDC-P) dataset then tested on Celeb-DF to examine its generalization capability. Experiments achieved an Area-Under Curve (AUC) score improvement of 20.25% over state-of-the-art models.

A hybrid differential evolution approach to designing deep convolutional neural networks for image classification

10.26686/wgtn.13158293 ◽

2020 ◽

Author(s):

Bin Wang ◽

Y Sun ◽

Bing Xue ◽

Mengjie Zhang

Keyword(s):

Neural Networks ◽

Differential Evolution ◽

Image Classification ◽

State Of The Art ◽

Variable Length ◽

De Algorithm ◽

Benchmark Datasets ◽

Encoding Strategy

© Springer Nature Switzerland AG 2018. Convolutional Neural Networks (CNNs) have demonstrated their superiority in image classification, and evolutionary computation (EC) methods have recently been surging to automatically design the architectures of CNNs to save the tedious work of manually designing CNNs. In this paper, a new hybrid differential evolution (DE) algorithm with a newly added crossover operator is proposed to evolve the architectures of CNNs of any lengths, which is named DECNN. There are three new ideas in the proposed DECNN method. Firstly, an existing effective encoding scheme is refined to cater for variable-length CNN architectures; Secondly, the new mutation and crossover operators are developed for variable-length DE to optimise the hyperparameters of CNNs; Finally, the new second crossover is introduced to evolve the depth of the CNN architectures. The proposed algorithm is tested on six widely-used benchmark datasets and the results are compared to 12 state-of-the-art methods, which shows the proposed method is vigorously competitive to the state-of-the-art algorithms. Furthermore, the proposed method is also compared with a method using particle swarm optimisation with a similar encoding strategy named IPPSO, and the proposed DECNN outperforms IPPSO in terms of the accuracy.

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks

10.24963/ijcai.2018/309 ◽

2018 ◽

Cited By ~ 79

Author(s):

Yang He ◽

Guoliang Kang ◽

Xuanyi Dong ◽

Yanwei Fu ◽

Yi Yang

Keyword(s):

Neural Networks ◽

State Of The Art ◽

The State ◽

Training Data ◽

Inference Procedure ◽

Accuracy Improvement ◽

Large Capacity ◽

Pruning Methods

This paper proposed a Soft Filter Pruning (SFP) method to accelerate the inference procedure of deep Convolutional Neural Networks (CNNs). Specifically, the proposed SFP enables the pruned filters to be updated when training the model after pruning. SFP has two advantages over previous works: (1) Larger model capacity. Updating previously pruned filters provides our approach with larger optimization space than fixing the filters to zero. Therefore, the network trained by our method has a larger model capacity to learn from the training data. (2) Less dependence on the pretrained model. Large capacity enables SFP to train from scratch and prune the model simultaneously. In contrast, previous filter pruning methods should be conducted on the basis of the pre-trained model to guarantee their performance. Empirically, SFP from scratch outperforms the previous filter pruning methods. Moreover, our approach has been demonstrated effective for many advanced CNN architectures. Notably, on ILSCRC-2012, SFP reduces more than 42% FLOPs on ResNet-101 with even 0.2% top-5 accuracy improvement, which has advanced the state-of-the-art. Code is publicly available on GitHub: https://github.com/he-y/softfilter-pruning

A hybrid differential evolution approach to designing deep convolutional neural networks for image classification

10.26686/wgtn.13158293.v1 ◽

2020 ◽

Author(s):

Bin Wang ◽

Y Sun ◽

Bing Xue ◽

Mengjie Zhang

Keyword(s):

Neural Networks ◽

Differential Evolution ◽

Image Classification ◽

State Of The Art ◽

Variable Length ◽

De Algorithm ◽

Benchmark Datasets ◽

Encoding Strategy

Adaptive Region Embedding for Text Classification

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33017314 ◽

2019 ◽

Vol 33 ◽

pp. 7314-7321

Author(s):

Liuyu Xiang ◽

Xiaoming Jin ◽

Lan Yi ◽

Guiguang Ding

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Text Classification ◽

State Of The Art ◽

Context Information ◽

Great Success ◽

Recurrent Networks ◽

Learning Models ◽

Benchmark Datasets

Deep learning models such as convolutional neural networks and recurrent networks are widely applied in text classification. In spite of their great success, most deep learning models neglect the importance of modeling context information, which is crucial to understanding texts. In this work, we propose the Adaptive Region Embedding to learn context representation to improve text classification. Specifically, a metanetwork is learned to generate a context matrix for each region, and each word interacts with its corresponding context matrix to produce the regional representation for further classification. Compared to previous models that are designed to capture context information, our model contains less parameters and is more flexible. We extensively evaluate our method on 8 benchmark datasets for text classification. The experimental results prove that our method achieves state-of-the-art performances and effectively avoids word ambiguity.