scholarly journals Multiscale Convolutional Neural Networks for Hand Detection

2017 ◽  
Vol 2017 ◽  
pp. 1-13 ◽  
Author(s):  
Shiyang Yan ◽  
Yizhang Xia ◽  
Jeremy S. Smith ◽  
Wenjin Lu ◽  
Bailing Zhang

Unconstrained hand detection in still images plays an important role in many hand-related vision problems, for example, hand tracking, gesture analysis, human action recognition and human-machine interaction, and sign language recognition. Although hand detection has been extensively studied for decades, it is still a challenging task with many problems to be tackled. The contributing factors for this complexity include heavy occlusion, low resolution, varying illumination conditions, different hand gestures, and the complex interactions between hands and objects or other hands. In this paper, we propose a multiscale deep learning model for unconstrained hand detection in still images. Deep learning models, and deep convolutional neural networks (CNNs) in particular, have achieved state-of-the-art performances in many vision benchmarks. Developed from the region-based CNN (R-CNN) model, we propose a hand detection scheme based on candidate regions generated by a generic region proposal algorithm, followed by multiscale information fusion from the popular VGG16 model. Two benchmark datasets were applied to validate the proposed method, namely, the Oxford Hand Detection Dataset and the VIVA Hand Detection Challenge. We achieved state-of-the-art results on the Oxford Hand Detection Dataset and had satisfactory performance in the VIVA Hand Detection Challenge.

2016 ◽  
Vol 21 (9) ◽  
pp. 998-1003 ◽  
Author(s):  
Oliver Dürr ◽  
Beate Sick

Deep learning methods are currently outperforming traditional state-of-the-art computer vision algorithms in diverse applications and recently even surpassed human performance in object recognition. Here we demonstrate the potential of deep learning methods to high-content screening–based phenotype classification. We trained a deep learning classifier in the form of convolutional neural networks with approximately 40,000 publicly available single-cell images from samples treated with compounds from four classes known to lead to different phenotypes. The input data consisted of multichannel images. The construction of appropriate feature definitions was part of the training and carried out by the convolutional network, without the need for expert knowledge or handcrafted features. We compare our results against the recent state-of-the-art pipeline in which predefined features are extracted from each cell using specialized software and then fed into various machine learning algorithms (support vector machine, Fisher linear discriminant, random forest) for classification. The performance of all classification approaches is evaluated on an untouched test image set with known phenotype classes. Compared to the best reference machine learning algorithm, the misclassification rate is reduced from 8.9% to 6.6%.


2021 ◽  
Vol 5 (2 (113)) ◽  
pp. 44-54
Author(s):  
Chingiz Kenshimov ◽  
Samat Mukhanov ◽  
Timur Merembayev ◽  
Didar Yedilkhan

For people with disabilities, sign language is the most important means of communication. Therefore, more and more authors of various papers and scientists around the world are proposing solutions to use intelligent hand gesture recognition systems. Such a system is aimed not only for those who wish to understand a sign language, but also speak using gesture recognition software. In this paper, a new benchmark dataset for Kazakh fingerspelling, able to train deep neural networks, is introduced. The dataset contains more than 10122 gesture samples for 42 alphabets. The alphabet has its own peculiarities as some characters are shown in motion, which may influence sign recognition. Research and analysis of convolutional neural networks, comparison, testing, results and analysis of LeNet, AlexNet, ResNet and EffectiveNet – EfficientNetB7 methods are described in the paper. EffectiveNet architecture is state-of-the-art (SOTA) and is supposed to be a new one compared to other architectures under consideration. On this dataset, we showed that the LeNet and EffectiveNet networks outperform other competing algorithms. Moreover, EffectiveNet can achieve state-of-the-art performance on nother hand gesture datasets. The architecture and operation principle of these algorithms reflect the effectiveness of their application in sign language recognition. The evaluation of the CNN model score is conducted by using the accuracy and penalty matrix. During training epochs, LeNet and EffectiveNet showed better results: accuracy and loss function had similar and close trends. The results of EffectiveNet were explained by the tools of the SHapley Additive exPlanations (SHAP) framework. SHAP explored the model to detect complex relationships between features in the images. Focusing on the SHAP tool may help to further improve the accuracy of the model


2020 ◽  
Vol 2 (2) ◽  
pp. 32-37
Author(s):  
P. RADIUK ◽  

Over the last decade, a set of machine learning algorithms called deep learning has led to significant improvements in computer vision, natural language recognition and processing. This has led to the widespread use of a variety of commercial, learning-based products in various fields of human activity. Despite this success, the use of deep neural networks remains a black box. Today, the process of setting hyperparameters and designing a network architecture requires experience and a lot of trial and error and is based more on chance than on a scientific approach. At the same time, the task of simplifying deep learning is extremely urgent. To date, no simple ways have been invented to establish the optimal values of learning hyperparameters, namely learning speed, sample size, data set, learning pulse, and weight loss. Grid search and random search of hyperparameter space are extremely resource intensive. The choice of hyperparameters is critical for the training time and the final result. In addition, experts often choose one of the standard architectures (for example, ResNets and ready-made sets of hyperparameters. However, such kits are usually suboptimal for specific practical tasks. The presented work offers an approach to finding the optimal set of hyperparameters of learning ZNM. An integrated approach to all hyperparameters is valuable because there is an interdependence between them. The aim of the work is to develop an approach for setting a set of hyperparameters, which will reduce the time spent during the design of ZNM and ensure the efficiency of its work. In recent decades, the introduction of deep learning methods, in particular convolutional neural networks (CNNs), has led to impressive success in image and video processing. However, the training of CNN has been commonly mostly based on the employment of quasi-optimal hyperparameters. Such an approach usually requires huge computational and time costs to train the network and does not guarantee a satisfactory result. However, hyperparameters play a crucial role in the effectiveness of CNN, as diverse hyperparameters lead to models with significantly different characteristics. Poorly selected hyperparameters generally lead to low model performance. The issue of choosing optimal hyperparameters for CNN has not been resolved yet. The presented work proposes several practical approaches to setting hyperparameters, which allows reducing training time and increasing the accuracy of the model. The article considers the function of training validation loss during underfitting and overfitting. There are guidelines in the end to reach the optimization point. The paper also considers the regulation of learning rate and momentum to accelerate network training. All experiments are based on the widespread CIFAR-10 and CIFAR-100 datasets.


2021 ◽  
pp. 1-11
Author(s):  
Tianshi Mu ◽  
Kequan Lin ◽  
Huabing Zhang ◽  
Jian Wang

Deep learning is gaining significant traction in a wide range of areas. Whereas, recent studies have demonstrated that deep learning exhibits the fatal weakness on adversarial examples. Due to the black-box nature and un-transparency problem of deep learning, it is difficult to explain the reason for the existence of adversarial examples and also hard to defend against them. This study focuses on improving the adversarial robustness of convolutional neural networks. We first explore how adversarial examples behave inside the network through visualization. We find that adversarial examples produce perturbations in hidden activations, which forms an amplification effect to fool the network. Motivated by this observation, we propose an approach, termed as sanitizing hidden activations, to help the network correctly recognize adversarial examples by eliminating or reducing the perturbations in hidden activations. To demonstrate the effectiveness of our approach, we conduct experiments on three widely used datasets: MNIST, CIFAR-10 and ImageNet, and also compare with state-of-the-art defense techniques. The experimental results show that our sanitizing approach is more generalized to defend against different kinds of attacks and can effectively improve the adversarial robustness of convolutional neural networks.


Mathematics ◽  
2021 ◽  
Vol 9 (2) ◽  
pp. 189
Author(s):  
Feng Liu ◽  
Xuan Zhou ◽  
Xuehu Yan ◽  
Yuliang Lu ◽  
Shudong Wang

Steganalysis is a method to detect whether the objects contain secret messages. With the popularity of deep learning, using convolutional neural networks (CNNs), steganalytic schemes have become the chief method of combating steganography in recent years. However, the diversity of filters has not been fully utilized in the current research. This paper constructs a new effective network with diverse filter modules (DFMs) and squeeze-and-excitation modules (SEMs), which can better capture the embedding artifacts. As the essential parts, combining three different scale convolution filters, DFMs can process information diversely, and the SEMs can enhance the effective channels out from DFMs. The experiments presented that our CNN is effective against content-adaptive steganographic schemes with different payloads, such as S-UNIWARD and WOW algorithms. Moreover, some state-of-the-art methods are compared with our approach to demonstrate the outstanding performance.


Author(s):  
Chen Xin ◽  
Minh Nguyen ◽  
Wei Qi Yan

Identifying fire flames is based on object recognition which has valuable applications in intelligent surveillance. This chapter focuses on flame recognition using deep learning and its evaluations. For achieving this goal, authors design a Multi-Flame Detection scheme (MFD) which utilises Convolutional Neural Networks (CNNs). Authors take use of TensorFlow in deep learning with an NVIDIA GPU to train an image dataset and constructed a model for flame recognition. The contributions of this book chapter are: (1) data augmentation for flame recognition, (2) model construction for deep learning, and (3) result evaluations for flame recognition using deep learning.


2017 ◽  
Vol 37 (4-5) ◽  
pp. 513-542 ◽  
Author(s):  
Sen Wang ◽  
Ronald Clark ◽  
Hongkai Wen ◽  
Niki Trigoni

This paper studies visual odometry (VO) from the perspective of deep learning. After tremendous efforts in the robotics and computer vision communities over the past few decades, state-of-the-art VO algorithms have demonstrated incredible performance. However, since the VO problem is typically formulated as a pure geometric problem, one of the key features still missing from current VO systems is the capability to automatically gain knowledge and improve performance through learning. In this paper, we investigate whether deep neural networks can be effective and beneficial to the VO problem. An end-to-end, sequence-to-sequence probabilistic visual odometry (ESP-VO) framework is proposed for the monocular VO based on deep recurrent convolutional neural networks. It is trained and deployed in an end-to-end manner, that is, directly inferring poses and uncertainties from a sequence of raw images (video) without adopting any modules from the conventional VO pipeline. It can not only automatically learn effective feature representation encapsulating geometric information through convolutional neural networks, but also implicitly model sequential dynamics and relation for VO using deep recurrent neural networks. Uncertainty is also derived along with the VO estimation without introducing much extra computation. Extensive experiments on several datasets representing driving, flying and walking scenarios show competitive performance of the proposed ESP-VO to the state-of-the-art methods, demonstrating a promising potential of the deep learning technique for VO and verifying that it can be a viable complement to current VO systems.


Energies ◽  
2020 ◽  
Vol 13 (21) ◽  
pp. 5758
Author(s):  
Xiaofeng Feng ◽  
Hengyu Hui ◽  
Ziyang Liang ◽  
Wenchong Guo ◽  
Huakun Que ◽  
...  

Electricity theft decreases electricity revenues and brings risks to power usage’s safety, which has been increasingly challenging nowadays. As the mainstream in the relevant studies, the state-of-the-art data-driven approaches mainly detect electricity theft events from the perspective of the correlations between different daily or weekly loads, which is relatively inadequate to extract features from hours or more of fine-grained temporal data. In view of the above deficiencies, we propose a novel electricity theft detection scheme based on text convolutional neural networks (TextCNN). Specifically, we convert electricity consumption measurements over a horizon of interest into a two-dimensional time-series containing the intraday electricity features. Based on the data structure, the proposed method can accurately capture various periodical features of electricity consumption. Moreover, a data augmentation method is proposed to cope with the imbalance of electricity theft data. Extensive experimental results based on realistic Chinese and Irish datasets indicate that the proposed model achieves a better performance compared with other existing methods.


Information ◽  
2019 ◽  
Vol 10 (5) ◽  
pp. 157 ◽  
Author(s):  
Daniel S. Berman

Domain generation algorithms (DGAs) represent a class of malware used to generate large numbers of new domain names to achieve command-and-control (C2) communication between the malware program and its C2 server to avoid detection by cybersecurity measures. Deep learning has proven successful in serving as a mechanism to implement real-time DGA detection, specifically through the use of recurrent neural networks (RNNs) and convolutional neural networks (CNNs). This paper compares several state-of-the-art deep-learning implementations of DGA detection found in the literature with two novel models: a deeper CNN model and a one-dimensional (1D) Capsule Networks (CapsNet) model. The comparison shows that the 1D CapsNet model performs as well as the best-performing model from the literature.


Sign in / Sign up

Export Citation Format

Share Document