scholarly journals Scale Invariant Fully Convolutional Network: Detecting Hands Efficiently

Author(s):  
Dan Liu ◽  
Dawei Du ◽  
Libo Zhang ◽  
Tiejian Luo ◽  
Yanjun Wu ◽  
...  

Existing hand detection methods usually follow the pipeline of multiple stages with high computation cost, i.e., feature extraction, region proposal, bounding box regression, and additional layers for rotated region detection. In this paper, we propose a new Scale Invariant Fully Convolutional Network (SIFCN) trained in an end-to-end fashion to detect hands efficiently. Specifically, we merge the feature maps from high to low layers in an iterative way, which handles different scales of hands better with less time overhead comparing to concatenating them simply. Moreover, we develop the Complementary Weighted Fusion (CWF) block to make full use of the distinctive features among multiple layers to achieve scale invariance. To deal with rotated hand detection, we present the rotation map to get rid of complex rotation and derotation layers. Besides, we design the multi-scale loss scheme to accelerate the training process significantly by adding supervision to the intermediate layers of the network. Compared with the state-of-the-art methods, our algorithm shows comparable accuracy and runs a 4.23 times faster speed on the VIVA dataset and achieves better average precision on Oxford hand detection dataset at a speed of 62.5 fps.

2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Xiaodong Huang ◽  
Hui Zhang ◽  
Li Zhuo ◽  
Xiaoguang Li ◽  
Jing Zhang

Extracting the tongue body accurately from a digital tongue image is a challenge for automated tongue diagnoses, as the blurred edge of the tongue body, interference of pathological details, and the huge difference in the size and shape of the tongue. In this study, an automated tongue image segmentation method using enhanced fully convolutional network with encoder-decoder structure was presented. In the frame of the proposed network, the deep residual network was adopted as an encoder to obtain dense feature maps, and a Receptive Field Block was assembled behind the encoder. Receptive Field Block can capture adequate global contextual prior because of its structure of the multibranch convolution layers with varying kernels. Moreover, the Feature Pyramid Network was used as a decoder to fuse multiscale feature maps for gathering sufficient positional information to recover the clear contour of the tongue body. The quantitative evaluation of the segmentation results of 300 tongue images from the SIPL-tongue dataset showed that the average Hausdorff Distance, average Symmetric Mean Absolute Surface Distance, average Dice Similarity Coefficient, average precision, average sensitivity, and average specificity were 11.2963, 3.4737, 97.26%, 95.66%, 98.97%, and 98.68%, respectively. The proposed method achieved the best performance compared with the other four deep-learning-based segmentation methods (including SegNet, FCN, PSPNet, and DeepLab v3+). There were also similar results on the HIT-tongue dataset. The experimental results demonstrated that the proposed method can achieve accurate tongue image segmentation and meet the practical requirements of automated tongue diagnoses.


2021 ◽  
Vol 13 (16) ◽  
pp. 3211
Author(s):  
Tian Tian ◽  
Zhengquan Chu ◽  
Qian Hu ◽  
Li Ma

Semantic segmentation is a fundamental task in remote sensing image interpretation, which aims to assign a semantic label for every pixel in the given image. Accurate semantic segmentation is still challenging due to the complex distributions of various ground objects. With the development of deep learning, a series of segmentation networks represented by fully convolutional network (FCN) has made remarkable progress on this problem, but the segmentation accuracy is still far from expectations. This paper focuses on the importance of class-specific features of different land cover objects, and presents a novel end-to-end class-wise processing framework for segmentation. The proposed class-wise FCN (C-FCN) is shaped in the form of an encoder-decoder structure with skip-connections, in which the encoder is shared to produce general features for all categories and the decoder is class-wise to process class-specific features. To be detailed, class-wise transition (CT), class-wise up-sampling (CU), class-wise supervision (CS), and class-wise classification (CC) modules are designed to achieve the class-wise transfer, recover the resolution of class-wise feature maps, bridge the encoder and modified decoder, and implement class-wise classifications, respectively. Class-wise and group convolutions are adopted in the architecture with regard to the control of parameter numbers. The method is tested on the public ISPRS 2D semantic labeling benchmark datasets. Experimental results show that the proposed C-FCN significantly improves the segmentation performances compared with many state-of-the-art FCN-based networks, revealing its potentials on accurate segmentation of complex remote sensing images.


2018 ◽  
Vol 30 (7) ◽  
pp. 1775-1800 ◽  
Author(s):  
Xiaopeng Guo ◽  
Rencan Nie ◽  
Jinde Cao ◽  
Dongming Zhou ◽  
Wenhua Qian

As the optical lenses for cameras always have limited depth of field, the captured images with the same scene are not all in focus. Multifocus image fusion is an efficient technology that can synthesize an all-in-focus image using several partially focused images. Previous methods have accomplished the fusion task in spatial or transform domains. However, fusion rules are always a problem in most methods. In this letter, from the aspect of focus region detection, we propose a novel multifocus image fusion method based on a fully convolutional network (FCN) learned from synthesized multifocus images. The primary novelty of this method is that the pixel-wise focus regions are detected through a learning FCN, and the entire image, not just the image patches, are exploited to train the FCN. First, we synthesize 4500 pairs of multifocus images by repeatedly using a gaussian filter for each image from PASCAL VOC 2012, to train the FCN. After that, a pair of source images is fed into the trained FCN, and two score maps indicating the focus property are generated. Next, an inversed score map is averaged with another score map to produce an aggregative score map, which take full advantage of focus probabilities in two score maps. We implement the fully connected conditional random field (CRF) on the aggregative score map to accomplish and refine a binary decision map for the fusion task. Finally, we exploit the weighted strategy based on the refined decision map to produce the fused image. To demonstrate the performance of the proposed method, we compare its fused results with several start-of-the-art methods not only on a gray data set but also on a color data set. Experimental results show that the proposed method can achieve superior fusion performance in both human visual quality and objective assessment.


Sensors ◽  
2020 ◽  
Vol 20 (7) ◽  
pp. 2069 ◽  
Author(s):  
Chuncheng Feng ◽  
Hua Zhang ◽  
Haoran Wang ◽  
Shuang Wang ◽  
Yonglong Li

Crack detection on dam surfaces is an important task for safe inspection of hydropower stations. More and more object detection methods based on deep learning are being applied to crack detection. However, most of the methods can only achieve the classification and rough location of cracks. Pixel-level crack detection can provide more intuitive and accurate detection results for dam health assessment. To realize pixel-level crack detection, a method of crack detection on dam surface (CDDS) using deep convolution network is proposed. First, we use an unmanned aerial vehicle (UAV) to collect dam surface images along a predetermined trajectory. Second, raw images are cropped. Then crack regions are manually labelled on cropped images to create the crack dataset, and the architecture of CDDS network is designed. Finally, the CDDS network is trained, validated and tested using the crack dataset. To validate the performance of the CDDS network, the predicted results are compared with ResNet152-based, SegNet, UNet and fully convolutional network (FCN). In terms of crack segmentation, the recall, precision, F-measure and IoU are 80.45%, 80.31%, 79.16%, and 66.76%. The results on test dataset show that the CDDS network has better performance for crack detection of dam surfaces.


Electronics ◽  
2019 ◽  
Vol 8 (10) ◽  
pp. 1151 ◽  
Author(s):  
Xia Hua ◽  
Xinqing Wang ◽  
Ting Rui ◽  
Dong Wang ◽  
Faming Shao

Aiming at the real-time detection of multiple objects and micro-objects in large-scene remote sensing images, a cascaded convolutional neural network real-time object-detection framework for remote sensing images is proposed, which integrates visual perception and convolutional memory network reasoning. The detection framework is composed of two fully convolutional networks, namely, the strengthened object self-attention pre-screening fully convolutional network (SOSA-FCN) and the object accurate detection fully convolutional network (AD-FCN). SOSA-FCN introduces a self-attention module to extract attention feature maps and constructs a depth feature pyramid to optimize the attention feature maps by combining convolutional long-term and short-term memory networks. It guides the acquisition of potential sub-regions of the object in the scene, reduces the computational complexity, and enhances the network’s ability to extract multi-scale object features. It adapts to the complex background and small object characteristics of a large-scene remote sensing image. In AD-FCN, the object mask and object orientation estimation layer are designed to achieve fine positioning of candidate frames. The performance of the proposed algorithm is compared with that of other advanced methods on NWPU_VHR-10, DOTA, UCAS-AOD, and other open datasets. The experimental results show that the proposed algorithm significantly improves the efficiency of object detection while ensuring detection accuracy and has high adaptability. It has extensive engineering application prospects.


Sensors ◽  
2020 ◽  
Vol 20 (23) ◽  
pp. 6969
Author(s):  
Xiangda Lei ◽  
Hongtao Wang ◽  
Cheng Wang ◽  
Zongze Zhao ◽  
Jianqi Miao ◽  
...  

Airborne laser scanning (ALS) point cloud has been widely used in various fields, for it can acquire three-dimensional data with a high accuracy on a large scale. However, due to the fact that ALS data are discretely, irregularly distributed and contain noise, it is still a challenge to accurately identify various typical surface objects from 3D point cloud. In recent years, many researchers proved better results in classifying 3D point cloud by using different deep learning methods. However, most of these methods require a large number of training samples and cannot be widely used in complex scenarios. In this paper, we propose an ALS point cloud classification method to integrate an improved fully convolutional network into transfer learning with multi-scale and multi-view deep features. First, the shallow features of the airborne laser scanning point cloud such as height, intensity and change of curvature are extracted to generate feature maps by multi-scale voxel and multi-view projection. Second, these feature maps are fed into the pre-trained DenseNet201 model to derive deep features, which are used as input for a fully convolutional neural network with convolutional and pooling layers. By using this network, the local and global features are integrated to classify the ALS point cloud. Finally, a graph-cuts algorithm considering context information is used to refine the classification results. We tested our method on the semantic 3D labeling dataset of the International Society for Photogrammetry and Remote Sensing (ISPRS). Experimental results show that overall accuracy and the average F1 score obtained by the proposed method is 89.84% and 83.62%, respectively, when only 16,000 points of the original data are used for training.


2019 ◽  
Vol 13 (4) ◽  
pp. 583-590 ◽  
Author(s):  
Jianwei Zhang ◽  
Junting He ◽  
Tianfu Chen ◽  
Zhenmei Liu ◽  
Danni Chen

2019 ◽  
Vol 85 (10) ◽  
pp. 737-752
Author(s):  
Yihua Tan ◽  
Shengzhou Xiong ◽  
Zhi Li ◽  
Jinwen Tian ◽  
Yansheng Li

The analysis of built-up areas has always been a popular research topic for remote sensing applications. However, automatic extraction of built-up areas from a wide range of regions remains challenging. In this article, a fully convolutional network (FCN)–based strategy is proposed to address built-up area extraction. The proposed algorithm can be divided into two main steps. First, divide the remote sensing image into blocks and extract their deep features by a lightweight multi-branch convolutional neural network (LMB-CNN). Second, rearrange the deep features into feature maps that are fed into a well-designed FCN for image segmentation. Our FCN is integrated with multi-branch blocks and outputs multi-channel segmentation masks that are utilized to balance the false alarm and missing alarm. Experiments demonstrate that the overall classification accuracy of the proposed algorithm can achieve 98.75% in the test data set and that it has a faster processing compared with the existing state-of-the-art algorithms.


Sign in / Sign up

Export Citation Format

Share Document