Genetic Feature Fusion for Object Skeleton Detection

Object skeleton detection requires the convolutional neural networks to recognize objects and their parts in the cluttered background, overcome the image definition degradation brought by the pooling layers, and predict the location of skeleton pixels in different scale granularity. Most existing object skeleton detection methods take great efforts into the designing of side-output networks for multiscale feature fusion. Despite the great progress achieved by them, there are still many problems that hinder the development of object skeleton detection, such as the manually designed network is labor-intensive and the network initialization depends on models pretrained on large-scale datasets. To alleviate these issues, we propose a genetic NAS method to automatically search on a newly designed architecture search space for adaptive multiscale feature fusion. Furthermore, we introduce a symmetric encoder-decoder search space based on reversing the VGG network, in which the decoder can reuse the ImageNet pretrained model of VGG. The searched networks improve the performance of the state-of-the-art methods on commonly used skeleton detection benchmarks, which proves the efficacy of our method.

Download Full-text

SM-NAS: Structural-to-Modular Neural Architecture Search for Object Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6958 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12661-12668 ◽

Cited By ~ 3

Author(s):

Lewei Yao ◽

Hang Xu ◽

Wei Zhang ◽

Xiaodan Liang ◽

Zhenguo Li

Keyword(s):

Object Detection ◽

Feature Fusion ◽

State Of The Art ◽

Computational Cost ◽

Search Space ◽

Detection Methods ◽

Structural Level ◽

Neural Architecture ◽

Art Object ◽

Searching Strategy

The state-of-the-art object detection method is complicated with various modules such as backbone, RPN, feature fusion neck and RCNN head, where each module may have different designs and structures. How to leverage the computational cost and accuracy trade-off for the structural combination as well as the modular selection of multiple modules? Neural architecture search (NAS) has shown great potential in finding an optimal solution. Existing NAS works for object detection only focus on searching better design of a single module such as backbone or feature fusion neck, while neglecting the balance of the whole system. In this paper, we present a two-stage coarse-to-fine searching strategy named Structural-to-Modular NAS (SM-NAS) for searching a GPU-friendly design of both an efficient combination of modules and better modular-level architecture for object detection. Specifically, Structural-level searching stage first aims to find an efficient combination of different modules; Modular-level searching stage then evolves each specific module and pushes the Pareto front forward to a faster task-specific network. We consider a multi-objective search where the search space covers many popular designs of detection methods. We directly search a detection backbone without pre-trained models or any proxy task by exploring a fast training from scratch strategy. The resulting architectures dominate state-of-the-art object detection systems in both inference time and accuracy and demonstrate the effectiveness on multiple detection datasets, e.g. halving the inference time with additional 1% mAP improvement compared to FPN and reaching 46% mAP with the similar inference time of MaskRCNN.

Download Full-text

Efficient exploratory clustering analyses in large-scale exploration processes

The VLDB Journal ◽

10.1007/s00778-021-00716-y ◽

2021 ◽

Author(s):

Manuel Fritz ◽

Michael Behringer ◽

Dennis Tschechlov ◽

Holger Schwarz

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

Comprehensive Evaluation ◽

State Of The Art ◽

Clustering Algorithms ◽

Search Space ◽

Large Datasets ◽

Search Spaces ◽

Multiple Challenges ◽

The One

AbstractClustering is a fundamental primitive in manifold applications. In order to achieve valuable results in exploratory clustering analyses, parameters of the clustering algorithm have to be set appropriately, which is a tremendous pitfall. We observe multiple challenges for large-scale exploration processes. On the one hand, they require specific methods to efficiently explore large parameter search spaces. On the other hand, they often exhibit large runtimes, in particular when large datasets are analyzed using clustering algorithms with super-polynomial runtimes, which repeatedly need to be executed within exploratory clustering analyses. We address these challenges as follows: First, we present LOG-Means and show that it provides estimates for the number of clusters in sublinear time regarding the defined search space, i.e., provably requiring less executions of a clustering algorithm than existing methods. Second, we demonstrate how to exploit fundamental characteristics of exploratory clustering analyses in order to significantly accelerate the (repetitive) execution of clustering algorithms on large datasets. Third, we show how these challenges can be tackled at the same time. To the best of our knowledge, this is the first work which simultaneously addresses the above-mentioned challenges. In our comprehensive evaluation, we unveil that our proposed methods significantly outperform state-of-the-art methods, thus especially supporting novice analysts for exploratory clustering analyses in large-scale exploration processes.

Download Full-text

An Efficient Pedestrian Detection Method Based on YOLOv2

Mathematical Problems in Engineering ◽

10.1155/2018/3518959 ◽

2018 ◽

Vol 2018 ◽

pp. 1-10 ◽

Cited By ~ 5

Author(s):

Zhongmin Liu ◽

Zhicai Chen ◽

Zhanming Li ◽

Wenjin Hu

Keyword(s):

Detection Method ◽

Size Range ◽

State Of The Art ◽

Pedestrian Detection ◽

Semantic Segmentation ◽

Detection Methods ◽

Detection Model ◽

Great Progress ◽

Good Trade ◽

Speed And Accuracy

In recent years, techniques based on the deep detection model have achieved overwhelming improvements in the accuracy of detection, which makes them being the most adapted for the applications, such as pedestrian detection. However, speed and accuracy are a pair of contradictions that always exist and have long puzzled researchers. How to achieve the good trade-off between them is a problem we must consider while designing the detectors. To this end, we employ the general detector YOLOv2, a state-of-the-art method in the general detection tasks, in the pedestrian detection. Then we modify the network parameters and structures, according to the characteristics of the pedestrians, making this method more suitable for detecting pedestrians. Experimental results in INRIA pedestrian detection dataset show that it has a fairly high detection speed with a small precision gap compared with the state-of-the-art pedestrian detection methods. Furthermore, we add weak semantic segmentation networks after shared convolution layers to illuminate pedestrians and employ a scale-aware structure in our model according to the characteristics of the wide size range in Caltech pedestrian detection dataset, which make great progress under the original improvement.

Download Full-text

A Lightweight YOLOv4-Based Forestry Pest Detection Method Using Coordinate Attention and Feature Fusion

Entropy ◽

10.3390/e23121587 ◽

2021 ◽

Vol 23 (12) ◽

pp. 1587

Author(s):

Mingfeng Zha ◽

Wenbin Qian ◽

Wenlong Yi ◽

Jing Hua

Keyword(s):

Detection Method ◽

Feature Fusion ◽

State Of The Art ◽

Detection Methods ◽

Model Parameters ◽

Symmetric Structure ◽

Proposed Model ◽

Pest Detection ◽

Feature Information ◽

Small Targets

Traditional pest detection methods are challenging to use in complex forestry environments due to their low accuracy and speed. To address this issue, this paper proposes the YOLOv4_MF model. The YOLOv4_MF model utilizes MobileNetv2 as the feature extraction block and replaces the traditional convolution with depth-wise separated convolution to reduce the model parameters. In addition, the coordinate attention mechanism was embedded in MobileNetv2 to enhance feature information. A symmetric structure consisting of a three-layer spatial pyramid pool is presented, and an improved feature fusion structure was designed to fuse the target information. For the loss function, focal loss was used instead of cross-entropy loss to enhance the network’s learning of small targets. The experimental results showed that the YOLOv4_MF model has 4.24% higher mAP, 4.37% higher precision, and 6.68% higher recall than the YOLOv4 model. The size of the proposed model was reduced to 1/6 of that of YOLOv4. Moreover, the proposed algorithm achieved 38.62% mAP with respect to some state-of-the-art algorithms on the COCO dataset.

Download Full-text

SmokeNet: Satellite Smoke Scene Detection Using Convolutional Neural Network with Spatial and Channel-Wise Attention

Remote Sensing ◽

10.3390/rs11141702 ◽

2019 ◽

Vol 11 (14) ◽

pp. 1702 ◽

Cited By ~ 8

Author(s):

Rui Ba ◽

Chen Chen ◽

Jing Yuan ◽

Weiguo Song ◽

Siuming Lo

Keyword(s):

Neural Network ◽

Satellite Imagery ◽

Large Scale ◽

State Of The Art ◽

Kappa Coefficient ◽

Feature Representation ◽

Detection Methods ◽

Smoke Detection ◽

Training Images ◽

Moderate Resolution Imaging Spectroradiometer

A variety of environmental analysis applications have been advanced by the use of satellite remote sensing. Smoke detection based on satellite imagery is imperative for wildfire detection and monitoring. However, the commonly used smoke detection methods mainly focus on smoke discrimination from a few specific classes, which reduces their applicability in different regions of various classes. To this end, in this paper, we present a new large-scale satellite imagery smoke detection benchmark based on Moderate Resolution Imaging Spectroradiometer (MODIS) data, namely USTC_SmokeRS, consisting of 6225 satellite images from six classes (i.e., cloud, dust, haze, land, seaside, and smoke) and covering various areas/regions over the world. To build a baseline for smoke detection in satellite imagery, we evaluate several state-of-the-art deep learning-based image classification models. Moreover, we propose a new convolution neural network (CNN) model, SmokeNet, which incorporates spatial and channel-wise attention in CNN to enhance feature representation for scene classification. The experimental results of our method using different proportions (16%, 32%, 48%, and 64%) of training images reveal that our model outperforms other approaches with higher accuracy and Kappa coefficient. Specifically, the proposed SmokeNet model trained with 64% training images achieves the best accuracy of 92.75% and Kappa coefficient of 0.9130. The model trained with 16% training images can also improve the classification accuracy and Kappa coefficient by at least 4.99% and 0.06, respectively, over the state-of-the-art models.

Download Full-text

A Fast Intrusion Detection Method for High-Speed Railway Clearance Based on Low-Cost Embedded GPUs

Sensors ◽

10.3390/s21217279 ◽

2021 ◽

Vol 21 (21) ◽

pp. 7279

Author(s):

Yao Wang ◽

Peizhi Yu

Keyword(s):

Neural Network ◽

Intrusion Detection ◽

High Speed ◽

Large Scale ◽

Feature Fusion ◽

Low Cost ◽

Obstacle Detection ◽

Detection Methods ◽

Detection Accuracy ◽

Real Time Processing

The efficiency and the effectiveness of railway intrusion detection are crucial to the safety of railway transportation. Most current methods of railway intrusion detection or obstacle detection are inappropriate for large-scale applications due to their high cost or limited coverage. In this study, we present a fast and low-cost solution to intrusion detection of high-speed railways. As the solution to heavy computational burdens in the current convolutional-neural-network-based detection methods, the proposed method is mainly a novel neural network based on the SSD framework, which includes a feature extractor using an improved MobileNet and a lightweight and efficient feature fusion module. In addition, aiming to improve the detection accuracy of small objects, the feature map weights are introduced through convolution operation to fuse features at different scales. TensorRT is employed to optimize and deploy the proposed network in the low-cost embedded GPU platform, NVIDIA Jetson TX2, to enhance the efficiency. The experimental results show that the proposed methods achieved 89% mAP on the railway intrusion detection dataset, and the average processing time for a single frame was 38.6 ms on the Jetson TX2 module, which satisfies the need of real-time processing.

Download Full-text

MFF-Net: Deepfake Detection Network Based on Multi-Feature Fusion

Entropy ◽

10.3390/e23121692 ◽

2021 ◽

Vol 23 (12) ◽

pp. 1692

Author(s):

Lei Zhao ◽

Mingcheng Zhang ◽

Hongwei Ding ◽

Xiaohui Cui

Keyword(s):

Feature Extraction ◽

Semantic Information ◽

Feature Fusion ◽

State Of The Art ◽

Detection Methods ◽

Textural Features ◽

Detection Technology ◽

Rgb Images ◽

Signal Processing Methods ◽

Made In

Significant progress has been made in generating counterfeit images and videos. Forged videos generated by deepfaking have been widely spread and have caused severe societal impacts, which stir up public concern about automatic deepfake detection technology. Recently, many deepfake detection methods based on forged features have been proposed. Among the popular forged features, textural features are widely used. However, most of the current texture-based detection methods extract textures directly from RGB images, ignoring the mature spectral analysis methods. Therefore, this research proposes a deepfake detection network fusing RGB features and textural information extracted by neural networks and signal processing methods, namely, MFF-Net. Specifically, it consists of four key components: (1) a feature extraction module to further extract textural and frequency information using the Gabor convolution and residual attention blocks; (2) a texture enhancement module to zoom into the subtle textural features in shallow layers; (3) an attention module to force the classifier to focus on the forged part; (4) two instances of feature fusion to firstly fuse textural features from the shallow RGB branch and feature extraction module and then to fuse the textural features and semantic information. Moreover, we further introduce a new diversity loss to force the feature extraction module to learn features of different scales and directions. The experimental results show that MFF-Net has excellent generalization and has achieved state-of-the-art performance on various deepfake datasets.

Download Full-text

A Single Shot Framework with Multi-Scale Feature Fusion for Geospatial Object Detection

Remote Sensing ◽

10.3390/rs11050594 ◽

2019 ◽

Vol 11 (5) ◽

pp. 594 ◽

Cited By ~ 11

Author(s):

Shuo Zhuang ◽

Ping Wang ◽

Boran Jiang ◽

Gang Wang ◽

Cong Wang

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Large Scale ◽

Feature Fusion ◽

Aerial Images ◽

Detection Methods ◽

Single Shot ◽

Feature Maps ◽

Scale Feature ◽

Multi Scale

With the rapid advances in remote-sensing technologies and the larger number of satellite images, fast and effective object detection plays an important role in understanding and analyzing image information, which could be further applied to civilian and military fields. Recently object detection methods with region-based convolutional neural network have shown excellent performance. However, these two-stage methods contain region proposal generation and object detection procedures, resulting in low computation speed. Because of the expensive manual costs, the quantity of well-annotated aerial images is scarce, which also limits the progress of geospatial object detection in remote sensing. In this paper, on the one hand, we construct and release a large-scale remote-sensing dataset for geospatial object detection (RSD-GOD) that consists of 5 different categories with 18,187 annotated images and 40,990 instances. On the other hand, we design a single shot detection framework with multi-scale feature fusion. The feature maps from different layers are fused together through the up-sampling and concatenation blocks to predict the detection results. High-level features with semantic information and low-level features with fine details are fully explored for detection tasks, especially for small objects. Meanwhile, a soft non-maximum suppression strategy is put into practice to select the final detection results. Extensive experiments have been conducted on two datasets to evaluate the designed network. Results show that the proposed approach achieves a good detection performance and obtains the mean average precision value of 89.0% on a newly constructed RSD-GOD dataset and 83.8% on the Northwestern Polytechnical University very high spatial resolution-10 (NWPU VHR-10) dataset at 18 frames per second (FPS) on a NVIDIA GTX-1080Ti GPU.

Download Full-text

Improving Object Tracking by Added Noise and Channel Attention

Sensors ◽

10.3390/s20133780 ◽

2020 ◽

Vol 20 (13) ◽

pp. 3780 ◽

Cited By ~ 2

Author(s):

Mustansar Fiaz ◽

Arif Mahmood ◽

Ki Yeol Baek ◽

Sehar Shahzad Farooq ◽

Soon Ki Jung

Keyword(s):

Large Scale ◽

Data Augmentation ◽

Feature Fusion ◽

State Of The Art ◽

Computational Cost ◽

Training Data ◽

Superior Performance ◽

Input Noise ◽

Offline Learning ◽

Benchmark Datasets

CNN-based trackers, especially those based on Siamese networks, have recently attracted considerable attention because of their relatively good performance and low computational cost. For many Siamese trackers, learning a generic object model from a large-scale dataset is still a challenging task. In the current study, we introduce input noise as regularization in the training data to improve generalization of the learned model. We propose an Input-Regularized Channel Attentional Siamese (IRCA-Siam) tracker which exhibits improved generalization compared to the current state-of-the-art trackers. In particular, we exploit offline learning by introducing additive noise for input data augmentation to mitigate the overfitting problem. We propose feature fusion from noisy and clean input channels which improves the target localization. Channel attention integrated with our framework helps finding more useful target features resulting in further performance improvement. Our proposed IRCA-Siam enhances the discrimination of the tracker/background and improves fault tolerance and generalization. An extensive experimental evaluation on six benchmark datasets including OTB2013, OTB2015, TC128, UAV123, VOT2016 and VOT2017 demonstrate superior performance of the proposed IRCA-Siam tracker compared to the 30 existing state-of-the-art trackers.

Download Full-text

An Effective Cloud Detection Method for Gaofen-5 Images via Deep Learning

Remote Sensing ◽

10.3390/rs12132106 ◽

2020 ◽

Vol 12 (13) ◽

pp. 2106 ◽

Cited By ~ 1

Author(s):

Junchuan Yu ◽

Yichuan Li ◽

Xiangxiang Zheng ◽

Yufeng Zhong ◽

Peng He

Keyword(s):

Deep Learning ◽

Large Scale ◽

Feature Fusion ◽

Detection Methods ◽

Learning Technology ◽

Cloud Detection ◽

Learning Capability ◽

Quantitative Remote Sensing ◽

Recent Developments ◽

Speed Up

Recent developments in hyperspectral satellites have dramatically promoted the wide application of large-scale quantitative remote sensing. As an essential part of preprocessing, cloud detection is of great significance for subsequent quantitative analysis. For Gaofen-5 (GF-5) data producers, the daily cloud detection of hundreds of scenes is a challenging task. Traditional cloud detection methods cannot meet the strict demands of large-scale data production, especially for GF-5 satellites, which have massive data volumes. Deep learning technology, however, is able to perform cloud detection efficiently for massive repositories of satellite data and can even dramatically speed up processing by utilizing thumbnails. Inspired by the outstanding learning capability of convolutional neural networks (CNNs) for feature extraction, we propose a new dual-branch CNN architecture for cloud segmentation for GF-5 preview RGB images, termed a multiscale fusion gated network (MFGNet), which introduces pyramid pooling attention and spatial attention to extract both shallow and deep information. In addition, a new gated multilevel feature fusion module is also employed to fuse features at different depths and scales to generate pixelwise cloud segmentation results. The proposed model is extensively trained on hundreds of globally distributed GF-5 satellite images and compared with current mainstream CNN-based detection networks. The experimental results indicate that our proposed method has a higher F1 score (0.94) and fewer parameters (7.83 M) than the compared methods.

Download Full-text