SM-NAS: Structural-to-Modular Neural Architecture Search for Object Detection

The state-of-the-art object detection method is complicated with various modules such as backbone, RPN, feature fusion neck and RCNN head, where each module may have different designs and structures. How to leverage the computational cost and accuracy trade-off for the structural combination as well as the modular selection of multiple modules? Neural architecture search (NAS) has shown great potential in finding an optimal solution. Existing NAS works for object detection only focus on searching better design of a single module such as backbone or feature fusion neck, while neglecting the balance of the whole system. In this paper, we present a two-stage coarse-to-fine searching strategy named Structural-to-Modular NAS (SM-NAS) for searching a GPU-friendly design of both an efficient combination of modules and better modular-level architecture for object detection. Specifically, Structural-level searching stage first aims to find an efficient combination of different modules; Modular-level searching stage then evolves each specific module and pushes the Pareto front forward to a faster task-specific network. We consider a multi-objective search where the search space covers many popular designs of detection methods. We directly search a detection backbone without pre-trained models or any proxy task by exploring a fast training from scratch strategy. The resulting architectures dominate state-of-the-art object detection systems in both inference time and accuracy and demonstrate the effectiveness on multiple detection datasets, e.g. halving the inference time with additional 1% mAP improvement compared to FPN and reaching 46% mAP with the similar inference time of MaskRCNN.

Download Full-text

Real-Time Small Drones Detection Based on Pruned YOLOv4

Sensors ◽

10.3390/s21103374 ◽

2021 ◽

Vol 21 (10) ◽

pp. 3374

Author(s):

Hansen Liu ◽

Kuangang Fan ◽

Qinghua Ouyang ◽

Na Li

Keyword(s):

Object Detection ◽

Real Time ◽

Processing Speed ◽

State Of The Art ◽

Detection Methods ◽

Detection Accuracy ◽

Small Object ◽

Art Object ◽

Real Time Detection ◽

Small Object Detection

To address the threat of drones intruding into high-security areas, the real-time detection of drones is urgently required to protect these areas. There are two main difficulties in real-time detection of drones. One of them is that the drones move quickly, which leads to requiring faster detectors. Another problem is that small drones are difficult to detect. In this paper, firstly, we achieve high detection accuracy by evaluating three state-of-the-art object detection methods: RetinaNet, FCOS, YOLOv3 and YOLOv4. Then, to address the first problem, we prune the convolutional channel and shortcut layer of YOLOv4 to develop thinner and shallower models. Furthermore, to improve the accuracy of small drone detection, we implement a special augmentation for small object detection by copying and pasting small drones. Experimental results verify that compared to YOLOv4, our pruned-YOLOv4 model, with 0.8 channel prune rate and 24 layers prune, achieves 90.5% mAP and its processing speed is increased by 60.4%. Additionally, after small object augmentation, the precision and recall of the pruned-YOLOv4 almost increases by 22.8% and 12.7%, respectively. Experiment results verify that our pruned-YOLOv4 is an effective and accurate approach for drone detection.

Download Full-text

Improving YOLOv5 with Attention Mechanism for Detecting Boulders from Planetary Images

Remote Sensing ◽

10.3390/rs13183776 ◽

2021 ◽

Vol 13 (18) ◽

pp. 3776

Author(s):

Linlin Zhu ◽

Xun Geng ◽

Zheng Li ◽

Chun Liu

Keyword(s):

Object Detection ◽

Detection Method ◽

Feature Fusion ◽

Attention Mechanism ◽

Detection Methods ◽

Visual Object ◽

Art Object ◽

Geological Processes ◽

Landing Sites ◽

New Feature

It is of great significance to apply the object detection methods to automatically detect boulders from planetary images and analyze their distribution. This contributes to the selection of candidate landing sites and the understanding of the geological processes. This paper improves the state-of-the-art object detection method of YOLOv5 with attention mechanism and designs a pyramid based approach to detect boulders from planetary images. A new feature fusion layer has been designed to capture more shallow features of the small boulders. The attention modules implemented by combining the convolutional block attention module (CBAM) and efficient channel attention network (ECA-Net) are also added into YOLOv5 to highlight the information that contribute to boulder detection. Based on the Pascal Visual Object Classes 2007 (VOC2007) dataset which is widely used for object detection evaluations and the boulder dataset that we constructed from the images of Bennu asteroid, the evaluation results have shown that the improvements have increased the performance of YOLOv5 by 3.4% in precision. With the improved YOLOv5 detection method, the pyramid based approach extracts several layers of images with different resolutions from the large planetary images and detects boulders of different scales from different layers. We have also applied the proposed approach to detect the boulders on Bennu asteroid. The distribution of the boulders on Bennu asteroid has been analyzed and presented.

Download Full-text

Genetic Feature Fusion for Object Skeleton Detection

Security and Communication Networks ◽

10.1155/2021/6621760 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Yang Qiao ◽

Yunjie Tian ◽

Yue Liu ◽

Jianbin Jiao

Keyword(s):

Large Scale ◽

Feature Fusion ◽

State Of The Art ◽

Search Space ◽

Detection Methods ◽

Genetic Feature ◽

Great Progress ◽

Cluttered Background ◽

Scale Granularity ◽

Image Definition

Object skeleton detection requires the convolutional neural networks to recognize objects and their parts in the cluttered background, overcome the image definition degradation brought by the pooling layers, and predict the location of skeleton pixels in different scale granularity. Most existing object skeleton detection methods take great efforts into the designing of side-output networks for multiscale feature fusion. Despite the great progress achieved by them, there are still many problems that hinder the development of object skeleton detection, such as the manually designed network is labor-intensive and the network initialization depends on models pretrained on large-scale datasets. To alleviate these issues, we propose a genetic NAS method to automatically search on a newly designed architecture search space for adaptive multiscale feature fusion. Furthermore, we introduce a symmetric encoder-decoder search space based on reversing the VGG network, in which the decoder can reuse the ImageNet pretrained model of VGG. The searched networks improve the performance of the state-of-the-art methods on commonly used skeleton detection benchmarks, which proves the efficacy of our method.

Download Full-text

EMPIRICAL EVALUATION OF STATE-OF-THE-ART OBJECT DETECTION METHODS FOR DOCUMENT IMAGE UNDERSTANDING

FAIR - NGHIÊN CỨU CƠ BẢN VÀ ỨNG DỤNG CÔNG NGHỆ THÔNG TIN - 2017 ◽

10.15625/vap.2017.00022 ◽

2017 ◽

Author(s):

Nguyen D. Vo ◽

Khanh Nguyen ◽

Tam V. Nguyen ◽

Khang Nguyen

Keyword(s):

Object Detection ◽

State Of The Art ◽

Empirical Evaluation ◽

Image Understanding ◽

Document Image ◽

Detection Methods ◽

Art Object ◽

Document Image Understanding

Download Full-text

Deep-Learning-Based Road Crack Detection Frameworks for Dashcam-captured Images under Different Illumination Conditions

10.21203/rs.3.rs-685762/v1 ◽

2021 ◽

Author(s):

Da-Ren Chen ◽

Wei-Min Chiu

Keyword(s):

Object Detection ◽

Large Scale ◽

Crack Detection ◽

State Of The Art ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Machine Learning Techniques ◽

Detection Accuracy ◽

The Road ◽

Art Object

Abstract Machine learning techniques have been used to increase detection accuracy of cracks in road surfaces. Most studies failed to consider variable illumination conditions on the target of interest (ToI), and only focus on detecting the presence or absence of road cracks. This paper proposes a new road crack detection method, IlumiCrack, which integrates Gaussian mixture models (GMM) and object detection CNN models. This work provides the following contributions: 1) For the first time, a large-scale road crack image dataset with a range of illumination conditions (e.g., day and night) is prepared using a dashcam. 2) Based on GMM, experimental evaluations on 2 to 4 levels of brightness are conducted for optimal classification. 3) the IlumiCrack framework is used to integrate state-of-the-art object detecting methods with CNN to classify the road crack images into eight types with high accuracy. Experimental results show that IlumiCrack outperforms the state-of-the-art R-CNN object detection frameworks.

Download Full-text

An Experimental Analysis of Model Compression Techniques for Object Detection

10.5753/kdmile.2020.11958 ◽

2020 ◽

Author(s):

Andrey De Aguiar Salvi ◽

Rodrigo Coelho Barros

Keyword(s):

Object Detection ◽

Experimental Analysis ◽

State Of The Art ◽

Neural Architecture ◽

Model Compression ◽

Processing Power ◽

Benchmark Datasets ◽

The Difference ◽

And Performance ◽

Consumption Constraints

Recent research on Convolutional Neural Networks focuses on how to create models with a reduced number of parameters and a smaller storage size while keeping the model’s ability to perform its task, allowing the use of the best CNN for automating tasks in limited devices, with reduced processing power, memory, or energy consumption constraints. There are many different approaches in the literature: removing parameters, reduction of the floating-point precision, creating smaller models that mimic larger models, neural architecture search (NAS), etc. With all those possibilities, it is challenging to say which approach provides a better trade-off between model reduction and performance, due to the difference between the approaches, their respective models, the benchmark datasets, or variations in training details. Therefore, this article contributes to the literature by comparing three state-of-the-art model compression approaches to reduce a well-known convolutional approach for object detection, namely YOLOv3. Our experimental analysis shows that it is possible to create a reduced version of YOLOv3 with 90% fewer parameters and still outperform the original model by pruning parameters. We also create models that require only 0.43% of the original model’s inference effort.

Download Full-text

Neural Architecture Search for a Highly Efficient Network with Random Skip Connections

Applied Sciences ◽

10.3390/app10113712 ◽

2020 ◽

Vol 10 (11) ◽

pp. 3712

Author(s):

Dongjing Shan ◽

Xiongwei Zhang ◽

Wenhua Shi ◽

Li Li

Keyword(s):

State Of The Art ◽

Cell Structure ◽

Frequency Discrimination ◽

Search Space ◽

Short Term ◽

Cell Parameters ◽

Neural Architecture ◽

Proposed Model ◽

Initialization Scheme

Regarding the sequence learning of neural networks, there exists a problem of how to capture long-term dependencies and alleviate the gradient vanishing phenomenon. To manage this problem, we proposed a neural network with random connections via a scheme of a neural architecture search. First, a dense network was designed and trained to construct a search space, and then another network was generated by random sampling in the space, whose skip connections could transmit information directly over multiple periods and capture long-term dependencies more efficiently. Moreover, we devised a novel cell structure that required less memory and computational power than the structures of long short-term memories (LSTMs), and finally, we performed a special initialization scheme on the cell parameters, which could permit unhindered gradient propagation on the time axis at the beginning of training. In the experiments, we evaluated four sequential tasks: adding, copying, frequency discrimination, and image classification; we also adopted several state-of-the-art methods for comparison. The experimental results demonstrated that our proposed model achieved the best performance.

Download Full-text

TasselNetV2+: A Fast Implementation for High-Throughput Plant Counting From High-Resolution RGB Imagery

Frontiers in Plant Science ◽

10.3389/fpls.2020.541960 ◽

2020 ◽

Vol 11 ◽

Author(s):

Hao Lu ◽

Zhiguo Cao

Keyword(s):

High Resolution ◽

Object Detection ◽

High Throughput ◽

Graphics Processing Units ◽

State Of The Art ◽

Image Resolution ◽

Plant Phenotyping ◽

Art Object ◽

Bounding Boxes ◽

Computational Bottleneck

Plant counting runs through almost every stage of agricultural production from seed breeding, germination, cultivation, fertilization, pollination to yield estimation, and harvesting. With the prevalence of digital cameras, graphics processing units and deep learning-based computer vision technology, plant counting has gradually shifted from traditional manual observation to vision-based automated solutions. One of popular solutions is a state-of-the-art object detection technique called Faster R-CNN where plant counts can be estimated from the number of bounding boxes detected. It has become a standard configuration for many plant counting systems in plant phenotyping. Faster R-CNN, however, is expensive in computation, particularly when dealing with high-resolution images. Unfortunately high-resolution imagery is frequently used in modern plant phenotyping platforms such as unmanned aerial vehicles, engendering inefficient image analysis. Such inefficiency largely limits the throughput of a phenotyping system. The goal of this work hence is to provide an effective and efficient tool for high-throughput plant counting from high-resolution RGB imagery. In contrast to conventional object detection, we encourage another promising paradigm termed object counting where plant counts are directly regressed from images, without detecting bounding boxes. In this work, by profiling the computational bottleneck, we implement a fast version of a state-of-the-art plant counting model TasselNetV2 with several minor yet effective modifications. We also provide insights why these modifications make sense. This fast version, TasselNetV2+, runs an order of magnitude faster than TasselNetV2, achieving around 30 fps on image resolution of 1980 × 1080, while it still retains the same level of counting accuracy. We validate its effectiveness on three plant counting tasks, including wheat ears counting, maize tassels counting, and sorghum heads counting. To encourage the use of this tool, our implementation has been made available online at https://tinyurl.com/TasselNetV2plus.

Download Full-text

Object Detection Network Based on Feature Fusion and Attention Mechanism

Future Internet ◽

10.3390/fi11010009 ◽

2019 ◽

Vol 11 (1) ◽

pp. 9 ◽

Cited By ~ 6

Author(s):

Ying Zhang ◽

Yimin Chen ◽

Chen Huang ◽

Mingke Gao

Keyword(s):

Object Detection ◽

Feature Fusion ◽

Empirical Evaluation ◽

Attention Mechanism ◽

Detection Accuracy ◽

Small Object ◽

Art Object ◽

Pascal Voc ◽

Almost All ◽

The Impact

In recent years, almost all of the current top-performing object detection networks use CNN (convolutional neural networks) features. State-of-the-art object detection networks depend on CNN features. In this work, we add feature fusion in the object detection network to obtain a better CNN feature, which incorporates well deep, but semantic, and shallow, but high-resolution, CNN features, thus improving the performance of a small object. Also, the attention mechanism was applied to our object detection network, AF R-CNN (attention mechanism and convolution feature fusion based object detection), to enhance the impact of significant features and weaken background interference. Our AF R-CNN is a single end to end network. We choose the pre-trained network, VGG-16, to extract CNN features. Our detection network is trained on the dataset, PASCAL VOC 2007 and 2012. Empirical evaluation of the PASCAL VOC 2007 dataset demonstrates the effectiveness and improvement of our approach. Our AF R-CNN achieves an object detection accuracy of 75.9% on PASCAL VOC 2007, six points higher than Faster R-CNN.

Download Full-text

A Fast Orientation Invariant Detector Based on the One-stage Method

MATEC Web of Conferences ◽

10.1051/matecconf/201823204036 ◽

2018 ◽

Vol 232 ◽

pp. 04036

Author(s):

Jun Yin ◽

Huadong Pan ◽

Hui Su ◽

Zhonggeng Liu ◽

Zhirong Peng

Keyword(s):

Object Detection ◽

Loss Function ◽

High Efficiency ◽

Detection Method ◽

State Of The Art ◽

Orientation Angle ◽

Detection Methods ◽

Detection Algorithms ◽

Bounding Boxes ◽

The One

We propose an object detection method that predicts the orientation bounding boxes (OBB) to estimate objects locations, scales and orientations based on YOLO (You Only Look Once), which is one of the top detection algorithms performing well both in accuracy and speed. Horizontal bounding boxes(HBB), which are not robust to orientation variances, are used in the existing object detection methods to detect targets. The proposed orientation invariant YOLO (OIYOLO) detector can effectively deal with the bird’s eye viewpoint images where the orientation angles of the objects are arbitrary. In order to estimate the rotated angle of objects, we design a new angle loss function. Therefore, the training of OIYOLO forces the network to learn the annotated orientation angle of objects, making OIYOLO orientation invariances. The proposed approach that predicts OBB can be applied in other detection frameworks. In additional, to evaluate the proposed OIYOLO detector, we create an UAV-DAHUA datasets that annotated with objects locations, scales and orientation angles accurately. Extensive experiments conducted on UAV-DAHUA and DOTA datasets demonstrate that OIYOLO achieves state-of-the-art detection performance with high efficiency comparing with the baseline YOLO algorithms.

Download Full-text