M2Det: A Single-Shot Object Detector Based on Multi-Level Feature Pyramid Network

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019259 ◽

2019 ◽

Vol 33 ◽

pp. 9259-9266 ◽

Cited By ~ 78

Author(s):

Qijie Zhao ◽

Tao Sheng ◽

Yongtao Wang ◽

Zhi Tang ◽

Ying Chen ◽

...

Keyword(s):

Feature Fusion ◽

State Of The Art ◽

Single Shot ◽

Multi Scale ◽

One Stage ◽

Single Scale ◽

Feature Pyramid ◽

Multi Level ◽

Multiple Levels ◽

Inference Strategy

Feature pyramids are widely exploited by both the state-of-the-art one-stage object detectors (e.g., DSSD, RetinaNet, RefineDet) and the two-stage object detectors (e.g., Mask RCNN, DetNet) to alleviate the problem arising from scale variation across object instances. Although these object detectors with feature pyramids achieve encouraging results, they have some limitations due to that they only simply construct the feature pyramid according to the inherent multiscale, pyramidal architecture of the backbones which are originally designed for object classification task. Newly, in this work, we present Multi-Level Feature Pyramid Network (MLFPN) to construct more effective feature pyramids for detecting objects of different scales. First, we fuse multi-level features (i.e. multiple layers) extracted by backbone as the base feature. Second, we feed the base feature into a block of alternating joint Thinned U-shape Modules and Feature Fusion Modules and exploit the decoder layers of each Ushape module as the features for detecting objects. Finally, we gather up the decoder layers with equivalent scales (sizes) to construct a feature pyramid for object detection, in which every feature map consists of the layers (features) from multiple levels. To evaluate the effectiveness of the proposed MLFPN, we design and train a powerful end-to-end one-stage object detector we call M2Det by integrating it into the architecture of SSD, and achieve better detection performance than state-of-the-art one-stage detectors. Specifically, on MSCOCO benchmark, M2Det achieves AP of 41.0 at speed of 11.8 FPS with single-scale inference strategy and AP of 44.2 with multi-scale inference strategy, which are the new stateof-the-art results among one-stage detectors. The code will be made available on https://github.com/qijiezhao/M2Det.

Download Full-text

Rethinking the Bottom-Up Framework for Query-Based Video Localization

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6627 ◽

2020 ◽

Vol 34 (07) ◽

pp. 10551-10558 ◽

Cited By ~ 1

Author(s):

Long Chen ◽

Chujie Lu ◽

Siliang Tang ◽

Jun Xiao ◽

Dong Zhang ◽

...

Keyword(s):

State Of The Art ◽

Ground Truth ◽

Video Frame ◽

Top Down ◽

Bottom Up ◽

Multi Scale ◽

Feature Pyramid ◽

Multi Level ◽

Classification And Regression ◽

Model Graph

In this paper, we focus on the task query-based video localization, i.e., localizing a query in a long and untrimmed video. The prevailing solutions for this problem can be grouped into two categories: i) Top-down approach: It pre-cuts the video into a set of moment candidates, then it does classification and regression for each candidate; ii) Bottom-up approach: It injects the whole query content into each video frame, then it predicts the probabilities of each frame as a ground truth segment boundary (i.e., start or end). Both two frameworks have respective shortcomings: the top-down models suffer from heavy computations and they are sensitive to the heuristic rules, while the performance of bottom-up models is behind the performance of top-down counterpart thus far. However, we argue that the performance of bottom-up framework is severely underestimated by current unreasonable designs, including both the backbone and head network. To this end, we design a novel bottom-up model: Graph-FPN with Dense Predictions (GDP). For the backbone, GDP firstly generates a frame feature pyramid to capture multi-level semantics, then it utilizes graph convolution to encode the plentiful scene relationships, which incidentally mitigates the semantic gaps in the multi-scale feature pyramid. For the head network, GDP regards all frames falling in the ground truth segment as the foreground, and each foreground frame regresses the unique distances from its location to bi-directional boundaries. Extensive experiments on two challenging query-based video localization tasks (natural language video localization and video relocalization), involving four challenging benchmarks (TACoS, Charades-STA, ActivityNet Captions, and Activity-VRL), have shown that GDP surpasses the state-of-the-art top-down models.

Download Full-text

Adaptive Feature Pyramid Network to Predict Crisp Boundaries via NMS Layer and ODS F-Measure Loss Function

Information ◽

10.3390/info13010032 ◽

2022 ◽

Vol 13 (1) ◽

pp. 32

Author(s):

Gang Sun ◽

Hancheng Yu ◽

Xiangtao Jiang ◽

Mingkui Feng

Keyword(s):

Edge Detection ◽

Loss Function ◽

State Of The Art ◽

Cross Entropy ◽

Post Processing ◽

Multi Scale ◽

Feature Pyramid ◽

Multi Level ◽

Different Levels ◽

F Measure

Edge detection is one of the fundamental computer vision tasks. Recent methods for edge detection based on a convolutional neural network (CNN) typically employ the weighted cross-entropy loss. Their predicted results being thick and needing post-processing before calculating the optimal dataset scale (ODS) F-measure for evaluation. To achieve end-to-end training, we propose a non-maximum suppression layer (NMS) to obtain sharp boundaries without the need for post-processing. The ODS F-measure can be calculated based on these sharp boundaries. So, the ODS F-measure loss function is proposed to train the network. Besides, we propose an adaptive multi-level feature pyramid network (AFPN) to better fuse different levels of features. Furthermore, to enrich multi-scale features learned by AFPN, we introduce a pyramid context module (PCM) that includes dilated convolution to extract multi-scale features. Experimental results indicate that the proposed AFPN achieves state-of-the-art performance on the BSDS500 dataset (ODS F-score of 0.837) and the NYUDv2 dataset (ODS F-score of 0.780).

Download Full-text

Adaptive Weighted Multi-Level Fusion of Multi-Scale Features: A New Approach to Pedestrian Detection

Future Internet ◽

10.3390/fi13020038 ◽

2021 ◽

Vol 13 (2) ◽

pp. 38

Author(s):

Yao Xu ◽

Qin Yu

Keyword(s):

Deep Learning ◽

Feature Fusion ◽

Pedestrian Detection ◽

Feature Maps ◽

Scale Feature ◽

Multi Scale ◽

One Stage ◽

Current State ◽

Multi Level ◽

Feature Utilization

Great achievements have been made in pedestrian detection through deep learning. For detectors based on deep learning, making better use of features has become the key to their detection effect. While current pedestrian detectors have made efforts in feature utilization to improve their detection performance, the feature utilization is still inadequate. To solve the problem of inadequate feature utilization, we proposed the Multi-Level Feature Fusion Module (MFFM) and its Multi-Scale Feature Fusion Unit (MFFU) sub-module, which connect feature maps of the same scale and different scales by using horizontal and vertical connections and shortcut structures. All of these connections are accompanied by weights that can be learned; thus, they can be used as adaptive multi-level and multi-scale feature fusion modules to fuse the best features. Then, we built a complete pedestrian detector, the Adaptive Feature Fusion Detector (AFFDet), which is an anchor-free one-stage pedestrian detector that can make full use of features for detection. As a result, compared with other methods, our method has better performance on the challenging Caltech Pedestrian Detection Benchmark (Caltech) and has quite competitive speed. It is the current state-of-the-art one-stage pedestrian detection method.

Download Full-text

Improved SSD-assisted algorithm for surface defect detection of electromagnetic luminescence

Proceedings of the Institution of Mechanical Engineers Part O Journal of Risk and Reliability ◽

10.1177/1748006x21995388 ◽

2021 ◽

pp. 1748006X2199538

Author(s):

Zhenying Xu ◽

Ziqian Wu ◽

Wei Fan

Keyword(s):

Defect Detection ◽

Feature Fusion ◽

Recognition Rate ◽

Detection Methods ◽

Small Scale ◽

Detection Accuracy ◽

Single Shot ◽

Surface Defect Detection ◽

Feature Pyramid ◽

Small Feature

Defect detection of electromagnetic luminescence (EL) cells is the core step in the production and preparation of solar cell modules to ensure conversion efficiency and long service life of batteries. However, due to the lack of feature extraction capability for small feature defects, the traditional single shot multibox detector (SSD) algorithm performs not well in EL defect detection with high accuracy. Consequently, an improved SSD algorithm with modification in feature fusion in the framework of deep learning is proposed to improve the recognition rate of EL multi-class defects. A dataset containing images with four different types of defects through rotation, denoising, and binarization is established for the EL. The proposed algorithm can greatly improve the detection accuracy of the small-scale defect with the idea of feature pyramid networks. An experimental study on the detection of the EL defects shows the effectiveness of the proposed algorithm. Moreover, a comparison study shows the proposed method outperforms other traditional detection methods, such as the SIFT, Faster R-CNN, and YOLOv3, in detecting the EL defect.

Download Full-text

DCPNet: A Densely Connected Pyramid Network for Monocular Depth Estimation

Sensors ◽

10.3390/s21206780 ◽

2021 ◽

Vol 21 (20) ◽

pp. 6780

Author(s):

Zhitong Lai ◽

Rui Tian ◽

Zhiguo Wu ◽

Nannan Ding ◽

Linjian Sun ◽

...

Keyword(s):

Multiple Scales ◽

Feature Fusion ◽

State Of The Art ◽

Depth Estimation ◽

Multi Scale ◽

Pyramid Structure ◽

Benchmark Datasets ◽

The Common ◽

Monocular Depth ◽

Multiple Stages

Pyramid architecture is a useful strategy to fuse multi-scale features in deep monocular depth estimation approaches. However, most pyramid networks fuse features only within the adjacent stages in a pyramid structure. To take full advantage of the pyramid structure, inspired by the success of DenseNet, this paper presents DCPNet, a densely connected pyramid network that fuses multi-scale features from multiple stages of the pyramid structure. DCPNet not only performs feature fusion between the adjacent stages, but also non-adjacent stages. To fuse these features, we design a simple and effective dense connection module (DCM). In addition, we offer a new consideration of the common upscale operation in our approach. We believe DCPNet offers a more efficient way to fuse features from multiple scales in a pyramid-like network. We perform extensive experiments using both outdoor and indoor benchmark datasets (i.e., the KITTI and the NYU Depth V2 datasets) and DCPNet achieves the state-of-the-art results.

Download Full-text

Mosaic Super-resolution via Sequential Feature Pyramid Networks

10.36227/techrxiv.11402130 ◽

2019 ◽

Author(s):

Mehrdad Shoeiby ◽

Mohammad Ali Armin ◽

Sadegh Aliakbarian ◽

Saeed Anwar ◽

Lars petersson

Keyword(s):

State Of The Art ◽

Super Resolution ◽

Autonomous Driving ◽

Single Shot ◽

Current State ◽

Wide Range ◽

Feature Pyramid ◽

Novel Method ◽

Convolutional Lstm ◽

Mosaic Images

<div>Advances in the design of multi-spectral cameras have</div><div>led to great interests in a wide range of applications, from</div><div>astronomy to autonomous driving. However, such cameras</div><div>inherently suffer from a trade-off between the spatial and</div><div>spectral resolution. In this paper, we propose to address</div><div>this limitation by introducing a novel method to carry out</div><div>super-resolution on raw mosaic images, multi-spectral or</div><div>RGB Bayer, captured by modern real-time single-shot mo-</div><div>saic sensors. To this end, we design a deep super-resolution</div><div>architecture that benefits from a sequential feature pyramid</div><div>along the depth of the network. This, in fact, is achieved</div><div>by utilizing a convolutional LSTM (ConvLSTM) to learn the</div><div>inter-dependencies between features at different receptive</div><div>fields. Additionally, by investigating the effect of different</div><div>attention mechanisms in our framework, we show that a</div><div>ConvLSTM inspired module is able to provide superior at-</div><div>tention in our context. Our extensive experiments and anal-</div><div>yses evidence that our approach yields significant super-</div><div>resolution quality, outperforming current state-of-the-art</div><div>mosaic super-resolution methods on both Bayer and multi-</div><div>spectral images. Additionally, to the best of our knowledge,</div><div>our method is the first specialized method to super-resolve</div><div>mosaic images, whether it be multi-spectral or Bayer.</div><div><br></div>

Download Full-text

Multi-scale Feature Fusion Single Shot Object Detector Based on DenseNet

Intelligent Robotics and Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-030-27541-9_37 ◽

2019 ◽

pp. 450-460 ◽

Cited By ~ 1

Author(s):

Minghao Zhai ◽

Junchen Liu ◽

Wei Zhang ◽

Chen Liu ◽

Wei Li ◽

...

Keyword(s):

Feature Fusion ◽

Single Shot ◽

Scale Feature ◽

Multi Scale

Download Full-text

Multi Scale Object Detection Based on Single Shot Multibox Detector with Feature Fusion and Inception Network

The Journal of Korean Institute of Information Technology ◽

10.14801/jkiit.2018.16.10.93 ◽

2018 ◽

Vol 16 (10) ◽

pp. 93-100 ◽

Cited By ~ 1

Author(s):

Md Foysal Haque ◽

Dae-Seong Kang

Keyword(s):

Object Detection ◽

Feature Fusion ◽

Single Shot ◽

Multi Scale

Download Full-text

CovidMulti-Net: A Parallel-Dilated Multi Scale Feature Fusion Architecture for the Identification of COVID-19 Cases from Chest X-ray Images

10.1101/2021.05.19.21257430 ◽

2021 ◽

Author(s):

Md. Saikat Islam Khan ◽

Anichur Rahman ◽

Md. Razaul Karim ◽

Nasima Islam Bithi ◽

Shahab Band ◽

...

Keyword(s):

Healthcare Professionals ◽

Feature Fusion ◽

State Of The Art ◽

Research Community ◽

Diagnostic Model ◽

X Ray ◽

Multi Scale ◽

The World ◽

Chest X Ray ◽

Learning Concept

The COVID-19 pandemic is an emerging respiratory infectious disease, having a significant impact on the health and life of many people around the world. Therefore, early identification of COVID-19 patients is the fastest way to restrain the spread of the pandemic. However, as the number of cases grows at an alarming pace, most developing countries are now facing a shortage of medical resources and testing kits. Besides, using testing kits to detect COVID-19 cases is a time-consuming, expensive, and cumbersome procedure. Faced with these obstacles, most physicians, researchers, and engineers have advocated for the advancement of computer-aided deep learning models to assist healthcare professionals in quickly and inexpensively recognize COVID-19 cases from chest X-ray (CXR) images. With this motivation, this paper proposes a CovidMulti-Net architecture based on the transfer learning concept to classify COVID-19 cases from normal and other pneumonia cases using three publicly available datasets that include 1341, 1341, and 446 CXR images from healthy samples and 902, 1564, and 1193 CXR images infected with Viral Pneumonia, Bacterial Pneumonia, and COVID-19 diseases. In the proposed framework, features from CXR images are extracted using three well-known pre-trained models, including DenseNet-169, ResNet-50, and VGG-19. The extracted features are then fed into a concatenate layer, making a robust hybrid model. The proposed framework achieved a classification accuracy of 99.4%, 95.2%, and 94.8% for 2-Class, 3-Class, and 4-Class datasets, exceeding all the other state-of-the-art models. These results suggest that the CovidMulti-Net frameworks ability to discriminate individuals with COVID-19 infection from healthy ones and provides the opportunity to be used as a diagnostic model in clinics and hospitals. We also made all the materials publicly accessible for the research community at: https://github.com/saikat15010/CovidMulti-Net-Architecture.git.

Download Full-text

GourmetNet: Food Segmentation Using Multi-Scale Waterfall Features with Spatial and Channel Attention

Sensors ◽

10.3390/s21227504 ◽

2021 ◽

Vol 21 (22) ◽

pp. 7504

Author(s):

Udit Sharma ◽

Bruno Artacho ◽

Andreas Savakis

Keyword(s):

Feature Extraction ◽

State Of The Art ◽

Extraction Process ◽

Feature Representation ◽

Post Processing ◽

Multi Scale ◽

Spatial Pooling ◽

Current State ◽

Nutrition Monitoring ◽

Multiple Levels

We propose GourmetNet, a single-pass, end-to-end trainable network for food segmentation that achieves state-of-the-art performance. Food segmentation is an important problem as the first step for nutrition monitoring, food volume and calorie estimation. Our novel architecture incorporates both channel attention and spatial attention information in an expanded multi-scale feature representation using our advanced Waterfall Atrous Spatial Pooling module. GourmetNet refines the feature extraction process by merging features from multiple levels of the backbone through the two attention modules. The refined features are processed with the advanced multi-scale waterfall module that combines the benefits of cascade filtering and pyramid representations without requiring a separate decoder or post-processing. Our experiments on two food datasets show that GourmetNet significantly outperforms existing current state-of-the-art methods.

Download Full-text