Adaptive Feature Pyramid Network to Predict Crisp Boundaries via NMS Layer and ODS F-Measure Loss Function

Gang Sun; Hancheng Yu; Xiangtao Jiang; Mingkui Feng

doi:10.3390/info13010032

Adaptive Feature Pyramid Network to Predict Crisp Boundaries via NMS Layer and ODS F-Measure Loss Function

Information ◽

10.3390/info13010032 ◽

2022 ◽

Vol 13 (1) ◽

pp. 32

Author(s):

Gang Sun ◽

Hancheng Yu ◽

Xiangtao Jiang ◽

Mingkui Feng

Keyword(s):

Edge Detection ◽

Loss Function ◽

State Of The Art ◽

Cross Entropy ◽

Post Processing ◽

Multi Scale ◽

Feature Pyramid ◽

Multi Level ◽

Different Levels ◽

F Measure

Edge detection is one of the fundamental computer vision tasks. Recent methods for edge detection based on a convolutional neural network (CNN) typically employ the weighted cross-entropy loss. Their predicted results being thick and needing post-processing before calculating the optimal dataset scale (ODS) F-measure for evaluation. To achieve end-to-end training, we propose a non-maximum suppression layer (NMS) to obtain sharp boundaries without the need for post-processing. The ODS F-measure can be calculated based on these sharp boundaries. So, the ODS F-measure loss function is proposed to train the network. Besides, we propose an adaptive multi-level feature pyramid network (AFPN) to better fuse different levels of features. Furthermore, to enrich multi-scale features learned by AFPN, we introduce a pyramid context module (PCM) that includes dilated convolution to extract multi-scale features. Experimental results indicate that the proposed AFPN achieves state-of-the-art performance on the BSDS500 dataset (ODS F-score of 0.837) and the NYUDv2 dataset (ODS F-score of 0.780).

Download Full-text

Rethinking the Bottom-Up Framework for Query-Based Video Localization

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6627 ◽

2020 ◽

Vol 34 (07) ◽

pp. 10551-10558 ◽

Cited By ~ 1

Author(s):

Long Chen ◽

Chujie Lu ◽

Siliang Tang ◽

Jun Xiao ◽

Dong Zhang ◽

...

Keyword(s):

State Of The Art ◽

Ground Truth ◽

Video Frame ◽

Top Down ◽

Bottom Up ◽

Multi Scale ◽

Feature Pyramid ◽

Multi Level ◽

Classification And Regression ◽

Model Graph

In this paper, we focus on the task query-based video localization, i.e., localizing a query in a long and untrimmed video. The prevailing solutions for this problem can be grouped into two categories: i) Top-down approach: It pre-cuts the video into a set of moment candidates, then it does classification and regression for each candidate; ii) Bottom-up approach: It injects the whole query content into each video frame, then it predicts the probabilities of each frame as a ground truth segment boundary (i.e., start or end). Both two frameworks have respective shortcomings: the top-down models suffer from heavy computations and they are sensitive to the heuristic rules, while the performance of bottom-up models is behind the performance of top-down counterpart thus far. However, we argue that the performance of bottom-up framework is severely underestimated by current unreasonable designs, including both the backbone and head network. To this end, we design a novel bottom-up model: Graph-FPN with Dense Predictions (GDP). For the backbone, GDP firstly generates a frame feature pyramid to capture multi-level semantics, then it utilizes graph convolution to encode the plentiful scene relationships, which incidentally mitigates the semantic gaps in the multi-scale feature pyramid. For the head network, GDP regards all frames falling in the ground truth segment as the foreground, and each foreground frame regresses the unique distances from its location to bi-directional boundaries. Extensive experiments on two challenging query-based video localization tasks (natural language video localization and video relocalization), involving four challenging benchmarks (TACoS, Charades-STA, ActivityNet Captions, and Activity-VRL), have shown that GDP surpasses the state-of-the-art top-down models.

Download Full-text

M2Det: A Single-Shot Object Detector Based on Multi-Level Feature Pyramid Network

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019259 ◽

2019 ◽

Vol 33 ◽

pp. 9259-9266 ◽

Cited By ~ 78

Author(s):

Qijie Zhao ◽

Tao Sheng ◽

Yongtao Wang ◽

Zhi Tang ◽

Ying Chen ◽

...

Keyword(s):

Feature Fusion ◽

State Of The Art ◽

Single Shot ◽

Multi Scale ◽

One Stage ◽

Single Scale ◽

Feature Pyramid ◽

Multi Level ◽

Multiple Levels ◽

Inference Strategy

Feature pyramids are widely exploited by both the state-of-the-art one-stage object detectors (e.g., DSSD, RetinaNet, RefineDet) and the two-stage object detectors (e.g., Mask RCNN, DetNet) to alleviate the problem arising from scale variation across object instances. Although these object detectors with feature pyramids achieve encouraging results, they have some limitations due to that they only simply construct the feature pyramid according to the inherent multiscale, pyramidal architecture of the backbones which are originally designed for object classification task. Newly, in this work, we present Multi-Level Feature Pyramid Network (MLFPN) to construct more effective feature pyramids for detecting objects of different scales. First, we fuse multi-level features (i.e. multiple layers) extracted by backbone as the base feature. Second, we feed the base feature into a block of alternating joint Thinned U-shape Modules and Feature Fusion Modules and exploit the decoder layers of each Ushape module as the features for detecting objects. Finally, we gather up the decoder layers with equivalent scales (sizes) to construct a feature pyramid for object detection, in which every feature map consists of the layers (features) from multiple levels. To evaluate the effectiveness of the proposed MLFPN, we design and train a powerful end-to-end one-stage object detector we call M2Det by integrating it into the architecture of SSD, and achieve better detection performance than state-of-the-art one-stage detectors. Specifically, on MSCOCO benchmark, M2Det achieves AP of 41.0 at speed of 11.8 FPS with single-scale inference strategy and AP of 44.2 with multi-scale inference strategy, which are the new stateof-the-art results among one-stage detectors. The code will be made available on https://github.com/qijiezhao/M2Det.

Download Full-text

A Densely Connected Network Based on U-Net for Medical Image Segmentation

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3446618 ◽

2021 ◽

Vol 17 (3) ◽

pp. 1-14

Author(s):

Zhenzhen Yang ◽

Pengfei Xu ◽

Yongpeng Yang ◽

Bing-Kun Bao

Keyword(s):

Feature Extraction ◽

Image Segmentation ◽

Loss Function ◽

Network Architecture ◽

Medical Image ◽

Medical Image Segmentation ◽

Cross Entropy ◽

Loss Functions ◽

Feature Maps ◽

Different Levels

The U-Net has become the most popular structure in medical image segmentation in recent years. Although its performance for medical image segmentation is outstanding, a large number of experiments demonstrate that the classical U-Net network architecture seems to be insufficient when the size of segmentation targets changes and the imbalance happens between target and background in different forms of segmentation. To improve the U-Net network architecture, we develop a new architecture named densely connected U-Net (DenseUNet) network in this article. The proposed DenseUNet network adopts a dense block to improve the feature extraction capability and employs a multi-feature fuse block fusing feature maps of different levels to increase the accuracy of feature extraction. In addition, in view of the advantages of the cross entropy and the dice loss functions, a new loss function for the DenseUNet network is proposed to deal with the imbalance between target and background. Finally, we test the proposed DenseUNet network and compared it with the multi-resolutional U-Net (MultiResUNet) and the classic U-Net networks on three different datasets. The experimental results show that the DenseUNet network has significantly performances compared with the MultiResUNet and the classic U-Net networks.

Download Full-text

Adversarial Training for Community Question Answer Selection Based on Multi-Scale Matching

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301395 ◽

2019 ◽

Vol 33 ◽

pp. 395-402

Author(s):

Xiao Yang ◽

Madian Khabsa ◽

Miaosen Wang ◽

Wei Wang ◽

Ahmed Hassan Awadallah ◽

...

Keyword(s):

Question Answering ◽

State Of The Art ◽

Classification Problem ◽

Classification Model ◽

Training Set ◽

Community Based ◽

Multi Scale ◽

Adversarial Training ◽

Source Of Information ◽

Different Levels

Community-based question answering (CQA) websites represent an important source of information. As a result, the problem of matching the most valuable answers to their corresponding questions has become an increasingly popular research topic. We frame this task as a binary (relevant/irrelevant) classification problem, and present an adversarial training framework to alleviate label imbalance issue. We employ a generative model to iteratively sample a subset of challenging negative samples to fool our classification model. Both models are alternatively optimized using REINFORCE algorithm. The proposed method is completely different from previous ones, where negative samples in training set are directly used or uniformly down-sampled. Further, we propose using Multi-scale Matching which explicitly inspects the correlation between words and ngrams of different levels of granularity. We evaluate the proposed method on SemEval 2016 and SemEval 2017 datasets and achieves state-of-the-art or similar performance.

Download Full-text

Fast horizon detection in maritime images using region-of-interest

International Journal of Distributed Sensor Networks ◽

10.1177/1550147718790753 ◽

2018 ◽

Vol 14 (7) ◽

pp. 155014771879075 ◽

Cited By ~ 7

Author(s):

Chi Yoon Jeong ◽

Hyun S Yang ◽

KyeongDeok Moon

Keyword(s):

Edge Detection ◽

State Of The Art ◽

Region Of Interest ◽

Least Square ◽

Fast Method ◽

Multi Scale ◽

Art Methods ◽

Horizon Line ◽

Horizon Detection ◽

Interest Detection

In this article, we propose a fast method for detecting the horizon line in maritime scenarios by combining a multi-scale approach and region-of-interest detection. Recently, several methods that adopt a multi-scale approach have been proposed, because edge detection at a single is insufficient to detect all edges of various sizes. However, these methods suffer from high processing times, requiring tens of seconds to complete horizon detection. Moreover, the resolution of images captured from cameras mounted on vessels is increasing, which reduces processing speed. Using the region-of-interest is an efficient way of reducing the amount of processing information required. Thus, we explore a way to efficiently use the region-of-interest for horizon detection. The proposed method first detects the region-of-interest using a property of maritime scenes and then multi-scale edge detection is performed for edge extraction at each scale. The results are then combined to produce a single edge map. Then, Hough transform and a least-square method are sequentially used to estimate the horizon line accurately. We compared the performance of the proposed method with state-of-the-art methods using two publicly available databases, namely, Singapore Marine Dataset and buoy dataset. Experimental results show that the proposed method for region-of-interest detection reduces the processing time of horizon detection, and the accuracy with which the proposed method can identify the horizon is superior to that of state-of-the-art methods.

Download Full-text

GlobalTrack: A Simple and Strong Baseline for Long-Term Tracking

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6758 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11037-11044

Author(s):

Lianghua Huang ◽

Xin Zhao ◽

Kaiqi Huang

Keyword(s):

Online Learning ◽

Success Rate ◽

Large Scale ◽

State Of The Art ◽

Temporal Consistency ◽

Post Processing ◽

Two Stage ◽

Multi Scale ◽

Instance Search

A key capability of a long-term tracker is to search for targets in very large areas (typically the entire image) to handle possible target absences or tracking failures. However, currently there is a lack of such a strong baseline for global instance search. In this work, we aim to bridge this gap. Specifically, we propose GlobalTrack, a pure global instance search based tracker that makes no assumption on the temporal consistency of the target's positions and scales. GlobalTrack is developed based on two-stage object detectors, and it is able to perform full-image and multi-scale search of arbitrary instances with only a single query as the guide. We further propose a cross-query loss to improve the robustness of our approach against distractors. With no online learning, no punishment on position or scale changes, no scale smoothing and no trajectory refinement, our pure global instance search based tracker achieves comparable, sometimes much better performance on four large-scale tracking benchmarks (i.e., 52.1% AUC on LaSOT, 63.8% success rate on TLP, 60.3% MaxGM on OxUvA and 75.4% normalized precision on TrackingNet), compared to state-of-the-art approaches that typically require complex post-processing. More importantly, our tracker runs without cumulative errors, i.e., any type of temporary tracking failures will not affect its performance on future frames, making it ideal for long-term tracking. We hope this work will be a strong baseline for long-term tracking and will stimulate future works in this area.

Download Full-text

Hi-Fi: Hierarchical Feature Integration for Skeleton Detection

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/166 ◽

2018 ◽

Cited By ~ 13

Author(s):

Kai Zhao ◽

Wei Shen ◽

Shanghua Gao ◽

Dandan Li ◽

Ming-Ming Cheng

Keyword(s):

Performance Improvement ◽

State Of The Art ◽

Natural Images ◽

Feature Integration ◽

Detection Problem ◽

Multi Scale ◽

Integration Mechanism ◽

Object Parts ◽

High Level ◽

Different Levels

In natural images, the scales (thickness) of object skeletons may dramatically vary among objects and object parts. Thus, robust skeleton detection requires powerful multi-scale feature integration ability. To address this issue, we present a new convolutional neural network (CNN) architecture by introducing a novel hierarchical feature integration mechanism, named Hi-Fi, to address the object skeleton detection problem. The proposed CNN-based approach intrinsically captures high-level semantics from deeper layers, as well as low-level details from shallower layers. By hierarchically integrating different CNN feature levels with bidirectional guidance, our approach (1) enables mutual refinement across features of different levels, and (2) possesses the strong ability to capture both rich object context and high-resolution details. Experimental results show that our method significantly outperforms the state-of-the-art methods in terms of effectively fusing features from very different scales, as evidenced by a considerable performance improvement on several benchmarks.

Download Full-text

GourmetNet: Food Segmentation Using Multi-Scale Waterfall Features with Spatial and Channel Attention

Sensors ◽

10.3390/s21227504 ◽

2021 ◽

Vol 21 (22) ◽

pp. 7504

Author(s):

Udit Sharma ◽

Bruno Artacho ◽

Andreas Savakis

Keyword(s):

Feature Extraction ◽

State Of The Art ◽

Extraction Process ◽

Feature Representation ◽

Post Processing ◽

Multi Scale ◽

Spatial Pooling ◽

Current State ◽

Nutrition Monitoring ◽

Multiple Levels

We propose GourmetNet, a single-pass, end-to-end trainable network for food segmentation that achieves state-of-the-art performance. Food segmentation is an important problem as the first step for nutrition monitoring, food volume and calorie estimation. Our novel architecture incorporates both channel attention and spatial attention information in an expanded multi-scale feature representation using our advanced Waterfall Atrous Spatial Pooling module. GourmetNet refines the feature extraction process by merging features from multiple levels of the backbone through the two attention modules. The refined features are processed with the advanced multi-scale waterfall module that combines the benefits of cascade filtering and pyramid representations without requiring a separate decoder or post-processing. Our experiments on two food datasets show that GourmetNet significantly outperforms existing current state-of-the-art methods.

Download Full-text

App Download Forecasting: An Evolutionary Hierarchical Competition Approach

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/415 ◽

2017 ◽

Cited By ~ 6

Author(s):

Yingzi Wang ◽

Nicholas Jing Yuan ◽

Yu Sun ◽

Chuan Qin ◽

Xing Xie

Keyword(s):

Time Series ◽

Future Development ◽

Real World ◽

State Of The Art ◽

Time Series Forecasting ◽

Comprehensive Understanding ◽

Competition Analysis ◽

Multi Scale ◽

Multi Level ◽

Product Sales

Product sales forecasting enables comprehensive understanding of products' future development, making it of particular interest for companies to improve their business, for investors to measure the values of firms, and for users to capture the trends of a market. Recent studies show that the complex competition interactions among products directly influence products' future development. However, most existing approaches fail to model the evolutionary competition among products and lack the capability to organically reflect multi-level competition analysis in sales forecasting. To address these problems, we propose the Evolutionary Hierarchical Competition Model (EHCM), which effectively considers the time-evolving multi-level competition among products. The EHCM model systematically integrates hierarchical competition analysis with multi-scale time series forecasting. Extensive experiments using a real-world app download dataset show that EHCM outperforms state-of-the-art methods in various forecasting granularities.

Download Full-text

An Efficient Tongue Segmentation Model Based on U-Net Framework

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001421540355 ◽

2021 ◽

Author(s):

Qunsheng Ruan ◽

Qingfeng Wu ◽

Junfeng Yao ◽

Yingdong Wang ◽

Hsien-Wei Tseng ◽

...

Keyword(s):

Loss Function ◽

Loss Rate ◽

State Of The Art ◽

Experimental Results ◽

Cross Entropy ◽

Model Based ◽

Training Samples ◽

Net Framework ◽

Tongue Segmentation

In the intelligently processing of the tongue image, one of the most important tasks is to accurately segment the tongue body from a whole tongue image, and the good quality of tongue body edge processing is of great significance for the relevant tongue feature extraction. To improve the performance of the segmentation model for tongue images, we propose an efficient tongue segmentation model based on U-Net. Three important studies are launched, including optimizing the model’s main network, innovating a new network to specially handle tongue edge cutting and proposing a weighted binary cross-entropy loss function. The purpose of optimizing the tongue image main segmentation network is to make the model recognize the foreground and background features for the tongue image as well as possible. A novel tongue edge segmentation network is used to focus on handling the tongue edge because the edge of the tongue contains a number of important information. Furthermore, the advantageous loss function proposed is to be adopted to enhance the pixel supervision corresponding to tongue images. Moreover, thanks to a lack of tongue image resources on Traditional Chinese Medicine (TCM), some special measures are adopted to augment training samples. Various comparing experiments on two datasets were conducted to verify the performance of the segmentation model. The experimental results indicate that the loss rate of our model converges faster than the others. It is proved that our model has better stability and robustness of segmentation for tongue image from poor environment. The experimental results also indicate that our model outperforms the state-of-the-art ones in aspects of the two most important tongue image segmentation indexes: IoU and Dice. Moreover, experimental results on augmentation samples demonstrate our model have better performances.

Download Full-text