Weakly-Supervised Video Re-Localization with Multiscale Attention Model

Yung-Han Huang; Kuang-Jui Hsu; Shyh-Kang Jeng; Yen-Yu Lin

doi:10.1609/aaai.v34i07.6763

Weakly-Supervised Video Re-Localization with Multiscale Attention Model

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6763 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11077-11084

Author(s):

Yung-Han Huang ◽

Kuang-Jui Hsu ◽

Shyh-Kang Jeng ◽

Yen-Yu Lin

Keyword(s):

Feature Extraction ◽

State Of The Art ◽

The State ◽

Attention Mechanism ◽

Attention Model ◽

Temporal Features ◽

Target Segment ◽

Public Dataset ◽

Art Performance ◽

Weakly Supervised

Video re-localization aims to localize a sub-sequence, called target segment, in an untrimmed reference video that is similar to a given query video. In this work, we propose an attention-based model to accomplish this task in a weakly supervised setting. Namely, we derive our CNN-based model without using the annotated locations of the target segments in reference videos. Our model contains three modules. First, it employs a pre-trained C3D network for feature extraction. Second, we design an attention mechanism to extract multiscale temporal features, which are then used to estimate the similarity between the query video and a reference video. Third, a localization layer detects where the target segment is in the reference video by determining whether each frame in the reference video is consistent with the query video. The resultant CNN model is derived based on the proposed co-attention loss which discriminatively separates the target segment from the reference video. This loss maximizes the similarity between the query video and the target segment while minimizing the similarity between the target segment and the rest of the reference video. Our model can be modified to fully supervised re-localization. Our method is evaluated on a public dataset and achieves the state-of-the-art performance under both weakly supervised and fully supervised settings.

Download Full-text

SANTM: Efficient Self-attention-driven Network for Text Matching

ACM Transactions on Internet Technology ◽

10.1145/3426971 ◽

2022 ◽

Vol 22 (3) ◽

pp. 1-21

Author(s):

Prayag Tiwari ◽

Amit Kumar Jaiswal ◽

Sahil Garg ◽

Ilsun You

Keyword(s):

Natural Language ◽

State Of The Art ◽

The State ◽

Attention Mechanism ◽

Matching Problems ◽

Attention Model ◽

Extra Information ◽

Textual Entailment ◽

Benchmark Datasets ◽

Text Matching

Self-attention mechanisms have recently been embraced for a broad range of text-matching applications. Self-attention model takes only one sentence as an input with no extra information, i.e., one can utilize the final hidden state or pooling. However, text-matching problems can be interpreted either in symmetrical or asymmetrical scopes. For instance, paraphrase detection is an asymmetrical task, while textual entailment classification and question-answer matching are considered asymmetrical tasks. In this article, we leverage attractive properties of self-attention mechanism and proposes an attention-based network that incorporates three key components for inter-sequence attention: global pointwise features, preceding attentive features, and contextual features while updating the rest of the components. Our model follows evaluation on two benchmark datasets cover tasks of textual entailment and question-answer matching. The proposed efficient Self-attention-driven Network for Text Matching outperforms the state of the art on the Stanford Natural Language Inference and WikiQA datasets with much fewer parameters.

Download Full-text

Non-Local Spatial and Temporal Attention Network for Video-Based Person Re-Identification

Applied Sciences ◽

10.3390/app10155385 ◽

2020 ◽

Vol 10 (15) ◽

pp. 5385

Author(s):

Zheng Liu ◽

Feixiang Du ◽

Wang Li ◽

Xu Liu ◽

Qiang Zou

Keyword(s):

Feature Extraction ◽

Long Range ◽

State Of The Art ◽

The State ◽

Feature Representation ◽

Excellent Performance ◽

Average Precision ◽

Temporal Features ◽

Non Local ◽

The Relationship

Given a video containing a person, the video-based person re-identification (Re-ID) task aims to identify the same person from videos captured under different cameras. How to embed spatial-temporal information of a video into its feature representation is a crucial challenge. Most existing methods have failed to make full use of the relationship between frames during feature extraction. In this work, we propose a plug-and-play non-local attention module (NLAM) for frame-level feature extraction. NLAM, based on global spatial attention and channel attention, helps the network to determine the location of the person in each frame. Besides, we propose a non-local temporal pooling (NLTP) method used for temporal features’ aggregation, which can effectively capture long-range and global dependencies among the frames of the video. Our model obtained impressive results on different datasets compared to the state-of-the-art methods. In particular, it achieved the rank-1 accuracy of 86.3% on the MARS (Motion Analysis and Re-identification Set) dataset without re-ranking, which is 1.4% higher than the state-of-the-art way. On the DukeMTMC-VideoReID (Duke Multi-Target Multi-Camera Video Reidentification) dataset, our method also had an excellent performance of 95% rank-1 accuracy and 94.5% mAP (mean Average Precision).

Download Full-text

A Multi-Branch Feature Fusion Strategy Based on an Attention Mechanism for Remote Sensing Image Scene Classification

Remote Sensing ◽

10.3390/rs13101950 ◽

2021 ◽

Vol 13 (10) ◽

pp. 1950

Author(s):

Cuiping Shi ◽

Xin Zhao ◽

Liguo Wang

Keyword(s):

Remote Sensing ◽

Feature Extraction ◽

Classification Accuracy ◽

Feature Fusion ◽

State Of The Art ◽

Rapid Development ◽

Remote Sensing Image ◽

Classification Performance ◽

Attention Mechanism ◽

Scene Classification

In recent years, with the rapid development of computer vision, increasing attention has been paid to remote sensing image scene classification. To improve the classification performance, many studies have increased the depth of convolutional neural networks (CNNs) and expanded the width of the network to extract more deep features, thereby increasing the complexity of the model. To solve this problem, in this paper, we propose a lightweight convolutional neural network based on attention-oriented multi-branch feature fusion (AMB-CNN) for remote sensing image scene classification. Firstly, we propose two convolution combination modules for feature extraction, through which the deep features of images can be fully extracted with multi convolution cooperation. Then, the weights of the feature are calculated, and the extracted deep features are sent to the attention mechanism for further feature extraction. Next, all of the extracted features are fused by multiple branches. Finally, depth separable convolution and asymmetric convolution are implemented to greatly reduce the number of parameters. The experimental results show that, compared with some state-of-the-art methods, the proposed method still has a great advantage in classification accuracy with very few parameters.

Download Full-text

A Graph-based Evolutionary Algorithm for Automated Machine Learning

10.37686/ser.v1i2.77 ◽

2020 ◽

Author(s):

Fei Qi ◽

Zhaohui Xia ◽

Gaoyang Tang ◽

Hang Yang ◽

Yu Song ◽

...

Keyword(s):

Machine Learning ◽

Evolutionary Algorithm ◽

Parameter Optimization ◽

State Of The Art ◽

The State ◽

Complex Structures ◽

Architecture Evolution ◽

Automated Machine Learning ◽

Art Performance

As an emerging field, Automated Machine Learning (AutoML) aims to reduce or eliminate manual operations that require expertise in machine learning. In this paper, a graph-based architecture is employed to represent flexible combinations of ML models, which provides a large searching space compared to tree-based and stacking-based architectures. Based on this, an evolutionary algorithm is proposed to search for the best architecture, where the mutation and heredity operators are the key for architecture evolution. With Bayesian hyper-parameter optimization, the proposed approach can automate the workflow of machine learning. On the PMLB dataset, the proposed approach shows the state-of-the-art performance compared with TPOT, Autostacker, and auto-sklearn. Some of the optimized models are with complex structures which are difficult to obtain in manual design.

Download Full-text

The State-of-the-Art Technology of Currency Identification

International Journal of Digital Crime and Forensics ◽

10.4018/ijdcf.2017070106 ◽

2017 ◽

Vol 9 (3) ◽

pp. 58-72 ◽

Cited By ~ 1

Author(s):

Guangyu Wang ◽

Xiaotian Wu ◽

WeiQi Yan

Keyword(s):

Feature Extraction ◽

New Zealand ◽

Comparative Study ◽

State Of The Art ◽

The State ◽

Classification Algorithms ◽

Security Issue ◽

The Public ◽

Us Dollar ◽

Chinese Yuan

The security issue of currency has attracted awareness from the public. De-spite the development of applying various anti-counterfeit methods on currency notes, cheaters are able to produce illegal copies and circulate them in market without being detected. By reviewing related work in currency security, the focus of this paper is on conducting a comparative study of feature extraction and classification algorithms of currency notes authentication. We extract various computational features from the dataset consisting of US dollar (USD), Chinese Yuan (CNY) and New Zealand Dollar (NZD) and apply the classification algorithms to currency identification. Our contributions are to find and implement various algorithms from the existing literatures and choose the best approaches for use.

Download Full-text

The State-of-the-Art Technology of Currency Identification

Digital Currency ◽

10.4018/978-1-5225-6201-6.ch014 ◽

2018 ◽

pp. 252-269

Author(s):

Guangyu Wang ◽

Xiaotian Wu ◽

WeiQi Yan

Keyword(s):

Feature Extraction ◽

New Zealand ◽

Comparative Study ◽

State Of The Art ◽

The State ◽

Classification Algorithms ◽

Security Issue ◽

The Public ◽

Us Dollar ◽

Chinese Yuan

Download Full-text

Local Feature-Aware Siamese Matching Model for Vehicle Re-Identification

Applied Sciences ◽

10.3390/app10072474 ◽

2020 ◽

Vol 10 (7) ◽

pp. 2474

Author(s):

Honglie Wang ◽

Shouqian Sun ◽

Lunan Zhou ◽

Lilin Guo ◽

Xin Min ◽

...

Keyword(s):

Feature Extraction ◽

Large Scale ◽

Feature Matching ◽

State Of The Art ◽

Intelligent Transportation ◽

The State ◽

Local Feature ◽

Public Security ◽

Matching Model ◽

Image Deformation

Vehicle re-identification is attracting an increasing amount of attention in intelligent transportation and is widely used in public security. In comparison to person re-identification, vehicle re-identification is more challenging because vehicles with different IDs are generated by a unified pipeline and cannot only be distinguished based on the subtle differences in their features such as lights, ornaments, and decorations. In this paper, we propose a local feature-aware Siamese matching model for vehicle re-identification. A local feature-aware Siamese matching model focuses on the informative parts in an image and these are the parts most likely to differ among vehicles with different IDs. In addition, we utilize Siamese feature matching to better supervise our attention. Furthermore, a perspective transformer network, which can eliminate image deformation, has been designed for feature extraction. We have conducted extensive experiments on three large-scale vehicle re-ID datasets, i.e., VeRi-776, VehicleID, and PKU-VD, and the results show that our method is superior to the state-of-the-art methods.

Download Full-text

Relational Prototypical Network for Weakly Supervised Temporal Action Localization

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6760 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11053-11060

Author(s):

Linjiang Huang ◽

Yan Huang ◽

Wanli Ouyang ◽

Liang Wang

Keyword(s):

State Of The Art ◽

Previous Method ◽

The State ◽

Localization Method ◽

Action Localization ◽

Art Methods ◽

Prototype Learning ◽

Weakly Supervised ◽

Temporal Action

In this paper, we propose a weakly supervised temporal action localization method on untrimmed videos based on prototypical networks. We observe two challenges posed by weakly supervision, namely action-background separation and action relation construction. Unlike the previous method, we propose to achieve action-background separation only by the original videos. To achieve this, a clustering loss is adopted to separate actions from backgrounds and learn intra-compact features, which helps in detecting complete action instances. Besides, a similarity weighting module is devised to further separate actions from backgrounds. To effectively identify actions, we propose to construct relations among actions for prototype learning. A GCN-based prototype embedding module is introduced to generate relational prototypes. Experiments on THUMOS14 and ActivityNet1.2 datasets show that our method outperforms the state-of-the-art methods.

Download Full-text

Highly efficient thermophones based on freestanding single-walled carbon nanotube films

Nanoscale Horizons ◽

10.1039/c9nh00164f ◽

2019 ◽

Vol 4 (5) ◽

pp. 1158-1163 ◽

Cited By ~ 8

Author(s):

Stepan A. Romanov ◽

Ali E. Aliev ◽

Boris V. Fine ◽

Anton S. Anisimov ◽

Albert G. Nasibulin

Keyword(s):

Carbon Nanotubes ◽

Carbon Nanotube ◽

State Of The Art ◽

Single Walled Carbon Nanotubes ◽

The State ◽

Single Walled Carbon Nanotube ◽

Single Walled Carbon ◽

Highly Efficient ◽

Art Performance ◽

Walled Carbon Nanotubes

We present the state-of-the-art performance of air-coupled thermophones made of thin, freestanding films of randomly oriented single-walled carbon nanotubes (SWCNTs).

Download Full-text

SOGNet: Scene Overlap Graph Network for Panoptic Segmentation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6955 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12637-12644 ◽

Cited By ~ 1

Author(s):

Yibo Yang ◽

Hongyang Li ◽

Xia Li ◽

Qijie Zhao ◽

Jianlong Wu ◽

...

Keyword(s):

State Of The Art ◽

Ground Truth ◽

The State ◽

Graph Representation ◽

Weak Supervision ◽

Scene Graph ◽

Art Performance ◽

Segmentation Task ◽

Instance Segmentation ◽

Relation Matrix

The panoptic segmentation task requires a unified result from semantic and instance segmentation outputs that may contain overlaps. However, current studies widely ignore modeling overlaps. In this study, we aim to model overlap relations among instances and resolve them for panoptic segmentation. Inspired by scene graph representation, we formulate the overlapping problem as a simplified case, named scene overlap graph. We leverage each object's category, geometry and appearance features to perform relational embedding, and output a relation matrix that encodes overlap relations. In order to overcome the lack of supervision, we introduce a differentiable module to resolve the overlap between any pair of instances. The mask logits after removing overlaps are fed into per-pixel instance id classification, which leverages the panoptic supervision to assist in the modeling of overlap relations. Besides, we generate an approximate ground truth of overlap relations as the weak supervision, to quantify the accuracy of overlap relations predicted by our method. Experiments on COCO and Cityscapes demonstrate that our method is able to accurately predict overlap relations, and outperform the state-of-the-art performance for panoptic segmentation. Our method also won the Innovation Award in COCO 2019 challenge.

Download Full-text