scholarly journals Weakly-Supervised Video Re-Localization with Multiscale Attention Model

2020 ◽  
Vol 34 (07) ◽  
pp. 11077-11084
Author(s):  
Yung-Han Huang ◽  
Kuang-Jui Hsu ◽  
Shyh-Kang Jeng ◽  
Yen-Yu Lin

Video re-localization aims to localize a sub-sequence, called target segment, in an untrimmed reference video that is similar to a given query video. In this work, we propose an attention-based model to accomplish this task in a weakly supervised setting. Namely, we derive our CNN-based model without using the annotated locations of the target segments in reference videos. Our model contains three modules. First, it employs a pre-trained C3D network for feature extraction. Second, we design an attention mechanism to extract multiscale temporal features, which are then used to estimate the similarity between the query video and a reference video. Third, a localization layer detects where the target segment is in the reference video by determining whether each frame in the reference video is consistent with the query video. The resultant CNN model is derived based on the proposed co-attention loss which discriminatively separates the target segment from the reference video. This loss maximizes the similarity between the query video and the target segment while minimizing the similarity between the target segment and the rest of the reference video. Our model can be modified to fully supervised re-localization. Our method is evaluated on a public dataset and achieves the state-of-the-art performance under both weakly supervised and fully supervised settings.

2022 ◽  
Vol 22 (3) ◽  
pp. 1-21
Author(s):  
Prayag Tiwari ◽  
Amit Kumar Jaiswal ◽  
Sahil Garg ◽  
Ilsun You

Self-attention mechanisms have recently been embraced for a broad range of text-matching applications. Self-attention model takes only one sentence as an input with no extra information, i.e., one can utilize the final hidden state or pooling. However, text-matching problems can be interpreted either in symmetrical or asymmetrical scopes. For instance, paraphrase detection is an asymmetrical task, while textual entailment classification and question-answer matching are considered asymmetrical tasks. In this article, we leverage attractive properties of self-attention mechanism and proposes an attention-based network that incorporates three key components for inter-sequence attention: global pointwise features, preceding attentive features, and contextual features while updating the rest of the components. Our model follows evaluation on two benchmark datasets cover tasks of textual entailment and question-answer matching. The proposed efficient Self-attention-driven Network for Text Matching outperforms the state of the art on the Stanford Natural Language Inference and WikiQA datasets with much fewer parameters.


2020 ◽  
Vol 10 (15) ◽  
pp. 5385
Author(s):  
Zheng Liu ◽  
Feixiang Du ◽  
Wang Li ◽  
Xu Liu ◽  
Qiang Zou

Given a video containing a person, the video-based person re-identification (Re-ID) task aims to identify the same person from videos captured under different cameras. How to embed spatial-temporal information of a video into its feature representation is a crucial challenge. Most existing methods have failed to make full use of the relationship between frames during feature extraction. In this work, we propose a plug-and-play non-local attention module (NLAM) for frame-level feature extraction. NLAM, based on global spatial attention and channel attention, helps the network to determine the location of the person in each frame. Besides, we propose a non-local temporal pooling (NLTP) method used for temporal features’ aggregation, which can effectively capture long-range and global dependencies among the frames of the video. Our model obtained impressive results on different datasets compared to the state-of-the-art methods. In particular, it achieved the rank-1 accuracy of 86.3% on the MARS (Motion Analysis and Re-identification Set) dataset without re-ranking, which is 1.4% higher than the state-of-the-art way. On the DukeMTMC-VideoReID (Duke Multi-Target Multi-Camera Video Reidentification) dataset, our method also had an excellent performance of 95% rank-1 accuracy and 94.5% mAP (mean Average Precision).


2021 ◽  
Vol 13 (10) ◽  
pp. 1950
Author(s):  
Cuiping Shi ◽  
Xin Zhao ◽  
Liguo Wang

In recent years, with the rapid development of computer vision, increasing attention has been paid to remote sensing image scene classification. To improve the classification performance, many studies have increased the depth of convolutional neural networks (CNNs) and expanded the width of the network to extract more deep features, thereby increasing the complexity of the model. To solve this problem, in this paper, we propose a lightweight convolutional neural network based on attention-oriented multi-branch feature fusion (AMB-CNN) for remote sensing image scene classification. Firstly, we propose two convolution combination modules for feature extraction, through which the deep features of images can be fully extracted with multi convolution cooperation. Then, the weights of the feature are calculated, and the extracted deep features are sent to the attention mechanism for further feature extraction. Next, all of the extracted features are fused by multiple branches. Finally, depth separable convolution and asymmetric convolution are implemented to greatly reduce the number of parameters. The experimental results show that, compared with some state-of-the-art methods, the proposed method still has a great advantage in classification accuracy with very few parameters.


2020 ◽  
Author(s):  
Fei Qi ◽  
Zhaohui Xia ◽  
Gaoyang Tang ◽  
Hang Yang ◽  
Yu Song ◽  
...  

As an emerging field, Automated Machine Learning (AutoML) aims to reduce or eliminate manual operations that require expertise in machine learning. In this paper, a graph-based architecture is employed to represent flexible combinations of ML models, which provides a large searching space compared to tree-based and stacking-based architectures. Based on this, an evolutionary algorithm is proposed to search for the best architecture, where the mutation and heredity operators are the key for architecture evolution. With Bayesian hyper-parameter optimization, the proposed approach can automate the workflow of machine learning. On the PMLB dataset, the proposed approach shows the state-of-the-art performance compared with TPOT, Autostacker, and auto-sklearn. Some of the optimized models are with complex structures which are difficult to obtain in manual design.


2017 ◽  
Vol 9 (3) ◽  
pp. 58-72 ◽  
Author(s):  
Guangyu Wang ◽  
Xiaotian Wu ◽  
WeiQi Yan

The security issue of currency has attracted awareness from the public. De-spite the development of applying various anti-counterfeit methods on currency notes, cheaters are able to produce illegal copies and circulate them in market without being detected. By reviewing related work in currency security, the focus of this paper is on conducting a comparative study of feature extraction and classification algorithms of currency notes authentication. We extract various computational features from the dataset consisting of US dollar (USD), Chinese Yuan (CNY) and New Zealand Dollar (NZD) and apply the classification algorithms to currency identification. Our contributions are to find and implement various algorithms from the existing literatures and choose the best approaches for use.


2018 ◽  
pp. 252-269
Author(s):  
Guangyu Wang ◽  
Xiaotian Wu ◽  
WeiQi Yan

The security issue of currency has attracted awareness from the public. De-spite the development of applying various anti-counterfeit methods on currency notes, cheaters are able to produce illegal copies and circulate them in market without being detected. By reviewing related work in currency security, the focus of this paper is on conducting a comparative study of feature extraction and classification algorithms of currency notes authentication. We extract various computational features from the dataset consisting of US dollar (USD), Chinese Yuan (CNY) and New Zealand Dollar (NZD) and apply the classification algorithms to currency identification. Our contributions are to find and implement various algorithms from the existing literatures and choose the best approaches for use.


2020 ◽  
Vol 10 (7) ◽  
pp. 2474
Author(s):  
Honglie Wang ◽  
Shouqian Sun ◽  
Lunan Zhou ◽  
Lilin Guo ◽  
Xin Min ◽  
...  

Vehicle re-identification is attracting an increasing amount of attention in intelligent transportation and is widely used in public security. In comparison to person re-identification, vehicle re-identification is more challenging because vehicles with different IDs are generated by a unified pipeline and cannot only be distinguished based on the subtle differences in their features such as lights, ornaments, and decorations. In this paper, we propose a local feature-aware Siamese matching model for vehicle re-identification. A local feature-aware Siamese matching model focuses on the informative parts in an image and these are the parts most likely to differ among vehicles with different IDs. In addition, we utilize Siamese feature matching to better supervise our attention. Furthermore, a perspective transformer network, which can eliminate image deformation, has been designed for feature extraction. We have conducted extensive experiments on three large-scale vehicle re-ID datasets, i.e., VeRi-776, VehicleID, and PKU-VD, and the results show that our method is superior to the state-of-the-art methods.


2020 ◽  
Vol 34 (07) ◽  
pp. 11053-11060
Author(s):  
Linjiang Huang ◽  
Yan Huang ◽  
Wanli Ouyang ◽  
Liang Wang

In this paper, we propose a weakly supervised temporal action localization method on untrimmed videos based on prototypical networks. We observe two challenges posed by weakly supervision, namely action-background separation and action relation construction. Unlike the previous method, we propose to achieve action-background separation only by the original videos. To achieve this, a clustering loss is adopted to separate actions from backgrounds and learn intra-compact features, which helps in detecting complete action instances. Besides, a similarity weighting module is devised to further separate actions from backgrounds. To effectively identify actions, we propose to construct relations among actions for prototype learning. A GCN-based prototype embedding module is introduced to generate relational prototypes. Experiments on THUMOS14 and ActivityNet1.2 datasets show that our method outperforms the state-of-the-art methods.


2019 ◽  
Vol 4 (5) ◽  
pp. 1158-1163 ◽  
Author(s):  
Stepan A. Romanov ◽  
Ali E. Aliev ◽  
Boris V. Fine ◽  
Anton S. Anisimov ◽  
Albert G. Nasibulin

We present the state-of-the-art performance of air-coupled thermophones made of thin, freestanding films of randomly oriented single-walled carbon nanotubes (SWCNTs).


2020 ◽  
Vol 34 (07) ◽  
pp. 12637-12644 ◽  
Author(s):  
Yibo Yang ◽  
Hongyang Li ◽  
Xia Li ◽  
Qijie Zhao ◽  
Jianlong Wu ◽  
...  

The panoptic segmentation task requires a unified result from semantic and instance segmentation outputs that may contain overlaps. However, current studies widely ignore modeling overlaps. In this study, we aim to model overlap relations among instances and resolve them for panoptic segmentation. Inspired by scene graph representation, we formulate the overlapping problem as a simplified case, named scene overlap graph. We leverage each object's category, geometry and appearance features to perform relational embedding, and output a relation matrix that encodes overlap relations. In order to overcome the lack of supervision, we introduce a differentiable module to resolve the overlap between any pair of instances. The mask logits after removing overlaps are fed into per-pixel instance id classification, which leverages the panoptic supervision to assist in the modeling of overlap relations. Besides, we generate an approximate ground truth of overlap relations as the weak supervision, to quantify the accuracy of overlap relations predicted by our method. Experiments on COCO and Cityscapes demonstrate that our method is able to accurately predict overlap relations, and outperform the state-of-the-art performance for panoptic segmentation. Our method also won the Innovation Award in COCO 2019 challenge.


Sign in / Sign up

Export Citation Format

Share Document