Aerial Video Multi-target Detection with Memory Module *

Generating stylized captions for images is a challenging task since it requires not only describing the content of the image accurately but also expressing the desired linguistic style appropriately. In this paper, we propose MemCap, a novel stylized image captioning method that explicitly encodes the knowledge about linguistic styles with memory mechanism. Rather than relying heavily on a language model to capture style factors in existing methods, our method resorts to memorizing stylized elements learned from training corpus. Particularly, we design a memory module that comprises a set of embedding vectors for encoding style-related phrases in training corpus. To acquire the style-related phrases, we develop a sentence decomposing algorithm that splits a stylized sentence into a style-related part that reflects the linguistic style and a content-related part that contains the visual content. When generating captions, our MemCap first extracts content-relevant style knowledge from the memory module via an attention mechanism and then incorporates the extracted knowledge into a language model. Extensive experiments on two stylized image captioning datasets (SentiCap and FlickrStyle10K) demonstrate the effectiveness of our method.

Download Full-text

Unsupervised Scene Adaptation with Memory Regularization in vivo

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/150 ◽

2020 ◽

Cited By ~ 2

Author(s):

Zhedong Zheng ◽

Yi Yang

Keyword(s):

Domain Knowledge ◽

Domain Adaptation ◽

Baseline Model ◽

Memory Module ◽

Orthogonal Method ◽

Source Data ◽

Model Training ◽

With Memory ◽

Target Data

This work focuses on the unsupervised scene adaptation problem of learning from both labeled source data and unlabeled target data. Existing approaches focus on minoring the inter-domain gap between the source and target domains. However, the intra-domain knowledge and inherent uncertainty learned by the network are under-explored. In this paper, we propose an orthogonal method, called memory regularization in vivo, to exploit the intra-domain knowledge and regularize the model training. Specifically, we refer to the segmentation model itself as the memory module, and minor the discrepancy of the two classifiers, i.e., the primary classifier and the auxiliary classifier, to reduce the prediction inconsistency. Without extra parameters, the proposed method is complementary to most existing domain adaptation methods and could generally improve the performance of existing methods. Albeit simple, we verify the effectiveness of memory regularization on two synthetic-to-real benchmarks: GTA5 → Cityscapes and SYNTHIA → Cityscapes, yielding +11.1% and +11.3% mIoU improvement over the baseline model, respectively. Besides, a similar +12.0% mIoU improvement is observed on the cross-city benchmark: Cityscapes → Oxford RobotCar.

Download Full-text

Visual Detail Augmented Mapping for Small Aerial Target Detection

Remote Sensing ◽

10.3390/rs11010014 ◽

2018 ◽

Vol 11 (1) ◽

pp. 14 ◽

Cited By ~ 7

Author(s):

Jing Li ◽

Yanran Dai ◽

Congcong Li ◽

Junqi Shu ◽

Dongdong Li ◽

...

Keyword(s):

Object Detection ◽

Target Detection ◽

Visual Analysis ◽

High Efficiency ◽

High Rate ◽

False Alarms ◽

Moving Targets ◽

Visual Detail ◽

Mapping Approach ◽

Aerial Video

Moving target detection plays a primary and pivotal role in avionics visual analysis, which aims to completely and accurately detect moving objects from complex backgrounds. However, due to the relatively small sizes of targets in aerial video, many deep networks that achieve success in normal size object detection are usually accompanied by a high rate of false alarms and missed detections. To address this problem, we propose a novel visual detail augmented mapping approach for small aerial target detection. Concretely, we first present a multi-cue foreground segmentation algorithm including motion and grayscale information to extract potential regions. Then, based on the visual detail augmented mapping approach, the regions that might contain moving targets are magnified to multi-resolution to obtain detailed target information and rearranged into new foreground space for visual enhancement. Thus, original small targets are mapped to a more efficient foreground augmented map which is favorable for accurate detection. Finally, driven by the success of deep detection network, small moving targets can be well detected from aerial video. Experiments extensively demonstrate that the proposed method achieves success in small aerial target detection without changing the structure of the deep network. In addition, compared with the-state-of-art object detection algorithms, it performs favorably with high efficiency and robustness.

Download Full-text

Independent moving target detection for aerial video surveillance

10.1117/12.658207 ◽

2005 ◽

Author(s):

Dong Xu ◽

Jinwen An

Keyword(s):

Target Detection ◽

Video Surveillance ◽

Moving Target ◽

Moving Target Detection ◽

Aerial Video

Download Full-text

Pacing Electrocardiogram Detection With Memory-Based Autoencoder and Metric Learning

Frontiers in Physiology ◽

10.3389/fphys.2021.727210 ◽

2021 ◽

Vol 12 ◽

Author(s):

Zhaoyang Ge ◽

Huiqing Cheng ◽

Zhuang Tong ◽

Lihong Yang ◽

Bing Zhou ◽

...

Keyword(s):

Objective Function ◽

Input Data ◽

Metric Learning ◽

Large Error ◽

Limited Information ◽

Memory Module ◽

Latent Features ◽

With Memory ◽

Ecg Database ◽

Reconstructed Data

Remote ECG diagnosis has been widely used in the clinical ECG workflow. Especially for patients with pacemaker, in the limited information of patient's medical history, doctors need to determine whether the patient is wearing a pacemaker and also diagnose other abnormalities. An automatic detection pacing ECG method can help cardiologists reduce the workload and the rates of misdiagnosis. In this paper, we propose a novel autoencoder framework that can detect the pacing ECG from the remote ECG. First, we design a memory module in the traditional autoencoder. The memory module is to record and query the typical features of the training pacing ECG type. The framework does not directly feed features of the encoder into the decoder but uses the features to retrieve the most relevant items in the memory module. In the training process, the memory items are updated to represent the latent features of the input pacing ECG. In the detection process, the reconstruction data of the decoder is obtained by the fusion features in the memory module. Therefore, the reconstructed data of the decoder tends to be close to the pacing ECG. Meanwhile, we introduce an objective function based on the idea of metric learning. In the context of pacing ECG detection, comparing the error of objective function of the input data and reconstructed data can be used as an indicator of detection. According to the objective function, if the input data does not belong to pacing ECG, the objective function may get a large error. Furthermore, we introduce a new database named the pacing ECG database including 800 patients with a total of 8,000 heartbeats. Experimental results demonstrate that our method achieves an average F1-score of 0.918. To further validate the generalization of the proposed method, we also experiment on a widely used MIT-BIH arrhythmia database.

Download Full-text

Selective Attention and Error Processing in an Illusory Conjunction Task

Journal of Psychophysiology ◽

10.1027/0269-8803.19.3.216 ◽

2005 ◽

Vol 19 (3) ◽

pp. 216-231 ◽

Cited By ~ 8

Author(s):

Albertus A. Wijers ◽

Maarten A.S. Boksem

Keyword(s):

Target Detection ◽

Event Related Potentials ◽

Illusory Conjunction ◽

Conjunction Error ◽

Error Related Negativity ◽

Display Element ◽

Related Potentials ◽

Feature Error ◽

High Level ◽

Correct Target

Abstract. We recorded event-related potentials in an illusory conjunction task, in which subjects were cued on each trial to search for a particular colored letter in a subsequently presented test array, consisting of three different letters in three different colors. In a proportion of trials the target letter was present and in other trials none of the relevant features were present. In still other trials one of the features (color or letter identity) were present or both features were present but not combined in the same display element. When relevant features were present this resulted in an early posterior selection negativity (SN) and a frontal selection positivity (FSP). When a target was presented, this resulted in a FSP that was enhanced after 250 ms as compared to when both relevant features were present but not combined in the same display element. This suggests that this effect reflects an extra process of attending to both features bound to the same object. There were no differences between the ERPs in feature error and conjunction error trials, contrary to the idea that these two types of errors are due to different (perceptual and attentional) mechanisms. The P300 in conjunction error trials was much reduced relative to the P300 in correct target detection trials. A similar, error-related negativity-like component was visible in the response-locked averages in correct target detection trials, in feature error trials, and in conjunction error trials. Dipole modeling of this component resulted in a source in a deep medial-frontal location. These results suggested that this type of task induces a high level of response conflict, in which decision-related processes may play a major role.

Download Full-text