Aerial Video Multi-target Detection with Memory Module *

Author(s):  
Haihong Chi ◽  
Xiangrui Gao
2020 ◽  
Vol 34 (07) ◽  
pp. 12984-12992 ◽  
Author(s):  
Wentian Zhao ◽  
Xinxiao Wu ◽  
Xiaoxun Zhang

Generating stylized captions for images is a challenging task since it requires not only describing the content of the image accurately but also expressing the desired linguistic style appropriately. In this paper, we propose MemCap, a novel stylized image captioning method that explicitly encodes the knowledge about linguistic styles with memory mechanism. Rather than relying heavily on a language model to capture style factors in existing methods, our method resorts to memorizing stylized elements learned from training corpus. Particularly, we design a memory module that comprises a set of embedding vectors for encoding style-related phrases in training corpus. To acquire the style-related phrases, we develop a sentence decomposing algorithm that splits a stylized sentence into a style-related part that reflects the linguistic style and a content-related part that contains the visual content. When generating captions, our MemCap first extracts content-relevant style knowledge from the memory module via an attention mechanism and then incorporates the extracted knowledge into a language model. Extensive experiments on two stylized image captioning datasets (SentiCap and FlickrStyle10K) demonstrate the effectiveness of our method.


Author(s):  
Zhedong Zheng ◽  
Yi Yang

This work focuses on the unsupervised scene adaptation problem of learning from both labeled source data and unlabeled target data. Existing approaches focus on minoring the inter-domain gap between the source and target domains. However, the intra-domain knowledge and inherent uncertainty learned by the network are under-explored. In this paper, we propose an orthogonal method, called memory regularization in vivo, to exploit the intra-domain knowledge and regularize the model training. Specifically, we refer to the segmentation model itself as the memory module, and minor the discrepancy of the two classifiers, i.e., the primary classifier and the auxiliary classifier, to reduce the prediction inconsistency. Without extra parameters, the proposed method is complementary to most existing domain adaptation methods and could generally improve the performance of existing methods. Albeit simple, we verify the effectiveness of memory regularization on two synthetic-to-real benchmarks: GTA5 → Cityscapes and SYNTHIA → Cityscapes, yielding +11.1% and +11.3% mIoU improvement over the baseline model, respectively. Besides, a similar +12.0% mIoU improvement is observed on the cross-city benchmark: Cityscapes → Oxford RobotCar.


2018 ◽  
Vol 11 (1) ◽  
pp. 14 ◽  
Author(s):  
Jing Li ◽  
Yanran Dai ◽  
Congcong Li ◽  
Junqi Shu ◽  
Dongdong Li ◽  
...  

Moving target detection plays a primary and pivotal role in avionics visual analysis, which aims to completely and accurately detect moving objects from complex backgrounds. However, due to the relatively small sizes of targets in aerial video, many deep networks that achieve success in normal size object detection are usually accompanied by a high rate of false alarms and missed detections. To address this problem, we propose a novel visual detail augmented mapping approach for small aerial target detection. Concretely, we first present a multi-cue foreground segmentation algorithm including motion and grayscale information to extract potential regions. Then, based on the visual detail augmented mapping approach, the regions that might contain moving targets are magnified to multi-resolution to obtain detailed target information and rearranged into new foreground space for visual enhancement. Thus, original small targets are mapped to a more efficient foreground augmented map which is favorable for accurate detection. Finally, driven by the success of deep detection network, small moving targets can be well detected from aerial video. Experiments extensively demonstrate that the proposed method achieves success in small aerial target detection without changing the structure of the deep network. In addition, compared with the-state-of-art object detection algorithms, it performs favorably with high efficiency and robustness.


2021 ◽  
Vol 12 ◽  
Author(s):  
Zhaoyang Ge ◽  
Huiqing Cheng ◽  
Zhuang Tong ◽  
Lihong Yang ◽  
Bing Zhou ◽  
...  

Remote ECG diagnosis has been widely used in the clinical ECG workflow. Especially for patients with pacemaker, in the limited information of patient's medical history, doctors need to determine whether the patient is wearing a pacemaker and also diagnose other abnormalities. An automatic detection pacing ECG method can help cardiologists reduce the workload and the rates of misdiagnosis. In this paper, we propose a novel autoencoder framework that can detect the pacing ECG from the remote ECG. First, we design a memory module in the traditional autoencoder. The memory module is to record and query the typical features of the training pacing ECG type. The framework does not directly feed features of the encoder into the decoder but uses the features to retrieve the most relevant items in the memory module. In the training process, the memory items are updated to represent the latent features of the input pacing ECG. In the detection process, the reconstruction data of the decoder is obtained by the fusion features in the memory module. Therefore, the reconstructed data of the decoder tends to be close to the pacing ECG. Meanwhile, we introduce an objective function based on the idea of metric learning. In the context of pacing ECG detection, comparing the error of objective function of the input data and reconstructed data can be used as an indicator of detection. According to the objective function, if the input data does not belong to pacing ECG, the objective function may get a large error. Furthermore, we introduce a new database named the pacing ECG database including 800 patients with a total of 8,000 heartbeats. Experimental results demonstrate that our method achieves an average F1-score of 0.918. To further validate the generalization of the proposed method, we also experiment on a widely used MIT-BIH arrhythmia database.


2005 ◽  
Vol 19 (3) ◽  
pp. 216-231 ◽  
Author(s):  
Albertus A. Wijers ◽  
Maarten A.S. Boksem

Abstract. We recorded event-related potentials in an illusory conjunction task, in which subjects were cued on each trial to search for a particular colored letter in a subsequently presented test array, consisting of three different letters in three different colors. In a proportion of trials the target letter was present and in other trials none of the relevant features were present. In still other trials one of the features (color or letter identity) were present or both features were present but not combined in the same display element. When relevant features were present this resulted in an early posterior selection negativity (SN) and a frontal selection positivity (FSP). When a target was presented, this resulted in a FSP that was enhanced after 250 ms as compared to when both relevant features were present but not combined in the same display element. This suggests that this effect reflects an extra process of attending to both features bound to the same object. There were no differences between the ERPs in feature error and conjunction error trials, contrary to the idea that these two types of errors are due to different (perceptual and attentional) mechanisms. The P300 in conjunction error trials was much reduced relative to the P300 in correct target detection trials. A similar, error-related negativity-like component was visible in the response-locked averages in correct target detection trials, in feature error trials, and in conjunction error trials. Dipole modeling of this component resulted in a source in a deep medial-frontal location. These results suggested that this type of task induces a high level of response conflict, in which decision-related processes may play a major role.


Sign in / Sign up

Export Citation Format

Share Document