Distractor-Aware Tracking with Multi-Task and Dynamic Feature Learning

Author(s):  
Weichun Liu ◽  
Xiaoan Tang ◽  
Chenglin Zhao

Recently, deep trackers based on the siamese networking are enjoying increasing popularity in the tracking community. Generally, those trackers learn a high-level semantic embedding space for feature representation but lose low-level fine-grained details. Meanwhile, the learned high-level semantic features are not updated during online tracking, which results in tracking drift in presence of target appearance variation and similar distractors. In this paper, we present a novel end-to-end trainable Convolutional Neural Network (CNN) based on the siamese network for distractor-aware tracking. It enhances target appearance representation in both the offline training stage and online tracking stage. In the offline training stage, this network learns both the low-level fine-grained details and high-level coarse-grained semantics simultaneously in a multi-task learning framework. The low-level features with better resolution are complementary to semantic features and able to distinguish the foreground target from background distractors. In the online stage, the learned low-level features are fed into a correlation filter layer and updated in an interpolated manner to encode target appearance variation adaptively. The learned high-level features are fed into a cross-correlation layer without online update. Therefore, the proposed tracker benefits from both the adaptability of the fine-grained correlation filter and the generalization capability of the semantic embedding. Extensive experiments are conducted on the public OTB100 and UAV123 benchmark datasets. Our tracker achieves state-of-the-art performance while running with a real-time frame-rate.

Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5279
Author(s):  
Yang Li ◽  
Huahu Xu ◽  
Junsheng Xiao

Language-based person search retrieves images of a target person using natural language description and is a challenging fine-grained cross-modal retrieval task. A novel hybrid attention network is proposed for the task. The network includes the following three aspects: First, a cubic attention mechanism for person image, which combines cross-layer spatial attention and channel attention. It can fully excavate both important midlevel details and key high-level semantics to obtain better discriminative fine-grained feature representation of a person image. Second, a text attention network for language description, which is based on bidirectional LSTM (BiLSTM) and self-attention mechanism. It can better learn the bidirectional semantic dependency and capture the key words of sentences, so as to extract the context information and key semantic features of the language description more effectively and accurately. Third, a cross-modal attention mechanism and a joint loss function for cross-modal learning, which can pay more attention to the relevant parts between text and image features. It can better exploit both the cross-modal and intra-modal correlation and can better solve the problem of cross-modal heterogeneity. Extensive experiments have been conducted on the CUHK-PEDES dataset. Our approach obtains higher performance than state-of-the-art approaches, demonstrating the advantage of the approach we propose.


2019 ◽  
Author(s):  
Michael B. Bone ◽  
Fahad Ahmad ◽  
Bradley R. Buchsbaum

AbstractWhen recalling an experience of the past, many of the component features of the original episode may be, to a greater or lesser extent, reconstructed in the mind’s eye. There is strong evidence that the pattern of neural activity that occurred during an initial perceptual experience is recreated during episodic recall (neural reactivation), and that the degree of reactivation is correlated with the subjective vividness of the memory. However, while we know that reactivation occurs during episodic recall, we have lacked a way of precisely characterizing the contents—in terms of its featural constituents—of a reactivated memory. Here we present a novel approach, feature-specific informational connectivity (FSIC), that leverages hierarchical representations of image stimuli derived from a deep convolutional neural network to decode neural reactivation in fMRI data collected while participants performed an episodic recall task. We show that neural reactivation associated with low-level visual features (e.g. edges), high-level visual features (e.g. facial features), and semantic features (e.g. “terrier”) occur throughout the dorsal and ventral visual streams and extend into the frontal cortex. Moreover, we show that reactivation of both low- and high-level visual features correlate with the vividness of the memory, whereas only reactivation of low-level features correlates with recognition accuracy when the lure and target images are semantically similar. In addition to demonstrating the utility of FSIC for mapping feature-specific reactivation, these findings resolve the relative contributions of low- and high-level features to the vividness of visual memories, clarify the role of the frontal cortex during episodic recall, and challenge a strict interpretation the posterior-to-anterior visual hierarchy.


1994 ◽  
Vol 6 (3) ◽  
pp. 365-374 ◽  
Author(s):  
Philip T. Leat ◽  
Jane H. Scarrow

From at least the Early Jurassic to the Miocene, eastward subduction of oceanic crust took place beneath the Antarctic Peninsula. Magmatism associated with the subduction generated a N-S linear belt of volcanic rocks known as the Antarctic Peninsula Volcanic Group (APVG), and which erosion has now exposed at about the plutonic/volcanic interface. Large central volcanoes from the APVG are described here for the first time. The structures are situated in north-west Palmer Land within the main Mesozoic magmatic arc. One centre, Zonda Towers, is recognized by the presence of a 160 m thick silicic ignimbrite, containing accidental lava blocks up to 25 m in diameter. This megabreccia is interpreted as a caldera-fill deposit which formed by land sliding of steep caldera walls during ignimbrite eruption and deposition. A larger centre, Mount Edgell-Wright Spires, is dominated by coarse-grained debris flow deposits and silicic ignimbrites which, with minor lavas and fine-grained tuffs, form a volcanic succession some 1.5 km thick. Basic intermediate and silicic sills c. 50 m thick intrude the succession. A central gabbro-granite intrusion is interpreted to be a high-level magma chamber of the Mount Edgell volcano.


2021 ◽  
Author(s):  
Loris Naspi ◽  
Paul Hoffman ◽  
Barry Devereux ◽  
Alexa Morcom

When encoding new episodic memories, visual and semantic processing are proposed to make distinct contributions to accurate memory and memory distortions. Here, we used functional magnetic resonance imaging (fMRI) and representational similarity analysis to uncover the representations that predict true and false recognition of unfamiliar objects. Two semantic models captured coarse-grained taxonomic categories and specific object features, respectively, while two perceptual models embodied low-level visual properties. Twenty-eight female and male participants encoded images of objects during fMRI scanning, and later had to discriminate studied objects from similar lures and novel objects in a recognition memory test. Both perceptual and semantic models predicted true memory. When studied objects were later identified correctly, neural patterns corresponded to low-level visual representations of these object images in the early visual cortex, lingual, and fusiform gyri. In a similar fashion, alignment of neural patterns with fine-grained semantic feature representations in the fusiform gyrus also predicted true recognition. However, emphasis on coarser taxonomic representations predicted forgetting more anteriorly in ventral anterior temporal lobe, left perirhinal cortex, and left inferior frontal gyrus. In contrast, false recognition of similar lure objects was associated with weaker visual analysis posteriorly in early visual and left occipitotemporal cortex. The results implicate multiple perceptual and semantic representations in successful memory encoding and suggest that fine-grained semantic as well as visual analysis contributes to accurate later recognition, while processing visual image detail is critical for avoiding false recognition errors.


2012 ◽  
Vol 2012 ◽  
pp. 1-15 ◽  
Author(s):  
Ilia Lebedev ◽  
Christopher Fletcher ◽  
Shaoyi Cheng ◽  
James Martin ◽  
Austin Doupnik ◽  
...  

We present a highly productive approach to hardware design based on a many-core microarchitectural template used to implement compute-bound applications expressed in a high-level data-parallel language such as OpenCL. The template is customized on a per-application basis via a range of high-level parameters such as the interconnect topology or processing element architecture. The key benefits of this approach are that it (i) allows programmers to express parallelism through an API defined in a high-level programming language, (ii) supports coarse-grained multithreading and fine-grained threading while permitting bit-level resource control, and (iii) reduces the effort required to repurpose the system for different algorithms or different applications. We compare template-driven design to both full-custom and programmable approaches by studying implementations of a compute-bound data-parallel Bayesian graph inference algorithm across several candidate platforms. Specifically, we examine a range of template-based implementations on both FPGA and ASIC platforms and compare each against full custom designs. Throughout this study, we use a general-purpose graphics processing unit (GPGPU) implementation as a performance and area baseline. We show that our approach, similar in productivity to programmable approaches such as GPGPU applications, yields implementations with performance approaching that of full-custom designs on both FPGA and ASIC platforms.


Symmetry ◽  
2021 ◽  
Vol 13 (3) ◽  
pp. 447
Author(s):  
Aditi Gupta ◽  
Rinkaj Goyal

Software clones are code fragments with similar or nearly similar functionality or structures. These clones are introduced in a project either accidentally or deliberately during software development or maintenance process. The presence of clones poses a significant threat to the maintenance of software systems and is on the top of the list of code smell types. Clones can be simple (fine-grained) or high-level (coarse-grained), depending on the chosen granularity of code for the clone detection. Simple clones are generally viewed at the lines/statements level, whereas high-level clones have granularity as a block, method, class, or file. High-level clones are said to be composed of multiple simple clones. This study aims to detect high-level conceptual code clones (having granularity as java methods) in java-based projects, which is extendable to the projects developed in other languages as well. Conceptual code clones are the ones implementing a similar higher-level abstraction such as an Abstract Data Type (ADT) list. Based on the assumption that “similar documentation implies similar methods”, the proposed mechanism uses “documentation” associated with methods to identify method-level concept clones. As complete documentation does not contribute to the method’s semantics, we extracted only the description part of the method’s documentation, which led to two benefits: increased efficiency and reduced text corpus size. Further, we used Latent Semantic Indexing (LSI) with different combinations of weight and similarity measures to identify similar descriptions in the text corpus. To show the efficacy of the proposed approach, we validated it using three java open source systems of sufficient length. The findings suggest that the proposed mechanism can detect methods implementing similar high-level concepts with improved recall values.


Smart Cities ◽  
2021 ◽  
Vol 4 (1) ◽  
pp. 204-216
Author(s):  
Xinyue Ye ◽  
Lian Duan ◽  
Qiong Peng

Spatiotemporal prediction of crime is crucial for public safety and smart cities operation. As crime incidents are distributed sparsely across space and time, existing deep-learning methods constrained by coarse spatial scale offer only limited values in prediction of crime density. This paper proposes the use of deep inception-residual networks (DIRNet) to conduct fine-grained, theft-related crime prediction based on non-emergency service request data (311 events). Specifically, it outlines the employment of inception units comprising asymmetrical convolution layers to draw low-level spatiotemporal dependencies hidden in crime events and complaint records in the 311 dataset. Afterward, this paper details how residual units can be applied to capture high-level spatiotemporal features from low-level spatiotemporal dependencies for the final prediction. The effectiveness of the proposed DIRNet is evaluated based on theft-related crime data and 311 data in New York City from 2010 to 2015. The results confirm that the DIRNet obtains an average F1 of 71%, which is better than other prediction models.


Sensors ◽  
2020 ◽  
Vol 20 (21) ◽  
pp. 6204
Author(s):  
Qinghong Liu ◽  
Yong Qin ◽  
Zhengyu Xie ◽  
Zhiwei Cao ◽  
Limin Jia

Trains shuttle in semiopen environments, and the surrounding environment plays an important role in the safety of train operation. The weather is one of the factors that affect the surrounding environment of railways. Under haze conditions, railway monitoring and staff vision could be blurred, threatening railway safety. This paper tackles image dehazing for railways. The contributions of this paper for railway video image dehazing are as follows: (1) this paper proposes an end-to-end residual block-based haze removal method that consists of two subnetworks, namely fine-grained and coarse-grained network can directly generate the clean image from input hazy image, called RID-Net (Railway Image Dehazing Network). (2) The combined loss function (per-pixel loss and perceptual loss functions) is proposed to achieve both low-level features and high-level features so to generate the high-quality restored images. (3) We take the full-reference criterion (PSNR&SSIM), object detection, running time, and sensory vision to evaluate the proposed dehazing method. Experimental results on railway synthesized dataset, benchmark indoor dataset, and real-world dataset demonstrate our method has superior performance compared to the state-of-the-art methods.


Sign in / Sign up

Export Citation Format

Share Document