scholarly journals Multi-Task Driven Feature Models for Thermal Infrared Tracking

2020 ◽  
Vol 34 (07) ◽  
pp. 11604-11611 ◽  
Author(s):  
Qiao Liu ◽  
Xin Li ◽  
Zhenyu He ◽  
Nana Fan ◽  
Di Yuan ◽  
...  

Existing deep Thermal InfraRed (TIR) trackers usually use the feature models of RGB trackers for representation. However, these feature models learned on RGB images are neither effective in representing TIR objects nor taking fine-grained TIR information into consideration. To this end, we develop a multi-task framework to learn the TIR-specific discriminative features and fine-grained correlation features for TIR tracking. Specifically, we first use an auxiliary classification network to guide the generation of TIR-specific discriminative features for distinguishing the TIR objects belonging to different classes. Second, we design a fine-grained aware module to capture more subtle information for distinguishing the TIR objects belonging to the same class. These two kinds of features complement each other and recognize TIR objects in the levels of inter-class and intra-class respectively. These two feature models are learned using a multi-task matching framework and are jointly optimized on the TIR tracking task. In addition, we develop a large-scale TIR training dataset to train the network for adapting the model to the TIR domain. Extensive experimental results on three benchmarks show that the proposed algorithm achieves a relative gain of 10% over the baseline and performs favorably against the state-of-the-art methods. Codes and the proposed TIR dataset are available at https://github.com/QiaoLiuHit/MMNet.

Author(s):  
Anil S. Baslamisli ◽  
Partha Das ◽  
Hoang-An Le ◽  
Sezer Karaoglu ◽  
Theo Gevers

AbstractIn general, intrinsic image decomposition algorithms interpret shading as one unified component including all photometric effects. As shading transitions are generally smoother than reflectance (albedo) changes, these methods may fail in distinguishing strong photometric effects from reflectance variations. Therefore, in this paper, we propose to decompose the shading component into direct (illumination) and indirect shading (ambient light and shadows) subcomponents. The aim is to distinguish strong photometric effects from reflectance variations. An end-to-end deep convolutional neural network (ShadingNet) is proposed that operates in a fine-to-coarse manner with a specialized fusion and refinement unit exploiting the fine-grained shading model. It is designed to learn specific reflectance cues separated from specific photometric effects to analyze the disentanglement capability. A large-scale dataset of scene-level synthetic images of outdoor natural environments is provided with fine-grained intrinsic image ground-truths. Large scale experiments show that our approach using fine-grained shading decompositions outperforms state-of-the-art algorithms utilizing unified shading on NED, MPI Sintel, GTA V, IIW, MIT Intrinsic Images, 3DRMS and SRD datasets.


Author(s):  
Xiaoxiao Sun ◽  
Liyi Chen ◽  
Jufeng Yang

Fine-grained classification is absorbed in recognizing the subordinate categories of one field, which need a large number of labeled images, while it is expensive to label these images. Utilizing web data has been an attractive option to meet the demands of training data for convolutional neural networks (CNNs), especially when the well-labeled data is not enough. However, directly training on such easily obtained images often leads to unsatisfactory performance due to factors such as noisy labels. This has been conventionally addressed by reducing the noise level of web data. In this paper, we take a fundamentally different view and propose an adversarial discriminative loss to advocate representation coherence between standard and web data. This is further encapsulated in a simple, scalable and end-to-end trainable multi-task learning framework. We experiment on three public datasets using large-scale web data to evaluate the effectiveness and generalizability of the proposed approach. Extensive experiments demonstrate that our approach performs favorably against the state-of-the-art methods.


2021 ◽  
Vol 13 (9) ◽  
pp. 1765
Author(s):  
Juan M. Sánchez ◽  
César Coll ◽  
Raquel Niclòs

The combination of the state-of-the-art in the thermal infrared (TIR) domain [...]


Author(s):  
Chen Liu ◽  
Bo Li ◽  
Jun Zhao ◽  
Ming Su ◽  
Xu-Dong Liu

Detecting the newly emerging malware variants in real time is crucial for mitigating cyber risks and proactively blocking intrusions. In this paper, we propose MG-DVD, a novel detection framework based on dynamic heterogeneous graph learning, to detect malware variants in real time. Particularly, MG-DVD first models the fine-grained execution event streams of malware variants into dynamic heterogeneous graphs and investigates real-world meta-graphs between malware objects, which can effectively characterize more discriminative malicious evolutionary patterns between malware and their variants. Then, MG-DVD presents two dynamic walk-based heterogeneous graph learning methods to learn more comprehensive representations of malware variants, which significantly reduces the cost of the entire graph retraining. As a result, MG-DVD is equipped with the ability to detect malware variants in real time, and it presents better interpretability by introducing meaningful meta-graphs. Comprehensive experiments on large-scale samples prove that our proposed MG-DVD outperforms state-of-the-art methods in detecting malware variants in terms of effectiveness and efficiency.


2008 ◽  
Vol 14 (4) ◽  
pp. 547-573 ◽  
Author(s):  
ROBERTO NAVIGLI

AbstractThe semantic annotation of texts with senses from a computational lexicon is a complex and often subjective task. As a matter of fact, the fine granularity of the WordNet sense inventory [Fellbaum, Christiane (ed.). 1998.WordNet: An Electronic Lexical DatabaseMIT Press], ade factostandard within the research community, is one of the main causes of a low inter-tagger agreement ranging between 70% and 80% and the disappointing performance of automated fine-grained disambiguation systems (around 65% state of the art in the Senseval-3 English all-words task). In order to improve the performance of both manual and automated sense taggers, either we change the sense inventory (e.g. adopting a new dictionary or clustering WordNet senses) or we aim at resolving the disagreements between annotators by dealing with the fineness of sense distinctions. The former approach is not viable in the short term, as wide-coverage resources are not publicly available and no large-scale reliable clustering of WordNet senses has been released to date. The latter approach requires the ability to distinguish between subtle or misleading sense distinctions. In this paper, we propose the use of structural semantic interconnections – a specific kind of lexical chains – for the adjudication of disagreed sense assignments to words in context. The approach relies on the exploitation of the lexicon structure as a support to smooth possible divergencies between sense annotators and foster coherent choices. We perform a twofold experimental evaluation of the approach applied to manual annotations from the SemCor corpus, and automatic annotations from the Senseval-3 English all-words competition. Both sets of experiments and results are entirely novel: structural adjudication allows to improve the state-of-the-art performance in all-words disambiguation by 3.3 points (achieving a 68.5% F1-score) and attains figures around 80% precision and 60% recall in the adjudication of disagreements from human annotators.


Author(s):  
Ximing Zhang ◽  
Mingang Wang ◽  
Lin Cao

Most tracking-by-detection based trackers employ the online model update scheme based on the spatiotemporal consistency of visual cues. In presence of self-deformation, abrupt motion and heavy occlusion, these trackers suffer from different attributes and are prone to drifting. The model based on offline training, namely Siamese networks is invariant when suffering from the attributes. While the tracking speed of the offline method can be slow which is not enough for real-time tracking. In this paper, a novel collaborative tracker which decomposes the tracking task into online and offline modes is proposed. Our tracker switches between the online and offline modes automatically based on the tracker status inferred from the present failure tracking detection method which is based on the dispersal measure of the response map. The present Real-Time Thermal Infrared Collaborative Online and Offline Tracker (TCOOT) achieves state-of-the-art tracking performance while maintaining real-time speed at the same time. Experiments are carried out on the VOT-TIR-2015 benchmark dataset and our tracker achieves superior performance against Staple and Siam FC trackers by 3.3% and 3.6% on precision criterion and 3.8% and 5% on success criterion, respectively. The present method is real-time tracker as well.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Qian Yi ◽  
Guixuan Zhang ◽  
Shuwu Zhang

Distant supervision is an effective method to automatically collect large-scale datasets for relation extraction (RE). Automatically constructed datasets usually comprise two types of noise: the intrasentence noise and the wrongly labeled noisy sentence. To address issues caused by the above two types of noise and improve distantly supervised relation extraction, this paper proposes a novel distantly supervised relation extraction model, which consists of an entity-based gated convolution sentence encoder and a multilevel sentence selective attention (Matt) module. Specifically, we first apply an entity-based gated convolution operation to force the sentence encoder to extract entity-pair-related features and filter out useless intrasentence noise information. Furthermore, the multilevel attention schema fuses the bag information to obtain a fine-grained bag-specific query vector, which can better identify valid sentences and reduce the influence of wrongly labeled sentences. Experimental results on a large-scale benchmark dataset show that our model can effectively reduce the influence of the above two types of noise and achieves state-of-the-art performance in relation extraction.


Author(s):  
Nicola Messina ◽  
Giuseppe Amato ◽  
Andrea Esuli ◽  
Fabrizio Falchi ◽  
Claudio Gennaro ◽  
...  

Despite the evolution of deep-learning-based visual-textual processing systems, precise multi-modal matching remains a challenging task. In this work, we tackle the task of cross-modal retrieval through image-sentence matching based on word-region alignments, using supervision only at the global image-sentence level. Specifically, we present a novel approach called Transformer Encoder Reasoning and Alignment Network (TERAN). TERAN enforces a fine-grained match between the underlying components of images and sentences (i.e., image regions and words, respectively) to preserve the informative richness of both modalities. TERAN obtains state-of-the-art results on the image retrieval task on both MS-COCO and Flickr30k datasets. Moreover, on MS-COCO, it also outperforms current approaches on the sentence retrieval task. Focusing on scalable cross-modal information retrieval, TERAN is designed to keep the visual and textual data pipelines well separated. Cross-attention links invalidate any chance to separately extract visual and textual features needed for the online search and the offline indexing steps in large-scale retrieval systems. In this respect, TERAN merges the information from the two domains only during the final alignment phase, immediately before the loss computation. We argue that the fine-grained alignments produced by TERAN pave the way toward the research for effective and efficient methods for large-scale cross-modal information retrieval. We compare the effectiveness of our approach against relevant state-of-the-art methods. On the MS-COCO 1K test set, we obtain an improvement of 5.7% and 3.5% respectively on the image and the sentence retrieval tasks on the Recall@1 metric. The code used for the experiments is publicly available on GitHub at https://github.com/mesnico/TERAN .


2022 ◽  
Vol 2022 ◽  
pp. 1-13
Author(s):  
Qiong Lou ◽  
Junfeng Li ◽  
Yaguan Qian ◽  
Anlin Sun ◽  
Fang Lu

RGB-infrared (RGB-IR) person reidentification is a challenge problem in computer vision due to the large crossmodality difference between RGB and IR images. Most traditional methods only carry out feature alignment, which ignores the uniqueness of modality differences and is difficult to eliminate the huge differences between RGB and IR. In this paper, a novel AGF network is proposed for RGB-IR re-ID task, which is based on the idea of global and local alignment. The AGF network distinguishes pedestrians in different modalities globally by combining pixel alignment and feature alignment and highlights more structure information of person locally by weighting channels with SE-ResNet-50, which has achieved ideal results. It consists of three modules, including alignGAN module ( A ), crossmodality paired-images generation module ( G ), and feature alignment module ( F ). First, at pixel level, the RGB images are converted into IR images through the pixel alignment strategy to directly reduce the crossmodality difference between RGB and IR images. Second, at feature level, crossmodality paired images are generated by exchanging the modality-specific features of RGB and IR images to perform global set-level and fine-grained instance-level alignment. Finally, the SE-ResNet-50 network is used to replace the commonly used ResNet-50 network. By automatically learning the importance of different channel features, it strengthens the ability of the network to extract more fine-grained structural information of person crossmodalities. Extensive experimental results conducted on SYSU-MM01 dataset demonstrate that the proposed method favorably outperforms state-of-the-art methods. In addition, we evaluate the performance of the proposed method on a stronger baseline, and the evaluation results show that a RGB-IR re-ID method will show better performance on a stronger baseline.


1995 ◽  
Vol 38 (5) ◽  
pp. 1126-1142 ◽  
Author(s):  
Jeffrey W. Gilger

This paper is an introduction to behavioral genetics for researchers and practioners in language development and disorders. The specific aims are to illustrate some essential concepts and to show how behavioral genetic research can be applied to the language sciences. Past genetic research on language-related traits has tended to focus on simple etiology (i.e., the heritability or familiality of language skills). The current state of the art, however, suggests that great promise lies in addressing more complex questions through behavioral genetic paradigms. In terms of future goals it is suggested that: (a) more behavioral genetic work of all types should be done—including replications and expansions of preliminary studies already in print; (b) work should focus on fine-grained, theory-based phenotypes with research designs that can address complex questions in language development; and (c) work in this area should utilize a variety of samples and methods (e.g., twin and family samples, heritability and segregation analyses, linkage and association tests, etc.).


Sign in / Sign up

Export Citation Format

Share Document