scholarly journals Crossmodality Person Reidentification Based on Global and Local Alignment

2022 ◽  
Vol 2022 ◽  
pp. 1-13
Author(s):  
Qiong Lou ◽  
Junfeng Li ◽  
Yaguan Qian ◽  
Anlin Sun ◽  
Fang Lu

RGB-infrared (RGB-IR) person reidentification is a challenge problem in computer vision due to the large crossmodality difference between RGB and IR images. Most traditional methods only carry out feature alignment, which ignores the uniqueness of modality differences and is difficult to eliminate the huge differences between RGB and IR. In this paper, a novel AGF network is proposed for RGB-IR re-ID task, which is based on the idea of global and local alignment. The AGF network distinguishes pedestrians in different modalities globally by combining pixel alignment and feature alignment and highlights more structure information of person locally by weighting channels with SE-ResNet-50, which has achieved ideal results. It consists of three modules, including alignGAN module ( A ), crossmodality paired-images generation module ( G ), and feature alignment module ( F ). First, at pixel level, the RGB images are converted into IR images through the pixel alignment strategy to directly reduce the crossmodality difference between RGB and IR images. Second, at feature level, crossmodality paired images are generated by exchanging the modality-specific features of RGB and IR images to perform global set-level and fine-grained instance-level alignment. Finally, the SE-ResNet-50 network is used to replace the commonly used ResNet-50 network. By automatically learning the importance of different channel features, it strengthens the ability of the network to extract more fine-grained structural information of person crossmodalities. Extensive experimental results conducted on SYSU-MM01 dataset demonstrate that the proposed method favorably outperforms state-of-the-art methods. In addition, we evaluate the performance of the proposed method on a stronger baseline, and the evaluation results show that a RGB-IR re-ID method will show better performance on a stronger baseline.

2020 ◽  
Vol 34 (07) ◽  
pp. 11604-11611 ◽  
Author(s):  
Qiao Liu ◽  
Xin Li ◽  
Zhenyu He ◽  
Nana Fan ◽  
Di Yuan ◽  
...  

Existing deep Thermal InfraRed (TIR) trackers usually use the feature models of RGB trackers for representation. However, these feature models learned on RGB images are neither effective in representing TIR objects nor taking fine-grained TIR information into consideration. To this end, we develop a multi-task framework to learn the TIR-specific discriminative features and fine-grained correlation features for TIR tracking. Specifically, we first use an auxiliary classification network to guide the generation of TIR-specific discriminative features for distinguishing the TIR objects belonging to different classes. Second, we design a fine-grained aware module to capture more subtle information for distinguishing the TIR objects belonging to the same class. These two kinds of features complement each other and recognize TIR objects in the levels of inter-class and intra-class respectively. These two feature models are learned using a multi-task matching framework and are jointly optimized on the TIR tracking task. In addition, we develop a large-scale TIR training dataset to train the network for adapting the model to the TIR domain. Extensive experimental results on three benchmarks show that the proposed algorithm achieves a relative gain of 10% over the baseline and performs favorably against the state-of-the-art methods. Codes and the proposed TIR dataset are available at https://github.com/QiaoLiuHit/MMNet.


Author(s):  
Wentao Ding ◽  
Guanji Gao ◽  
Linfeng Shi ◽  
Yuzhong Qu

Recognizing time expressions is a fundamental and important task in many applications of natural language understanding, such as reading comprehension and question answering. Several newest state-of-the-art approaches have achieved good performance on recognizing time expressions. These approaches are black-boxed or based on heuristic rules, which leads to the difficulty in understanding the temporal information. On the contrary, classic rule-based or semantic parsing approaches can capture rich structural information, but their performances on recognition are not so good. In this paper, we propose a pattern-based approach, called PTime, which automatically generates and selects patterns for recognizing time expressions. In this approach, time expressions in training text are abstracted into type sequences by using fine-grained token types, thus the problem is transformed to select an appropriate subset of the sequential patterns. We use the Extended Budgeted Maximum Coverage (EBMC) model to optimize the pattern selection. The main idea is to maximize the correct token sequences matched by the selected patterns while the number of the mistakes should be limited by an adjustable budget. The interpretability of patterns and the adjustability of permitted number of mistakes make PTime a very promising approach for many applications. Experimental results show that PTime achieves a very competitive performance as compared with existing state-of-the-art approaches.


Author(s):  
Wenzhe Wang ◽  
Mengdan Zhang ◽  
Runnan Chen ◽  
Guanyu Cai ◽  
Penghao Zhou ◽  
...  

Multi-modal cues presented in videos are usually beneficial for the challenging video-text retrieval task on internet-scale datasets. Recent video retrieval methods take advantage of multi-modal cues by aggregating them to holistic high-level semantics for matching with text representations in a global view. In contrast to this global alignment, the local alignment of detailed semantics encoded within both multi-modal cues and distinct phrases is still not well conducted. Thus, in this paper, we leverage the hierarchical video-text alignment to fully explore the detailed diverse characteristics in multi-modal cues for fine-grained alignment with local semantics from phrases, as well as to capture a high-level semantic correspondence. Specifically, multi-step attention is learned for progressively comprehensive local alignment and a holistic transformer is utilized to summarize multi-modal cues for global alignment. With hierarchical alignment, our model outperforms state-of-the-art methods on three public video retrieval datasets.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Xusi Han ◽  
Genki Terashi ◽  
Charles Christoffer ◽  
Siyang Chen ◽  
Daisuke Kihara

AbstractAn increasing number of density maps of biological macromolecules have been determined by cryo-electron microscopy (cryo-EM) and stored in the public database, EMDB. To interpret the structural information contained in EM density maps, alignment of maps is an essential step for structure modeling, comparison of maps, and for database search. Here, we developed VESPER, which captures the similarity of underlying molecular structures embedded in density maps by taking local gradient directions into consideration. Compared to existing methods, VESPER achieved substantially more accurate global and local alignment of maps as well as database retrieval.


2021 ◽  
Vol 251 ◽  
pp. 03028
Author(s):  
Tadeas Bilka ◽  
Jakub Kandra ◽  
Claus Kleinwort ◽  
Radek Zlebcik

The alignment of the Belle II tracking system, composed of a pixel and strip vertex detectors and central drift chamber, is described by approximately sixty thousand parameters; from local alignment of sensors and wires to relative global alignment of the sub-detectors. In the next data reprocessing, scheduled since Spring 2021, we aim to determine all parameters in a simultaneous fit by Millepede II, where recent developments allow to achieve a direct solution of the full problem in about one hour and make it practically feasible for regular detector alignment. The tracking detectors and the alignment technique are described and the alignment strategy is discussed in the context of studies on simulations and experience obtained from recorded data. Preliminary results and further refinements based on studies of real Belle II data are presented.


1995 ◽  
Vol 38 (5) ◽  
pp. 1126-1142 ◽  
Author(s):  
Jeffrey W. Gilger

This paper is an introduction to behavioral genetics for researchers and practioners in language development and disorders. The specific aims are to illustrate some essential concepts and to show how behavioral genetic research can be applied to the language sciences. Past genetic research on language-related traits has tended to focus on simple etiology (i.e., the heritability or familiality of language skills). The current state of the art, however, suggests that great promise lies in addressing more complex questions through behavioral genetic paradigms. In terms of future goals it is suggested that: (a) more behavioral genetic work of all types should be done—including replications and expansions of preliminary studies already in print; (b) work should focus on fine-grained, theory-based phenotypes with research designs that can address complex questions in language development; and (c) work in this area should utilize a variety of samples and methods (e.g., twin and family samples, heritability and segregation analyses, linkage and association tests, etc.).


1998 ◽  
Vol 38 (2) ◽  
pp. 9-15 ◽  
Author(s):  
J. Guan ◽  
T. D. Waite ◽  
R. Amal ◽  
H. Bustamante ◽  
R. Wukasch

A rapid method of determining the structure of aggregated particles using small angle laser light scattering is applied here to assemblages of bacteria from wastewater treatment systems. The structure information so obtained is suggestive of fractal behaviour as found by other methods. Strong dependencies are shown to exist between the fractal structure of the bacterial aggregates and the behaviour of the biosolids in zone settling and dewatering by both pressure filtration and centrifugation methods. More rapid settling and significantly higher solids contents are achievable for “looser” flocs characterised by lower fractal dimensions. The rapidity of determination of structural information and the strong dependencies of the effectiveness of a number of wastewater treatment processes on aggregate structure suggests that this method may be particularly useful as an on-line control tool.


1999 ◽  
Vol 18 (3-4) ◽  
pp. 265-273
Author(s):  
Giovanni B. Garibotto

The paper is intended to provide an overview of advanced robotic technologies within the context of Postal Automation services. The main functional requirements of the application are briefly referred, as well as the state of the art and new emerging solutions. Image Processing and Pattern Recognition have always played a fundamental role in Address Interpretation and Mail sorting and the new challenging objective is now off-line handwritten cursive recognition, in order to be able to handle all kind of addresses in a uniform way. On the other hand, advanced electromechanical and robotic solutions are extremely important to solve the problems of mail storage, transportation and distribution, as well as for material handling and logistics. Finally a short description of new services of Postal Automation is referred, by considering new emerging services of hybrid mail and paper to electronic conversion.


2021 ◽  
Vol 13 (12) ◽  
pp. 2417
Author(s):  
Savvas Karatsiolis ◽  
Andreas Kamilaris ◽  
Ian Cole

Estimating the height of buildings and vegetation in single aerial images is a challenging problem. A task-focused Deep Learning (DL) model that combines architectural features from successful DL models (U-NET and Residual Networks) and learns the mapping from a single aerial imagery to a normalized Digital Surface Model (nDSM) was proposed. The model was trained on aerial images whose corresponding DSM and Digital Terrain Models (DTM) were available and was then used to infer the nDSM of images with no elevation information. The model was evaluated with a dataset covering a large area of Manchester, UK, as well as the 2018 IEEE GRSS Data Fusion Contest LiDAR dataset. The results suggest that the proposed DL architecture is suitable for the task and surpasses other state-of-the-art DL approaches by a large margin.


2021 ◽  
Vol 40 (3) ◽  
pp. 1-13
Author(s):  
Lumin Yang ◽  
Jiajie Zhuang ◽  
Hongbo Fu ◽  
Xiangzhi Wei ◽  
Kun Zhou ◽  
...  

We introduce SketchGNN , a convolutional graph neural network for semantic segmentation and labeling of freehand vector sketches. We treat an input stroke-based sketch as a graph with nodes representing the sampled points along input strokes and edges encoding the stroke structure information. To predict the per-node labels, our SketchGNN uses graph convolution and a static-dynamic branching network architecture to extract the features at three levels, i.e., point-level, stroke-level, and sketch-level. SketchGNN significantly improves the accuracy of the state-of-the-art methods for semantic sketch segmentation (by 11.2% in the pixel-based metric and 18.2% in the component-based metric over a large-scale challenging SPG dataset) and has magnitudes fewer parameters than both image-based and sequence-based methods.


Sign in / Sign up

Export Citation Format

Share Document