scholarly journals STA: Spatial-Temporal Attention for Large-Scale Video-Based Person Re-Identification

Author(s):  
Yang Fu ◽  
Xiaoyang Wang ◽  
Yunchao Wei ◽  
Thomas Huang

In this work, we propose a novel Spatial-Temporal Attention (STA) approach to tackle the large-scale person reidentification task in videos. Different from the most existing methods, which simply compute representations of video clips using frame-level aggregation (e.g. average pooling), the proposed STA adopts a more effective way for producing robust clip-level feature representation. Concretely, our STA fully exploits those discriminative parts of one target person in both spatial and temporal dimensions, which results in a 2-D attention score matrix via inter-frame regularization to measure the importances of spatial parts across different frames. Thus, a more robust clip-level feature representation can be generated according to a weighted sum operation guided by the mined 2-D attention score matrix. In this way, the challenging cases for video-based person re-identification such as pose variation and partial occlusion can be well tackled by the STA. We conduct extensive experiments on two large-scale benchmarks, i.e. MARS and DukeMTMCVideoReID. In particular, the mAP reaches 87.7% on MARS, which significantly outperforms the state-of-the-arts with a large margin of more than 11.6%.

2021 ◽  
pp. 1-15
Author(s):  
Benjamin G. Martin ◽  
Elisabeth Piller

Photographs of the German and Soviet pavilions facing off at the Paris International Exposition in 1937 offer an iconic image of the interwar period, and with good reason. This image captures the interwar period's great conflict of ideologies, the international interconnectedness of the age and the aestheticisation of political and ideological conflict in the age of mass media and mass spectacle. [Figure 1] Last but not least, it captures the importance in the 1930s of what we now call cultural diplomacy. Both pavilions – Germany's, in Albert Speer's neo-classical tower bloc crowned with a giant swastika, and the Soviet Union's, housed in Boris Iofan's forward-thrusting structure topped by Vera Mukhina's monumental sculptural group – represented the outcome of a large-scale collaboration between political leaders and architects, artists, intellectuals and graphic and industrial designers seeking to present their country to foreign visitors in a manner designed to advance the country's interests in the international arena. Each pavilion, that is, made an outreach that was diplomatic – in the sense that it sought to mediate between distinct polities – using means that were cultural – in the sense that they deployed refined aesthetic practices (like the arts and architecture) and in the sense that they highlighted the distinctive features, or ‘culture’, of a particular group (like the German nation or the Soviet state).


2021 ◽  
Vol 13 (3) ◽  
pp. 433
Author(s):  
Junge Shen ◽  
Tong Zhang ◽  
Yichen Wang ◽  
Ruxin Wang ◽  
Qi Wang ◽  
...  

Remote sensing images contain complex backgrounds and multi-scale objects, which pose a challenging task for scene classification. The performance is highly dependent on the capacity of the scene representation as well as the discriminability of the classifier. Although multiple models possess better properties than a single model on these aspects, the fusion strategy for these models is a key component to maximize the final accuracy. In this paper, we construct a novel dual-model architecture with a grouping-attention-fusion strategy to improve the performance of scene classification. Specifically, the model employs two different convolutional neural networks (CNNs) for feature extraction, where the grouping-attention-fusion strategy is used to fuse the features of the CNNs in a fine and multi-scale manner. In this way, the resultant feature representation of the scene is enhanced. Moreover, to address the issue of similar appearances between different scenes, we develop a loss function which encourages small intra-class diversities and large inter-class distances. Extensive experiments are conducted on four scene classification datasets include the UCM land-use dataset, the WHU-RS19 dataset, the AID dataset, and the OPTIMAL-31 dataset. The experimental results demonstrate the superiority of the proposed method in comparison with the state-of-the-arts.


2021 ◽  
Vol 12 (1) ◽  
pp. 20
Author(s):  
Claudia Maria Astorino

RESUMO: Ao longo de sua história, Veneza vem atraindo um número considerável de turistas. Incrustrados em tão singular cenário geográfico, seus canais, gôndolas, pontes, campi, tesouros arquitetônicos e artísticos constituem um legado singular e, consequentemente, uma oferta turística inigualável, que tem seduzido o imaginário de turistas potencias das mais distintas procedências. O presente estudo objetivou analisar como a atividade turística em Veneza tem evoluído e de que modo vem sendo ilustrada nas artes, sobretudo, no cinema, na música e nas artes visuais. Nesse sentido, formou-se um corpus de estudo composto por filmes e videoclipes italianos e estrangeiros, além de obras de artes visuais, com a finalidade de confrontá-los com as etapas do turismo no percurso do tempo. Trata-se, portanto, de um estudo qualitativo, descritivo e comparativo. A metodologia constou de pesquisa bibliográfica em fontes secundárias, de forma a traçar a evolução do turismo em Veneza, seguida pela composição do referido corpus de estudo, análise das obras selecionadas para este corpus e, por fim, comparação entre ficção e realidade.Palavras-chave: Veneza. Turismo. Ficção x realidade. Filmes e videoclipes. Artes visuais. ABSTRACT: Nel corso della sua storia, Venezia ha atratto un numero considerevole di turisti. Incastonati in uno scenario unico, i suoi canalli, gondole, ponti, campi, tesori architettonici e artistici costituiscono un patrimonio singolare e di consegenza un’offerta turistica impareggiabile che da sempre ha popolato l’immaginario di potenziali turisti delle più svariate origini. Il presente studio si è proposto ad analizzare come si è evoluta l’attività turistica a Venezia e come à stata illustrata nel campo delle arti, in particolare nel cinema, nella musica e nelle arti visive. Si è dunque formato un corpus di studio, composto da film e da videoclip italiani e stranieri, oltre ad opere di arti visive, per confrontarli con le tappe dello svilupo del turismo nel tempo. Si tratta quindi di uno atudio qualitativo, descrittivo e comparativo. La metodologia è costituita da una ricerca bibliografica su fonti secondarie, al fine di tracciare l’evoluzione del turismo a Venezia, seguita dalla composizione di un corpus di studio, dall’analisi delle opere selezionate per questo corpus ed infine dal confronto tra finzione e realtà.Parole-chiave: Venezia. Turismo. Finzione x realtà. Film e videoclipe. Arti visive. ABSTRACT: Throughout its history, Venice has attracted a considerable number of tourists. Embedded in such a singular geographic setting, its canals, gondolas, campi, architectural and artistic treasures constitute a unique legacy and, consequently, an unparalleled tourist offer that has seduced the imagination of potential tourists from the most diverse origins. The present study aimed to analyze how the tourist activity in Venice has evolved and how it has been illustrated in the arts, especially in cinema, music and visual arts. In this sense, a corpus was formed, composed of Italian and foreign films and video clips, in addition to visual arts works, in order to confront them with the stages of tourism in the course of time. It is, therefore, a qualitative, descriptive and comparative study. The methodology consisted of bibliographic research in secondary sources, in order to trace the evolution of tourism in Venice, followed by the composition of a study corpus, analysis of the works selected for this corpus and, finally, comparison between fiction and reality.Keywords: Venice. Tourism. Fiction x reality. Films and video clips. Visual arts.


Author(s):  
Cao Liu ◽  
Shizhu He ◽  
Kang Liu ◽  
Jun Zhao

By reason of being able to obtain natural language responses, natural answers are more favored in real-world Question Answering (QA) systems. Generative models learn to automatically generate natural answers from large-scale question answer pairs (QA-pairs). However, they are suffering from the uncontrollable and uneven quality of QA-pairs crawled from the Internet. To address this problem, we propose a curriculum learning based framework for natural answer generation (CL-NAG), which is able to take full advantage of the valuable learning data from a noisy and uneven-quality corpus. Specifically, we employ two practical measures to automatically measure the quality (complexity) of QA-pairs. Based on the measurements, CL-NAG firstly utilizes simple and low-quality QA-pairs to learn a basic model, and then gradually learns to produce better answers with richer contents and more complete syntaxes based on more complex and higher-quality QA-pairs. In this way, all valuable information in the noisy and uneven-quality corpus could be fully exploited. Experiments demonstrate that CL-NAG outperforms the state-of-the-arts, which increases 6.8% and 8.7% in the accuracy for simple and complex questions, respectively.


2017 ◽  
Vol 14 (4) ◽  
pp. 172988141770907 ◽  
Author(s):  
Hanbo Wu ◽  
Xin Ma ◽  
Zhimeng Zhang ◽  
Haibo Wang ◽  
Yibin Li

Human daily activity recognition has been a hot spot in the field of computer vision for many decades. Despite best efforts, activity recognition in naturally uncontrolled settings remains a challenging problem. Recently, by being able to perceive depth and visual cues simultaneously, RGB-D cameras greatly boost the performance of activity recognition. However, due to some practical difficulties, the publicly available RGB-D data sets are not sufficiently large for benchmarking when considering the diversity of their activities, subjects, and background. This severely affects the applicability of complicated learning-based recognition approaches. To address the issue, this article provides a large-scale RGB-D activity data set by merging five public RGB-D data sets that differ from each other on many aspects such as length of actions, nationality of subjects, or camera angles. This data set comprises 4528 samples depicting 7 action categories (up to 46 subcategories) performed by 74 subjects. To verify the challengeness of the data set, three feature representation methods are evaluated, which are depth motion maps, spatiotemporal depth cuboid similarity feature, and curvature space scale. Results show that the merged large-scale data set is more realistic and challenging and therefore more suitable for benchmarking.


2018 ◽  
Vol 28 (1) ◽  
pp. 71-76
Author(s):  
Gareth Edwards ◽  
Nicholas O’Regan

A recent interview with Vicki Heywood, Chair of the Royal Society of the Arts (RSA), highlights the role that arts can play in dealing with complex problems in society today and particularly from an international perspective. The message from this interview resonates with recent literature on leadership that also recognizes the importance of the arts in leading successfully through wicked problems. The importance of linking arts interpretations of leadership with culture and place is also taken into consideration within the analysis of the interview. The article concludes by suggesting that leadership practice into the future should promote leading through art to uncover the multiple identities and belonging that shape global society. More specifically, the article proposes that by leading through art, artists can help uncover and discover complex intricacies within context and culture which may help to problematize large scale generalizations which have become the epitome of serious global issues.


2019 ◽  
Vol 12 (3) ◽  
pp. 205979911989078
Author(s):  
Ewa Sidorenko

In this article, I discuss a performance arts–based visual methodology based on the use of the archaic wet collodion photography. The collaboration between Street Collodion Art photography collective and myself, as a researcher, had two aims: to generate a large scale photographic and narrative portrait of Lower Silesia in Poland, and to explore identities in the region where nearly all of its inhabitants represent recent migrant populations. Data generated through this project include collodion portraits, their interpretations and narratives collected through unstructured interviews. Initial data analysis has generated identity narratives linked to work, place and belonging and ethnicity/nationality. In addition, in 2016 and 2017, three exhibitions of the portraits and a selection of edited stories took place in Lubin, Legnica and Wrocław attended by local inhabitants, including project participants. The examination of the arts-based methodology finds that the ritual character of the wet collodion photographic encounter has acted as a form of artistic intervention which, in generating memory narratives, enabled an articulation of social identities in the climate dominated by nationalist discourses. Such symbolic work emerging out of the project reveals a critical potential in the collaboration between the arts and social research. Furthermore, the project has shown that despite different traditions of practice, a collaboration between the artists and social researchers can yield rich data and access participants in ways that conventional methodologies cannot.


Arts ◽  
2020 ◽  
Vol 9 (1) ◽  
pp. 19
Author(s):  
Déirdre Kelly

It seems inherent in the nature of contemporary artist’s book production to continue to question the context for the genre in contemporary art practice, notwithstanding the medium’s potential for dissemination via mass production and an unquestionable advantage of portability for distribution. Artists, curators and editors operating in this sector look to create contexts for books in a variety of imaginative ways, through exhibition, commission, installations, performance and, of course as documentation. Broadening the discussion of the idea of the book within contemporary art practice, this paper examines the presence and role of book works within the context of the art biennale, in particular the Venice Art Biennale of which the 58th iteration (2019) is entitled ‘May You Live In Interesting Times’ and curated by Ralph Rugoff, with an overview of the independent International cultural offerings and the function of the ‘Book Pavilion’. Venetian museums and institutions continue to present vibrant diverse works within the arena of large-scale exhibitions, recognising the position that the book occupies in the history of the city. This year, the appearance for the first time, of ‘Book Biennale’, opens up a new and interesting dialogue, taking the measure of how the book is being promoted and its particular function for visual communication within the arts in Venice and beyond.


Author(s):  
Zhiyong Wang ◽  
Dagan Feng

Visual information has been immensely used in various domains such as web, education, health, and digital libraries, due to the advancements of computing technologies. Meanwhile, users realize that it has been more and more difficult to find desired visual content such as images. Though traditional content-based retrieval (CBR) systems allow users to access visual information through query-by-example with low level visual features (e.g. color, shape, and texture), the semantic gap is widely recognized as a hurdle for practical adoption of CBR systems. Wealthy visual information (e.g. user generated visual content) enables us to derive new knowledge at a large scale, which will significantly facilitate visual information management. Besides semantic concept detection, semantic relationship among concepts can also be explored in visual domain, other than traditional textual domain. Therefore, this chapter aims to provide an overview of the state-of-the-arts on discovering semantics in visual domain from two aspects, semantic concept detection and knowledge discovery from visual information at semantic level. For the first aspect, various aspects of visual information annotation are discussed, including content representation, machine learning based annotation methodologies, and widely used datasets. For the second aspect, a novel data driven based approach is introduced to discover semantic relevance among concepts in visual domain. Future research topics are also outlined.


Author(s):  
Hong Liu ◽  
Jie Li ◽  
Yongjian Wu ◽  
Rongrong Ji

Symmetric positive defined (SPD) matrix has attracted increasing research focus in image/video analysis, which merits in capturing the Riemannian geometry in its structured 2D feature representation. However, computation in the vector space on SPD matrices cannot capture the geometric properties, which corrupts the classification performance. To this end, Riemannian based deep network has become a promising solution for SPD matrix classification, because of its excellence in performing non-linear learning over SPD matrix. Besides, Riemannian metric learning typically adopts a kNN classifier that cannot be extended to large-scale datasets, which limits its application in many time-efficient scenarios. In this paper, we propose a Bag-of-Matrix-Summarization (BoMS) method to be combined with Riemannian network, which handles the above issues towards highly efficient and scalable SPD feature representation. Our key innovation lies in the idea of summarizing data in a Riemannian geometric space instead of the vector space. First, the whole training set is compressed with a small number of matrix features to ensure high scalability. Second, given such a compressed set, a constant-length vector representation is extracted by efficiently measuring the distribution variations between the summarized data and the latent feature of the Riemannian network. Finally, the proposed BoMS descriptor is integrated into the Riemannian network, upon which the whole framework is end-to-end trained via matrix back-propagation. Experiments on four different classification tasks demonstrate the superior performance of the proposed method over the state-of-the-art methods.


Sign in / Sign up

Export Citation Format

Share Document