Fusion-Attention Network for person search with free-form natural language

2018 ◽  
Vol 116 ◽  
pp. 205-211 ◽  
Author(s):  
Zhong Ji ◽  
Shengjia Li ◽  
Yanwei Pang
Author(s):  
Lianli Gao ◽  
Pengpeng Zeng ◽  
Jingkuan Song ◽  
Yuan-Fang Li ◽  
Wu Liu ◽  
...  

To date, visual question answering (VQA) (i.e., image QA and video QA) is still a holy grail in vision and language understanding, especially for video QA. Compared with image QA that focuses primarily on understanding the associations between image region-level details and corresponding questions, video QA requires a model to jointly reason across both spatial and long-range temporal structures of a video as well as text to provide an accurate answer. In this paper, we specifically tackle the problem of video QA by proposing a Structured Two-stream Attention network, namely STA, to answer a free-form or open-ended natural language question about the content of a given video. First, we infer rich longrange temporal structures in videos using our structured segment component and encode text features. Then, our structured two-stream attention component simultaneously localizes important visual instance, reduces the influence of background video and focuses on the relevant text. Finally, the structured two-stream fusion component incorporates different segments of query and video aware context representation and infers the answers. Experiments on the large-scale video QA dataset TGIF-QA show that our proposed method significantly surpasses the best counterpart (i.e., with one representation for the video input) by 13.0%, 13.5%, 11.0% and 0.3 for Action, Trans., TrameQA and Count tasks. It also outperforms the best competitor (i.e., with two representations) on the Action, Trans., TrameQA tasks by 4.1%, 4.7%, and 5.1%.


2014 ◽  
Vol 08 (03) ◽  
pp. 249-255
Author(s):  
Joseph R. Barr ◽  
Dimitri Popolov

This paper discusses principles for the design of natural language processing (NLP) systems to automatically extract data from doctor's notes, laboratory results and other medical documents in free-form text. We argue that rather than searching for "atom units of meaning" in the text and then trying to generalize them into a broader set of documents through increasingly complicated system of rules, an NLP practitioner should take concepts as a whole and as a meaningful unit of text. This simplifies the rules and makes NLP system easier to maintain and adapt. The departure point is purely practical; however, a deeper investigation of typical problems with the implementation of such systems leads us to a discussion of broader linguistic theories underlying the NLP practices, such as metaphors theories and models of human communication.


Author(s):  
Thi Thanh Thuy Pham ◽  
Dinh-Duc Nguyen ◽  
Ba Hoang Phuc Ta ◽  
Thuy-Binh Nguyen ◽  
Thi-Ngoc-Diep Do ◽  
...  

2020 ◽  
pp. 205-228
Author(s):  
George A. Khachatryan

Instruction modeling is still in its early stages. This chapter discusses promising directions in which instruction modeling could develop in coming years. This includes increasing the richness of interfaces used in instruction modeling programs (e.g., by allowing students to enter responses in free form and have them graded via natural language processing); applying instruction modeling to subjects beyond mathematics, including English, foreign language, and science; using educational data mining to create automated “coaches” to help teachers better implement instruction modeling programs in their classrooms; creating approaches to instruction modeling that allow for rapid authorship of content; redesigning schools (in schedules as well as architecture) to optimize the use of instruction modeling; and putting in place government policies to encourage the use of comprehensive blended learning programs (such as those developed through instruction modeling).


Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5279
Author(s):  
Yang Li ◽  
Huahu Xu ◽  
Junsheng Xiao

Language-based person search retrieves images of a target person using natural language description and is a challenging fine-grained cross-modal retrieval task. A novel hybrid attention network is proposed for the task. The network includes the following three aspects: First, a cubic attention mechanism for person image, which combines cross-layer spatial attention and channel attention. It can fully excavate both important midlevel details and key high-level semantics to obtain better discriminative fine-grained feature representation of a person image. Second, a text attention network for language description, which is based on bidirectional LSTM (BiLSTM) and self-attention mechanism. It can better learn the bidirectional semantic dependency and capture the key words of sentences, so as to extract the context information and key semantic features of the language description more effectively and accurately. Third, a cross-modal attention mechanism and a joint loss function for cross-modal learning, which can pay more attention to the relevant parts between text and image features. It can better exploit both the cross-modal and intra-modal correlation and can better solve the problem of cross-modal heterogeneity. Extensive experiments have been conducted on the CUHK-PEDES dataset. Our approach obtains higher performance than state-of-the-art approaches, demonstrating the advantage of the approach we propose.


2017 ◽  
Author(s):  
Han Yang ◽  
Marta R. Costa-jussà ◽  
José A. R. Fonollosa

Author(s):  
Xiangtan Lin ◽  
Pengzhen Ren ◽  
Yun Xiao ◽  
Xiaojun Chang ◽  
Alex Hauptmann

Person search has drawn increasing attention due to its real-world applications and research significance. Person search aims to find a probe person in a gallery of scene images with a wide range of applications, such as criminals search, multicamera tracking, missing person search, etc. Early person search works focused on image-based person search, which uses person image as the search query. Text-based person search is another major person search category that uses free-form natural language as the search query. Person search is challenging, and corresponding solutions are diverse and complex. Therefore, systematic surveys on this topic are essential. This paper surveyed the recent works on image-based and text-based person search from the perspective of challenges and solutions. Specifically, we provide a brief analysis of highly influential person search methods considering the three significant challenges: the discriminative person features, the query-person gap, and the detection-identification inconsistency. We summarise and compare evaluation results. Finally, we discuss open issues and some promising future research directions.


2020 ◽  
Vol 32 (18) ◽  
pp. 14963-14973
Author(s):  
Meina Song ◽  
Wen Zhao ◽  
E. HaiHong

Abstract Natural language inference (NLI) is the basic task of many applications such as question answering and paraphrase recognition. Existing methods have solved the key issue of how the NLI model can benefit from external knowledge. Inspired by this, we attempt to further explore the following two problems: (1) how to make better use of external knowledge when the total amount of such knowledge is constant and (2) how to bring external knowledge to the NLI model more conveniently in the application scenario. In this paper, we propose a novel joint training framework that consists of a modified graph attention network, called the knowledge graph attention network, and an NLI model. We demonstrate that the proposed method outperforms the existing method which introduces external knowledge, and we improve the performance of multiple NLI models without additional external knowledge.


2017 ◽  
Vol 11 (03) ◽  
pp. 345-371
Author(s):  
Avani Chandurkar ◽  
Ajay Bansal

With the inception of the World Wide Web, the amount of data present on the Internet is tremendous. This makes the task of navigating through this enormous amount of data quite difficult for the user. As users struggle to navigate through this wealth of information, the need for the development of an automated system that can extract the required information becomes urgent. This paper presents a Question Answering system to ease the process of information retrieval. Question Answering systems have been around for quite some time and are a sub-field of information retrieval and natural language processing. The task of any Question Answering system is to seek an answer to a free form factual question. The difficulty of pinpointing and verifying the precise answer makes question answering more challenging than simple information retrieval done by search engines. The research objective of this paper is to develop a novel approach to Question Answering based on a composition of conventional approaches of Information Retrieval (IR) and Natural Language processing (NLP). The focus is on using a structured and annotated knowledge base instead of an unstructured one. The knowledge base used here is DBpedia and the final system is evaluated on the Text REtrieval Conference (TREC) 2004 questions dataset.


Sign in / Sign up

Export Citation Format

Share Document