Position Focused Attention Network for Image-Text Matching

Image-text matching tasks have recently attracted a lot of attention in the computer vision field. The key point of this cross-domain problem is how to accurately measure the similarity between the visual and the textual contents, which demands a fine understanding of both modalities. In this paper, we propose a novel position focused attention network (PFAN) to investigate the relation between the visual and the textual views. In this work, we integrate the object position clue to enhance the visual-text joint-embedding learning. We first split the images into blocks, by which we infer the relative position of region in the image. Then, an attention mechanism is proposed to model the relations between the image region and blocks and generate the valuable position feature, which will be further utilized to enhance the region expression and model a more reliable relationship between the visual image and the textual sentence. Experiments on the popular datasets Flickr30K and MS-COCO show the effectiveness of the proposed method. Besides the public datasets, we also conduct experiments on our collected practical news dataset (Tencent-News) to validate the practical application value of proposed method. As far as we know, this is the first attempt to test the performance on the practical application. Our method can achieve the state-of-art performance on all of these three datasets.

Download Full-text

Attend, Correct And Focus: A Bidirectional Correct Attention Network For Image-Text Matching

10.1109/icip42928.2021.9506438 ◽

2021 ◽

Author(s):

Yang Liu ◽

Huaqiu Wang ◽

Fanyang Meng ◽

Mengyuan Liu ◽

Hong Liu

Keyword(s):

Attention Network ◽

Text Matching

Download Full-text

FAC-Net: Feedback Attention Network Based on Context Encoder Network for Skin Lesion Segmentation

Sensors ◽

10.3390/s21155172 ◽

2021 ◽

Vol 21 (15) ◽

pp. 5172

Author(s):

Yuying Dong ◽

Liejun Wang ◽

Shuli Cheng ◽

Yongming Li

Keyword(s):

Skin Lesion ◽

Real Life ◽

Good Deal ◽

Experimental Tests ◽

Skin Lesions ◽

Lesion Segmentation ◽

Attention Network ◽

Effective Feedback ◽

Public Datasets ◽

Learning Architectures

Considerable research and surveys indicate that skin lesions are an early symptom of skin cancer. Segmentation of skin lesions is still a hot research topic. Dermatological datasets in skin lesion segmentation tasks generated a large number of parameters when data augmented, limiting the application of smart assisted medicine in real life. Hence, this paper proposes an effective feedback attention network (FAC-Net). The network is equipped with the feedback fusion block (FFB) and the attention mechanism block (AMB), through the combination of these two modules, we can obtain richer and more specific feature mapping without data enhancement. Numerous experimental tests were given by us on public datasets (ISIC2018, ISBI2017, ISBI2016), and a good deal of metrics like the Jaccard index (JA) and Dice coefficient (DC) were used to evaluate the results of segmentation. On the ISIC2018 dataset, we obtained results for DC equal to 91.19% and JA equal to 83.99%, compared with the based network. The results of these two main metrics were improved by more than 1%. In addition, the metrics were also improved in the other two datasets. It can be demonstrated through experiments that without any enhancements of the datasets, our lightweight model can achieve better segmentation performance than most deep learning architectures.

Download Full-text

Structured Two-Stream Attention Network for Video Question Answering

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016391 ◽

2019 ◽

Vol 33 ◽

pp. 6391-6398 ◽

Cited By ~ 3

Author(s):

Lianli Gao ◽

Pengpeng Zeng ◽

Jingkuan Song ◽

Yuan-Fang Li ◽

Wu Liu ◽

...

Keyword(s):

Large Scale ◽

Question Answering ◽

Free Form ◽

Image Region ◽

Attention Network ◽

Natural Language Question ◽

Temporal Structures ◽

Text Features ◽

Video Input ◽

Language Question

To date, visual question answering (VQA) (i.e., image QA and video QA) is still a holy grail in vision and language understanding, especially for video QA. Compared with image QA that focuses primarily on understanding the associations between image region-level details and corresponding questions, video QA requires a model to jointly reason across both spatial and long-range temporal structures of a video as well as text to provide an accurate answer. In this paper, we specifically tackle the problem of video QA by proposing a Structured Two-stream Attention network, namely STA, to answer a free-form or open-ended natural language question about the content of a given video. First, we infer rich longrange temporal structures in videos using our structured segment component and encode text features. Then, our structured two-stream attention component simultaneously localizes important visual instance, reduces the influence of background video and focuses on the relevant text. Finally, the structured two-stream fusion component incorporates different segments of query and video aware context representation and infers the answers. Experiments on the large-scale video QA dataset TGIF-QA show that our proposed method significantly surpasses the best counterpart (i.e., with one representation for the video input) by 13.0%, 13.5%, 11.0% and 0.3 for Action, Trans., TrameQA and Count tasks. It also outperforms the best competitor (i.e., with two representations) on the Action, Trans., TrameQA tasks by 4.1%, 4.7%, and 5.1%.

Download Full-text

Unified Adaptive Relevance Distinguishable Attention Network for Image-Text Matching

IEEE Transactions on Multimedia ◽

10.1109/tmm.2022.3141603 ◽

2022 ◽

pp. 1-1

Author(s):

Kun Zhang ◽

Zhendong Mao ◽

Anan Liu ◽

Yongdong Zhang

Keyword(s):

Attention Network ◽

Text Matching

Download Full-text

Phonetics and Ambiguity Comprehension Gated Attention Network for Humor Recognition

Complexity ◽

10.1155/2020/2509018 ◽

2020 ◽

Vol 2020 ◽

pp. 1-9 ◽

Cited By ~ 1

Author(s):

Xiaochao Fan ◽

Hongfei Lin ◽

Liang Yang ◽

Yufeng Diao ◽

Chen Shen ◽

...

Keyword(s):

Neural Network ◽

Semantic Information ◽

Semantic Representation ◽

Research Attention ◽

Attention Network ◽

Phonetic Information ◽

The Neural Network ◽

Ambiguous Words ◽

Public Datasets

Humor refers to the quality of being amusing. With the development of artificial intelligence, humor recognition is attracting a lot of research attention. Although phonetics and ambiguity have been introduced by previous studies, existing recognition methods still lack suitable feature design for neural networks. In this paper, we illustrate that phonetics structure and ambiguity associated with confusing words need to be learned for their own representations via the neural network. Then, we propose the Phonetics and Ambiguity Comprehension Gated Attention network (PACGA) to learn phonetic structures and semantic representation for humor recognition. The PACGA model can well represent phonetic information and semantic information with ambiguous words, which is of great benefit to humor recognition. Experimental results on two public datasets demonstrate the effectiveness of our model.

Download Full-text

Transfer joint embedding for cross-domain named entity recognition

ACM Transactions on Information Systems ◽

10.1145/2457465.2457467 ◽

2013 ◽

Vol 31 (2) ◽

pp. 1-27 ◽

Cited By ~ 5

Author(s):

Sinno Jialin Pan ◽

Zhiqiang Toh ◽

Jian Su

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Cross Domain ◽

Joint Embedding

Download Full-text

Development of thalamus mediates paternal age effect on offspring reading: A preliminary investigation

10.1101/2020.05.20.105759 ◽

2020 ◽

Author(s):

Zhichao Xia ◽

Cheng Wang ◽

Roeland Hancock ◽

Maaike Vandermosten ◽

Fumiko Hoeft

Keyword(s):

De Novo ◽

Preliminary Investigation ◽

Molecular Genetic ◽

Paternal Age ◽

Brain Maturation ◽

Future Research ◽

List Type ◽

Attention Network ◽

Dorsal Attention Network ◽

Public Datasets

AbstractThe importance of (inherited) genetic impact in reading development is well-established. De novo mutation is another important contributor that is recently gathering interest as a major liability of neurodevelopmental disorders, but has been neglected in reading research to date. Paternal age at childbirth (PatAGE) is known as the most prominent risk factor for de novo mutation, which has been shown repeatedly by molecular genetic studies. As one of the first effort, we performed a preliminary investigation of the relationship between PatAGE, offspring’s reading, and brain structure in a longitudinal neuroimaging study following 51 children from kindergarten through third grade. The results showed that greater PatAGE was associated significantly with worse reading, explaining an additional 9.5% of the variations after controlling for a number of confounds — including familial factors and cognitive-linguistic reading precursors. Moreover, this effect was mediated by volumetric maturation of the left posterior thalamus from ages 5 to 8. Complementary analyses indicated the PatAGE-related thalamic region was most likely located in the pulvinar nuclei and related to the dorsal attention network, by using offspring’s diffusion MRI data, brain atlases, and public datasets. Altogether, these findings provide novel insights into neurocognitive mechanisms underlying the PatAGE effect on reading acquisition during its earliest phase and suggest promising areas of future research.HighlightsPaternal age at childbirth (PatAGE) is negatively correlated with reading in offspring.PatAGE is related to volumetric maturation of the thalamus.Brain maturation mediates the PatAGE effect on reading.PatAGE-related thalamic area is connected to the dorsal attention network.

Download Full-text

Experience in Developing Critical Thinking Through the Prism of Visual Text Perception: Psychological and Pedagogical Aspect

Izvestia Ural Federal University Journal Series 1. Issues in Education, Science and Culture ◽

10.15826/izv1.2020.26.4.070 ◽

2020 ◽

Vol 26 (4) ◽

pp. 60-68

Author(s):

N. A. Simbirtseva ◽

Keyword(s):

Critical Thinking ◽

Visual Information ◽

Visual Image ◽

Thinking Skills ◽

Critical Thinking Skills ◽

Psychological Characteristics ◽

Pedagogical Aspect ◽

Visual Text ◽

Text Features ◽

Main Components

The article discusses the importance of competent perception and interpretation of a visual image / visual text by a person of the XXI century through the development of critical thinking skills that are relevant in the study of the humanities. The basis of existing methods and technologies is the psychological foundations of the perception of the visual and associated mental operations, which determine the “grasping” of the image in its integrity, the subjectivity of experiences, associative thinking and memorization. The main components of critical thinking are highlighted. The purposeful nature of the method of segment analysis of the visual text and its pedagogical significance in the humanities are emphasized. The main results of the study are as follows: the psychological characteristics of the perception of a visual image / visual text, features of critical thinking in the perception of visual information are revealed, the method of segment analysis and the practice of its application in the process of humanitarian education.

Download Full-text