scholarly journals Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning

Author(s):  
Ying Cheng ◽  
Ruize Wang ◽  
Zhihao Pan ◽  
Rui Feng ◽  
Yuejie Zhang
2021 ◽  
Vol 2 (2) ◽  
pp. 1-18
Author(s):  
Hongchao Gao ◽  
Yujia Li ◽  
Jiao Dai ◽  
Xi Wang ◽  
Jizhong Han ◽  
...  

Recognizing irregular text from natural scene images is challenging due to the unconstrained appearance of text, such as curvature, orientation, and distortion. Recent recognition networks regard this task as a text sequence labeling problem and most networks capture the sequence only from a single-granularity visual representation, which to some extent limits the performance of recognition. In this article, we propose a hierarchical attention network to capture multi-granularity deep local representations for recognizing irregular scene text. It consists of several hierarchical attention blocks, and each block contains a Local Visual Representation Module (LVRM) and a Decoder Module (DM). Based on the hierarchical attention network, we propose a scene text recognition network. The extensive experiments show that our proposed network achieves the state-of-the-art performance on several benchmark datasets including IIIT-5K, SVT, CUTE, SVT-Perspective, and ICDAR datasets under shorter training time.


2019 ◽  
Author(s):  
Stephan Spiegel ◽  
Imtiaz Hossain ◽  
Christopher Ball ◽  
Xian Zhang

AbstractMotivationThe clustering of biomedical images according to their phenotype is an important step in early drug discovery. Modern high-content-screening devices easily produce thousands of cell images, but the resulting data is usually unlabelled and it requires extra effort to construct a visual representation that supports the grouping according to the presented morphological characteristics.ResultsWe introduce a novel approach to visual representation learning that is guided by metadata. In high-context-screening, meta-data can typically be derived from the experimental layout, which links each cell image of a particular assay to the tested chemical compound and corresponding compound concentration. In general, there exists a one-to-many relationship between phenotype and compound, since various molecules and different dosage can lead to one and the same alterations in biological cells.Our empirical results show that metadata-guided visual representation learning is an effective approach for clustering biomedical images. We have evaluated our proposed approach on both benchmark and real-world biological data. Furthermore, we have juxtaposed implicit and explicit learning techniques, where both loss function and batch construction differ. Our experiments demonstrate that metadata-guided visual representation learning is able to identify commonalities and distinguish differences in visual appearance that lead to meaningful clusters, even without image-level annotations.NotePlease refer to the supplementary material for implementation details on metadata-guided visual representation learning strategies.


Author(s):  
J. Hemavathy ◽  
E. Arul Jothi ◽  
R. Nishalini ◽  
M. Oviya

Recently, Learning Machines have achieved a measure of success in the representation of multiple views. Since the effectiveness of data mining methods is highly dependent on the ability to produce data representation, learning multi-visual representation has become a very promising topic with widespread use. It is an emerging data mining guide that looks at multidisciplinary learning to improve overall performance. Multi-view reading is also known as data integration or data integration from multiple feature sets. In general, learning the representation of multiple views is able to learn the informative and cohesive representation that leads to the improvement in the performance of predictors. Therefore, learning multi-view representation has been widely used in many real-world applications including media retrieval, native language processing, video analysis, and a recommendation program. We propose two main stages of learning multidisciplinary representation: (i) alignment of multidisciplinary representation, which aims to capture relationships between different perspectives on content alignment; (ii) a combination of different visual representations, which seeks to combine different aspects learned from many different perspectives into a single integrated representation. Both of these strategies seek to use the relevant information contained in most views to represent the data as a whole. In this project we use the concept of canonical integration analysis to get more details. Encouraged by the success of in-depth reading, in-depth reading representation of multiple theories has attracted a lot of attention in media access due to its ability to read explicit visual representation.


2021 ◽  
Author(s):  
Zhenda Xie ◽  
Yutong Lin ◽  
Zheng Zhang ◽  
Yue Cao ◽  
Stephen Lin ◽  
...  

IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 172683-172693 ◽  
Author(s):  
Baoyuan Wu ◽  
Weidong Chen ◽  
Yanbo Fan ◽  
Yong Zhang ◽  
Jinlong Hou ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document