Regenerating Image Caption with High-Level Semantics

Author(s):  
Wei-Dong Tian ◽  
Nan-Xun Wang ◽  
Yue-Lin Sun ◽  
Zhong-Qiu Zhao
Keyword(s):  
2020 ◽  
Vol 34 (05) ◽  
pp. 9571-9578 ◽  
Author(s):  
Wei Zhang ◽  
Yue Ying ◽  
Pan Lu ◽  
Hongyuan Zha

Personalized image caption, a natural extension of the standard image caption task, requires to generate brief image descriptions tailored for users' writing style and traits, and is more practical to meet users' real demands. Only a few recent studies shed light on this crucial task and learn static user representations to capture their long-term literal-preference. However, it is insufficient to achieve satisfactory performance due to the intrinsic existence of not only long-term user literal-preference, but also short-term literal-preference which is associated with users' recent states. To bridge this gap, we develop a novel multimodal hierarchical transformer network (MHTN) for personalized image caption in this paper. It learns short-term user literal-preference based on users' recent captions through a short-term user encoder at the low level. And at the high level, the multimodal encoder integrates target image representations with short-term literal-preference, as well as long-term literal-preference learned from user IDs. These two encoders enjoy the advantages of the powerful transformer networks. Extensive experiments on two real datasets show the effectiveness of considering two types of user literal-preference simultaneously and better performance over the state-of-the-art models.


2019 ◽  
Vol 123 ◽  
pp. 89-95 ◽  
Author(s):  
Songtao Ding ◽  
Shiru Qu ◽  
Yuling Xi ◽  
Arun Kumar Sangaiah ◽  
Shaohua Wan

Author(s):  
Polina Kuznetsova ◽  
Vicente Ordonez ◽  
Tamara L. Berg ◽  
Yejin Choi

We present a new tree based approach to composing expressive image descriptions that makes use of naturally occuring web images with captions. We investigate two related tasks: image caption generalization and generation, where the former is an optional subtask of the latter. The high-level idea of our approach is to harvest expressive phrases (as tree fragments) from existing image descriptions, then to compose a new description by selectively combining the extracted (and optionally pruned) tree fragments. Key algorithmic components are tree composition and compression, both integrating tree structure with sequence structure. Our proposed system attains significantly better performance than previous approaches for both image caption generalization and generation. In addition, our work is the first to show the empirical benefit of automatically generalized captions for composing natural image descriptions.


2018 ◽  
Vol 24 (3) ◽  
pp. 325-362
Author(s):  
A. BELZ ◽  
T.L. BERG ◽  
L. YU

Work in computer vision and natural language processing involving images and text has been experiencing explosive growth over the past decade, with a particular boost coming from the neural network revolution. The present volume brings together five research articles from several different corners of the area: multilingual multimodal image description (Franket al.), multimodal machine translation (Madhyasthaet al., Franket al.), image caption generation (Madhyasthaet al., Tantiet al.), visual scene understanding (Silbereret al.), and multimodal learning of high-level attributes (Sorodocet al.). In this article, we touch upon all of these topics as we review work involving images and text under the three main headings of image description (Section 2), visually grounded referring expression generation (REG) and comprehension (Section 3), and visual question answering (VQA) (Section 4).


Author(s):  
David P. Bazett-Jones ◽  
Mark L. Brown

A multisubunit RNA polymerase enzyme is ultimately responsible for transcription initiation and elongation of RNA, but recognition of the proper start site by the enzyme is regulated by general, temporal and gene-specific trans-factors interacting at promoter and enhancer DNA sequences. To understand the molecular mechanisms which precisely regulate the transcription initiation event, it is crucial to elucidate the structure of the transcription factor/DNA complexes involved. Electron spectroscopic imaging (ESI) provides the opportunity to visualize individual DNA molecules. Enhancement of DNA contrast with ESI is accomplished by imaging with electrons that have interacted with inner shell electrons of phosphorus in the DNA backbone. Phosphorus detection at this intermediately high level of resolution (≈lnm) permits selective imaging of the DNA, to determine whether the protein factors compact, bend or wrap the DNA. Simultaneously, mass analysis and phosphorus content can be measured quantitatively, using adjacent DNA or tobacco mosaic virus (TMV) as mass and phosphorus standards. These two parameters provide stoichiometric information relating the ratios of protein:DNA content.


Author(s):  
J. S. Wall

The forte of the Scanning transmission Electron Microscope (STEM) is high resolution imaging with high contrast on thin specimens, as demonstrated by visualization of single heavy atoms. of equal importance for biology is the efficient utilization of all available signals, permitting low dose imaging of unstained single molecules such as DNA.Our work at Brookhaven has concentrated on: 1) design and construction of instruments optimized for a narrow range of biological applications and 2) use of such instruments in a very active user/collaborator program. Therefore our program is highly interactive with a strong emphasis on producing results which are interpretable with a high level of confidence.The major challenge we face at the moment is specimen preparation. The resolution of the STEM is better than 2.5 A, but measurements of resolution vs. dose level off at a resolution of 20 A at a dose of 10 el/A2 on a well-behaved biological specimen such as TMV (tobacco mosaic virus). To track down this problem we are examining all aspects of specimen preparation: purification of biological material, deposition on the thin film substrate, washing, fast freezing and freeze drying. As we attempt to improve our equipment/technique, we use image analysis of TMV internal controls included in all STEM samples as a monitor sensitive enough to detect even a few percent improvement. For delicate specimens, carbon films can be very harsh-leading to disruption of the sample. Therefore we are developing conducting polymer films as alternative substrates, as described elsewhere in these Proceedings. For specimen preparation studies, we have identified (from our user/collaborator program ) a variety of “canary” specimens, each uniquely sensitive to one particular aspect of sample preparation, so we can attempt to separate the variables involved.


2020 ◽  
Vol 29 (4) ◽  
pp. 738-761
Author(s):  
Tess K. Koerner ◽  
Melissa A. Papesh ◽  
Frederick J. Gallun

Purpose A questionnaire survey was conducted to collect information from clinical audiologists about rehabilitation options for adult patients who report significant auditory difficulties despite having normal or near-normal hearing sensitivity. This work aimed to provide more information about what audiologists are currently doing in the clinic to manage auditory difficulties in this patient population and their views on the efficacy of recommended rehabilitation methods. Method A questionnaire survey containing multiple-choice and open-ended questions was developed and disseminated online. Invitations to participate were delivered via e-mail listservs and through business cards provided at annual audiology conferences. All responses were anonymous at the time of data collection. Results Responses were collected from 209 participants. The majority of participants reported seeing at least one normal-hearing patient per month who reported significant communication difficulties. However, few respondents indicated that their location had specific protocols for the treatment of these patients. Counseling was reported as the most frequent rehabilitation method, but results revealed that audiologists across various work settings are also successfully starting to fit patients with mild-gain hearing aids. Responses indicated that patient compliance with computer-based auditory training methods was regarded as low, with patients generally preferring device-based rehabilitation options. Conclusions Results from this questionnaire survey strongly suggest that audiologists frequently see normal-hearing patients who report auditory difficulties, but that few clinicians are equipped with established protocols for diagnosis and management. While many feel that mild-gain hearing aids provide considerable benefit for these patients, very little research has been conducted to date to support the use of hearing aids or other rehabilitation options for this unique patient population. This study reveals the critical need for additional research to establish evidence-based practice guidelines that will empower clinicians to provide a high level of clinical care and effective rehabilitation strategies to these patients.


Sign in / Sign up

Export Citation Format

Share Document