Evaluating Gender-Neutral Training Data for Automated Image Captioning

Human ratings are currently the most accurate way to assess the quality of an image captioning model, yet most often the only used outcome of an expensive human rating evaluation is a few overall statistics over the evaluation dataset. In this paper, we show that the signal from instance-level human caption ratings can be leveraged to improve captioning models, even when the amount of caption ratings is several orders of magnitude less than the caption training data. We employ a policy gradient method to maximize the human ratings as rewards in an off-policy reinforcement learning setting, where policy gradients are estimated by samples from a distribution that focuses on the captions in a caption ratings dataset. Our empirical evidence indicates that the proposed method learns to generalize the human raters' judgments to a previously unseen set of images, as judged by a different set of human judges, and additionally on a different, multi-dimensional side-by-side human evaluation procedure.

Download Full-text

Leveraging Human Attention in Novel Object Captioning

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/86 ◽

2021 ◽

Author(s):

Xianyu Chen ◽

Ming Jiang ◽

Qi Zhao

Keyword(s):

State Of The Art ◽

Source Code ◽

Training Data ◽

Training Method ◽

Image Captioning ◽

Novel Object ◽

Text Corpora ◽

Human Attention ◽

Gating Mechanism ◽

Novel Objects

Image captioning models depend on training with paired image-text corpora, which poses various challenges in describing images containing novel objects absent from the training data. While previous novel object captioning methods rely on external image taggers or object detectors to describe novel objects, we present the Attention-based Novel Object Captioner (ANOC) that complements novel object captioners with human attention features that characterize generally important information independent of tasks. It introduces a gating mechanism that adaptively incorporates human attention with self-learned machine attention, with a Constrained Self-Critical Sequence Training method to address the exposure bias while maintaining constraints of novel object descriptions. Extensive experiments conducted on the nocaps and Held-Out COCO datasets demonstrate that our method considerably outperforms the state-of-the-art novel object captioners. Our source code is available at https://github.com/chenxy99/ANOC.

Download Full-text

Analysis of the Fuzziness of Image Caption Generation Models due to Data Augmentation Techniques

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c6439.0910321 ◽

2021 ◽

Vol 10 (3) ◽

pp. 131-139

Author(s):

Kota Akshith Reddy ◽

◽

Satish C J ◽

Jahnavi Polsani ◽

Teja Naveen Chintapalli ◽

...

Keyword(s):

Deep Learning ◽

Data Augmentation ◽

Training Data ◽

Image Captioning ◽

The Core ◽

Augmentation Techniques ◽

Evaluation Metric ◽

Learning Data ◽

Image Caption Generation ◽

Image Caption

Automatic Image Caption Generation is one of the core problems in the field of Deep Learning. Data Augmentation is a technique which helps in increasing the amount of data at hand and this is done by augmenting the training data using various techniques like flipping, rotating, Zooming, Brightening, etc. In this work, we create an Image Captioning model and check its robustness on all the major types of Image Augmentation techniques. The results show the fuzziness of the model while working with the same image but a different augmentation technique and because of this, a different caption is produced every time a different data augmentation technique is employed. We also show the change in the performance of the model after applying these augmentation techniques. Flickr8k dataset is used for this study along with BLEU score as the evaluation metric for the image captioning model.

Download Full-text

Backpropagation-Based Decoding for Multimodal Machine Translation

Frontiers in Artificial Intelligence ◽

10.3389/frai.2021.736722 ◽

2022 ◽

Vol 4 ◽

Author(s):

Ziyan Yang ◽

Leticia Pinto-Alva ◽

Franck Dernoncourt ◽

Vicente Ordonez

Keyword(s):

Neural Network ◽

Machine Translation ◽

Visual Representations ◽

Training Data ◽

Language Models ◽

Test Time ◽

Single Image ◽

Image Captioning ◽

Paired Data ◽

Multimodal Language

People are able to describe images using thousands of languages, but languages share only one visual world. The aim of this work is to use the learned intermediate visual representations from a deep convolutional neural network to transfer information across languages for which paired data is not available in any form. Our work proposes using backpropagation-based decoding coupled with transformer-based multilingual-multimodal language models in order to obtain translations between any languages used during training. We particularly show the capabilities of this approach in the translation of German-Japanese and Japanese-German sentence pairs, given a training data of images freely associated with text in English, German, and Japanese but for which no single image contains annotations in both Japanese and German. Moreover, we demonstrate that our approach is also generally useful in the multilingual image captioning task when sentences in a second language are available at test time. The results of our method also compare favorably in the Multi30k dataset against recently proposed methods that are also aiming to leverage images as an intermediate source of translations.

Download Full-text

Kick-Starting Female Careers

Journal of Personnel Psychology ◽

10.1027/1866-5888/a000209 ◽

2018 ◽

Vol 17 (4) ◽

pp. 193-203 ◽

Cited By ~ 2

Author(s):

Tanja Hentschel ◽

Lisa Kristina Horvath ◽

Claudia Peus ◽

Sabine Sczesny

Keyword(s):

Word Pair ◽

Linguistic Form ◽

Gender Neutral ◽

Entrepreneurial Activities ◽

Relationship Of ◽

The Relationship

Abstract. Entrepreneurship programs often aim at increasing women’s lower entrepreneurial activities. We investigate how advertisements for entrepreneurship programs can be designed to increase women’s application intentions. Results of an experiment with 156 women showed that women indicate (1) lower self-ascribed fit to and interest in the program after viewing a male-typed image (compared to a gender-neutral or female-typed image) in the advertisement; and (2) lower self-ascribed fit to and interest in the program as well as lower application intentions if the German masculine linguistic form of the term “entrepreneur” (compared to the gender-fair word pair “female and male entrepreneur”) is used in the recruitment advertisement. Women’s reactions are most negative when both a male-typed image and the masculine linguistic form appear in the advertisement. Self-ascribed fit and program interest mediate the relationship of advertisement characteristics on application intentions.

Download Full-text

Making Evidence-Based Practice User Friendly: A Curriculum for Training "Data-Proficient" Clinicians

PsycEXTRA Dataset ◽

10.1037/e517292011-091 ◽

2009 ◽

Author(s):

Christopher Layne ◽

Virginia Strand ◽

Robert Abramovitz ◽

Glenn Saxe

Keyword(s):

Evidence Based Practice ◽

Training Data ◽

Evidence Based ◽

User Friendly

Download Full-text

An AdaBoost Using a Weak-Learner Generating Several Weak-Hypotheses for Large Training Data of Natural Language Processing

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.130.83 ◽

2010 ◽

Vol 130 (1) ◽

pp. 83-91 ◽

Cited By ~ 1

Author(s):

Tomoya Iwakura ◽

Seishi Okamoto ◽

Kazuo Asakawa

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Training Data ◽

Weak Learner

Download Full-text

A Note on Document Classification with Small Training Data

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.131.1459 ◽

2011 ◽

Vol 131 (8) ◽

pp. 1459-1466

Author(s):

Yasunari Maeda ◽

Hideki Yoshida ◽

Masakiyo Suzuki ◽

Toshiyasu Matsushima

Keyword(s):

Document Classification ◽

Training Data

Download Full-text

Local and Regional Hour-Ahead Forecasts of Solar Irradiance with Training Data Selection and Support Vector Regression

IEEJ Transactions on Power and Energy ◽

10.1541/ieejpes.136.898 ◽

2016 ◽

Vol 136 (12) ◽

pp. 898-907 ◽

Cited By ~ 2

Author(s):

Joao Gari da Silva Fonseca Junior ◽

Hideaki Ohtake ◽

Takashi Oozeki ◽

Kazuhiko Ogimoto

Keyword(s):

Support Vector Regression ◽

Solar Irradiance ◽

Training Data ◽

Data Selection ◽

Support Vector ◽

Training Data Selection

Download Full-text

Cardiology Fellowship Education in the Era of High-density Training, Data Tracking, and Quality Measures

The American Heart Hospital Journal ◽

10.15420/ahhj.2011.9.2.99 ◽

2011 ◽

Vol 9 (2) ◽

pp. 99

Author(s):

Alex J Auseon ◽

Albert J Kolibash ◽

◽

Keyword(s):

Ohio State University ◽

Training Data ◽

State University ◽

Training Requirements ◽

Healthcare Environment ◽

Medical Centers ◽

Work Hour ◽

The Ohio State University ◽

Data Tracking

Background:Educating trainees during cardiology fellowship is a process in constant evolution, with program directors regularly adapting to increasing demands and regulations as they strive to prepare graduates for practice in todays healthcare environment.Methods and Results:In a 10-year follow-up to a previous manuscript regarding fellowship education, we reviewed the literature regarding the most topical issues facing training programs in 2010, describing our approach at The Ohio State University.Conclusion:In the midst of challenges posed by the increasing complexity of training requirements and documentation, work hour restrictions, and the new definitions of quality and safety, we propose methods of curricula revision and collaboration that may serve as an example to other medical centers.

Download Full-text

Evaluating Gender-Neutral Training Data for Automated Image Captioning

Reinforcing an Image Caption Generator Using Off-Line Human Feedback

Leveraging Human Attention in Novel Object Captioning

Analysis of the Fuzziness of Image Caption Generation Models due to Data Augmentation Techniques

Backpropagation-Based Decoding for Multimodal Machine Translation

Kick-Starting Female Careers

Making Evidence-Based Practice User Friendly: A Curriculum for Training "Data-Proficient" Clinicians

An AdaBoost Using a Weak-Learner Generating Several Weak-Hypotheses for Large Training Data of Natural Language Processing

A Note on Document Classification with Small Training Data

Local and Regional Hour-Ahead Forecasts of Solar Irradiance with Training Data Selection and Support Vector Regression

Cardiology Fellowship Education in the Era of High-density Training, Data Tracking, and Quality Measures

Export Citation Format