scholarly journals Visual Dialog Agent Based On Deep Q Learning and Memory Module Networks

Author(s):  
Arundhati Raj ◽  
Shubhangi Srivastava ◽  
Aniruddh Suresh Pillai ◽  
Ajay Kumar

In the past many years, it has been observed that there has been an increase in methods to solve problems and the solution involves a combination of Computer Vision and Natural Language Processing. New algorithms and systems are emerging and are being developed every day to solve the above-mentioned kind of problems. Visual Dialog Agent is one of them. This kind of system utilizes both Computer Vision and Natural Language Processing algorithms. With this technology many variants of Visual Dialog Agents have been designed till date and many exclusive algorithms are created for Visual Dialog Agent. In this paper we propose an idea to create a Visual Dialog Agent which utilizes the present state of art End to End Memory Module Networks along with Reinforcement Learning Policies to answer the questions prompted by the user and as well understand the inclination of the user in the conversation which it holds. The goal of the proposed Visual Dialog Agent is to have a more engaging conversation with the highest user inclination.

2019 ◽  
Vol 8 (4) ◽  
pp. 3656-3659

This paper discusses the concept of integrating artificial perception of an artificial intelligence by integrating NLP and CV, this should be able to solve 50% of problems where the data is usually in a raw format and not understandable by the machine. This method helps in the automatic labelling and understanding the data so it is easier for the machine to understand and help in our day to day tasks. “Perception is the ability to become aware of something which is internal or in the external environment through the use of the 5 senses” this is a natural capability of humans but has never properly been achieved in a machine. In the past five years massive strides have taken place in both natural language processing and computer vision but none of these advancements have increased the intelligence and perception of computer systems in the dramatic way that was expected. This difference in what was expected and what has finally been delivered is due to the fact that both these fields have evolved separately whereas perception requires these two dimensions of hearing (Natural Language Processing) and vision (Computer Vision) to be integrated.


Author(s):  
Santosh Kumar Mishra ◽  
Rijul Dhir ◽  
Sriparna Saha ◽  
Pushpak Bhattacharyya

Image captioning is the process of generating a textual description of an image that aims to describe the salient parts of the given image. It is an important problem, as it involves computer vision and natural language processing, where computer vision is used for understanding images, and natural language processing is used for language modeling. A lot of works have been done for image captioning for the English language. In this article, we have developed a model for image captioning in the Hindi language. Hindi is the official language of India, and it is the fourth most spoken language in the world, spoken in India and South Asia. To the best of our knowledge, this is the first attempt to generate image captions in the Hindi language. A dataset is manually created by translating well known MSCOCO dataset from English to Hindi. Finally, different types of attention-based architectures are developed for image captioning in the Hindi language. These attention mechanisms are new for the Hindi language, as those have never been used for the Hindi language. The obtained results of the proposed model are compared with several baselines in terms of BLEU scores, and the results show that our model performs better than others. Manual evaluation of the obtained captions in terms of adequacy and fluency also reveals the effectiveness of our proposed approach. Availability of resources : The codes of the article are available at https://github.com/santosh1821cs03/Image_Captioning_Hindi_Language ; The dataset will be made available: http://www.iitp.ac.in/∼ai-nlp-ml/resources.html .


2017 ◽  
Vol 49 (4) ◽  
pp. 1-44 ◽  
Author(s):  
Peratham Wiriyathammabhum ◽  
Douglas Summers-Stay ◽  
Cornelia Fermüller ◽  
Yiannis Aloimonos

Author(s):  
Oksana Chulanova

The article discusses the capabilities of artificial intelligence technologies - technologies based on the use of artificial intelligence, including natural language processing, intellectual decision support, computer vision, speech recognition and synthesis, and promising methods of artificial intelligence. The results of the author's study and the analysis of artificial intelligence technologies and their capabilities for optimizing work with staff are presented. A study conducted by the author allowed us to develop an author's concept of integrating artificial intelligence technologies into work with personnel in the digital paradigm.


2021 ◽  
pp. 111-127
Author(s):  
Rajat Koner ◽  
Hang Li ◽  
Marcel Hildebrandt ◽  
Deepan Das ◽  
Volker Tresp ◽  
...  

AbstractVisual Question Answering (VQA) is concerned with answering free-form questions about an image. Since it requires a deep semantic and linguistic understanding of the question and the ability to associate it with various objects that are present in the image, it is an ambitious task and requires multi-modal reasoning from both computer vision and natural language processing. We propose Graphhopper, a novel method that approaches the task by integrating knowledge graph reasoning, computer vision, and natural language processing techniques. Concretely, our method is based on performing context-driven, sequential reasoning based on the scene entities and their semantic and spatial relationships. As a first step, we derive a scene graph that describes the objects in the image, as well as their attributes and their mutual relationships. Subsequently, a reinforcement learning agent is trained to autonomously navigate in a multi-hop manner over the extracted scene graph to generate reasoning paths, which are the basis for deriving answers. We conduct an experimental study on the challenging dataset GQA, based on both manually curated and automatically generated scene graphs. Our results show that we keep up with human performance on manually curated scene graphs. Moreover, we find that Graphhopper outperforms another state-of-the-art scene graph reasoning model on both manually curated and automatically generated scene graphs by a significant margin.


2016 ◽  
Vol 57 ◽  
pp. 345-420 ◽  
Author(s):  
Yoav Goldberg

Over the past few years, neural networks have re-emerged as powerful machine-learning models, yielding state-of-the-art results in fields such as image recognition and speech processing. More recently, neural network models started to be applied also to textual natural language signals, again with very promising results. This tutorial surveys neural network models from the perspective of natural language processing research, in an attempt to bring natural-language researchers up to speed with the neural techniques. The tutorial covers input encoding for natural language tasks, feed-forward networks, convolutional networks, recurrent networks and recursive networks, as well as the computation graph abstraction for automatic gradient computation.


2020 ◽  
Vol 2020 ◽  
pp. 1-13 ◽  
Author(s):  
Haoran Wang ◽  
Yue Zhang ◽  
Xiaosheng Yu

In recent years, with the rapid development of artificial intelligence, image caption has gradually attracted the attention of many researchers in the field of artificial intelligence and has become an interesting and arduous task. Image caption, automatically generating natural language descriptions according to the content observed in an image, is an important part of scene understanding, which combines the knowledge of computer vision and natural language processing. The application of image caption is extensive and significant, for example, the realization of human-computer interaction. This paper summarizes the related methods and focuses on the attention mechanism, which plays an important role in computer vision and is recently widely used in image caption generation tasks. Furthermore, the advantages and the shortcomings of these methods are discussed, providing the commonly used datasets and evaluation criteria in this field. Finally, this paper highlights some open challenges in the image caption task.


2011 ◽  
Vol 17 (2) ◽  
pp. 141-144
Author(s):  
ANSSI YLI-JYRÄ ◽  
ANDRÁS KORNAI ◽  
JACQUES SAKAROVITCH

For the past two decades, specialised events on finite-state methods have been successful in presenting interesting studies on natural language processing to the public through journals and collections. The FSMNLP workshops have become well-known among researchers and are now the main forum of the Association for Computational Linguistics' (ACL) Special Interest Group on Finite-State Methods (SIGFSM). The current issue on finite-state methods and models in natural language processing was planned in 2008 in this context as a response to a call for special issue proposals. In 2010, the issue received a total of sixteen submissions, some of which were extended and updated versions of workshop papers, and others which were completely new. The final selection, consisting of only seven papers that could fit into one issue, is not fully representative, but complements the prior special issues in a nice way. The selected papers showcase a few areas where finite-state methods have less than obvious and sometimes even groundbreaking relevance to natural language processing (NLP) applications.


Author(s):  
NANA AMPAH ◽  
Matthew Sadiku ◽  
Omonowo Momoh ◽  
Sarhan Musa

Computational humanities is at the intersection of computing technologies and the disciplines of the humanities. Research in this field has steadily increased over the past years. Computational tools supporting textual search, large database analysis, data mining, network mapping, and natural language processing are employed by the humanities researcher.  This opens up new realms for analysis and understanding.  This paper provides a brief introduction into computational humanities.


Sign in / Sign up

Export Citation Format

Share Document