Computer Vision and Natural Language Processing

Image captioning is the process of generating a textual description of an image that aims to describe the salient parts of the given image. It is an important problem, as it involves computer vision and natural language processing, where computer vision is used for understanding images, and natural language processing is used for language modeling. A lot of works have been done for image captioning for the English language. In this article, we have developed a model for image captioning in the Hindi language. Hindi is the official language of India, and it is the fourth most spoken language in the world, spoken in India and South Asia. To the best of our knowledge, this is the first attempt to generate image captions in the Hindi language. A dataset is manually created by translating well known MSCOCO dataset from English to Hindi. Finally, different types of attention-based architectures are developed for image captioning in the Hindi language. These attention mechanisms are new for the Hindi language, as those have never been used for the Hindi language. The obtained results of the proposed model are compared with several baselines in terms of BLEU scores, and the results show that our model performs better than others. Manual evaluation of the obtained captions in terms of adequacy and fluency also reveals the effectiveness of our proposed approach. Availability of resources : The codes of the article are available at https://github.com/santosh1821cs03/Image_Captioning_Hindi_Language ; The dataset will be made available: http://www.iitp.ac.in/∼ai-nlp-ml/resources.html .

Download Full-text

Explainability in Time Series Forecasting, Natural Language Processing, and Computer Vision

Explainable Artificial Intelligence: An Introduction to Interpretable Machine Learning ◽

10.1007/978-3-030-83356-5_7 ◽

2021 ◽

pp. 261-302

Author(s):

Uday Kamath ◽

John Liu

Keyword(s):

Computer Vision ◽

Time Series ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Time Series Forecasting

Download Full-text

The Concept of Integrating Artificial Intelligence Technologies Into Human Resources in a Digital Paradigm

Management of the personnel and intellectual resources in Russia ◽

10.12737/2305-7807-2020-5-9 ◽

2020 ◽

Vol 9 (2) ◽

pp. 5-9

Author(s):

Oksana Chulanova

Keyword(s):

Artificial Intelligence ◽

Computer Vision ◽

Natural Language Processing ◽

Decision Support ◽

Speech Recognition ◽

Human Resources ◽

Natural Language ◽

Language Processing

The article discusses the capabilities of artificial intelligence technologies - technologies based on the use of artificial intelligence, including natural language processing, intellectual decision support, computer vision, speech recognition and synthesis, and promising methods of artificial intelligence. The results of the author's study and the analysis of artificial intelligence technologies and their capabilities for optimizing work with staff are presented. A study conducted by the author allowed us to develop an author's concept of integrating artificial intelligence technologies into work with personnel in the digital paradigm.

Download Full-text

Graphhopper: Multi-hop Scene Graph Reasoning for Visual Question Answering

10.1007/978-3-030-88361-4_7 ◽

2021 ◽

pp. 111-127

Author(s):

Rajat Koner ◽

Hang Li ◽

Marcel Hildebrandt ◽

Deepan Das ◽

Volker Tresp ◽

...

Keyword(s):

Computer Vision ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Human Performance ◽

Question Answering ◽

Scene Graph ◽

Visual Question Answering ◽

Learning Agent ◽

Modal Reasoning

AbstractVisual Question Answering (VQA) is concerned with answering free-form questions about an image. Since it requires a deep semantic and linguistic understanding of the question and the ability to associate it with various objects that are present in the image, it is an ambitious task and requires multi-modal reasoning from both computer vision and natural language processing. We propose Graphhopper, a novel method that approaches the task by integrating knowledge graph reasoning, computer vision, and natural language processing techniques. Concretely, our method is based on performing context-driven, sequential reasoning based on the scene entities and their semantic and spatial relationships. As a first step, we derive a scene graph that describes the objects in the image, as well as their attributes and their mutual relationships. Subsequently, a reinforcement learning agent is trained to autonomously navigate in a multi-hop manner over the extracted scene graph to generate reasoning paths, which are the basis for deriving answers. We conduct an experimental study on the challenging dataset GQA, based on both manually curated and automatically generated scene graphs. Our results show that we keep up with human performance on manually curated scene graphs. Moreover, we find that Graphhopper outperforms another state-of-the-art scene graph reasoning model on both manually curated and automatically generated scene graphs by a significant margin.

Download Full-text

An Overview of Image Caption Generation Methods

Computational Intelligence and Neuroscience ◽

10.1155/2020/3062706 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13 ◽

Cited By ~ 2

Author(s):

Haoran Wang ◽

Yue Zhang ◽

Xiaosheng Yu

Keyword(s):

Artificial Intelligence ◽

Computer Vision ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Rapid Development ◽

Evaluation Criteria ◽

Arduous Task ◽

Image Caption Generation ◽

Image Caption

In recent years, with the rapid development of artificial intelligence, image caption has gradually attracted the attention of many researchers in the field of artificial intelligence and has become an interesting and arduous task. Image caption, automatically generating natural language descriptions according to the content observed in an image, is an important part of scene understanding, which combines the knowledge of computer vision and natural language processing. The application of image caption is extensive and significant, for example, the realization of human-computer interaction. This paper summarizes the related methods and focuses on the attention mechanism, which plays an important role in computer vision and is recently widely used in image caption generation tasks. Furthermore, the advantages and the shortcomings of these methods are discussed, providing the commonly used datasets and evaluation criteria in this field. Finally, this paper highlights some open challenges in the image caption task.

Download Full-text

Enabling Intelligence through Deep Learning using IoT in a Classroom Environment based on a multimodal approach

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i2.818 ◽

2021 ◽

Vol 12 (2) ◽

pp. 381-393

Author(s):

Lakshaga Jyothi M, Et. al.

Keyword(s):

Computer Vision ◽

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Classroom Environment ◽

Educational Institution ◽

Daily Lives ◽

Learning Principles ◽

Intelligent Classroom

Smart Classrooms are becoming very popular nowadays. The boom of recent technologies such as the Internet of Things, thanks to those technologies that are tremendously equipping every corner of a diverse set of fields. Every educational institution has set some benchmark on adopting these technologies in their daily lives. But due to some constraints and setbacks, these IoT technological embodiments in the educational sector is still in the premature stage. The major success of any technological evolution is based on its full-fledged implementation to fit the society in the broader concern. The breakthrough in recent years by Deep Learning principles as it outperforms traditional machine learning models to solve any tasks especially, Computer Vision and Natural language processing problems. A fusion of Computer Vision and Natural Language Processing as a new astonishing field that have shown its existence in the recent years. Using such mixtures with the IoT platforms is a challenging task and and has not reached the eyes of many researchers across the globe. Many researchers of the past have shown interest in designing an intelligent classroom on a different context. Hence to fill this gap, we have proposed an approach or a conceptual model through which Deep Learning architectures fused in the IoT systems results in an Intelligent Classroom via such hybrid systems. Apart from this, we have also discussed the major challenges, limitations as well as opportunities that can arise with Deep Learning-based IoT Solutions. In this paper, we have summarized the available applications of these technologies to suit our solution. Thus, this paper can be taken as a kickstart for our research to have a glimpse of the available papers for the success of our proposed approach.

Download Full-text

VQAR: Review on Information Retrieval Techniques based on Computer Vision and Natural Language Processing

2019 3rd International Conference on Computing Methodologies and Communication (ICCMC) ◽

10.1109/iccmc.2019.8819803 ◽

2019 ◽

Author(s):

Shivangi Modi ◽

Dhatri Pandya

Keyword(s):

Computer Vision ◽

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing

Download Full-text

Visual Dialog Agent Based On Deep Q Learning and Memory Module Networks

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-1207 ◽

2021 ◽

pp. 41-47

Author(s):

Arundhati Raj ◽

Shubhangi Srivastava ◽

Aniruddh Suresh Pillai ◽

Ajay Kumar

Keyword(s):

Computer Vision ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Memory Module ◽

Agent Based ◽

Q Learning ◽

The Past ◽

Learning Policies ◽

New Algorithms

In the past many years, it has been observed that there has been an increase in methods to solve problems and the solution involves a combination of Computer Vision and Natural Language Processing. New algorithms and systems are emerging and are being developed every day to solve the above-mentioned kind of problems. Visual Dialog Agent is one of them. This kind of system utilizes both Computer Vision and Natural Language Processing algorithms. With this technology many variants of Visual Dialog Agents have been designed till date and many exclusive algorithms are created for Visual Dialog Agent. In this paper we propose an idea to create a Visual Dialog Agent which utilizes the present state of art End to End Memory Module Networks along with Reinforcement Learning Policies to answer the questions prompted by the user and as well understand the inclination of the user in the conversation which it holds. The goal of the proposed Visual Dialog Agent is to have a more engaging conversation with the highest user inclination.

Download Full-text

Integrating Computer Vision and Natural Language Processing to Guide Blind Movements

Position Papers of the 2019 Federated Conference on Computer Science and Information Systems ◽

10.15439/2019f345 ◽

2019 ◽

Author(s):

Lenard Nkalubo

Keyword(s):

Computer Vision ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing

Download Full-text

Review of Artificial Intelligence Adversarial Attack and Defense Technologies

Applied Sciences ◽

10.3390/app9050909 ◽

2019 ◽

Vol 9 (5) ◽

pp. 909 ◽

Cited By ~ 21

Author(s):

Shilin Qiu ◽

Qihe Liu ◽

Shijie Zhou ◽

Chunjiang Wu

Keyword(s):

Artificial Intelligence ◽

Computer Vision ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Physical World ◽

Research Progress ◽

Testing Stage ◽

Adversarial Attack ◽

Attack And Defense

In recent years, artificial intelligence technologies have been widely used in computer vision, natural language processing, automatic driving, and other fields. However, artificial intelligence systems are vulnerable to adversarial attacks, which limit the applications of artificial intelligence (AI) technologies in key security fields. Therefore, improving the robustness of AI systems against adversarial attacks has played an increasingly important role in the further development of AI. This paper aims to comprehensively summarize the latest research progress on adversarial attack and defense technologies in deep learning. According to the target model’s different stages where the adversarial attack occurred, this paper expounds the adversarial attack methods in the training stage and testing stage respectively. Then, we sort out the applications of adversarial attack technologies in computer vision, natural language processing, cyberspace security, and the physical world. Finally, we describe the existing adversarial defense methods respectively in three main categories, i.e., modifying data, modifying models and using auxiliary tools.

Download Full-text