Explainability in Time Series Forecasting, Natural Language Processing, and Computer Vision

A Hindi Image Caption Generation Framework Using Deep Learning

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3432246 ◽

2021 ◽

Vol 20 (2) ◽

pp. 1-19

Author(s):

Santosh Kumar Mishra ◽

Rijul Dhir ◽

Sriparna Saha ◽

Pushpak Bhattacharyya

Keyword(s):

Computer Vision ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

English Language ◽

Image Captioning ◽

Textual Description ◽

Proposed Model ◽

Hindi Language ◽

The Given

Image captioning is the process of generating a textual description of an image that aims to describe the salient parts of the given image. It is an important problem, as it involves computer vision and natural language processing, where computer vision is used for understanding images, and natural language processing is used for language modeling. A lot of works have been done for image captioning for the English language. In this article, we have developed a model for image captioning in the Hindi language. Hindi is the official language of India, and it is the fourth most spoken language in the world, spoken in India and South Asia. To the best of our knowledge, this is the first attempt to generate image captions in the Hindi language. A dataset is manually created by translating well known MSCOCO dataset from English to Hindi. Finally, different types of attention-based architectures are developed for image captioning in the Hindi language. These attention mechanisms are new for the Hindi language, as those have never been used for the Hindi language. The obtained results of the proposed model are compared with several baselines in terms of BLEU scores, and the results show that our model performs better than others. Manual evaluation of the obtained captions in terms of adequacy and fluency also reveals the effectiveness of our proposed approach. Availability of resources : The codes of the article are available at https://github.com/santosh1821cs03/Image_Captioning_Hindi_Language ; The dataset will be made available: http://www.iitp.ac.in/∼ai-nlp-ml/resources.html .

Download Full-text

Wave2Vec: Vectorizing Electroencephalography Bio-Signal for Prediction of Brain Disease

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph15081750 ◽

2018 ◽

Vol 15 (8) ◽

pp. 1750 ◽

Cited By ~ 4

Author(s):

Seonho Kim ◽

Jungjoon Kim ◽

Hong-Woo Chun

Keyword(s):

Artificial Intelligence ◽

Time Series ◽

Feature Selection ◽

Deep Learning ◽

Natural Language Processing ◽

Data Analysis ◽

Natural Language ◽

Real Number ◽

Real Time ◽

Language Processing

Interest in research involving health-medical information analysis based on artificial intelligence, especially for deep learning techniques, has recently been increasing. Most of the research in this field has been focused on searching for new knowledge for predicting and diagnosing disease by revealing the relation between disease and various information features of data. These features are extracted by analyzing various clinical pathology data, such as EHR (electronic health records), and academic literature using the techniques of data analysis, natural language processing, etc. However, still needed are more research and interest in applying the latest advanced artificial intelligence-based data analysis technique to bio-signal data, which are continuous physiological records, such as EEG (electroencephalography) and ECG (electrocardiogram). Unlike the other types of data, applying deep learning to bio-signal data, which is in the form of time series of real numbers, has many issues that need to be resolved in preprocessing, learning, and analysis. Such issues include leaving feature selection, learning parts that are black boxes, difficulties in recognizing and identifying effective features, high computational complexities, etc. In this paper, to solve these issues, we provide an encoding-based Wave2vec time series classifier model, which combines signal-processing and deep learning-based natural language processing techniques. To demonstrate its advantages, we provide the results of three experiments conducted with EEG data of the University of California Irvine, which are a real-world benchmark bio-signal dataset. After converting the bio-signals (in the form of waves), which are a real number time series, into a sequence of symbols or a sequence of wavelet patterns that are converted into symbols, through encoding, the proposed model vectorizes the symbols by learning the sequence using deep learning-based natural language processing. The models of each class can be constructed through learning from the vectorized wavelet patterns and training data. The implemented models can be used for prediction and diagnosis of diseases by classifying the new data. The proposed method enhanced data readability and intuition of feature selection and learning processes by converting the time series of real number data into sequences of symbols. In addition, it facilitates intuitive and easy recognition, and identification of influential patterns. Furthermore, real-time large-capacity data analysis is facilitated, which is essential in the development of real-time analysis diagnosis systems, by drastically reducing the complexity of calculation without deterioration of analysis performance by data simplification through the encoding process.

Download Full-text

Computer Vision and Natural Language Processing

ACM Computing Surveys ◽

10.1145/3009906 ◽

2017 ◽

Vol 49 (4) ◽

pp. 1-44 ◽

Cited By ~ 11

Author(s):

Peratham Wiriyathammabhum ◽

Douglas Summers-Stay ◽

Cornelia Fermüller ◽

Yiannis Aloimonos

Keyword(s):

Computer Vision ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing

Download Full-text

Using time series and natural language processing to identify viral moments in the 2016

10.18653/v1/w19-2107 ◽

2019 ◽

Author(s):

Josephine Lukito ◽

Prathusha K Sarma ◽

Jordan Foley ◽

Aman Abhishek

Keyword(s):

Time Series ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing

Download Full-text

The Concept of Integrating Artificial Intelligence Technologies Into Human Resources in a Digital Paradigm

Management of the personnel and intellectual resources in Russia ◽

10.12737/2305-7807-2020-5-9 ◽

2020 ◽

Vol 9 (2) ◽

pp. 5-9

Author(s):

Oksana Chulanova

Keyword(s):

Artificial Intelligence ◽

Computer Vision ◽

Natural Language Processing ◽

Decision Support ◽

Speech Recognition ◽

Human Resources ◽

Natural Language ◽

Language Processing

The article discusses the capabilities of artificial intelligence technologies - technologies based on the use of artificial intelligence, including natural language processing, intellectual decision support, computer vision, speech recognition and synthesis, and promising methods of artificial intelligence. The results of the author's study and the analysis of artificial intelligence technologies and their capabilities for optimizing work with staff are presented. A study conducted by the author allowed us to develop an author's concept of integrating artificial intelligence technologies into work with personnel in the digital paradigm.

Download Full-text

Graphhopper: Multi-hop Scene Graph Reasoning for Visual Question Answering

10.1007/978-3-030-88361-4_7 ◽

2021 ◽

pp. 111-127

Author(s):

Rajat Koner ◽

Hang Li ◽

Marcel Hildebrandt ◽

Deepan Das ◽

Volker Tresp ◽

...

Keyword(s):

Computer Vision ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Human Performance ◽

Question Answering ◽

Scene Graph ◽

Visual Question Answering ◽

Learning Agent ◽

Modal Reasoning

AbstractVisual Question Answering (VQA) is concerned with answering free-form questions about an image. Since it requires a deep semantic and linguistic understanding of the question and the ability to associate it with various objects that are present in the image, it is an ambitious task and requires multi-modal reasoning from both computer vision and natural language processing. We propose Graphhopper, a novel method that approaches the task by integrating knowledge graph reasoning, computer vision, and natural language processing techniques. Concretely, our method is based on performing context-driven, sequential reasoning based on the scene entities and their semantic and spatial relationships. As a first step, we derive a scene graph that describes the objects in the image, as well as their attributes and their mutual relationships. Subsequently, a reinforcement learning agent is trained to autonomously navigate in a multi-hop manner over the extracted scene graph to generate reasoning paths, which are the basis for deriving answers. We conduct an experimental study on the challenging dataset GQA, based on both manually curated and automatically generated scene graphs. Our results show that we keep up with human performance on manually curated scene graphs. Moreover, we find that Graphhopper outperforms another state-of-the-art scene graph reasoning model on both manually curated and automatically generated scene graphs by a significant margin.

Download Full-text

An Overview of Image Caption Generation Methods

Computational Intelligence and Neuroscience ◽

10.1155/2020/3062706 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13 ◽

Cited By ~ 2

Author(s):

Haoran Wang ◽

Yue Zhang ◽

Xiaosheng Yu

Keyword(s):

Artificial Intelligence ◽

Computer Vision ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Rapid Development ◽

Evaluation Criteria ◽

Arduous Task ◽

Image Caption Generation ◽

Image Caption

In recent years, with the rapid development of artificial intelligence, image caption has gradually attracted the attention of many researchers in the field of artificial intelligence and has become an interesting and arduous task. Image caption, automatically generating natural language descriptions according to the content observed in an image, is an important part of scene understanding, which combines the knowledge of computer vision and natural language processing. The application of image caption is extensive and significant, for example, the realization of human-computer interaction. This paper summarizes the related methods and focuses on the attention mechanism, which plays an important role in computer vision and is recently widely used in image caption generation tasks. Furthermore, the advantages and the shortcomings of these methods are discussed, providing the commonly used datasets and evaluation criteria in this field. Finally, this paper highlights some open challenges in the image caption task.

Download Full-text

Enabling Intelligence through Deep Learning using IoT in a Classroom Environment based on a multimodal approach

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i2.818 ◽

2021 ◽

Vol 12 (2) ◽

pp. 381-393

Author(s):

Lakshaga Jyothi M, Et. al.

Keyword(s):

Computer Vision ◽

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Classroom Environment ◽

Educational Institution ◽

Daily Lives ◽

Learning Principles ◽

Intelligent Classroom

Smart Classrooms are becoming very popular nowadays. The boom of recent technologies such as the Internet of Things, thanks to those technologies that are tremendously equipping every corner of a diverse set of fields. Every educational institution has set some benchmark on adopting these technologies in their daily lives. But due to some constraints and setbacks, these IoT technological embodiments in the educational sector is still in the premature stage. The major success of any technological evolution is based on its full-fledged implementation to fit the society in the broader concern. The breakthrough in recent years by Deep Learning principles as it outperforms traditional machine learning models to solve any tasks especially, Computer Vision and Natural language processing problems. A fusion of Computer Vision and Natural Language Processing as a new astonishing field that have shown its existence in the recent years. Using such mixtures with the IoT platforms is a challenging task and and has not reached the eyes of many researchers across the globe. Many researchers of the past have shown interest in designing an intelligent classroom on a different context. Hence to fill this gap, we have proposed an approach or a conceptual model through which Deep Learning architectures fused in the IoT systems results in an Intelligent Classroom via such hybrid systems. Apart from this, we have also discussed the major challenges, limitations as well as opportunities that can arise with Deep Learning-based IoT Solutions. In this paper, we have summarized the available applications of these technologies to suit our solution. Thus, this paper can be taken as a kickstart for our research to have a glimpse of the available papers for the success of our proposed approach.

Download Full-text