scholarly journals Video captioning with stacked attention and semantic hard pull

2021 ◽  
Vol 7 ◽  
pp. e664
Author(s):  
Md. Mushfiqur Rahman ◽  
Thasin Abedin ◽  
Khondokar S.S. Prottoy ◽  
Ayana Moshruba ◽  
Fazlul Hasan Siddiqui

Video captioning, i.e., the task of generating captions from video sequences creates a bridge between the Natural Language Processing and Computer Vision domains of computer science. The task of generating a semantically accurate description of a video is quite complex. Considering the complexity, of the problem, the results obtained in recent research works are praiseworthy. However, there is plenty of scope for further investigation. This paper addresses this scope and proposes a novel solution. Most video captioning models comprise two sequential/recurrent layers—one as a video-to-context encoder and the other as a context-to-caption decoder. This paper proposes a novel architecture, namely Semantically Sensible Video Captioning (SSVC) which modifies the context generation mechanism by using two novel approaches—“stacked attention” and “spatial hard pull”. As there are no exclusive metrics for evaluating video captioning models, we emphasize both quantitative and qualitative analysis of our model. Hence, we have used the BLEU scoring metric for quantitative analysis and have proposed a human evaluation metric for qualitative analysis, namely the Semantic Sensibility (SS) scoring metric. SS Score overcomes the shortcomings of common automated scoring metrics. This paper reports that the use of the aforementioned novelties improves the performance of state-of-the-art architectures.

Informatics ◽  
2019 ◽  
Vol 6 (2) ◽  
pp. 19 ◽  
Author(s):  
Rajat Pandit ◽  
Saptarshi Sengupta ◽  
Sudip Kumar Naskar ◽  
Niladri Sekhar Dash ◽  
Mohini Mohan Sardar

Semantic similarity is a long-standing problem in natural language processing (NLP). It is a topic of great interest as its understanding can provide a look into how human beings comprehend meaning and make associations between words. However, when this problem is looked at from the viewpoint of machine understanding, particularly for under resourced languages, it poses a different problem altogether. In this paper, semantic similarity is explored in Bangla, a less resourced language. For ameliorating the situation in such languages, the most rudimentary method (path-based) and the latest state-of-the-art method (Word2Vec) for semantic similarity calculation were augmented using cross-lingual resources in English and the results obtained are truly astonishing. In the presented paper, two semantic similarity approaches have been explored in Bangla, namely the path-based and distributional model and their cross-lingual counterparts were synthesized in light of the English WordNet and Corpora. The proposed methods were evaluated on a dataset comprising of 162 Bangla word pairs, which were annotated by five expert raters. The correlation scores obtained between the four metrics and human evaluation scores demonstrate a marked enhancement that the cross-lingual approach brings into the process of semantic similarity calculation for Bangla.


2020 ◽  
pp. 109-116
Author(s):  
Antonella Poce ◽  
Francesca Amenduni ◽  
Carlo De Medio ◽  
Alessandra Norgini

The role of Higher Education (HE) is growingly acknowledged for the promotion of Critical Thinking (CT). Constructed-response tasks (CRT) are recognized to be necessary for the CT assessment, though they present problems related to scoring quality and cost (Ku, 2009). Researchers (Liu, Frankel, Roohr, 2014) have proposed using automated scoring to address the above concerns. The present work is aimed at comparing the features of different Natural Language Processing (NLP) techniques adopted to improve the reliability of a prototype designed to automatically assess six sub-skills of CT in CRT: use of language, argumentation, relevance, importance, critical evaluation and novelty (Poce, 2017). We will present the first (1.0) and the second (2.0) version of the CT prototype and their respective reliability results. Our research question is the following: Which level of reliability are shown respectively by the 1.0 and 2.0 automatic CT assessment prototype compared to expert human evaluation? Data collection is realized in two moments, to measure respectively the CT prototype 1.0 and 2.0 reliability from a total of 264 participants and 592 open-ended answers. Two human assessors rated all of these responses on each of the subskills on a scale of 1-5. Similarly, NLP approaches are adopted to compute a feature on each dimension. Quadratic Weighted Kappa and Pearson product-moment correlation were used to evaluate the between-human agreement and human-NLP agreement. Preliminary findings based on the first data set suggest adequate level of between-human rating agreement and a lower level human-NLP agreement (r .43 for the subscales of Relevance and Importance). We are continuing the analysis of the data collected in the 2nd step and expect to complete them in June 2020.


2020 ◽  
Author(s):  
Yuqi Kong ◽  
Fanchao Meng ◽  
Ben Carterette

Comparing document semantics is one of the toughest tasks in both Natural Language Processing and Information Retrieval. To date, on one hand, the tools for this task are still rare. On the other hand, most relevant methods are devised from the statistic or the vector space model perspectives but nearly none from a topological perspective. In this paper, we hope to make a different sound. A novel algorithm based on topological persistence for comparing semantics similarity between two documents is proposed. Our experiments are conducted on a document dataset with human judges’ results. A collection of state-of-the-art methods are selected for comparison. The experimental results show that our algorithm can produce highly human-consistent results, and also beats most state-of-the-art methods though ties with NLTK.


Author(s):  
Davide Picca ◽  
Dominique Jaccard ◽  
Gérald Eberlé

In the last decades, Natural Language Processing (NLP) has obtained a high level of success. Interactions between NLP and Serious Games have started and some of them already include NLP techniques. The objectives of this paper are twofold: on the one hand, providing a simple framework to enable analysis of potential uses of NLP in Serious Games and, on the other hand, applying the NLP framework to existing Serious Games and giving an overview of the use of NLP in pedagogical Serious Games. In this paper we present 11 serious games exploiting NLP techniques. We present them systematically, according to the following structure:  first, we highlight possible uses of NLP techniques in Serious Games, second, we describe the type of NLP implemented in the each specific Serious Game and, third, we provide a link to possible purposes of use for the different actors interacting in the Serious Game.


2018 ◽  
Vol 2018 (1) ◽  
pp. 127-144 ◽  
Author(s):  
Lucy Simko ◽  
Luke Zettlemoyer ◽  
Tadayoshi Kohno

Abstract Source code attribution classifiers have recently become powerful. We consider the possibility that an adversary could craft code with the intention of causing a misclassification, i.e., creating a forgery of another author’s programming style in order to hide the forger’s own identity or blame the other author. We find that it is possible for a non-expert adversary to defeat such a system. In order to inform the design of adversarially resistant source code attribution classifiers, we conduct two studies with C/C++ programmers to explore the potential tactics and capabilities both of such adversaries and, conversely, of human analysts doing source code authorship attribution. Through the quantitative and qualitative analysis of these studies, we (1) evaluate a state-of-the-art machine classifier against forgeries, (2) evaluate programmers as human analysts/forgery detectors, and (3) compile a set of modifications made to create forgeries. Based on our analyses, we then suggest features that future source code attribution systems might incorporate in order to be adversarially resistant.


Author(s):  
Fazel Keshtkar ◽  
Ledong Shi ◽  
Syed Ahmad Chan Bukhari

Finding our favorite dishes have became a hard task since restaurants are providing more choices and va- rieties. On the other hand, comments and reviews of restaurants are a good place to look for the answer. The purpose of this study is to use computational linguistics and natural language processing to categorise and find semantic relation in various dishes based on reviewers’ comments and menus description. Our goal is to imple- ment a state-of-the-art computational linguistics meth- ods such as, word embedding model, word2vec, topic modeling, PCA, classification algorithm. For visualiza- tions, t-Distributed Stochastic Neighbor Embedding (t- SNE) was used to explore the relation within dishes and their reviews. We also aim to extract the common pat- terns between different dishes among restaurants and reviews comment, and in reverse, explore the dishes with a semantics relations. A dataset of articles related to restaurant and located dishes within articles used to find comment patterns. Then we applied t-SNE visual- izations to identify the root of each feature of the dishes. As a result, to find a dish our model is able to assist users by several words of description and their inter- est. Our dataset contains 1,000 articles from food re- views agency on a variety of dishes from different cul- tures: American, i.e. ’steak’, hamburger; Chinese, i.e. ’stir fry’, ’dumplings’; Japanese, i.e., ’sushi’.


2020 ◽  
Vol 2020 ◽  
pp. 1-15
Author(s):  
Mingfu Xue ◽  
Chengxiang Yuan ◽  
Jian Wang ◽  
Weiqiang Liu

Recently, the natural language processing- (NLP-) based intelligent question and answer (Q&A) robots have been used ubiquitously. However, the robustness and security of current Q&A robots are still unsatisfactory, e.g., a slight typo in the user’s question may cause the Q&A robot unable to return the correct answer. In this paper, we propose a fast and automatic test dataset generation method for the robustness and security evaluation of current Q&A robots, which can work in black-box scenarios and thus can be applied to a variety of different Q&A robots. Specifically, we propose a dependency parse-based adversarial examples generation (DPAEG) method for Q&A robots. DPAEG first uses the proposed dependency parse-based keywords extraction algorithm to extract keywords from a question. Then, the proposed algorithm generates adversarial words according to the extracted keywords, which include typos and words that are spelled similarly to the keywords. Finally, these adversarial words are used to generate a large number of adversarial questions. The generated adversarial questions which are similar to the original questions do not affect human’s understanding, but the Q&A robots cannot answer these adversarial questions correctly. Moreover, the proposed method works in a black-box scenario, which means it does not need the knowledge of the target Q&A robots. Experiment results show that the generated adversarial examples have a high success rate on two state-of-the-art Q&A robots, DrQA and Google Assistant. In addition, the generated adversarial examples not only affect the correct answer (top-1) returned by DrQA but also affect the top-k candidate answers returned by DrQA. The adversarial examples make the top-k candidate answers contain fewer correct answers and make the correct answers rank lower in the top-k candidate answers. The human evaluation results show that participants with different genders, ages, and mother tongues can understand the meaning of most of the generated adversarial examples, which means that the generated adversarial examples do not affect human’s understanding.


2017 ◽  
Vol 13 (4) ◽  
pp. 89-108 ◽  
Author(s):  
Santosh Kumar Bharti ◽  
Ramkrushna Pradhan ◽  
Korra Sathya Babu ◽  
Sanjay Kumar Jena

In Natural Language Processing (NLP), sarcasm analysis in the text is considered as the most challenging task. It has been broadly researched in recent years. The property of sarcasm that makes it harder to detect is the gap between the literal and its intended meaning. It is a particular kind of sentiment which is capable of flipping the entire sense of a text. Sarcasm is often expressed verbally through the use of high pitch with heavy tonal stress. The other clues of sarcasm are the usage of various gestures such as gently sloping of eyes, hands movements, shaking heads, etc. However, the appearances of these clues for sarcasm are absent in textual data which makes the detection of sarcasm dependent upon several other factors. In this article, six algorithms were proposed to analyze the sarcasm in tweets of Twitter. These algorithms are based on the possible occurrences of sarcasm in tweets. Finally, the experimental results of the proposed algorithms were compared with some of the existing state-of-the-art.


Algorithms ◽  
2019 ◽  
Vol 12 (11) ◽  
pp. 240 ◽  
Author(s):  
Stefan Klus ◽  
Patrick Gelß

Interest in machine learning with tensor networks has been growing rapidly in recent years. We show that tensor-based methods developed for learning the governing equations of dynamical systems from data can, in the same way, be used for supervised learning problems and propose two novel approaches for image classification. One is a kernel-based reformulation of the previously introduced multidimensional approximation of nonlinear dynamics (MANDy), the other an alternating ridge regression in the tensor train format. We apply both methods to the MNIST and fashion MNIST data set and show that the approaches are competitive with state-of-the-art neural network-based classifiers.


Author(s):  
Shaoxiang Chen ◽  
Yu-Gang Jiang

Sequence-to-sequence models incorporated with attention mechanism have shown promising improvements on video captioning. While there is rich information both inside and between frames, spatial attention is rarely explored and motion information is usually handled by 3D-CNNs as just another modality for fusion. On the other hand, researches about human perception suggest that apparent motion can attract attention. Motivated by this, we aim to learn spatial attention on video frames under the guidance of motion information for caption generation. We present a novel video captioning framework by utilizing Motion Guided Spatial Attention (MGSA). The proposed MGSA exploits the motion between video frames by learning spatial attention from stacked optical flow images with a custom CNN. To further relate the spatial attention maps of video frames, we designed a Gated Attention Recurrent Unit (GARU) to adaptively incorporate previous attention maps. The whole framework can be trained in an end-to-end manner. We evaluate our approach on two benchmark datasets, MSVD and MSR-VTT. The experiments show that our designed model can generate better video representation and state of the art results are obtained under popular evaluation metrics such as BLEU@4, CIDEr, and METEOR.


Sign in / Sign up

Export Citation Format

Share Document