MRN: Moment Relation Network for Natural Language Video Localization with Transfer Learning

Author(s):  
Siyu Jiang ◽  
Guobin Wu

In this paper, we tackle the task of natural language video localization (NLVL): given an untrimmed video and a description language query, the goal is to localize the temporal segment within the video that best describes the natural language description. NLVL is challenging at the intersection of language and video understanding because a video may contain multiple segments of interests and the language may describe complicated temporal dependencies. Though existing approaches have achieved good performance, most of them did not fully consider the inherent differences between language and video modalities. Here, we propose Moment Relation Network (MRN) to reduce the divergence of the probability distribution of these two modalities. Specifically, MRN trains video and language subnets, and then uses transfer learning techniques to map the extracted features into an embedding-shared space where we calculate the similarity of two modalities using Mahalanobis distance metric, which is used to localize moments. Extensive experiments on benchmark datasets show that the proposed MRN significantly outperforms the state-of-the-art under the widely used metrics by a large margin.

Author(s):  
Siva Reddy ◽  
Mirella Lapata ◽  
Mark Steedman

In this paper we introduce a novel semantic parsing approach to query Freebase in natural language without requiring manual annotations or question-answer pairs. Our key insight is to represent natural language via semantic graphs whose topology shares many commonalities with Freebase. Given this representation, we conceptualize semantic parsing as a graph matching problem. Our model converts sentences to semantic graphs using CCG and subsequently grounds them to Freebase guided by denotations as a form of weak supervision. Evaluation experiments on a subset of the Free917 and WebQuestions benchmark datasets show our semantic parser improves over the state of the art.


Author(s):  
Ali Fakhry

The applications of Deep Q-Networks are seen throughout the field of reinforcement learning, a large subsect of machine learning. Using a classic environment from OpenAI, CarRacing-v0, a 2D car racing environment, alongside a custom based modification of the environment, a DQN, Deep Q-Network, was created to solve both the classic and custom environments. The environments are tested using custom made CNN architectures and applying transfer learning from Resnet18. While DQNs were state of the art years ago, using it for CarRacing-v0 appears somewhat unappealing and not as effective as other reinforcement learning techniques. Overall, while the model did train and the agent learned various parts of the environment, attempting to reach the reward threshold for the environment with this reinforcement learning technique seems problematic and difficult as other techniques would be more useful.


2022 ◽  
Vol 22 (3) ◽  
pp. 1-21
Author(s):  
Prayag Tiwari ◽  
Amit Kumar Jaiswal ◽  
Sahil Garg ◽  
Ilsun You

Self-attention mechanisms have recently been embraced for a broad range of text-matching applications. Self-attention model takes only one sentence as an input with no extra information, i.e., one can utilize the final hidden state or pooling. However, text-matching problems can be interpreted either in symmetrical or asymmetrical scopes. For instance, paraphrase detection is an asymmetrical task, while textual entailment classification and question-answer matching are considered asymmetrical tasks. In this article, we leverage attractive properties of self-attention mechanism and proposes an attention-based network that incorporates three key components for inter-sequence attention: global pointwise features, preceding attentive features, and contextual features while updating the rest of the components. Our model follows evaluation on two benchmark datasets cover tasks of textual entailment and question-answer matching. The proposed efficient Self-attention-driven Network for Text Matching outperforms the state of the art on the Stanford Natural Language Inference and WikiQA datasets with much fewer parameters.


Author(s):  
Yu Gong ◽  
Xusheng Luo ◽  
Yu Zhu ◽  
Wenwu Ou ◽  
Zhao Li ◽  
...  

Slot filling is a critical task in natural language understanding (NLU) for dialog systems. State-of-the-art approaches treat it as a sequence labeling problem and adopt such models as BiLSTM-CRF. While these models work relatively well on standard benchmark datasets, they face challenges in the context of E-commerce where the slot labels are more informative and carry richer expressions. In this work, inspired by the unique structure of E-commerce knowledge base, we propose a novel multi-task model with cascade and residual connections, which jointly learns segment tagging, named entity tagging and slot filling. Experiments show the effectiveness of the proposed cascade and residual structures. Our model has a 14.6% advantage in F1 score over the strong baseline methods on a new Chinese E-commerce shopping assistant dataset, while achieving competitive accuracies on a standard dataset. Furthermore, online test deployed on such dominant E-commerce platform shows 130% improvement on accuracy of understanding user utterances. Our model has already gone into production in the E-commerce platform.


2018 ◽  
Author(s):  
Debanjan Mahata ◽  
John Kuriakose ◽  
Rajiv Ratn Shah ◽  
Roger Zimmermann

Keyphrase extraction is a fundamental task in natural language processing that facilitates mapping of documents to a set of representative phrases. In this paper, we present an unsupervised technique (Key2Vec) that leverages phrase embeddings for ranking keyphrases extracted from scientific articles. Specifically, we propose an effective way of processing text documents for training multi-word phrase embeddings that are used for thematic representation of scientific articles and ranking of keyphrases extracted from them using theme-weighted PageRank. Evaluations are performed on benchmark datasets producing state-of-the-art results.


2022 ◽  
Vol 40 (1) ◽  
pp. 1-22
Author(s):  
Lianghao Xia ◽  
Chao Huang ◽  
Yong Xu ◽  
Huance Xu ◽  
Xiang Li ◽  
...  

As the deep learning techniques have expanded to real-world recommendation tasks, many deep neural network based Collaborative Filtering (CF) models have been developed to project user-item interactions into latent feature space, based on various neural architectures, such as multi-layer perceptron, autoencoder, and graph neural networks. However, the majority of existing collaborative filtering systems are not well designed to handle missing data. Particularly, in order to inject the negative signals in the training phase, these solutions largely rely on negative sampling from unobserved user-item interactions and simply treating them as negative instances, which brings the recommendation performance degradation. To address the issues, we develop a C ollaborative R eflection-Augmented A utoencoder N etwork (CRANet), that is capable of exploring transferable knowledge from observed and unobserved user-item interactions. The network architecture of CRANet is formed of an integrative structure with a reflective receptor network and an information fusion autoencoder module, which endows our recommendation framework with the ability of encoding implicit user’s pairwise preference on both interacted and non-interacted items. Additionally, a parametric regularization-based tied-weight scheme is designed to perform robust joint training of the two-stage CRANetmodel. We finally experimentally validate CRANeton four diverse benchmark datasets corresponding to two recommendation tasks, to show that debiasing the negative signals of user-item interactions improves the performance as compared to various state-of-the-art recommendation techniques. Our source code is available at https://github.com/akaxlh/CRANet.


Author(s):  
Zhiguo Wang ◽  
Wael Hamza ◽  
Radu Florian

Natural language sentence matching is a fundamental technology for a variety of tasks. Previous approaches either match sentences from a single direction or only apply single granular (word-by-word or sentence-by-sentence) matching. In this work, we propose a bilateral multi-perspective matching (BiMPM) model. Given two sentences P and Q, our model first encodes them with a BiLSTM encoder. Next, we match the two encoded sentences in two directions P against Q and P against Q. In each matching direction, each time step of one sentence is matched against all time-steps of the other sentence from multiple perspectives. Then, another BiLSTM layer is utilized to aggregate the matching results into a fix-length matching vector. Finally, based on the matching vector, a decision is made through a fully connected layer. We evaluate our model on three tasks: paraphrase identification, natural language inference and answer sentence selection. Experimental results on standard benchmark datasets show that our model achieves the state-of-the-art performance on all tasks.


2020 ◽  
Vol 32 (23) ◽  
pp. 17309-17320
Author(s):  
Rolandos Alexandros Potamias ◽  
Georgios Siolas ◽  
Andreas - Georgios Stafylopatis

AbstractFigurative language (FL) seems ubiquitous in all social media discussion forums and chats, posing extra challenges to sentiment analysis endeavors. Identification of FL schemas in short texts remains largely an unresolved issue in the broader field of natural language processing, mainly due to their contradictory and metaphorical meaning content. The main FL expression forms are sarcasm, irony and metaphor. In the present paper, we employ advanced deep learning methodologies to tackle the problem of identifying the aforementioned FL forms. Significantly extending our previous work (Potamias et al., in: International conference on engineering applications of neural networks, Springer, Berlin, pp 164–175, 2019), we propose a neural network methodology that builds on a recently proposed pre-trained transformer-based network architecture which is further enhanced with the employment and devise of a recurrent convolutional neural network. With this setup, data preprocessing is kept in minimum. The performance of the devised hybrid neural architecture is tested on four benchmark datasets, and contrasted with other relevant state-of-the-art methodologies and systems. Results demonstrate that the proposed methodology achieves state-of-the-art performance under all benchmark datasets, outperforming, even by a large margin, all other methodologies and published studies.


2021 ◽  
Vol 11 (15) ◽  
pp. 7160
Author(s):  
Ramon Ruiz-Dolz ◽  
Montserrat Nofre ◽  
Mariona Taulé ◽  
Stella Heras ◽  
Ana García-Fornes

The application of the latest Natural Language Processing breakthroughs in computational argumentation has shown promising results, which have raised the interest in this area of research. However, the available corpora with argumentative annotations are often limited to a very specific purpose or are not of adequate size to take advantage of state-of-the-art deep learning techniques (e.g., deep neural networks). In this paper, we present VivesDebate, a large, richly annotated and versatile professional debate corpus for computational argumentation research. The corpus has been created from 29 transcripts of a debate tournament in Catalan and has been machine-translated into Spanish and English. The annotation contains argumentative propositions, argumentative relations, debate interactions and professional evaluations of the arguments and argumentation. The presented corpus can be useful for research on a heterogeneous set of computational argumentation underlying tasks such as Argument Mining, Argument Analysis, Argument Evaluation or Argument Generation, among others. All this makes VivesDebate a valuable resource for computational argumentation research within the context of massive corpora aimed at Natural Language Processing tasks.


Author(s):  
Arshia Rehman ◽  
Saeeda Naz ◽  
Ahmed Khan ◽  
Ahmad Zaib ◽  
Imran Razzak

AbstractBackgroundCoronavirus disease (COVID-19) is an infectious disease caused by a new virus. Exponential growth is not only threatening lives, but also impacting businesses and disrupting travel around the world.AimThe aim of this work is to develop an efficient diagnosis of COVID-19 disease by differentiating it from viral pneumonia, bacterial pneumonia and healthy cases using deep learning techniques.MethodIn this work, we have used pre-trained knowledge to improve the diagnostic performance using transfer learning techniques and compared the performance different CNN architectures.ResultsEvaluation results using K-fold (10) showed that we have achieved state of the art performance with overall accuracy of 98.75% on the perspective of CT and X-ray cases as a whole.ConclusionQuantitative evaluation showed high accuracy for automatic diagnosis of COVID-19. Pre-trained deep learning models develop in this study could be used early screening of coronavirus, however it calls for extensive need to CT or X-rays dataset to develop a reliable application.


Sign in / Sign up

Export Citation Format

Share Document