scholarly journals MKPM: Multi keyword-pair matching for natural language sentences

Author(s):  
Xin Lu ◽  
Yao Deng ◽  
Ting Sun ◽  
Yi Gao ◽  
Jun Feng ◽  
...  

AbstractSentence matching is widely used in various natural language tasks, such as natural language inference, paraphrase identification and question answering. For these tasks, we need to understand the logical and semantic relationship between two sentences. Most current methods use all information within a sentence to build a model and hence determine its relationship to another sentence. However, the information contained in some sentences may cause redundancy or introduce noise, impeding the performance of the model. Therefore, we propose a sentence matching method based on multi keyword-pair matching (MKPM), which uses keyword pairs in two sentences to represent the semantic relationship between them, avoiding the interference of redundancy and noise. Specifically, we first propose a sentence-pair-based attention mechanism sp-attention to select the most important word pair from the two sentences as a keyword pair, and then propose a Bi-task architecture to model the semantic information of these keyword pairs. The Bi-task architecture is as follows: 1. In order to understand the semantic relationship at the word level between two sentences, we design a word-pair task (WP-Task), which uses these keyword pairs to complete sentence matching independently. 2. We design a sentence-pair task (SP-Task) to understand the sentence level semantic relationship between the two sentences by sentence denoising. Through the integration of the two tasks, our model can understand sentences more accurately from the two granularities of word and sentence. Experimental results show that our model can achieve state-of-the-art performance in several tasks. Our source code is publicly available1.

Author(s):  
Zhongbin Xie ◽  
Shuai Ma

Semantically matching two text sequences (usually two sentences) is a fundamental problem in NLP. Most previous methods either encode each of the two sentences into a vector representation (sentence-level embedding) or leverage word-level interaction features between the two sentences. In this study, we propose to take the sentence-level embedding features and the word-level interaction features as two distinct views of a sentence pair, and unify them with a framework of Variational Autoencoders such that the sentence pair is matched in a semi-supervised manner. The proposed model is referred to as Dual-View Variational AutoEncoder (DV-VAE), where the optimization of the variational lower bound can be interpreted as an implicit Co-Training mechanism for two matching models over distinct views. Experiments on SNLI, Quora and a Community Question Answering dataset demonstrate the superiority of our DV-VAE over several strong semi-supervised and supervised text matching models.


Author(s):  
Seonhoon Kim ◽  
Inho Kang ◽  
Nojun Kwak

Sentence matching is widely used in various natural language tasks such as natural language inference, paraphrase identification, and question answering. For these tasks, understanding logical and semantic relationship between two sentences is required but it is yet challenging. Although attention mechanism is useful to capture the semantic relationship and to properly align the elements of two sentences, previous methods of attention mechanism simply use a summation operation which does not retain original features enough. Inspired by DenseNet, a densely connected convolutional network, we propose a densely-connected co-attentive recurrent neural network, each layer of which uses concatenated information of attentive features as well as hidden features of all the preceding recurrent layers. It enables preserving the original and the co-attentive feature information from the bottommost word embedding layer to the uppermost recurrent layer. To alleviate the problem of an ever-increasing size of feature vectors due to dense concatenation operations, we also propose to use an autoencoder after dense concatenation. We evaluate our proposed architecture on highly competitive benchmark datasets related to sentence matching. Experimental results show that our architecture, which retains recurrent and attentive features, achieves state-of-the-art performances for most of the tasks.


Terminology ◽  
1998 ◽  
Vol 5 (2) ◽  
pp. 203-228 ◽  
Author(s):  
Bernardo Magnini

The role of generic lexical resources as well as specialized terminology is crucial in the design of complex dialogue systems, where a human interacts with the computer using Natural Language. Lexicon and terminology are supposed to store information for several purposes, including the discrimination of semantic-ally inconsistent interpretations, the use of lexical variations, the compositional construction of a semantic representation for a complex sentence and the ability to access equivalencies across different languages. For these purposes it is necessary to rely on representational tools that are both theoretically motivated and operationally well defined. In this paper we propose a solution to lexical and terminology representation which is based on the combination of a linguistically motivated upper model and a multilingual WordNet. The upper model accounts for the linguistic analysis at the sentence level, while the multilingual WordNet accounts for lexical and conceptual relations at the word level.


2020 ◽  
Vol 2020 ◽  
pp. 1-10 ◽  
Author(s):  
Hanqian Wu ◽  
Mumu Liu ◽  
Shangbin Zhang ◽  
Zhike Wang ◽  
Siliang Cheng

Online product reviews are exploring on e-commerce platforms, and mining aspect-level product information contained in those reviews has great economic benefit. The aspect category classification task is a basic task for aspect-level sentiment analysis which has become a hot research topic in the natural language processing (NLP) field during the last decades. In various e-commerce platforms, there emerge various user-generated question-answering (QA) reviews which generally contain much aspect-related information of products. Although some researchers have devoted their efforts on the aspect category classification for traditional product reviews, the existing deep learning-based approaches cannot be well applied to represent the QA-style reviews. Thus, we propose a 4-dimension (4D) textual representation model based on QA interaction-level and hyperinteraction-level by modeling with different levels of the text representation, i.e., word-level, sentence-level, QA interaction-level, and hyperinteraction-level. In our experiments, the empirical studies on datasets from three domains demonstrate that our proposals perform better than traditional sentence-level representation approaches, especially in the Digit domain.


2021 ◽  
Author(s):  
Anshuman Mishra ◽  
Dhruvesh Patel ◽  
Aparna Vijayakumar ◽  
Xiang Lorraine Li ◽  
Pavan Kapanipathi ◽  
...  

2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Chinh Trong Nguyen ◽  
Dang Tuan Nguyen

Recently, many deep learning models have archived high results in question answering task with overall F1 scores above 0.88 on SQuAD datasets. However, many of these models have quite low F1 scores on why-questions. These F1 scores range from 0.57 to 0.7 on SQuAD v1.1 development set. This means these models are more appropriate to the extraction of answers for factoid questions than for why-questions. Why-questions are asked when explanations are needed. These explanations are possibly arguments or simply subjective opinions. Therefore, we propose an approach to finding the answer for why-question using discourse analysis and natural language inference. In our approach, natural language inference is applied to identify implicit arguments at sentence level. It is also applied in sentence similarity calculation. Discourse analysis is applied to identify the explicit arguments and the opinions at sentence level in documents. The results from these two methods are the answer candidates to be selected as the final answer for each why-question. We also implement a system with our approach. Our system can provide an answer for a why-question and a document as in reading comprehension test. We test our system with a Vietnamese translated test set which contains all why-questions of SQuAD v1.1 development set. The test results show that our system cannot beat a deep learning model in F1 score; however, our system can answer more questions (answer rate of 77.0%) than the deep learning model (answer rate of 61.0%).


2021 ◽  
Vol 14 (4) ◽  
pp. 1-24
Author(s):  
Sushant Kafle ◽  
Becca Dingman ◽  
Matt Huenerfauth

There are style guidelines for authors who highlight important words in static text, e.g., bolded words in student textbooks, yet little research has investigated highlighting in dynamic texts, e.g., captions during educational videos for Deaf or Hard of Hearing (DHH) users. In our experimental study, DHH participants subjectively compared design parameters for caption highlighting, including: decoration (underlining vs. italicizing vs. boldfacing), granularity (sentence level vs. word level), and whether to highlight only the first occurrence of a repeating keyword. In partial contrast to recommendations in prior research, which had not been based on experimental studies with DHH users, we found that DHH participants preferred boldface, word-level highlighting in captions. Our empirical results provide guidance for the design of keyword highlighting during captioned videos for DHH users, especially in educational video genres.


2007 ◽  
Vol 33 (1) ◽  
pp. 105-133 ◽  
Author(s):  
Catalina Hallett ◽  
Donia Scott ◽  
Richard Power

This article describes a method for composing fluent and complex natural language questions, while avoiding the standard pitfalls of free text queries. The method, based on Conceptual Authoring, is targeted at question-answering systems where reliability and transparency are critical, and where users cannot be expected to undergo extensive training in question composition. This scenario is found in most corporate domains, especially in applications that are risk-averse. We present a proof-of-concept system we have developed: a question-answering interface to a large repository of medical histories in the area of cancer. We show that the method allows users to successfully and reliably compose complex queries with minimal training.


Sign in / Sign up

Export Citation Format

Share Document