sentence pair
Recently Published Documents


TOTAL DOCUMENTS

32
(FIVE YEARS 22)

H-INDEX

3
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Yibo Chen ◽  
Zuping Zhang ◽  
Xin Huang ◽  
Xing Xiang ◽  
Zhiqiang He ◽  
...  

Abstract Discriminating the homology and heterogeneity of two documents in information retrieval is very important and difficult step. Existing methods mainly focus on word-based document duplicate checking or sentence pairs matching except manual verification which need a lot of human resource cost. The word-based document duplicate checking can not judge the similarity of two documents from the semantic level and the matching sentence pair methods can not effectively mine the semantic information from a long text which is frequent retrieval results. A concept-based Multi-Feature Semantic Fusion Model (MFSFM) is proposed. It employs multi-feature enhanced semantics to construct a concept map for represent the document, and employs a multi-convolution mixed residual CNN module to introduce local attention mechanism for improve the sensitivity of conceptual boundary information. To improve the feasibility of the proposed MFSFM based on concept maps, two multi-feature document data sets are set up. Each of them consists of about 500 actual scientific and technological project feasibility reports. Experimental results based on the actual datasets show that the proposed MFSFM converges quickly while expanding the latest methods of natural language matching at the accuracy rate.


Author(s):  
Xin Lu ◽  
Yao Deng ◽  
Ting Sun ◽  
Yi Gao ◽  
Jun Feng ◽  
...  

AbstractSentence matching is widely used in various natural language tasks, such as natural language inference, paraphrase identification and question answering. For these tasks, we need to understand the logical and semantic relationship between two sentences. Most current methods use all information within a sentence to build a model and hence determine its relationship to another sentence. However, the information contained in some sentences may cause redundancy or introduce noise, impeding the performance of the model. Therefore, we propose a sentence matching method based on multi keyword-pair matching (MKPM), which uses keyword pairs in two sentences to represent the semantic relationship between them, avoiding the interference of redundancy and noise. Specifically, we first propose a sentence-pair-based attention mechanism sp-attention to select the most important word pair from the two sentences as a keyword pair, and then propose a Bi-task architecture to model the semantic information of these keyword pairs. The Bi-task architecture is as follows: 1. In order to understand the semantic relationship at the word level between two sentences, we design a word-pair task (WP-Task), which uses these keyword pairs to complete sentence matching independently. 2. We design a sentence-pair task (SP-Task) to understand the sentence level semantic relationship between the two sentences by sentence denoising. Through the integration of the two tasks, our model can understand sentences more accurately from the two granularities of word and sentence. Experimental results show that our model can achieve state-of-the-art performance in several tasks. Our source code is publicly available1.


10.2196/23099 ◽  
2021 ◽  
Vol 9 (5) ◽  
pp. e23099
Author(s):  
Mark Ormerod ◽  
Jesús Martínez del Rincón ◽  
Barry Devereux

Background Semantic textual similarity (STS) is a natural language processing (NLP) task that involves assigning a similarity score to 2 snippets of text based on their meaning. This task is particularly difficult in the domain of clinical text, which often features specialized language and the frequent use of abbreviations. Objective We created an NLP system to predict similarity scores for sentence pairs as part of the Clinical Semantic Textual Similarity track in the 2019 n2c2/OHNLP Shared Task on Challenges in Natural Language Processing for Clinical Data. We subsequently sought to analyze the intermediary token vectors extracted from our models while processing a pair of clinical sentences to identify where and how representations of semantic similarity are built in transformer models. Methods Given a clinical sentence pair, we take the average predicted similarity score across several independently fine-tuned transformers. In our model analysis we investigated the relationship between the final model’s loss and surface features of the sentence pairs and assessed the decodability and representational similarity of the token vectors generated by each model. Results Our model achieved a correlation of 0.87 with the ground-truth similarity score, reaching 6th place out of 33 teams (with a first-place score of 0.90). In detailed qualitative and quantitative analyses of the model’s loss, we identified the system’s failure to correctly model semantic similarity when both sentence pairs contain details of medical prescriptions, as well as its general tendency to overpredict semantic similarity given significant token overlap. The token vector analysis revealed divergent representational strategies for predicting textual similarity between bidirectional encoder representations from transformers (BERT)–style models and XLNet. We also found that a large amount information relevant to predicting STS can be captured using a combination of a classification token and the cosine distance between sentence-pair representations in the first layer of a transformer model that did not produce the best predictions on the test set. Conclusions We designed and trained a system that uses state-of-the-art NLP models to achieve very competitive results on a new clinical STS data set. As our approach uses no hand-crafted rules, it serves as a strong deep learning baseline for this task. Our key contribution is a detailed analysis of the model’s outputs and an investigation of the heuristic biases learned by transformer models. We suggest future improvements based on these findings. In our representational analysis we explore how different transformer models converge or diverge in their representation of semantic signals as the tokens of the sentences are augmented by successive layers. This analysis sheds light on how these “black box” models integrate semantic similarity information in intermediate layers, and points to new research directions in model distillation and sentence embedding extraction for applications in clinical NLP.


2020 ◽  
Vol 3 (4) ◽  
Author(s):  
Adi Sutrisno ◽  

Google Translate is a free and practical online translation service that allows millions of people around the globe to translate words, phrases, sentences, and paragraphs into an intended target language. However, in 2015, some Google Translate users in Indonesia, filed complaints, asserting that the machine was often inaccurate, speculating that it could only translate languages at the micro-level of words and phrases, rather than complete sentences or paragraphs. This research works to examine the accuracy as well as the shortcomings of Google Translate, in the context of English to Indonesian translations, in order to critically engage the complaints made by Google users. For the purpose of this study, 80 English sentences were translated using Google Translate and assessed for accuracy using a table adapted from Memsource criteria. Both the original sentences and their translated versions were analyzed using a sentence pair matrix to determine the machine’s failings and areas for improvement. The results challenged those initial speculations which suggested Google Translate is only effective with words and phrases. On the contrary, Memsource proved to be a useful tool in demonstrating a reasonable level of accuracy, accurately translating 60.37% of Indonesian-English sentences and vice versa.


2020 ◽  
Author(s):  
Adi Sutrisno

Google Translate is a free and practical online translation service that allows millions of people around the globe to translate words, phrases, sentences, and paragraphs into an intended target language. However, in 2015, some Google Translate users in Indonesia, filed complaints, asserting that the machine was often inaccurate, speculating that it could only translate languages at the micro-level of words and phrases, rather than complete sentences or paragraphs. This research works to examine the accuracy as well as the shortcomings of Google Translate, in the context of English to Indonesian translations, in order to critically engage the complaints made by Google users. For the purpose of this study, 80 English sentences were translated using Google Translate and assessed for accuracy using a table adapted from Memsource criteria. Both the original sentences and their translated versions were analyzed using a sentence pair matrix to determine the machine’s failings and areas for improvement. The results challenged those initial speculations which suggested Google Translate is only effective with words and phrases. On the contrary, Memsource proved to be a useful tool in demonstrating a reasonable level of accuracy, accurately translating 60.37% of Indonesian-English sentences and vice versa.


2020 ◽  
Author(s):  
Adi Sutrisno

Google Translate is a free and practical online translation service that allows millions of people around the globe to translate words, phrases, sentences, and paragraphs into an intended target language. However, in 2015, some Google Translate users in Indonesia, filed complaints, asserting that the machine was often inaccurate, speculating that it could only translate languages at the micro-level of words and phrases, rather than complete sentences or paragraphs. This research works to examine the accuracy as well as the shortcomings of Google Translate, in the context of English to Indonesian translations, in order to critically engage the complaints made by Google users. For the purpose of this study, 80 English sentences were translated using Google Translate and assessed for accuracy using a table adapted from Memsource criteria. Both the original sentences and their translated versions were analyzed using a sentence pair matrix to determine the machine’s failings and areas for improvement. The results challenged those initial speculations which suggested Google Translate is only effective with words and phrases. On the contrary, Memsource proved to be a useful tool in demonstrating a reasonable level of accuracy, accurately translating 60.37% of Indonesian-English sentences and vice versa.


Sign in / Sign up

Export Citation Format

Share Document