paraphrase generation
Recently Published Documents


TOTAL DOCUMENTS

88
(FIVE YEARS 69)

H-INDEX

5
(FIVE YEARS 1)

2022 ◽  
Vol 12 (1) ◽  
pp. 499
Author(s):  
Ying Zhou ◽  
Xiaokang Hu ◽  
Vera Chung

Paraphrase detection and generation are important natural language processing (NLP) tasks. Yet the term paraphrase is broad enough to include many fine-grained relations. This leads to different tolerance levels of semantic divergence in the positive paraphrase class among publicly available paraphrase datasets. Such variation can affect the generalisability of paraphrase classification models. It may also impact the predictability of paraphrase generation models. This paper presents a new model which can use few corpora of fine-grained paraphrase relations to construct automatically using language inference models. The fine-grained sentence level paraphrase relations are defined based on word and phrase level counterparts. We demonstrate that the fine-grained labels from our proposed system can make it possible to generate paraphrases at desirable semantic level. The new labels could also contribute to general sentence embedding techniques.


2021 ◽  
Vol 7 ◽  
pp. e759
Author(s):  
G. Thomas Hudson ◽  
Noura Al Moubayed

Multitask learning has led to significant advances in Natural Language Processing, including the decaNLP benchmark where question answering is used to frame 10 natural language understanding tasks in a single model. In this work we show how models trained to solve decaNLP fail with simple paraphrasing of the question. We contribute a crowd-sourced corpus of paraphrased questions (PQ-decaNLP), annotated with paraphrase phenomena. This enables analysis of how transformations such as swapping the class labels and changing the sentence modality lead to a large performance degradation. Training both MQAN and the newer T5 model using PQ-decaNLP improves their robustness and for some tasks improves the performance on the original questions, demonstrating the benefits of a model which is more robust to paraphrasing. Additionally, we explore how paraphrasing knowledge is transferred between tasks, with the aim of exploiting the multitask property to improve the robustness of the models. We explore the addition of paraphrase detection and paraphrase generation tasks, and find that while both models are able to learn these new tasks, knowledge about paraphrasing does not transfer to other decaNLP tasks.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Xiaoqiang Chi ◽  
Yang Xiang

Paraphrase generation is an essential yet challenging task in natural language processing. Neural-network-based approaches towards paraphrase generation have achieved remarkable success in recent years. Previous neural paraphrase generation approaches ignore linguistic knowledge, such as part-of-speech information regardless of its availability. The underlying assumption is that neural nets could learn such information implicitly when given sufficient data. However, it would be difficult for neural nets to learn such information properly when data are scarce. In this work, we endeavor to probe into the efficacy of explicit part-of-speech information for the task of paraphrase generation in low-resource scenarios. To this end, we devise three mechanisms to fuse part-of-speech information under the framework of sequence-to-sequence learning. We demonstrate the utility of part-of-speech information in low-resource paraphrase generation through extensive experiments on multiple datasets of varying sizes and genres.


Author(s):  
Ahmet Bagci ◽  
Mehmet Fatih Amasyali

Author(s):  
Zilu Guo ◽  
Zhongqiang Huang ◽  
Kenny Q. Zhu ◽  
Guandan Chen ◽  
Kaibo Zhang ◽  
...  

Paraphrase generation plays key roles in NLP tasks such as question answering, machine translation, and information retrieval. In this paper, we propose a novel framework for paraphrase generation. It simultaneously decodes the output sentence using a pretrained wordset-to-sequence model and a round-trip translation model. We evaluate this framework on Quora, WikiAnswers, MSCOCO and Twitter, and show its advantage over previous state-of-the-art unsupervised methods and distantly-supervised methods by significant margins on all datasets. For Quora and WikiAnswers, our framework even performs better than some strongly supervised methods with domain adaptation. Further, we show that the generated paraphrases can be used to augment the training data for machine translation to achieve substantial improvements.


2021 ◽  
Author(s):  
Bo Sun ◽  
Fei Zhang ◽  
Jing Yuan ◽  
Zhao Wei ◽  
Shu Ting

Abstract Background: As people prefer to obtain medical knowledge online, medical intelligence question-answer systems based on question matching have attracted more and more attention, especially in China. However, due to the lack of paraphrase corpus of medical question, the development of this field is limited.Objective: We propose a method for paraphrase generation which suitable for the Chinese medical field and use deep learning models instead of artificial evaluation for the first time. The method is designed to be able to automatically construct high quality Chinese medical paraphrase.Methods: Validation experiments were carried out on two Chinese paraphrase data (one is general data, the other is medical data). Neural machine translation is used to generated paraphrase, that is, translate a sentence into other languages, and then reverse-translate it back to the original language to get the corresponding paraphrase. BLUE, ROUGEs, are used as quantitative evaluation metrics. Three deep text matching models are used to evaluate the generated paraphrase, instead of manual. Precision, Recall, F1 and AUC are used as qualitative evaluation metrics.Results: 49908 and 4062 paraphrases were generated on the two datasets, and the generated efficiency was 97.03% and 98.38%, respectively. For the data in the two fields, the generated and original paraphrase pairs are very similar at the quantitative and qualitative evaluation metrics, especially the medical field. Take medical data as example, BLUE of generated and original paraphrase pairs are 0.556 and 0.626, respectively; the mean difference of AUC between the two groups was 0.015. Conclusions: We first propose a paraphrase generation method based on neural machine translation and use deep text matching model instead of manual evaluation to evaluate the generated paraphrase. By analyzing the evaluation metrics, it can be concluded that:the paraphrase generated method has reached or even exceeded the level of artificial construction at the semantic level, especially in medical field; the deep text matching model can replace manual evaluation and realize automated paraphrase generation. This is of great significance to the development of Chinese medical paraphrase generation.


Entropy ◽  
2021 ◽  
Vol 23 (5) ◽  
pp. 566
Author(s):  
Xiaoqiang Chi ◽  
Yang Xiang

Paraphrase generation is an important yet challenging task in natural language processing. Neural network-based approaches have achieved remarkable success in sequence-to-sequence learning. Previous paraphrase generation work generally ignores syntactic information regardless of its availability, with the assumption that neural nets could learn such linguistic knowledge implicitly. In this work, we make an endeavor to probe into the efficacy of explicit syntactic information for the task of paraphrase generation. Syntactic information can appear in the form of dependency trees, which could be easily acquired from off-the-shelf syntactic parsers. Such tree structures could be conveniently encoded via graph convolutional networks to obtain more meaningful sentence representations, which could improve generated paraphrases. Through extensive experiments on four paraphrase datasets with different sizes and genres, we demonstrate the utility of syntactic information in neural paraphrase generation under the framework of sequence-to-sequence modeling. Specifically, our graph convolutional network-enhanced models consistently outperform their syntax-agnostic counterparts using multiple evaluation metrics.


Author(s):  
Divesh Kubal ◽  
Hemant Palivela

Paraphrase Generation is one of the most important and challenging tasks in the field of Natural Language Generation. The paraphrasing techniques help to identify or to extract/generate phrases/sentences conveying the similar meaning. The paraphrasing task can be bifurcated into two sub-tasks namely, Paraphrase Identification (PI) and Paraphrase Generation (PG). Most of the existing proposed state-of-the-art systems have the potential to solve only one problem at a time. This paper proposes a light-weight unified model that can simultaneously classify whether given pair of sentences are paraphrases of each other and the model can also generate multiple paraphrases given an input sentence. Paraphrase Generation module aims to generate fluent and semantically similar paraphrases and the Paraphrase Identification systemaims to classify whether sentences pair are paraphrases of each other or not. The proposed approach uses an amalgamation of data sampling or data variety with a granular fine-tuned Text-To-Text Transfer Transformer (T5) model. This paper proposes a unified approach which aims to solve the problems of Paraphrase Identification and generation by using carefully selected data-points and a fine-tuned T5 model. The highlight of this study is that the same light-weight model trained by keeping the objective of Paraphrase Generation can also be used for solving the Paraphrase Identification task. Hence, the proposed system is light-weight in terms of the model’s size along with the data used to train the model which facilitates the quick learning of the model without having to compromise with the results. The proposed system is then evaluated against the popular evaluation metrics like BLEU (BiLingual Evaluation Understudy):, ROUGE (Recall-Oriented Understudy for Gisting Evaluation), METEOR, WER (Word Error Rate), and GLEU (Google-BLEU) for Paraphrase Generation and classification metrics like accuracy, precision, recall and F1-score for Paraphrase Identification system. The proposed model achieves state-of-the-art results on both the tasks of Paraphrase Identification and paraphrase Generation.


Author(s):  
Xiaoqiang Chi ◽  
Yang Xiang

Paraphrase generation is an important yet challenging task in NLP. Neural network-based approaches have achieved remarkable success in sequence-to-sequence(seq2seq) learning. Previous paraphrase generation work generally ignores syntactic information regardless of its availability, with the assumption that neural nets could learn such linguistic knowledge implicitly. In this work we make an endeavor to probe into the efficacy of explicit syntactic information for the task of paraphrase generation. Syntactic information can appear in the form of dependency trees which could be easily acquired from off-the-shelf syntactic parsers. Such tree structures could be conveniently encoded via graph convolutional networks(GCNs) to obtain more meaningful sentence representations, which could improve generated paraphrases. Through extensive experiments on four paraphrase datasets with different sizes and genres, we demonstrate the utility of syntactic information in neural paraphrase generation under the framework of seq2seq modeling. Specifically, our GCN-enhanced models consistently outperform their syntax-agnostic counterparts in multiple evaluation metrics.


Sign in / Sign up

Export Citation Format

Share Document