scholarly journals Unsupervised Controllable Text Formalization

Author(s):  
Parag Jain ◽  
Abhijit Mishra ◽  
Amar Prakash Azad ◽  
Karthik Sankaranarayanan

We propose a novel framework for controllable natural language transformation. Realizing that the requirement of parallel corpus is practically unsustainable for controllable generation tasks, an unsupervised training scheme is introduced. The crux of the framework is a deep neural encoder-decoder that is reinforced with text-transformation knowledge through auxiliary modules (called scorers). These scorers, based on off-the-shelf language processing tools, decide the learning scheme of the encoder-decoder based on its actions. We apply this framework for the text-transformation task of formalizing an input text by improving its readability grade; the degree of required formalization can be controlled by the user at run-time. Experiments on public datasets demonstrate the efficacy of our model towards: (a) transforming a given text to a more formal style, and (b) varying the amount of formalness in the output text based on the specified input control. Our code and datasets are released for academic use.

2008 ◽  
Vol 34 (4) ◽  
pp. 597-614 ◽  
Author(s):  
Trevor Cohn ◽  
Chris Callison-Burch ◽  
Mirella Lapata

Automatic paraphrasing is an important component in many natural language processing tasks. In this article we present a new parallel corpus with paraphrase annotations. We adopt a definition of paraphrase based on word alignments and show that it yields high inter-annotator agreement. As Kappa is suited to nominal data, we employ an alternative agreement statistic which is appropriate for structured alignment tasks. We discuss how the corpus can be usefully employed in evaluating paraphrase systems automatically (e.g., by measuring precision, recall, and F1) and also in developing linguistically rich paraphrase models based on syntactic structure.


2017 ◽  
Vol 13 (1) ◽  
Author(s):  
Ewa Rudnicka ◽  
Francis Bond ◽  
Łukasz Grabowski ◽  
Maciej Piasecki ◽  
Tadeusz Piotrowski

AbstractThe paper focuses on the issue of creating equivalence links in the domain of bilingual computational lexicography. The existing interlingual links between plWordNet and Princeton WordNet synsets (sets of synonymous lexical units – lemma and sense pairs) are re-analysed from the perspective of equivalence types as defined in traditional lexicography and translation. Special attention is paid to cognitive and translational equivalents. A proposal of mapping lexical units is presented. Three types of links are defined: super-strong equivalence, strong equivalence and weak implied equivalence. The strong equivalences have a common set of formal, semantic and usage features, with some of their values slightly loosened for strong equivalence. These will be introduced manually by trained lexicographers. The sense-mapping will partly draw on the results of the existing synset mapping. The lexicographers will analyse lists of pairs of synsets linked by interlingual relations such as synonymy, partial synonymy, hyponymy and hypernymy. They will also consult bilingual dictionaries and check translation probabilities in a parallel corpus. The results of the proposed mapping have great application potential in the area of natural language processing, translation and language learning.


2021 ◽  
pp. 147387162110388
Author(s):  
Mohammad Alharbi ◽  
Matthew Roach ◽  
Tom Cheesman ◽  
Robert S Laramee

In general, Natural Language Processing (NLP) algorithms exhibit black-box behavior. Users input text and output are provided with no explanation of how the results are obtained. In order to increase understanding and trust, users value transparent processing which may explain derived results and enable understanding of the underlying routines. Many approaches take an opaque approach by default when designing NLP tools and do not incorporate a means to steer and manipulate the intermediate NLP steps. We present an interactive, customizable, visual framework that enables users to observe and participate in the NLP pipeline processes, explicitly manipulate the parameters of each step, and explore the result visually based on user preferences. The visible NLP (VNLP) pipeline design is then applied to a text similarity application to demonstrate the utility and advantages of a visible and transparent NLP pipeline in supporting users to understand and justify both the process and results. We also report feedback on our framework from a modern languages expert.


2018 ◽  
Vol 18 (1) ◽  
pp. 18-24
Author(s):  
Sri Reski Anita Muhsini

Implementasi pengukuran kesamaan semantik memiliki peran yang sangat penting dalam beberapa bidang Natural Language Processing (NLP), dimana hasilnya seringkali dijadikan dasar dalam melakukan task NLP yang lebih lanjut. Salah satu penerapannya yaitu dengan melakukan pengukuran kesamaan semantik multibahasa antar kata. Pengukuran ini dilatarbelakangi oleh suatu masalah dimana saat ini banyak sistem pencarian informasi yang harus berurusan dengan teks atau dokumen multibahasa. Sepasang kata dinyatakan memiliki kesamaan semantik jika pasangan kata tersebut memiliki kesamaan dari sisi makna atau konsep. Pada penelitian ini, diimplementasikan perhitungan kesamaan semantik antar kata pada bahasa yang berbeda yaitu bahasa Inggris dan bahasa Spanyol. Korpus yang digunakan pada penelitian ini yakni Europarl Parallel Corpus pada bahasa Inggris dan bahasa Spanyol. Konteks kata bersumber dari Swadesh list, serta hasil dari kesamaan semantiknya dibandingkan dengan datasetGold Standard SemEval 2017 Crosslingual Semantic Similarity untuk diukur nilai korelasinya. Hasil pengujian yang didapat terlihat bahwa pengukuran metode PMI mampu menghasilkan korelasi sebesar 0,5781 untuk korelasi Pearson dan 0.5762 untuk korelasi Spearman. Dari hasil penelitian dapat disimpulkan bahwa Implementasi pengukuran Crosslingual Semantic Similarity menggunakan metode Pointwise Mutual Information (PMI) mampu menghasilkan korelasi terbaik. Peneliti merekomendasikan pada penelitian selanjutnya dapat dilakukan dengan menggunakan dataset lain untuk membuktikan seberapa efektif metode pengukuran Poitnwise Mutual Information (PMI) dalam mengukur Crosslingual Semantic Similarity antar kata.


Author(s):  
Noriko Ito ◽  
◽  
Toru Sugimoto ◽  
Yusuke Takahashi ◽  
Shino Iwashita ◽  
...  

We propose two computational models - one of a language within context based on systemic functional linguistic theory and one of context-sensitive language understanding. The model of a language within context called the Semiotic Base characterizes contextual, semantic, lexicogrammatical, and graphological aspects of input texts. The understanding process is divided into shallow and deep analyses. Shallow analysis consists of morphological and dependency analyses and word concept and case relation assignment, mainly by existing natural language processing tools and machine-readable dictionaries. Results are used to detect the contextual configuration of input text in contextual analysis. This is followed by deep analyses of lexicogrammar, semantics, and concepts, conducted by referencing a subset of resources related to the detected context. Our proposed models have been implemented in Java and verified by integrating them into such applications as dialog-based question-and-answer (Q&A).


2020 ◽  
Vol 34 (03) ◽  
pp. 3041-3048 ◽  
Author(s):  
Chuxu Zhang ◽  
Huaxiu Yao ◽  
Chao Huang ◽  
Meng Jiang ◽  
Zhenhui Li ◽  
...  

Knowledge graphs (KGs) serve as useful resources for various natural language processing applications. Previous KG completion approaches require a large number of training instances (i.e., head-tail entity pairs) for every relation. The real case is that for most of the relations, very few entity pairs are available. Existing work of one-shot learning limits method generalizability for few-shot scenarios and does not fully use the supervisory information; however, few-shot KG completion has not been well studied yet. In this work, we propose a novel few-shot relation learning model (FSRL) that aims at discovering facts of new relations with few-shot references. FSRL can effectively capture knowledge from heterogeneous graph structure, aggregate representations of few-shot references, and match similar entity pairs of reference set for every relation. Extensive experiments on two public datasets demonstrate that FSRL outperforms the state-of-the-art.


2021 ◽  
pp. 1-31
Author(s):  
Miroslav Blšták ◽  
Viera Rozinajová

Abstract Automatic question generation is one of the most challenging tasks of Natural Language Processing. It requires “bidirectional” language processing: first, the system has to understand the input text (Natural Language Understanding), and it then has to generate questions also in the form of text (Natural Language Generation). In this article, we introduce our framework for generating the factual questions from unstructured text in the English language. It uses a combination of traditional linguistic approaches based on sentence patterns with several machine learning methods. We first obtain lexical, syntactic and semantic information from an input text, and we then construct a hierarchical set of patterns for each sentence. The set of features is extracted from the patterns, and it is then used for automated learning of new transformation rules. Our learning process is totally data-driven because the transformation rules are obtained from a set of initial sentence–question pairs. The advantages of this approach lie in a simple expansion of new transformation rules which allows us to generate various types of questions and also in the continuous improvement of the system by reinforcement learning. The framework also includes a question evaluation module which estimates the quality of generated questions. It serves as a filter for selecting the best questions and eliminating incorrect ones or duplicates. We have performed several experiments to evaluate the correctness of generated questions, and we have also compared our system with several state-of-the-art systems. Our results indicate that the quality of generated questions outperforms the state-of-the-art systems and our questions are also comparable to questions created by humans. We have also created and published an interface with all created data sets and evaluated questions, so it is possible to follow up on our work.


Author(s):  
Nibedita Roy ◽  
Apurbalal Senapati

Machine Translation (MT) is the process of automatically converting one natural language into another, preserving the exact meaning of the input text to the output text. It is one of the classical problems in the Natural Language Processing (NLP) domain and there is a wide application in our daily life. Though the research in MT in English and some other language is relatively in an advanced stage, but for most of the languages, it is far from the human-level performance in the translation task. From the computational point of view, for MT a lot of preprocessing and basic NLP tools and resources are needed. This study gives an overview of the available basic NLP resources in the context of Assamese-English machine translation.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Zhenyu Yang ◽  
Lei Wang ◽  
Bo Ma ◽  
Yating Yang ◽  
Rui Dong ◽  
...  

Extracting entities and relations from unstructured sentences is one of the most concerned tasks in the field of natural language processing. However, most existing works process entity and relation information in a certain order and suffer from the error iteration. In this paper, we introduce a relational triplet joint tagging network (RTJTN), which is divided into joint entities and relations tagging layer and relational triplet judgment layer. In the joint tagging layer, instead of extracting entity and relation separately, we propose a tagging method that allows the model to simultaneously extract entities and relations in unstructured sentences to prevent the error iteration; and, in order to solve the relation overlapping problem, we propose a relational triplet judgment network to judge the correct triples among the group of triples with the same relation in a sentence. In the experiment, we evaluate our network on the English public dataset NYT and the Chinese public datasets DuIE 2.0 and CMED. The F1 score of our model is improved by 1.1, 6.0, and 5.1 compared to the best baseline model on NYT, DuIE 2.0, and CMED datasets, respectively. In-depth analysis of the model’s performance on overlapping problems and sentence complexity problems shows that our model has different gains in all cases.


Author(s):  
Yue Yuan ◽  
Xiaofei Zhou ◽  
Shirui Pan ◽  
Qiannan Zhu ◽  
Zeliang Song ◽  
...  

Joint extraction of entities and relations is an important task in natural language processing (NLP), which aims to capture all relational triplets from plain texts. This is a big challenge due to some of the triplets extracted from one sentence may have overlapping entities. Most existing methods perform entity recognition followed by relation detection between every possible entity pairs, which usually suffers from numerous redundant operations. In this paper, we propose a relation-specific attention network (RSAN) to handle the issue. Our RSAN utilizes relation-aware attention mechanism to construct specific sentence representations for each relation, and then performs sequence labeling to extract its corresponding head and tail entities. Experiments on two public datasets show that our model can effectively extract overlapping triplets and achieve state-of-the-art performance.


Sign in / Sign up

Export Citation Format

Share Document