scholarly journals Towards equivalence links between senses in plWordNet and Princeton WordNet

2017 ◽  
Vol 13 (1) ◽  
Author(s):  
Ewa Rudnicka ◽  
Francis Bond ◽  
Łukasz Grabowski ◽  
Maciej Piasecki ◽  
Tadeusz Piotrowski

AbstractThe paper focuses on the issue of creating equivalence links in the domain of bilingual computational lexicography. The existing interlingual links between plWordNet and Princeton WordNet synsets (sets of synonymous lexical units – lemma and sense pairs) are re-analysed from the perspective of equivalence types as defined in traditional lexicography and translation. Special attention is paid to cognitive and translational equivalents. A proposal of mapping lexical units is presented. Three types of links are defined: super-strong equivalence, strong equivalence and weak implied equivalence. The strong equivalences have a common set of formal, semantic and usage features, with some of their values slightly loosened for strong equivalence. These will be introduced manually by trained lexicographers. The sense-mapping will partly draw on the results of the existing synset mapping. The lexicographers will analyse lists of pairs of synsets linked by interlingual relations such as synonymy, partial synonymy, hyponymy and hypernymy. They will also consult bilingual dictionaries and check translation probabilities in a parallel corpus. The results of the proposed mapping have great application potential in the area of natural language processing, translation and language learning.

2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Fridah Katushemererwe ◽  
Andrew Caines ◽  
Paula Buttery

AbstractThis paper describes an endeavour to build natural language processing (NLP) tools for Runyakitara, a group of four closely related Bantu languages spoken in western Uganda. In contrast with major world languages such as English, for which corpora are comparatively abundant and NLP tools are well developed, computational linguistic resources for Runyakitara are in short supply. First therefore, we need to collect corpora for these languages, before we can proceed to the design of a spell-checker, grammar-checker and applications for computer-assisted language learning (CALL). We explain how we are collecting primary data for a new Runya Corpus of speech and writing, we outline the design of a morphological analyser, and discuss how we can use these new resources to build NLP tools. We are initially working with Runyankore–Rukiga, a closely-related pair of Runyakitara languages, and we frame our project in the context of NLP for low-resource languages, as well as CALL for the preservation of endangered languages. We put our project forward as a test case for the revitalization of endangered languages through education and technology.


2008 ◽  
Vol 34 (4) ◽  
pp. 597-614 ◽  
Author(s):  
Trevor Cohn ◽  
Chris Callison-Burch ◽  
Mirella Lapata

Automatic paraphrasing is an important component in many natural language processing tasks. In this article we present a new parallel corpus with paraphrase annotations. We adopt a definition of paraphrase based on word alignments and show that it yields high inter-annotator agreement. As Kappa is suited to nominal data, we employ an alternative agreement statistic which is appropriate for structured alignment tasks. We discuss how the corpus can be usefully employed in evaluating paraphrase systems automatically (e.g., by measuring precision, recall, and F1) and also in developing linguistically rich paraphrase models based on syntactic structure.


2003 ◽  
Vol 17 (5) ◽  
Author(s):  
Anne Vandeventer Faltin

This paper illustrates the usefulness of natural language processing (NLP) tools for computer assisted language learning (CALL) through the presentation of three NLP tools integrated within a CALL software for French. These tools are (i) a sentence structure viewer; (ii) an error diagnosis system; and (iii) a conjugation tool. The sentence structure viewer helps language learners grasp the structure of a sentence, by providing lexical and grammatical information. This information is derived from a deep syntactic analysis. Two different outputs are presented. The error diagnosis system is composed of a spell checker, a grammar checker, and a coherence checker. The spell checker makes use of alpha-codes, phonological reinterpretation, and some ad hoc rules to provide correction proposals. The grammar checker employs constraint relaxation and phonological reinterpretation as diagnosis techniques. The coherence checker compares the underlying "semantic" structures of a stored answer and of the learners' input to detect semantic discrepancies. The conjugation tool is a resource with enhanced capabilities when put on an electronic format, enabling searches from inflected and ambiguous verb forms.


2011 ◽  
Vol 4 (3) ◽  
Author(s):  
Treveur Bretaudière ◽  
Samuel Cruz-Lara ◽  
Lina María Rojas Barahona

We present our current research activities associating automatic natural language processing to serious games and virtual worlds. Several interesting scenarios have been developed: language learning, natural language generation, multilingual information, emotion detection, real-time translations, and non-intrusive access to linguistic information such as definitions or synonyms. Part of our work has contributed to the specification of the Multi Lingual Information Framework [ISO FDIS 24616], (MLIF,2011). Standardization will grant stability,  interoperability and sustainability of an important part of our research activities, in particular, in the framework of representing and managing multilingual textual information.


Author(s):  
Ming-Shin Lu ◽  
◽  
Yu-Chun Wang ◽  
Jen-Hsiang Lin ◽  
Chao-Lin Liu ◽  
...  

Using techniques of natural language processing to assist the preparation of educational resources for language learning has become an important field. We report two software systems that are designed for assisting the tasks of test item translation and test item authoring. We built a software environment to help experts translate the test items for the Trends in International Mathematics and Science Study (TIMSS). Test items of TIMSS are prepared in American English and will be translated to traditional Chinese. We also built a software environment for composing test items for introductory Chinese courses. The system currently aids the preparation of four important categories of test items, and the resulting test items can be administrated on the Internet.


2018 ◽  
Vol 18 (1) ◽  
pp. 18-24
Author(s):  
Sri Reski Anita Muhsini

Implementasi pengukuran kesamaan semantik memiliki peran yang sangat penting dalam beberapa bidang Natural Language Processing (NLP), dimana hasilnya seringkali dijadikan dasar dalam melakukan task NLP yang lebih lanjut. Salah satu penerapannya yaitu dengan melakukan pengukuran kesamaan semantik multibahasa antar kata. Pengukuran ini dilatarbelakangi oleh suatu masalah dimana saat ini banyak sistem pencarian informasi yang harus berurusan dengan teks atau dokumen multibahasa. Sepasang kata dinyatakan memiliki kesamaan semantik jika pasangan kata tersebut memiliki kesamaan dari sisi makna atau konsep. Pada penelitian ini, diimplementasikan perhitungan kesamaan semantik antar kata pada bahasa yang berbeda yaitu bahasa Inggris dan bahasa Spanyol. Korpus yang digunakan pada penelitian ini yakni Europarl Parallel Corpus pada bahasa Inggris dan bahasa Spanyol. Konteks kata bersumber dari Swadesh list, serta hasil dari kesamaan semantiknya dibandingkan dengan datasetGold Standard SemEval 2017 Crosslingual Semantic Similarity untuk diukur nilai korelasinya. Hasil pengujian yang didapat terlihat bahwa pengukuran metode PMI mampu menghasilkan korelasi sebesar 0,5781 untuk korelasi Pearson dan 0.5762 untuk korelasi Spearman. Dari hasil penelitian dapat disimpulkan bahwa Implementasi pengukuran Crosslingual Semantic Similarity menggunakan metode Pointwise Mutual Information (PMI) mampu menghasilkan korelasi terbaik. Peneliti merekomendasikan pada penelitian selanjutnya dapat dilakukan dengan menggunakan dataset lain untuk membuktikan seberapa efektif metode pengukuran Poitnwise Mutual Information (PMI) dalam mengukur Crosslingual Semantic Similarity antar kata.


2021 ◽  
pp. 50-57
Author(s):  
А. Катинская ◽  
Ж. Хоу ◽  
Р. Янгарбер

Одна из перспективных областей компьютерной лингвистики – разработка образовательных приложений. В данной статье на примере системы «Ревита» показано, как инструменты для автоматического анализа текста могут быть использованы при создании сервиса для изучения языка. «Ревита» разрабатывается в Хельсинкском университете и представляет собой инструмент для автоматического создания грамматических упражнений на основе текстов, которые преподаватель или сам пользователь загружает в систему. «Ревита» предназначена для студентов среднего или продвинутого уровня; при этом упражнения подбираются для студентов с учетом их уровня подготовленности – для этого система анализирует данные о выполнении заданий каждым учеником. «Ревита» предоставляет инструменты для включения учеников в группы, с которыми учитель может легко делиться материалами, упражнениями. Также для учителя доступен режим анализа успеваемости учеников. Developing educational applications of one the promising areas of Computational Linguistics. In this article, we show how tools for automatic natural language processing can be used to create a system for language learning. Revita is being developed at the University of Helsinki. It is a tool for automatically creating exercises based on texts that the teacher or the learner can upload to the system. Revita is intended for intermediate or advanced students. Exercises are automatically selected for students considering their level of language competence. For this, the system analyzes data on the performance of tasks by each student. Revita also provides tools for including students in groups with which the teacher can easily share materials, exercises, as well as easily track a student progress.


Sign in / Sign up

Export Citation Format

Share Document