A NEW FORMAL DEFINITION OF LANGUAGE FOR NATURAL LANGUAGE PROCESSING

Author(s):  
Yi WANG ◽  
Xiaojing WANG
2008 ◽  
Vol 34 (4) ◽  
pp. 597-614 ◽  
Author(s):  
Trevor Cohn ◽  
Chris Callison-Burch ◽  
Mirella Lapata

Automatic paraphrasing is an important component in many natural language processing tasks. In this article we present a new parallel corpus with paraphrase annotations. We adopt a definition of paraphrase based on word alignments and show that it yields high inter-annotator agreement. As Kappa is suited to nominal data, we employ an alternative agreement statistic which is appropriate for structured alignment tasks. We discuss how the corpus can be usefully employed in evaluating paraphrase systems automatically (e.g., by measuring precision, recall, and F1) and also in developing linguistically rich paraphrase models based on syntactic structure.


Author(s):  
Virginie Goepp ◽  
Nada Matta ◽  
Emmanuel Caillaud ◽  
Françoise Feugeas

AbstractCommunity of Practice (CoP) efficiency evaluation is a great deal in research. Indeed, having the possibility to know if a given CoP is successful or not is essential to better manage it over time. The existing approaches for efficiency evaluation are difficult and time-consuming to put into action on real CoPs. They require either to evaluate subjective constructs making the analysis unreliable, either to work out a knowledge interaction matrix that is difficult to set up. However, these approaches build their evaluation on the fact that a CoP is successful if knowledge is exchanged between the members. It is the case if there are some interactions between the actors involved in the CoP. Therefore, we propose to analyze these interactions through the exchanges of emails thanks to Natural Language Processing. Our approach is systematic and semi-automated. It requires the e-mails exchanged and the definition of the speech-acts that will be retrieved. We apply it on a real project-based CoP: the SEPOLBE research project that involves different expertise fields. It allows us to identify the CoP core group and to emphasize learning processes between members with different backgrounds (Microbiology, Electrochemistry and Civil engineering).


Algorithms ◽  
2020 ◽  
Vol 13 (10) ◽  
pp. 262
Author(s):  
Fabio Massimo Zanzotto ◽  
Giorgio Satta ◽  
Giordano Cristini

Parsing is a key task in computer science, with applications in compilers, natural language processing, syntactic pattern matching, and formal language theory. With the recent development of deep learning techniques, several artificial intelligence applications, especially in natural language processing, have combined traditional parsing methods with neural networks to drive the search in the parsing space, resulting in hybrid architectures using both symbolic and distributed representations. In this article, we show that existing symbolic parsing algorithms for context-free languages can cross the border and be entirely formulated over distributed representations. To this end, we introduce a version of the traditional Cocke–Younger–Kasami (CYK) algorithm, called distributed (D)-CYK, which is entirely defined over distributed representations. D-CYK uses matrix multiplication on real number matrices of a size independent of the length of the input string. These operations are compatible with recurrent neural networks. Preliminary experiments show that D-CYK approximates the original CYK algorithm. By showing that CYK can be entirely performed on distributed representations, we open the way to the definition of recurrent layer neural networks that can process general context-free languages.


Terminology ◽  
2010 ◽  
Vol 16 (1) ◽  
pp. 30-50 ◽  
Author(s):  
Anne Condamines

The study of variation in terminology came to the fore over the last fifteen years in connection with advances in textual terminology. This new approach to terminology could be a way of improving the management of risk related to language use in the workplace and to contribute to the definition of a “linguistics of the workplace”. As a theoretical field of study, linguistics has hardly found any application in the workplace. Two of its applied branches, however, Sociolinguistics and Natural Language Processing (NLP) are relevant. Both deal with lexical phenomena, — i.e. terminology — sociolinguistics taking into account very subtle inter-individual variations and NLP being more interested in stability in the use. So, taking into account variations in building terminologies could be a means of considering both description and prescription, use and norm. This approach to terminology, which has been made possible thanks to NLP and Knowledge Engineering could be a way of meeting needs in the workplace concerning risk management related to language use.


2020 ◽  
pp. 3-17
Author(s):  
Peter Nabende

Natural Language Processing for under-resourced languages is now a mainstream research area. However, there are limited studies on Natural Language Processing applications for many indigenous East African languages. As a contribution to covering the current gap of knowledge, this paper focuses on evaluating the application of well-established machine translation methods for one heavily under-resourced indigenous East African language called Lumasaaba. Specifically, we review the most common machine translation methods in the context of Lumasaaba including both rule-based and data-driven methods. Then we apply a state of the art data-driven machine translation method to learn models for automating translation between Lumasaaba and English using a very limited data set of parallel sentences. Automatic evaluation results show that a transformer-based Neural Machine Translation model architecture leads to consistently better BLEU scores than the recurrent neural network-based models. Moreover, the automatically generated translations can be comprehended to a reasonable extent and are usually associated with the source language input.


Sign in / Sign up

Export Citation Format

Share Document