Terminology
Latest Publications


TOTAL DOCUMENTS

598
(FIVE YEARS 53)

H-INDEX

16
(FIVE YEARS 2)

Published By John Benjamins Publishing Company

0929-9971

Terminology ◽  
2022 ◽  
Author(s):  
Ayla Rigouts Terryn ◽  
Véronique Hoste ◽  
Els Lefever

Abstract As with many tasks in natural language processing, automatic term extraction (ATE) is increasingly approached as a machine learning problem. So far, most machine learning approaches to ATE broadly follow the traditional hybrid methodology, by first extracting a list of unique candidate terms, and classifying these candidates based on the predicted probability that they are valid terms. However, with the rise of neural networks and word embeddings, the next development in ATE might be towards sequential approaches, i.e., classifying each occurrence of each token within its original context. To test the validity of such approaches for ATE, two sequential methodologies were developed, evaluated, and compared: one feature-based conditional random fields classifier and one embedding-based recurrent neural network. An additional comparison was added with a machine learning interpretation of the traditional approach. All systems were trained and evaluated on identical data in multiple languages and domains to identify their respective strengths and weaknesses. The sequential methodologies were proven to be valid approaches to ATE, and the neural network even outperformed the more traditional approach. Interestingly, a combination of multiple approaches can outperform all of them separately, showing new ways to push the state-of-the-art in ATE.


Terminology ◽  
2022 ◽  
Author(s):  
Paolo Frassi

Abstract We propose to identify, for the French language, the senses and subsenses of travail in the field of international commerce. We also intend to present the main weak idioms containing this form, from a corpus that has been constituted ex novo in the framework of the DIACOM-fr project (Department of Foreign Languages, University of Verona), part of the Excellence Project “Le Digital Humanities applicate alle lingue e letterature straniere” (“Digital Humanities applied to foreign modern languages and literatures”). The senses and subsenses as well as the weak idioms, classified on the basis of a number of semantic labels, will be represented in a draft of terminological network.


Terminology ◽  
2021 ◽  
Author(s):  
Marta García González

Abstract The paper discusses the main results of an analysis of Spanish accounting terminology, based on the exploitation of three different corpora. The analysis was aimed at measuring the level of terminology variation in Spanish accounting and at assessing the suitability of accounting standards and companies’ financial statements for terminology extraction in the translation of accounting texts. The results evidence a terminological variation of around 25% in international accounting standards and a considerable lack of consistency in the use of accounting terminology in the financial statements of Spanish companies, both in the Spanish originals and in their English translations.


Terminology ◽  
2021 ◽  
Author(s):  
Belén López Arroyo ◽  
Lucía Sanz Valdivieso

Abstract Specialized genres are bound to the communicative context of their discourse community. However, certain genres extend beyond one specific domain, remaining unchanged at different linguistic levels across domains. That seems to be the case of wine and olive oil tasting notes since both analyze and evaluate sensory descriptions. The present study aims at describing and comparing lexical chunks of wine and olive oil tasting notes at a semantic level to show if there is variation in the same genre across domains; we will not only describe, classify and compare lexical chunks, but also identify the way this knowledge is structured and construed in the same genre in both domains. We will test our methodology in a corpus of English tasting notes from both genres written by three different writer profiles: professionals, amateurs and wineries/mills. Our results will be useful for scholars as well as technical writers when writing tasting notes.


Terminology ◽  
2021 ◽  
Author(s):  
Gisle Andersen

Abstract The development of terminologies for domains where these are lacking is a time-consuming and costly task. This article takes a methodological perspective and addresses a general methodological question: how can we, with limited funding, utilise to a maximal degree, existing language resources to create a terminology at a relatively low cost? Although an important player in the maritime industries for many centuries, Norway has not prioritised the systematic development of an official maritime terminology. The article therefore focuses specifically on efforts to develop a national resource for maritime domains. The article describes efforts to create a corpus of popular science and a parallel corpus of technical texts. Six different term extraction methods are applied. These include corpus-based statistical analyses of frequency, collocation and keyness, as well as bilingual term extraction. Finally, the pros and cons of each method are evaluated by means of a cost-benefit analysis.


Terminology ◽  
2021 ◽  
Vol 27 (2) ◽  
pp. 219-253
Author(s):  
Natalia Rivas ◽  
Gabriel Quiroz ◽  
John Jairo Giraldo

Abstract This paper analyzes nested-abbreviated terms from a linguistic perspective by describing their morphological, syntactic, and semantic features for terminology purposes. Nested-abbreviated terms can be considered as abbreviated forms, either initialisms or acronyms, which have within their meaning another abbreviated term. To carry out the analysis, 433 nested-abbreviated terms were extracted from two specialized dictionaries in English. Data analysis showed that, from the morphological and semantic perspective, nested-abbreviated terms behave like typical abbreviations. Important differences were found from a syntactic standpoint where nested abbreviated terms behave as premodifiers in the noun phrase (NP) in 98.93% of the cases. As this is the first time nested-abbreviated terms are studied, they were not only described but also analyzed and defined. Although the percentage of nested-abbreviated terms obtained from the dictionaries is relatively low, less than 1% of total abbreviations, it was found that it is highly relevant to study this growing phenomenon in specialized languages for terminology extraction, as well as for other purposes.


Terminology ◽  
2021 ◽  
Author(s):  
Ayla Rigouts Terryn ◽  
Véronique Hoste ◽  
Els Lefever

Abstract Automatic term extraction (ATE) is an important task within natural language processing, both separately, and as a preprocessing step for other tasks. In recent years, research has moved far beyond the traditional hybrid approach where candidate terms are extracted based on part-of-speech patterns and filtered and sorted with statistical termhood and unithood measures. While there has been an explosion of different types of features and algorithms, including machine learning methodologies, some of the fundamental problems remain unsolved, such as the ambiguous nature of the concept “term”. This has been a hurdle in the creation of data for ATE, meaning that datasets for both training and testing are scarce, and system evaluations are often limited and rarely cover multiple languages and domains. The ACTER Annotated Corpora for Term Extraction Research contain manual term annotations in four domains and three languages and have been used to investigate a supervised machine learning approach for ATE, using a binary random forest classifier with multiple types of features. The resulting system (HAMLET Hybrid Adaptable Machine Learning approach to Extract Terminology) provides detailed insights into its strengths and weaknesses. It highlights a certain unpredictability as an important drawback of machine learning methodologies, but also shows how the system appears to have learnt a robust definition of terms, producing results that are state-of-the-art, and contain few errors that are not (part of) terms in any way. Both the amount and the relevance of the training data have a substantial effect on results, and by varying the training data, it appears to be possible to adapt the system to various desired outputs, e.g., different types of terms. While certain issues remain difficult – such as the extraction of rare terms and multiword terms – this study shows how supervised machine learning is a promising methodology for ATE.


Terminology ◽  
2021 ◽  
Author(s):  
Oi Yee Kwong

Abstract In this paper, we address the system evaluation issue for commercial term extraction tools from the users’ perspective. We first revisit the gold standard approach commonly practised among researchers, and discuss the challenges it may pose on end users, taking translators as a typical example. Considering the very different motivations and needs of users and researchers, a user-driven approach is proposed as a variation and alternative to the gold standard approach to allow users to assess and understand the performance of commercial tools more objectively. Its feasibility and usefulness are demonstrated by deploying a benchmarking dataset of English-Chinese financial terms, produced by multiple annotators, in a case study with SDL MultiTerm Extract. The results also provide insight for future development of term extractors designed for translators, which will hopefully generate more accurate candidates, offer more customised features, enable better user experience, and enjoy wider popularity as a computer-aided translation tool.


Sign in / Sign up

Export Citation Format

Share Document