automatic term extraction
Recently Published Documents


TOTAL DOCUMENTS

46
(FIVE YEARS 11)

H-INDEX

6
(FIVE YEARS 1)

Terminology ◽  
2022 ◽  
Author(s):  
Ayla Rigouts Terryn ◽  
Véronique Hoste ◽  
Els Lefever

Abstract As with many tasks in natural language processing, automatic term extraction (ATE) is increasingly approached as a machine learning problem. So far, most machine learning approaches to ATE broadly follow the traditional hybrid methodology, by first extracting a list of unique candidate terms, and classifying these candidates based on the predicted probability that they are valid terms. However, with the rise of neural networks and word embeddings, the next development in ATE might be towards sequential approaches, i.e., classifying each occurrence of each token within its original context. To test the validity of such approaches for ATE, two sequential methodologies were developed, evaluated, and compared: one feature-based conditional random fields classifier and one embedding-based recurrent neural network. An additional comparison was added with a machine learning interpretation of the traditional approach. All systems were trained and evaluated on identical data in multiple languages and domains to identify their respective strengths and weaknesses. The sequential methodologies were proven to be valid approaches to ATE, and the neural network even outperformed the more traditional approach. Interestingly, a combination of multiple approaches can outperform all of them separately, showing new ways to push the state-of-the-art in ATE.


2021 ◽  
Vol 19 (2) ◽  
pp. 5-16
Author(s):  
E. P. Bruches ◽  
T. V. Batura

We propose a method for scientific terms extraction from the texts in Russian based on weakly supervised learning. This approach doesn't require a large amount of hand-labeled data. To implement this method we collected a list of terms in a semi-automatic way and then annotated texts of scientific articles with these terms. These texts we used to train a model. Then we used predictions of this model on another part of the text collection to extend the train set. The second model was trained on both text collections: annotated with a dictionary and by a second model. Obtained results showed that giving additional data, annotated even in an automatic way, improves the quality of scientific terms extraction.


Author(s):  
NGOC TAN LE ◽  
Fatiha Sadat

With the emergence of the neural networks-based approaches, research on information extraction has benefited from large-scale raw texts by leveraging them using pre-trained embeddings and other data augmentation techniques to deal with challenges and issues in Natural Language Processing tasks. In this paper, we propose an approach using sequence-to-sequence neural networks-based models to deal with term extraction for low-resource domain. Our empirical experiments, evaluating on the multilingual ACTER dataset provided in the LREC-TermEval 2020 shared task on automatic term extraction, proved the efficiency of deep learning approach, in the case of low-data settings, for the automatic term extraction task.


2021 ◽  
Vol 9 (1) ◽  
pp. 30-38
Author(s):  
Jurgita Mikelionienė ◽  
Jurgita Motiejūnienė

Abstract Artificial Intelligence (AI), as a multidisciplinary field, combines computer science, robotics and cognitive science, with increasingly growing applications in many diverse areas, such as engineering, business, medicine, weather forecasting, industry, translation, natural language, linguistics, etc. In Europe, interest in AI has been rising in the last decade. One of the greatest hurdles for researchers in automated processing of technical documentation is large amounts of specific terminology. The aim of this research is to analyse the semi-automatically extracted artificial intelligence-related terminology and the most common phrases related to artificial intelligence in English and Lithuanian in terms of their structure, multidisciplinarity and connotation. For selection and analysis of terms, two programmes were chosen in this study, namely SynchroTerm and SketchEngine. The paper presents the outcomes of an AI terminological project carried out with SynchroTerm and provides an analysis of a special corpus compiled in the field of artificial intelligence using the SketchEngine platform. The analysis of semi-automatic term extraction use and corpus-based techniques for artificial intelligence-related terminology revealed that AI as a specialized domain contains multidisciplinary terminology, and is complex and dynamic. The empiric data shows that the context is essential for the evaluation of the concept under analysis and reveals the different connotation of the term.


Author(s):  
Carlos Periñán Pascual ◽  
Ricardo Mairal Usón

Following previous research on automatic term extraction, the primary aim of this paper is to propose a more robust and consistent framework of analysis for the comparative evaluation of term extractors. Within the different views for software quality outlined in ISO standards, our proposal focuses on the criterion of external quality and in particular on the characteristics of functionality, usability and efficiency together with the subcharacteristics of suitability, precision, operability and time behavior. The evaluation phase is completed by comparing four online open-access automatic term extractors: TermoStat, GaleXtract, BioTex and DEXTER. This latter resource forms part of the virtual functional laboratory for natural language processing (FUNK Lab) developed by our research group. Furthermore, the results obtained from the comparative analysis are discussed.


Sign in / Sign up

Export Citation Format

Share Document