term extraction
Recently Published Documents


TOTAL DOCUMENTS

339
(FIVE YEARS 103)

H-INDEX

16
(FIVE YEARS 4)

Terminology ◽  
2022 ◽  
Author(s):  
Ayla Rigouts Terryn ◽  
Véronique Hoste ◽  
Els Lefever

Abstract As with many tasks in natural language processing, automatic term extraction (ATE) is increasingly approached as a machine learning problem. So far, most machine learning approaches to ATE broadly follow the traditional hybrid methodology, by first extracting a list of unique candidate terms, and classifying these candidates based on the predicted probability that they are valid terms. However, with the rise of neural networks and word embeddings, the next development in ATE might be towards sequential approaches, i.e., classifying each occurrence of each token within its original context. To test the validity of such approaches for ATE, two sequential methodologies were developed, evaluated, and compared: one feature-based conditional random fields classifier and one embedding-based recurrent neural network. An additional comparison was added with a machine learning interpretation of the traditional approach. All systems were trained and evaluated on identical data in multiple languages and domains to identify their respective strengths and weaknesses. The sequential methodologies were proven to be valid approaches to ATE, and the neural network even outperformed the more traditional approach. Interestingly, a combination of multiple approaches can outperform all of them separately, showing new ways to push the state-of-the-art in ATE.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Chunhua Yao ◽  
Xinyu Song ◽  
Xuelei Zhang ◽  
Weicheng Zhao ◽  
Ao Feng

Aspect-level sentiment analysis identifies the sentiment polarity of aspect terms in complex sentences, which is useful in a wide range of applications. It is a highly challenging task and attracts the attention of many researchers in the natural language processing field. In order to obtain a better aspect representation, a wide range of existing methods design complex attention mechanisms to establish the connection between entity words and their context. With the limited size of data collections in aspect-level sentiment analysis, mainly because of the high annotation workload, the risk of overfitting is greatly increased. In this paper, we propose a Shared Multitask Learning Network (SMLN), which jointly trains auxiliary tasks that are highly related to aspect-level sentiment analysis. Specifically, we use opinion term extraction due to its high correlation with the main task. Through a custom-designed Cross Interaction Unit (CIU), effective information of the opinion term extraction task is passed to the main task, with performance improvement in both directions. Experimental results on SemEval-2014 and SemEval-2015 datasets demonstrate the competitive performance of SMLN in comparison to baseline methods.


2021 ◽  
pp. 85-92
Author(s):  
Sigita Rackevičienė ◽  
Liudmila Mockienė ◽  
Andrius Utka ◽  
Aivaras Rokas

The aim of the paper is to present a methodological framework for the development of an English-Lithuanian bilingual termbase in the cybersecurity domain, which can be applied as a model for other language pairs and other specialised domains. It is argued that the presented methodological approach can ensure creation of high-quality bilingual termbases even with limited available resources. The paper touches upon the methods and problems of dataset (corpora) compilation, terminology annotation, automatic bilingual term extraction (BiTE) and alignment, knowledge-rich context extraction, and linguistic linked open data (LLOD) technologies. The paper presents theoretical considerations as well as the arguments on the effectiveness of the described methods. The theoretical analysis and a pilot study allow arguing that: 1) a combination of parallel and comparable corpora enable to considerably expand the amount and variety of data sources that can be used for terminology extraction; this methodology is especially important for less-resourced languages which often lack parallel data; 2) deep learning systems trained by using manually annotated data (gold standard corpora) allow effective automatization of extraction of terminological data and metadata, which enables to regularly update termbases with minimised manual input; 3) LLOD technologies enable to integrate the terminological data into the global linguistic data ecosystem and make it reusable, searchable and discoverable across the Web.


Author(s):  
Oliver Streiter ◽  
Natascia Ralli ◽  
Isabella Ties ◽  
Leonhard Voltmer

BISTRO is an online platform which supports the translation process in various phases. The phases which can be distinguished are the terminologi-cal preparation of the source text, the creation of terminological glossaries and the retrieval of related documents and their terminological elaboration. For this purpose BISTRO hyperlinks a terminology database with bilingual and trilingual corpora. Term tools such as term extraction (TE), term recognition (TR) and keyword-in-context (KWIC) may be applied to the query results, which consist of retrieved terms or corpus segments. BISTRO ’s architecture is open for new tools and contents, providing at the same time the interface for the management of the underlying data structure and the constant update of the terminological data.


JAMIA Open ◽  
2021 ◽  
Vol 4 (4) ◽  
Author(s):  
Chrysoula Zerva ◽  
Samuel Taylor ◽  
Axel J Soto ◽  
Nhung T H Nguyen ◽  
Sophia Ananiadou

Abstract The COVID-19 pandemic resulted in an unprecedented production of scientific literature spanning several fields. To facilitate navigation of the scientific literature related to various aspects of the pandemic, we developed an exploratory search system. The system is based on automatically identified technical terms, document citations, and their visualization, accelerating identification of relevant documents. It offers a multi-view interactive search and navigation interface, bringing together unsupervised approaches of term extraction and citation analysis. We conducted a user evaluation with domain experts, including epidemiologists, biochemists, medicinal chemists, and medicine students. In general, most users were satisfied with the relevance and speed of the search results. More interestingly, participants mostly agreed on the capacity of the system to enable exploration and discovery of the search space using the graph visualization and filters. The system is updated on a weekly basis and it is publicly available at http://www.nactem.ac.uk/cord/.


Tertium ◽  
2021 ◽  
Vol 6 (1) ◽  
pp. 257-283
Author(s):  
Agnieszka Rzepkowska

The paper discusses two approaches to compiling lists of labour-law basic terminology (BT): a frequency-based approach and a concept-based one. The purpose of the paper is to compare each of the methods based on two sets of basic terminology selected in accordance with them. Using the first method, terms are selected via an automatic search of keywords and terms and organised according to frequency with the use of Sketch Engine. The second means of term extraction is a concept-based approach in which terms are selected based on the table of contents of the Polish Labour Code, which, for the purposes of the study, is assumed to outline the terminological system of Polish labour law. The results of this research are reviewed from the viewpoint of terms’ frequency, the number of words they consist of, systemic relations between terms in the labour-law terminological system, and potential users and their needs. This has allowed the author to draw a few conclusions as to the characteristics of the approaches taken, and the applicability and usefulness of lists of BT compiled on their bases.


Terminology ◽  
2021 ◽  
Author(s):  
Gisle Andersen

Abstract The development of terminologies for domains where these are lacking is a time-consuming and costly task. This article takes a methodological perspective and addresses a general methodological question: how can we, with limited funding, utilise to a maximal degree, existing language resources to create a terminology at a relatively low cost? Although an important player in the maritime industries for many centuries, Norway has not prioritised the systematic development of an official maritime terminology. The article therefore focuses specifically on efforts to develop a national resource for maritime domains. The article describes efforts to create a corpus of popular science and a parallel corpus of technical texts. Six different term extraction methods are applied. These include corpus-based statistical analyses of frequency, collocation and keyness, as well as bilingual term extraction. Finally, the pros and cons of each method are evaluated by means of a cost-benefit analysis.


2021 ◽  
Vol 65 (6) ◽  
Author(s):  
Chen Chen ◽  
Houfeng Wang ◽  
Qingqing Zhu ◽  
Junfei Liu

Sign in / Sign up

Export Citation Format

Share Document