term candidate
Recently Published Documents


TOTAL DOCUMENTS

6
(FIVE YEARS 2)

H-INDEX

2
(FIVE YEARS 0)

2021 ◽  
Vol 2078 (1) ◽  
pp. 012031
Author(s):  
Ani Song ◽  
Xiaoxia Jia ◽  
Wei Jiang

Abstract With the development of military intelligence, higher requirements are put forward for automatic term recognition in military field. In view of the characteristics of flexible and diverse naming of military requirement documents without annotated corpus, the method of this paper uses the existing military domain core database, and matches the data set and core database by Aho-Corasic algorithm and word segmentation technology, so that the terms to be recognized in the data set can be divided into three types. The possible rules of word formation of military terms are summarized and phrases that conform to the rules of word formation are found in the documents as the term candidate set. The core library and TF-IDF method are used to calculate the value of the candidate terms, and the candidate terms whose value is greater than the threshold are selected iteratively as the real terms. The experimental results show that the F1 value of this method reaches 0.719, which is better than the traditional C-value method. Therefore, the method proposed in this paper can achieve better automatic term recognition effect for military requirement documents without annotation.


Terminology ◽  
2018 ◽  
Vol 24 (1) ◽  
pp. 122-147
Author(s):  
Mercè Vàzquez ◽  
Antoni Oliver

Abstract The identification of reliable terms from domain-specific corpora using computational methods is a task that has to be validated manually by specialists, which is a highly time-consuming activity. To reduce this effort and improve term candidate selection, we implemented the Token Slot Recognition method, a filtering method based on terminological tokens which is used to rank extracted term candidates from domain-specific corpora. This paper presents the implementation of the term candidates filtering method we developed in linguistic and statistical approaches applied for automatic term extraction using several domain-specific corpora in different languages. We observed that the filtering method outperforms term candidate selection by ranking a higher number of terms at the top of the term candidate list than raw frequency, and for statistical term extraction the improvement is between 15% and 25% both in precision and recall. Our analyses further revealed a reduction in the number of term candidates to be validated manually by specialists. In conclusion, the number of term candidates extracted automatically from domain-specific corpora has been reduced significantly using the Token Slot Recognition filtering method, so term candidates can be easily and quickly validated by specialists.


Terminology ◽  
2014 ◽  
Vol 20 (1) ◽  
pp. 50-73 ◽  
Author(s):  
Gabriel Bernier-Colborne ◽  
Patrick Drouin

In this paper, we describe a methodology used to create a test corpus for the evaluation of term extractors. This methodology relies on term annotation: terms in a corpus on automotive engineering are selected based on specific criteria pertaining to the terminological setting as well as linguistic and formal properties of terms and term variations. The test corpus accounts for the variety of ways in which terms are realized in running text, and provides a means of automatically evaluating the relevance of term candidate lists produced by term extractors. Due to the XML annotation scheme used, the corpus can be customized, e.g. by filtering out some of the annotated terms based on the type of term or term variation, or frequency. In this paper, we focus on the methodological aspects of this work.


Terminology ◽  
2008 ◽  
Vol 14 (2) ◽  
pp. 204-229 ◽  
Author(s):  
Chunyu Kit ◽  
Xiaoyue Liu

Terminology as a set of concept carriers crystallizes our special knowledge about a subject. Automatic term recognition (ATR) plays a critical role in the processing and management of various kinds of information, knowledge and documents, e.g., knowledge acquisition via text mining. Measuring termhood properly is one of the core issues involved in ATR. This article presents a novel approach to termhood measurement for mono-word terms via corpus comparison, which quantifies the termhood of a term candidate as its rank difference in a domain and a background corpus. Our ATR experiments to identify legal terms in Hong Kong (HK) legal texts with the British National Corpus (BNC) as background corpus provide evidence to confirm the validity and effectiveness of this approach. Without any prior knowledge and ad hoc heuristics, it achieves a precision of 97.0% on the top 1000 candidates and a precision of 96.1% on the top 10% candidates that are most highly ranked by the termhood measure, illustrating a state-of-the-art performance on mono-word ATR in the field.


1986 ◽  
Vol 15 (1) ◽  
pp. 3-19 ◽  
Author(s):  
Charles Weiss ◽  
Stan Jarvis

Interactive videodisc (IVD) technology, which combines the visual power of television with the flexibility of the microcomputer, offers attractive possibilities for applications to training and education in developing countries. For example, it can increase the efficiency of training, can teach students in scattered locations, and can supplement the work of live teachers when the latter are scarce. But, IVD requires substantial front-end costs, as well as a local team with a unique blend of talents. The first trial applications in developing countries should provide a high pay-off in the relatively short-term. Candidate applications drawn from the work of development assistance agencies include remedial science teaching for prospective college entrants, and training in the operation, maintenance, and repair of complicated machinery.


Sign in / Sign up

Export Citation Format

Share Document