automatic term recognition Latest Research Papers

Abstract With the development of military intelligence, higher requirements are put forward for automatic term recognition in military field. In view of the characteristics of flexible and diverse naming of military requirement documents without annotated corpus, the method of this paper uses the existing military domain core database, and matches the data set and core database by Aho-Corasic algorithm and word segmentation technology, so that the terms to be recognized in the data set can be divided into three types. The possible rules of word formation of military terms are summarized and phrases that conform to the rules of word formation are found in the documents as the term candidate set. The core library and TF-IDF method are used to calculate the value of the candidate terms, and the candidate terms whose value is greater than the threshold are selected iteratively as the real terms. The experimental results show that the F1 value of this method reaches 0.719, which is better than the traditional C-value method. Therefore, the method proposed in this paper can achieve better automatic term recognition effect for military requirement documents without annotation.

Download Full-text

Machine Learning in Terminology Extraction from Czech and English Texts

Linguistic Frontiers ◽

10.2478/lf-2021-0001 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Dominika Kováříková

Keyword(s):

Machine Learning ◽

Low Frequency ◽

Single Word ◽

Terminology Extraction ◽

Academic Texts ◽

Quantitative Term ◽

Characteristic Features ◽

Specialized Texts ◽

Cross Lingual ◽

Automatic Term Recognition

Abstract The method of automatic term recognition based on machine learning is focused primarily on the most important quantitative term attributes. It is able to successfully identify terms and non-terms (with success rate of more than 95 %) and find characteristic features of a term as a terminological unit. A single-word term can be characterized as a word with a low frequency that occurs considerably more often in specialized texts than in non-academic texts, occurs in a small number of disciplines, its distribution in the corpus is uneven as is the distance between its two instances. A multi-word term is a collocation consisting of words with low frequency and contains at least one single-word term. The method is based on quantitative features and it makes it possible to utilize the algorithms in multiple disciplines as well as to create cross-lingual applications (verified on Czech and English).

Download Full-text

Clinical sublanguages

Terminology ◽

10.1075/term.00013.gro ◽

2018 ◽

Vol 24 (1) ◽

pp. 41-65 ◽

Cited By ~ 1

Author(s):

Leonie Grön ◽

Ann Bertels

Keyword(s):

Language Use ◽

Term Weighting ◽

Semantic Composition ◽

Fine Grained ◽

Term Variation ◽

Automatic Term Recognition ◽

Clinical Records ◽

Clinical Domain

Abstract Due to its specific linguistic properties, the language found in clinical records has been characterized as a distinct sublanguage. Even within the clinical domain, though, there are major differences in language use, which has led to more fine-grained distinctions based on medical fields and document types. However, previous work has mostly neglected the influence of term variation. By contrast, we propose to integrate the potential for term variation in the characterization of clinical sublanguages. By analyzing a corpus of clinical records, we show that the different sections of these records vary systematically with regard to their lexical, terminological and semantic composition, as well as their potential for term variation. These properties have implications for automatic term recognition, as they influence the performance of frequency-based term weighting.

Download Full-text

Nested term recognition driven by word connection strength

Terminology ◽

10.1075/term.21.2.03mar ◽

2015 ◽

Vol 21 (2) ◽

pp. 180-204 ◽

Cited By ~ 1

Author(s):

Malgorzata Marciniak ◽

Agnieszka Mykowiecka

Keyword(s):

Mutual Information ◽

Term Structure ◽

Recognition Task ◽

New Method ◽

Connection Strength ◽

Term List ◽

C Value ◽

Automatic Term Recognition ◽

The Impact ◽

Value Ranking

Domain corpora are often not very voluminous and even important terms can occur in them not as isolated maximal phrases but only within more complex constructions. Appropriate recognition of nested terms can thus influence the content of the extracted candidate term list and its order. We propose a new method for identifying nested terms based on a combination of two aspects: grammatical correctness and normalised pointwise mutual information (NPMI) counted for all bigrams in a given corpus. NPMI is typically used for recognition of strong word connections, but in our solution we use it to recognise the weakest points to suggest the best place for division of a phrase into two parts. By creating, at most, two nested phrases in each step, we introduce a binary term structure. We test the impact of the proposed method applied, together with the C-value ranking method, to the automatic term recognition task performed on three corpora, two in Polish and one in English.

Download Full-text

Methods for automatic term recognition in domain-specific text collections: A survey

Programming and Computer Software ◽

10.1134/s036176881506002x ◽

2015 ◽

Vol 41 (6) ◽

pp. 336-349 ◽

Cited By ~ 12

Author(s):

N. A. Astrakhantsev ◽

D. G. Fedorenko ◽

D. Yu. Turdakov

Keyword(s):

Domain Specific ◽

Text Collections ◽

Automatic Term Recognition

Download Full-text

Automatic Term Recognition Using Hybrid Method Based on Rewriting and Statistic

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.1049-1050.1544 ◽

2014 ◽

Vol 1049-1050 ◽

pp. 1544-1549

Author(s):

Wen Xiong

Keyword(s):

Mutual Information ◽

Hybrid Method ◽

Deep Processing ◽

Patent Data ◽

Two Phase ◽

Statistical Measures ◽

Log Likelihood ◽

C Value ◽

Automatic Term Recognition ◽

Significant Application

Machine aided human translation (MAHT) for the abstract of patent texts is an important step to the deep processing of the patent data, where the terms have significant application value. This paper investigates the automatic term recognition (ATR), and proposes a new hybrid method based on two-phase analysis and statistic to generate English candidate terms. The segments including stop words were not simply discarded; instead, a rewriting method using beginning patterns, ending patterns, and inner patterns on the phase two was employed for the processing of the segments. In the meantime, generalized statistical measures were used for the evaluation of the candidates such as the generalized mutual information (MI), Log-Likelihood Ratio (LLR), and C-value to filter the low score’s candidate terms and to attain the intersection set of them. The experiments on the patent abstract texts extracted randomly show the availability of the method.

Download Full-text

Evaluation of five single-word term recognition methods on a legal English corpus

Corpora ◽

10.3366/cor.2014.0052 ◽

2014 ◽

Vol 9 (1) ◽

pp. 83-107 ◽

Cited By ~ 5

Author(s):

María José Marín

Keyword(s):

United Kingdom ◽

Supreme Court ◽

Single Word ◽

Inverse Document Frequency ◽

Term Frequency ◽

Domain Specific ◽

The United Kingdom ◽

Large Size ◽

Document Frequency ◽

Automatic Term Recognition

Specialised texts are characterised by, amongst other features, the presence of terminology which conveys domain-specific concepts that are essential for the specialist who is interested in analysing such texts. Automatic Term Recognition methods (ATR) are employed to identify those terms automatically, which is especially helpful in view of the large size of corpora nowadays. However, they tend to concentrate on the identification of Multi-Word Terms (MWTs) neglecting Single-Word Terms (SWTs) to a certain extent. This might be related to the greater number of the former found in fields such as biomedicine. However, so far as legal English is concerned, testing has shown that SWTs represent 65.22 percent of the items in the specialised glossary employed for the evaluation of the ATR methods examined herein. This paper presents the evaluation of five SWT recognition methods, namely, those of Chung (2003) , Drouin (2003) , Kit and Liu (2008) , Keywords (2008), and TF-IDF (term frequency-inverse document frequency). These were tested on the United Kingdom Supreme Court Corpus (UKSCC), a legal corpus of 2.6 million words which was compiled for this purpose. The results indicate that Drouin's TermoStat software is the best performing method, achieving 73.45 percent precision on the top 2,000 candidate terms.

Download Full-text

Term Extraction

The Oxford Handbook of Computational Linguistics 2nd edition ◽

10.1093/oxfordhb/9780199573691.013.004 ◽

2014 ◽

Cited By ~ 3

Author(s):

Ioannis Korkontzelos ◽

Sophia Ananiadou

Keyword(s):

Language Processing ◽

Research Topic ◽

Free Text ◽

Automatic Extraction ◽

Human Intervention ◽

Important Research ◽

Literature Information ◽

Term Extraction ◽

Important Research Topic ◽

Automatic Term Recognition

Automatic extraction of metadata from free text is key to digesting stored literature information, especially in dynamic and rapidly evolving fields, such as biomedicine. Besides, more and more applications heavily depend on knowledge and ontologies. Successfully recognizing or extracting terms and their relations in scientific and technical documents without human intervention is crucial to semantically structuring literature and populating ontologies. This task has been recognized as the bottleneck in exploiting fields that involve complex and dynamically changing terms, and thus has become an important research topic in Natural Language Processing. This chapter presents a brief but complete overview of automatic term recognition techniques and discusses a number of crucial practical issues. Subsequently, it focuses on evaluation, discusses available resources, and highlights a number of applications.

Download Full-text

Automatic Access to Legal Terminology Applying Two Different Automatic Term Recognition Methods

Procedia - Social and Behavioral Sciences ◽

10.1016/j.sbspro.2013.10.669 ◽

2013 ◽

Vol 95 ◽

pp. 455-463 ◽

Cited By ~ 1

Author(s):

María José Marín Pérez ◽

Camino Rea Rizzo

Keyword(s):

Automatic Term Recognition

Download Full-text

Query Log Analysis for Adaptive Dialogue-Driven Search

Handbook of Research on Web Log Analysis ◽

10.4018/978-1-59904-974-8.ch020 ◽

2011 ◽

pp. 389-414 ◽

Cited By ~ 1

Author(s):

Udo Kruschwitz ◽

Nick Webb ◽

Richard Sutcliffe

Keyword(s):

Mutual Information ◽

Question Answering ◽

System Analysis ◽

Domain Model ◽

Log Analysis ◽

Query Log ◽

Query Log Analysis ◽

Question Answering Systems ◽

Query Logs ◽

Automatic Term Recognition

The theme of this chapter is the improvement of Information Retrieval and Question Answering systems by the analysis of query logs. Two case studies are discussed. The first describes an intranet search engine working on a university campus which can present sophisticated query modifications to the user. It does this via a hierarchical domain model built using multi-word term co-occurrence data. The usage log was analysed using mutual information scores between a query and its refinement, between a query and its replacement, and between two queries occurring in the same session. The results can be used to validate refinements in the domain model, and to suggest replacements such as domain-dependent spelling corrections. The second case study describes a dialogue-based question answering system working over a closed document collection largely derived from the Web. Logs here are based around explicit sessions in which an analyst interacts with the system. Analysis of the logs has shown that certain types of interaction lead to increased precision of the results. Future versions of the system will encourage these forms of interaction. The conclusions of this chapter are firstly that there is a growing literature on query log analysis, much of it reviewed here, secondly that logs provide many forms of useful information for improving a system, and thirdly that mutual information measures taken with automatic term recognition algorithms and hierarchy construction techniques comprise one approach for enhancing system performance.

Download Full-text

automatic term recognition
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Automatic Term Recognition Method for Military Domain

Machine Learning in Terminology Extraction from Czech and English Texts

Clinical sublanguages

Nested term recognition driven by word connection strength

Methods for automatic term recognition in domain-specific text collections: A survey

Automatic Term Recognition Using Hybrid Method Based on Rewriting and Statistic

Evaluation of five single-word term recognition methods on a legal English corpus

Term Extraction

Automatic Access to Legal Terminology Applying Two Different Automatic Term Recognition Methods

Query Log Analysis for Adaptive Dialogue-Driven Search

Export Citation Format

automatic term recognitionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Automatic Term Recognition Method for Military Domain

Machine Learning in Terminology Extraction from Czech and English Texts

Clinical sublanguages

Nested term recognition driven by word connection strength

Methods for automatic term recognition in domain-specific text collections: A survey

Automatic Term Recognition Using Hybrid Method Based on Rewriting and Statistic

Evaluation of five single-word term recognition methods on a legal English corpus

Term Extraction

Automatic Access to Legal Terminology Applying Two Different Automatic Term Recognition Methods

Query Log Analysis for Adaptive Dialogue-Driven Search

automatic term recognition
Recently Published Documents