Measuring mono-word termhood by rank difference via corpus comparison

Terminology ◽  
2008 ◽  
Vol 14 (2) ◽  
pp. 204-229 ◽  
Author(s):  
Chunyu Kit ◽  
Xiaoyue Liu

Terminology as a set of concept carriers crystallizes our special knowledge about a subject. Automatic term recognition (ATR) plays a critical role in the processing and management of various kinds of information, knowledge and documents, e.g., knowledge acquisition via text mining. Measuring termhood properly is one of the core issues involved in ATR. This article presents a novel approach to termhood measurement for mono-word terms via corpus comparison, which quantifies the termhood of a term candidate as its rank difference in a domain and a background corpus. Our ATR experiments to identify legal terms in Hong Kong (HK) legal texts with the British National Corpus (BNC) as background corpus provide evidence to confirm the validity and effectiveness of this approach. Without any prior knowledge and ad hoc heuristics, it achieves a precision of 97.0% on the top 1000 candidates and a precision of 96.1% on the top 10% candidates that are most highly ranked by the termhood measure, illustrating a state-of-the-art performance on mono-word ATR in the field.

2002 ◽  
Vol 7 (1) ◽  
pp. 87-106
Author(s):  
Eugenia Eumeridou

Automatic term recognition is a natural language processing technology which is gaining increasing prominence in our information-overloaded society. Apart from its use for quick and efficient updating of terminologies and thesauri, it has also been used for machine translation, information retrieval, document indexing and classification as well as content representation. Until very recently, term identification techniques rested solely on the mapping of term linguistic properties onto computational procedures. However, actual terminological practice has shown that context is also important for term identification and interpretation as terms may appear in different forms depending on the situation of use. The aim of this article is to show the importance of contextual information for automatic term recognition by exploiting the relation between verbal semantic content and term occurrence in three subcorpora drawn from the British National Corpus.


2013 ◽  
Vol 17 (3) ◽  
pp. 445-463 ◽  
Author(s):  
JOHANNA GERWIN

Constructions involving a ditransitive verb, a direct theme object, and an indirect recipient object have been extensively studied – especially in the contexts of the ‘dative’ and the ‘benefactive alternations’, i.e. the alternations between a double-object construction (DOC) (e.g. She gave him a book) and a corresponding prepositional construction (PREP) either with to (e.g. She gave a book to him) or with for (e.g. She bought a book for him). The present study focuses on a ditransitive phenomenon which occurs in British dialects: when both objects are pronouns, three variants of encoding are possible: DOC (e.g. Give me it!), PREP (e.g. Give it to me!) and the alternative double-object construction (altDOC) (e.g. Give it me!). The regional distribution and diachronic development of the three constructions are traced using two corpora containing regional speech: the Freiburg English Dialect Corpus (FRED)1 and the online version of the British National Corpus (BNCweb). In concentrating on a dialect phenomenon, in taking language-external determinants of the ‘dative/benefactive alternation’ into consideration, and in investigating these empirically, the present study takes a novel approach to the much-discussed topic of ditransitives in English.


2020 ◽  
pp. 007542422097914
Author(s):  
Karin Aijmer

Well has a long history and is found as an intensifier already in older English. It is argued that diachronically well has developed from its etymological meaning (‘in a good way’) on a cline of adverbialization to an intensifier and to a discourse marker. Well is replaced by other intensifiers in the fourteenth century but emerges in new uses in Present-Day English. The changes in frequency and use of the new intensifier are explored on the basis of a twenty-year time gap between the old British National Corpus (1994) and the new Spoken British National Corpus (2014). The results show that well increases in frequency over time and that it spreads to new semantic types of adjectives and participles, and is found above all in predicative structures with a copula. The emergence of a new well and its increase in frequency are also related to social factors such as the age, gender, and social class of the speakers, and the informal character of the conversation.


2014 ◽  
Vol 12 (4) ◽  
pp. 319-340
Author(s):  
Anu Koskela

This paper explores the lexicographic representation of a type of polysemy that arises when the meaning of one lexical item can either include or contrast with the meaning of another, as in the case of dog/bitch, shoe/boot, finger/thumb and animal/bird. A survey of how such pairs are represented in monolingual English dictionaries showed that dictionaries mostly represent as explicitly polysemous those lexical items whose broader and narrower readings are more distinctive and clearly separable in definitional terms. They commonly only represented the broader readings for terms that are in fact frequently used in the narrower reading, as shown by data from the British National Corpus.  


2021 ◽  
Vol 3 (1) ◽  
pp. 9-21
Author(s):  
Namkil Kang

The ultimate goal of this paper is to provide a comparative analysis of rely on and depend on in the Corpus of Contemporary American English and the British National Corpus. The COCA clearly shows that the expression rely on government is the most preferred by Americans, followed by rely on people, and rely on data. The COCA further indicates that the expression depend on slate is the most preferred by Americans, followed by depend on government, and depend on people. The BNC shows, on the other hand, that the expression rely on others is the most preferred by the British, followed by rely on people, and rely on friends. The BNC further indicates that depend on factors and depend on others are the most preferred by the British, followed by depend on age, and depend on food. Finally, in the COCA, the nouns government, luck, welfare, people, information, state, fossil, water, family, oil, food, and things are linked to both rely on and depend on, but many nouns are not still linked to both of them. On the other hand, in the BNC, only the nouns state, chance, government, and others are linked to both rely on and depend on, but many nouns are not still linked to both rely on and depend on. It can thus be inferred from this that rely on is slightly different from depend on in its use.


2016 ◽  
Vol 6 (2) ◽  
pp. 109
Author(s):  
Bei Yang ◽  
Bin Chen

<p>Semantic prosody is a concept that has been subject to considerable criticism and debate. One big concern is to what extent semantic prosody is domain or register-related. Previous studies reach the agreement that CAUSE has an overwhelmingly negative meaning in general English. Its semantic prosody remains controversial in academic writing, however, because of the size and register of the corpus used in different studies. In order to minimize the role that corpus choice has to play in determining the research findings, this paper uses sub-corpora from the British National Corpus to investigate the usage of CAUSE in different types of scientific writing. The results show that the occurrence of CAUSE is the highest in social science, less frequent in applied science, and the lowest in natural and pure science. Its semantic prosody is overwhelmingly negative in social science and applied science, and mainly neutral in natural and pure science. It seems that the verb CAUSE lacks its normal negative semantic prosody in contexts that do not refer to human beings. The implications of the findings for language learning are also discussed.</p>


2015 ◽  
Vol 83 (2) ◽  
pp. 1519-1529 ◽  
Author(s):  
Shariq Mahmood Khan ◽  
R. Nilavalan ◽  
Abdulhafid F. Sallama

1997 ◽  
Vol 2 (1) ◽  
pp. 133-152 ◽  
Author(s):  
Paul Rayson ◽  
Geoffrey N. Leech ◽  
Mary Hodges

In this article, we undertake selective quantitative analyses of the demographi-cally-sampled spoken English component of the British National Corpus (for brevity, referred to here as the ''Conversational Corpus"). This is a subcorpus of c. 4.5 million words, in which speakers and respondents (see I below) are identified by such factors as gender, age, social group, and geographical region. Using a corpus analysis tool developed at Lancaster, we undertake a comparison of the vocabulary of speakers, highlighting those differences which are marked by a very high X2 value of difference between different sectors of the corpus according to gender, age, and social group. A fourth variable, that of geographical region of the United Kingdom, is not investigated in this article, although it remains a promising subject for future research. (As background we also briefly examine differences between spoken and written material in the British National Corpus [BNC].) This study is illustrative of the potentiality of the Conversational Corpus for future corpus-based research on social differentiation in the use of language. There are evident limitations, including (a) the reliance on vocabulary frequency lists and (b) the simplicity of the transcription system employed for the spoken part of the BNC The conclusion of the article considers future advances in the research paradigm illustrated here.


Sign in / Sign up

Export Citation Format

Share Document