Measuring mono-word termhood by rank difference via corpus comparison

Terminology as a set of concept carriers crystallizes our special knowledge about a subject. Automatic term recognition (ATR) plays a critical role in the processing and management of various kinds of information, knowledge and documents, e.g., knowledge acquisition via text mining. Measuring termhood properly is one of the core issues involved in ATR. This article presents a novel approach to termhood measurement for mono-word terms via corpus comparison, which quantifies the termhood of a term candidate as its rank difference in a domain and a background corpus. Our ATR experiments to identify legal terms in Hong Kong (HK) legal texts with the British National Corpus (BNC) as background corpus provide evidence to confirm the validity and effectiveness of this approach. Without any prior knowledge and ad hoc heuristics, it achieves a precision of 97.0% on the top 1000 candidates and a precision of 96.1% on the top 10% candidates that are most highly ranked by the termhood measure, illustrating a state-of-the-art performance on mono-word ATR in the field.

Download Full-text

The contribution of verbal semantic content towards term recognition

International Journal of Corpus Linguistics ◽

10.1075/ijcl.7.1.05eum ◽

2002 ◽

Vol 7 (1) ◽

pp. 87-106

Author(s):

Eugenia Eumeridou

Keyword(s):

Language Processing ◽

Contextual Information ◽

Semantic Content ◽

Document Indexing ◽

British National Corpus ◽

Computational Procedures ◽

Automatic Term Recognition ◽

Identification Techniques ◽

Term Identification ◽

National Corpus

Automatic term recognition is a natural language processing technology which is gaining increasing prominence in our information-overloaded society. Apart from its use for quick and efficient updating of terminologies and thesauri, it has also been used for machine translation, information retrieval, document indexing and classification as well as content representation. Until very recently, term identification techniques rested solely on the mapping of term linguistic properties onto computational procedures. However, actual terminological practice has shown that context is also important for term identification and interpretation as terms may appear in different forms depending on the situation of use. The aim of this article is to show the importance of contextual information for automatic term recognition by exploiting the relation between verbal semantic content and term occurrence in three subcorpora drawn from the British National Corpus.

Download Full-text

Give it me!: pronominal ditransitives in English dialects

English Language and Linguistics ◽

10.1017/s1360674313000117 ◽

2013 ◽

Vol 17 (3) ◽

pp. 445-463 ◽

Cited By ~ 7

Author(s):

JOHANNA GERWIN

Keyword(s):

Regional Distribution ◽

Online Version ◽

Double Object ◽

Double Object Construction ◽

Object Construction ◽

Novel Approach ◽

British National Corpus ◽

English Dialect ◽

Diachronic Development ◽

National Corpus

Constructions involving a ditransitive verb, a direct theme object, and an indirect recipient object have been extensively studied – especially in the contexts of the ‘dative’ and the ‘benefactive alternations’, i.e. the alternations between a double-object construction (DOC) (e.g. She gave him a book) and a corresponding prepositional construction (PREP) either with to (e.g. She gave a book to him) or with for (e.g. She bought a book for him). The present study focuses on a ditransitive phenomenon which occurs in British dialects: when both objects are pronouns, three variants of encoding are possible: DOC (e.g. Give me it!), PREP (e.g. Give it to me!) and the alternative double-object construction (altDOC) (e.g. Give it me!). The regional distribution and diachronic development of the three constructions are traced using two corpora containing regional speech: the Freiburg English Dialect Corpus (FRED)1 and the online version of the British National Corpus (BNCweb). In concentrating on a dialect phenomenon, in taking language-external determinants of the ‘dative/benefactive alternation’ into consideration, and in investigating these empirically, the present study takes a novel approach to the much-discussed topic of ditransitives in English.

Download Full-text

A novel approach to Internet connectivity for mobile ad hoc networks

IET International Conference on Wireless Mobile and Multimedia Networks Proceedings (ICWMMN 2006) ◽

10.1049/cp:20061210 ◽

2006 ◽

Author(s):

Shuigen Yang ◽

Huachun Zhou ◽

Hongke Zhang ◽

Yajuan Qin ◽

Ping Dong

Keyword(s):

Ad Hoc Networks ◽

Mobile Ad Hoc Networks ◽

Ad Hoc ◽

Internet Connectivity ◽

Novel Approach ◽

Mobile Ad Hoc ◽

Hoc Networks

Download Full-text

“That’s well good”: A Re-emergent Intensifier in Current British English

Journal of English Linguistics ◽

10.1177/0075424220979143 ◽

2020 ◽

pp. 007542422097914

Author(s):

Karin Aijmer

Keyword(s):

Social Class ◽

Fourteenth Century ◽

Social Factors ◽

British English ◽

Discourse Marker ◽

Time Gap ◽

British National Corpus ◽

Semantic Types ◽

Over Time ◽

National Corpus

Well has a long history and is found as an intensifier already in older English. It is argued that diachronically well has developed from its etymological meaning (‘in a good way’) on a cline of adverbialization to an intensifier and to a discourse marker. Well is replaced by other intensifiers in the fourteenth century but emerges in new uses in Present-Day English. The changes in frequency and use of the new intensifier are explored on the basis of a twenty-year time gap between the old British National Corpus (1994) and the new Spoken British National Corpus (2014). The results show that well increases in frequency over time and that it spreads to new semantic types of adjectives and participles, and is found above all in predicative structures with a copula. The emergence of a new well and its increase in frequency are also related to social factors such as the age, gender, and social class of the speakers, and the informal character of the conversation.

Download Full-text

Inclusion, Contrast and Polysemy in Dictionaries: The Relationship between Theory, Language Use and Lexicographic Practice

Research in Language ◽

10.1515/rela-2015-0001 ◽

2014 ◽

Vol 12 (4) ◽

pp. 319-340

Author(s):

Anu Koskela

Keyword(s):

Language Use ◽

Lexical Item ◽

British National Corpus ◽

Lexical Items ◽

The Relationship ◽

National Corpus

This paper explores the lexicographic representation of a type of polysemy that arises when the meaning of one lexical item can either include or contrast with the meaning of another, as in the case of dog/bitch, shoe/boot, finger/thumb and animal/bird. A survey of how such pairs are represented in monolingual English dictionaries showed that dictionaries mostly represent as explicitly polysemous those lexical items whose broader and narrower readings are more distinctive and clearly separable in definitional terms. They commonly only represented the broader readings for terms that are in fact frequently used in the narrower reading, as shown by data from the British National Corpus.

Download Full-text

A Corpora-Based Analysis of Rely on and Depend on

Journal of Critical Studies in Language and Literature ◽

10.46809/jcsll.v3i1.119 ◽

2021 ◽

Vol 3 (1) ◽

pp. 9-21

Author(s):

Namkil Kang

Keyword(s):

Comparative Analysis ◽

American English ◽

The Other ◽

Information State ◽

Other Hand ◽

British National Corpus ◽

National Corpus

The ultimate goal of this paper is to provide a comparative analysis of rely on and depend on in the Corpus of Contemporary American English and the British National Corpus. The COCA clearly shows that the expression rely on government is the most preferred by Americans, followed by rely on people, and rely on data. The COCA further indicates that the expression depend on slate is the most preferred by Americans, followed by depend on government, and depend on people. The BNC shows, on the other hand, that the expression rely on others is the most preferred by the British, followed by rely on people, and rely on friends. The BNC further indicates that depend on factors and depend on others are the most preferred by the British, followed by depend on age, and depend on food. Finally, in the COCA, the nouns government, luck, welfare, people, information, state, fossil, water, family, oil, food, and things are linked to both rely on and depend on, but many nouns are not still linked to both of them. On the other hand, in the BNC, only the nouns state, chance, government, and others are linked to both rely on and depend on, but many nouns are not still linked to both rely on and depend on. It can thus be inferred from this that rely on is slightly different from depend on in its use.

Download Full-text

The Correlations Between Combinational Arrangements and Semantic Implications of Utterly in the British National Corpus

The Journal of Humanities and Social sciences 21 ◽

10.22143/hss21.12.6.25 ◽

2021 ◽

Vol 12 (6) ◽

pp. 349-360

Author(s):

Jungyull Lee

Keyword(s):

British National Corpus ◽

National Corpus

Download Full-text

The Usage of CAUSE in Three Branches of Science

Higher Education Studies ◽

10.5539/hes.v6n2p109 ◽

2016 ◽

Vol 6 (2) ◽

pp. 109

Author(s):

Bei Yang ◽

Bin Chen

Keyword(s):

Social Science ◽

Language Learning ◽

Academic Writing ◽

Applied Science ◽

Human Beings ◽

Pure Science ◽

Research Findings ◽

British National Corpus ◽

Semantic Prosody ◽

National Corpus

<p>Semantic prosody is a concept that has been subject to considerable criticism and debate. One big concern is to what extent semantic prosody is domain or register-related. Previous studies reach the agreement that CAUSE has an overwhelmingly negative meaning in general English. Its semantic prosody remains controversial in academic writing, however, because of the size and register of the corpus used in different studies. In order to minimize the role that corpus choice has to play in determining the research findings, this paper uses sub-corpora from the British National Corpus to investigate the usage of CAUSE in different types of scientific writing. The results show that the occurrence of CAUSE is the highest in social science, less frequent in applied science, and the lowest in natural and pure science. Its semantic prosody is overwhelmingly negative in social science and applied science, and mainly neutral in natural and pure science. It seems that the verb CAUSE lacks its normal negative semantic prosody in contexts that do not refer to human beings. The implications of the findings for language learning are also discussed.</p>

Download Full-text

A Novel Approach for Reliable Route Discovery in Mobile Ad-Hoc Network

Wireless Personal Communications ◽

10.1007/s11277-015-2461-8 ◽

2015 ◽

Vol 83 (2) ◽

pp. 1519-1529 ◽

Cited By ~ 8

Author(s):

Shariq Mahmood Khan ◽

R. Nilavalan ◽

Abdulhafid F. Sallama

Keyword(s):

Ad Hoc Network ◽

Ad Hoc ◽

Mobile Ad Hoc Network ◽

Route Discovery ◽

Novel Approach ◽

Mobile Ad Hoc

Download Full-text

Social Differentiation in the Use of English Vocabulary

International Journal of Corpus Linguistics ◽

10.1075/ijcl.2.1.07ray ◽

1997 ◽

Vol 2 (1) ◽

pp. 133-152 ◽

Cited By ~ 62

Author(s):

Paul Rayson ◽

Geoffrey N. Leech ◽

Mary Hodges

Keyword(s):

Social Group ◽

Geographical Region ◽

Social Differentiation ◽

Future Research ◽

Analysis Tool ◽

Spoken English ◽

Transcription System ◽

Group A ◽

British National Corpus ◽

National Corpus

In this article, we undertake selective quantitative analyses of the demographi-cally-sampled spoken English component of the British National Corpus (for brevity, referred to here as the ''Conversational Corpus"). This is a subcorpus of c. 4.5 million words, in which speakers and respondents (see I below) are identified by such factors as gender, age, social group, and geographical region. Using a corpus analysis tool developed at Lancaster, we undertake a comparison of the vocabulary of speakers, highlighting those differences which are marked by a very high X2 value of difference between different sectors of the corpus according to gender, age, and social group. A fourth variable, that of geographical region of the United Kingdom, is not investigated in this article, although it remains a promising subject for future research. (As background we also briefly examine differences between spoken and written material in the British National Corpus [BNC].) This study is illustrative of the potentiality of the Conversational Corpus for future corpus-based research on social differentiation in the use of language. There are evident limitations, including (a) the reliance on vocabulary frequency lists and (b) the simplicity of the transcription system employed for the spoken part of the BNC The conclusion of the article considers future advances in the research paradigm illustrated here.

Download Full-text