The contribution of verbal semantic content towards term recognition

Automatic term recognition is a natural language processing technology which is gaining increasing prominence in our information-overloaded society. Apart from its use for quick and efficient updating of terminologies and thesauri, it has also been used for machine translation, information retrieval, document indexing and classification as well as content representation. Until very recently, term identification techniques rested solely on the mapping of term linguistic properties onto computational procedures. However, actual terminological practice has shown that context is also important for term identification and interpretation as terms may appear in different forms depending on the situation of use. The aim of this article is to show the importance of contextual information for automatic term recognition by exploiting the relation between verbal semantic content and term occurrence in three subcorpora drawn from the British National Corpus.

Download Full-text

Measuring mono-word termhood by rank difference via corpus comparison

Terminology ◽

10.1075/term.14.2.05kit ◽

2008 ◽

Vol 14 (2) ◽

pp. 204-229 ◽

Cited By ~ 24

Author(s):

Chunyu Kit ◽

Xiaoyue Liu

Keyword(s):

Ad Hoc ◽

Critical Role ◽

Novel Approach ◽

Rank Difference ◽

Special Knowledge ◽

British National Corpus ◽

Automatic Term Recognition ◽

Core Issues ◽

Term Candidate ◽

National Corpus

Terminology as a set of concept carriers crystallizes our special knowledge about a subject. Automatic term recognition (ATR) plays a critical role in the processing and management of various kinds of information, knowledge and documents, e.g., knowledge acquisition via text mining. Measuring termhood properly is one of the core issues involved in ATR. This article presents a novel approach to termhood measurement for mono-word terms via corpus comparison, which quantifies the termhood of a term candidate as its rank difference in a domain and a background corpus. Our ATR experiments to identify legal terms in Hong Kong (HK) legal texts with the British National Corpus (BNC) as background corpus provide evidence to confirm the validity and effectiveness of this approach. Without any prior knowledge and ad hoc heuristics, it achieves a precision of 97.0% on the top 1000 candidates and a precision of 96.1% on the top 10% candidates that are most highly ranked by the termhood measure, illustrating a state-of-the-art performance on mono-word ATR in the field.

Download Full-text

The Disambiguation of Nominalizations

Computational Linguistics ◽

10.1162/089120102760276018 ◽

2002 ◽

Vol 28 (3) ◽

pp. 357-388 ◽

Cited By ~ 32

Author(s):

Maria Lapata

Keyword(s):

Information Sources ◽

Contextual Information ◽

Head Noun ◽

Insufficient Evidence ◽

Smoothing Techniques ◽

Partial Parsing ◽

Compound Nouns ◽

British National Corpus ◽

National Corpus

This article addresses the interpretation of nominalizations, a particular class of compound nouns whose head noun is derived from a verb and whose modifier is interpreted as an argument of this verb. Any attempt to automatically interpret nominalizations needs to take into account: (a) the selectional constraints imposed by the nominalized compound head, (b) the fact that the relation of the modifier and the head noun can be ambiguous, and (c) the fact that these constraints can be easily overridden by contextual or pragmatic factors. The interpretation of nominalizations poses a further challenge for probabilistic approaches since the argument relations between a head and its modifier are not readily available in the corpus. Even an approximation that maps the compound head to its underlying verb provides insufficient evidence. We present an approach that treats the interpretation task as a disambiguation problem and show how we can “re-create” the missing distributional evidence by exploiting partial parsing, smoothing techniques, and contextual information. We combine these distinct information sources using Ripper, a system that learns sets of rules from data, and achieve an accuracy of 86.1% (over a baseline of 61.5%) on the British National Corpus.

Download Full-text

Adaptive particle swarm optimization algorithm based long short-term memory networks for sentiment analysis

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201644 ◽

2021 ◽

pp. 1-17

Author(s):

J. Shobana ◽

M. Murali

Keyword(s):

Particle Swarm Optimization ◽

Sentiment Analysis ◽

Language Processing ◽

Short Term Memory ◽

Contextual Information ◽

Particle Swarm ◽

Pso Algorithm ◽

Swarm Optimization ◽

Adaptive Particle Swarm Optimization ◽

Proposed Model

Text Sentiment analysis is the process of predicting whether a segment of text has opinionated or objective content and analyzing the polarity of the text’s sentiment. Understanding the needs and behavior of the target customer plays a vital role in the success of the business so the sentiment analysis process would help the marketer to improve the quality of the product as well as a shopper to buy the correct product. Due to its automatic learning capability, deep learning is the current research interest in Natural language processing. Skip-gram architecture is used in the proposed model for better extraction of the semantic relationships as well as contextual information of words. However, the main contribution of this work is Adaptive Particle Swarm Optimization (APSO) algorithm based LSTM for sentiment analysis. LSTM is used in the proposed model for understanding complex patterns in textual data. To improve the performance of the LSTM, weight parameters are enhanced by presenting the Adaptive PSO algorithm. Opposition based learning (OBL) method combined with PSO algorithm becomes the Adaptive Particle Swarm Optimization (APSO) classifier which assists LSTM in selecting optimal weight for the environment in less number of iterations. So APSO - LSTM ‘s ability in adjusting the attributes such as optimal weights and learning rates combined with the good hyper parameter choices leads to improved accuracy and reduces losses. Extensive experiments were conducted on four datasets proved that our proposed APSO-LSTM model secured higher accuracy over the classical methods such as traditional LSTM, ANN, and SVM. According to simulation results, the proposed model is outperforming other existing models.

Download Full-text

“That’s well good”: A Re-emergent Intensifier in Current British English

Journal of English Linguistics ◽

10.1177/0075424220979143 ◽

2020 ◽

pp. 007542422097914

Author(s):

Karin Aijmer

Keyword(s):

Social Class ◽

Fourteenth Century ◽

Social Factors ◽

British English ◽

Discourse Marker ◽

Time Gap ◽

British National Corpus ◽

Semantic Types ◽

Over Time ◽

National Corpus

Well has a long history and is found as an intensifier already in older English. It is argued that diachronically well has developed from its etymological meaning (‘in a good way’) on a cline of adverbialization to an intensifier and to a discourse marker. Well is replaced by other intensifiers in the fourteenth century but emerges in new uses in Present-Day English. The changes in frequency and use of the new intensifier are explored on the basis of a twenty-year time gap between the old British National Corpus (1994) and the new Spoken British National Corpus (2014). The results show that well increases in frequency over time and that it spreads to new semantic types of adjectives and participles, and is found above all in predicative structures with a copula. The emergence of a new well and its increase in frequency are also related to social factors such as the age, gender, and social class of the speakers, and the informal character of the conversation.

Download Full-text

Inclusion, Contrast and Polysemy in Dictionaries: The Relationship between Theory, Language Use and Lexicographic Practice

Research in Language ◽

10.1515/rela-2015-0001 ◽

2014 ◽

Vol 12 (4) ◽

pp. 319-340

Author(s):

Anu Koskela

Keyword(s):

Language Use ◽

Lexical Item ◽

British National Corpus ◽

Lexical Items ◽

The Relationship ◽

National Corpus

This paper explores the lexicographic representation of a type of polysemy that arises when the meaning of one lexical item can either include or contrast with the meaning of another, as in the case of dog/bitch, shoe/boot, finger/thumb and animal/bird. A survey of how such pairs are represented in monolingual English dictionaries showed that dictionaries mostly represent as explicitly polysemous those lexical items whose broader and narrower readings are more distinctive and clearly separable in definitional terms. They commonly only represented the broader readings for terms that are in fact frequently used in the narrower reading, as shown by data from the British National Corpus.

Download Full-text

A Corpora-Based Analysis of Rely on and Depend on

Journal of Critical Studies in Language and Literature ◽

10.46809/jcsll.v3i1.119 ◽

2021 ◽

Vol 3 (1) ◽

pp. 9-21

Author(s):

Namkil Kang

Keyword(s):

Comparative Analysis ◽

American English ◽

The Other ◽

Information State ◽

Other Hand ◽

British National Corpus ◽

National Corpus

The ultimate goal of this paper is to provide a comparative analysis of rely on and depend on in the Corpus of Contemporary American English and the British National Corpus. The COCA clearly shows that the expression rely on government is the most preferred by Americans, followed by rely on people, and rely on data. The COCA further indicates that the expression depend on slate is the most preferred by Americans, followed by depend on government, and depend on people. The BNC shows, on the other hand, that the expression rely on others is the most preferred by the British, followed by rely on people, and rely on friends. The BNC further indicates that depend on factors and depend on others are the most preferred by the British, followed by depend on age, and depend on food. Finally, in the COCA, the nouns government, luck, welfare, people, information, state, fossil, water, family, oil, food, and things are linked to both rely on and depend on, but many nouns are not still linked to both of them. On the other hand, in the BNC, only the nouns state, chance, government, and others are linked to both rely on and depend on, but many nouns are not still linked to both rely on and depend on. It can thus be inferred from this that rely on is slightly different from depend on in its use.

Download Full-text

The Correlations Between Combinational Arrangements and Semantic Implications of Utterly in the British National Corpus

The Journal of Humanities and Social sciences 21 ◽

10.22143/hss21.12.6.25 ◽

2021 ◽

Vol 12 (6) ◽

pp. 349-360

Author(s):

Jungyull Lee

Keyword(s):

British National Corpus ◽

National Corpus

Download Full-text

The Usage of CAUSE in Three Branches of Science

Higher Education Studies ◽

10.5539/hes.v6n2p109 ◽

2016 ◽

Vol 6 (2) ◽

pp. 109

Author(s):

Bei Yang ◽

Bin Chen

Keyword(s):

Social Science ◽

Language Learning ◽

Academic Writing ◽

Applied Science ◽

Human Beings ◽

Pure Science ◽

Research Findings ◽

British National Corpus ◽

Semantic Prosody ◽

National Corpus

<p>Semantic prosody is a concept that has been subject to considerable criticism and debate. One big concern is to what extent semantic prosody is domain or register-related. Previous studies reach the agreement that CAUSE has an overwhelmingly negative meaning in general English. Its semantic prosody remains controversial in academic writing, however, because of the size and register of the corpus used in different studies. In order to minimize the role that corpus choice has to play in determining the research findings, this paper uses sub-corpora from the British National Corpus to investigate the usage of CAUSE in different types of scientific writing. The results show that the occurrence of CAUSE is the highest in social science, less frequent in applied science, and the lowest in natural and pure science. Its semantic prosody is overwhelmingly negative in social science and applied science, and mainly neutral in natural and pure science. It seems that the verb CAUSE lacks its normal negative semantic prosody in contexts that do not refer to human beings. The implications of the findings for language learning are also discussed.</p>

Download Full-text

Social Differentiation in the Use of English Vocabulary

International Journal of Corpus Linguistics ◽

10.1075/ijcl.2.1.07ray ◽

1997 ◽

Vol 2 (1) ◽

pp. 133-152 ◽

Cited By ~ 62

Author(s):

Paul Rayson ◽

Geoffrey N. Leech ◽

Mary Hodges

Keyword(s):

Social Group ◽

Geographical Region ◽

Social Differentiation ◽

Future Research ◽

Analysis Tool ◽

Spoken English ◽

Transcription System ◽

Group A ◽

British National Corpus ◽

National Corpus

In this article, we undertake selective quantitative analyses of the demographi-cally-sampled spoken English component of the British National Corpus (for brevity, referred to here as the ''Conversational Corpus"). This is a subcorpus of c. 4.5 million words, in which speakers and respondents (see I below) are identified by such factors as gender, age, social group, and geographical region. Using a corpus analysis tool developed at Lancaster, we undertake a comparison of the vocabulary of speakers, highlighting those differences which are marked by a very high X2 value of difference between different sectors of the corpus according to gender, age, and social group. A fourth variable, that of geographical region of the United Kingdom, is not investigated in this article, although it remains a promising subject for future research. (As background we also briefly examine differences between spoken and written material in the British National Corpus [BNC].) This study is illustrative of the potentiality of the Conversational Corpus for future corpus-based research on social differentiation in the use of language. There are evident limitations, including (a) the reliance on vocabulary frequency lists and (b) the simplicity of the transcription system employed for the spoken part of the BNC The conclusion of the article considers future advances in the research paradigm illustrated here.

Download Full-text

Creating and using Web corpora

International Journal of Corpus Linguistics ◽

10.1075/ijcl.10.4.07the ◽

2005 ◽

Vol 10 (4) ◽

pp. 517-541 ◽

Cited By ~ 4

Author(s):

Mike Thelwall

Keyword(s):

Search Engine ◽

Web Sites ◽

Web Crawler ◽

Commercial Search Engine ◽

British National Corpus ◽

The Uk ◽

The University ◽

The Web ◽

National Corpus

The Web has recently been used as a corpus for linguistic investigations, often with the help of a commercial search engine. We discuss some potential problems with collecting data from commercial search engine and with using the Web as a corpus. We outline an alternative strategy for data collection, using a personal Web crawler. As a case study, the university Web sites of three nations (Australia, New Zealand and the UK) were crawled. The most frequent words were broadly consistent with non-Web written English, but with some academic-related words amongst the top 50 most frequent. It was also evident that the university Web sites contained a significant amount of non-English text, and academic Web English seems to be more future-oriented than British National Corpus written English.

Download Full-text