scholarly journals The Written British National Corpus 2014 – design and comparability

Author(s):  
Vaclav Brezina ◽  
Abi Hawtin ◽  
Tony McEnery

Abstract The British National Corpus 2014 is a major project led by Lancaster University to create a 100-million-word corpus of present day British English. This corpus has been constructed as a comparable counterpart of the original British National Corpus (referred to as the BNC1994 in this article), which was compiled in the early 1990s. This article starts with the justification of the project answering the question of ‘Why do we need a new BNC?’. We then provide a general overview of the construction of the Written British National Corpus 2014 (Written BNC2014); we also briefly discuss some issues of data collection before looking in detail at the design of the corpus. Compiling a large general corpus such as the Written BNC2014 has been a major undertaking involving teamwork and collaboration. It also required generosity on the part of the many individuals and organisations who contributed to the data collection.

2017 ◽  
Vol 22 (3) ◽  
pp. 319-344 ◽  
Author(s):  
Robbie Love ◽  
Claire Dembry ◽  
Andrew Hardie ◽  
Vaclav Brezina ◽  
Tony McEnery

Abstract This paper introduces the Spoken British National Corpus 2014, an 11.5-million-word corpus of orthographically transcribed conversations among L1 speakers of British English from across the UK, recorded in the years 2012–2016. After showing that a survey of the recent history of corpora of spoken British English justifies the compilation of this new corpus, we describe the main stages of the Spoken BNC2014’s creation: design, data and metadata collection, transcription, XML encoding, and annotation. In doing so we aim to (i) encourage users of the corpus to approach the data with sensitivity to the many methodological issues we identified and attempted to overcome while compiling the Spoken BNC2014, and (ii) inform (future) compilers of spoken corpora of the innovations we implemented to attempt to make the construction of corpora representing spontaneous speech in informal contexts more tractable, both logistically and practically, than in the past.


2020 ◽  
pp. 007542422097914
Author(s):  
Karin Aijmer

Well has a long history and is found as an intensifier already in older English. It is argued that diachronically well has developed from its etymological meaning (‘in a good way’) on a cline of adverbialization to an intensifier and to a discourse marker. Well is replaced by other intensifiers in the fourteenth century but emerges in new uses in Present-Day English. The changes in frequency and use of the new intensifier are explored on the basis of a twenty-year time gap between the old British National Corpus (1994) and the new Spoken British National Corpus (2014). The results show that well increases in frequency over time and that it spreads to new semantic types of adjectives and participles, and is found above all in predicative structures with a copula. The emergence of a new well and its increase in frequency are also related to social factors such as the age, gender, and social class of the speakers, and the informal character of the conversation.


2017 ◽  
Vol 22 (3) ◽  
pp. 375-402 ◽  
Author(s):  
Jacqueline Laws ◽  
Chris Ryder ◽  
Sylvia Jaworska

Abstract The aim of this paper is to ascertain the degree to which lexical diversity, density and creativity in everyday spoken British English have changed over a 20-year period, as a function of age and gender. Usage patterns of four verb-forming suffixes, -ate, -en, -ify and -ize, were compared in contemporary speech from the Spoken British National Corpus 2014 Sample (Spoken BNC2014S) with its 20-year old counterpart, the BNC1994’s demographically-sampled component (the Spoken BNC1994DS). Frequency comparisons revealed that verb suffixation is denser in the Spoken BNC2014S than in the Spoken BNC1994DS, with the exception of the -en suffix, the use of which has decreased, particularly among female and younger speakers in general. Male speakers and speakers in the 35–59 age range showed the greatest type diversity; there is evidence that this peak is occurring earlier in the more recent corpus. Contrary to expectations, female rather than male speakers produced the largest number of neologisms and rare forms.


Author(s):  
Robbie Love

Abstract This paper investigates changes in swearing usage in informal speech using large-scale corpus data, comparing the occurrence and social distribution of swear words in two corpora of informal spoken British English: the demographically-sampled part of the Spoken British National Corpus 1994 (BNC1994) and the Spoken British National Corpus 2014 (BNC2014); the compilation of the latter has facilitated large-scale, diachronic analyses of authentic spoken data on a scale which has, until now, not been possible. A form and frequency analysis of a set of 16 ‘pure’ swear word lemma forms is presented. The findings reveal that swearing occurrence is significantly lower in the Spoken BNC2014 but still within a comparable range to previous studies. Furthermore, FUCK is found to overtake BLOODY as the most popular swear word lemma. Finally, the social distribution of swearing across gender and age groups generally supports the findings of previous research: males still swear more than females, and swearing still peaks in the twenties and declines thereafter. However, the distribution of swearing according to socio-economic status is found to be more complex than expected in the 2010s and requires further investigation. This paper also reflects on some of the methodological challenges associated with making comparisons between the two corpora.


Author(s):  
Dr. Hamad Abdullah H Aldawsari

Many people use pause fillers such as um, erm, and er in order to signal to the other person that they have not finished speaking yet. This paper aims to investigate pause fillers and their relationship with the two sociolinguistic variables of age and gender. The data-driven analysis is based on the British National Corpus (BNC). The results show that the sociolinguistic variables of age and gender influence the use of pause fillers among British English speakers, which is proposed to be linked to the advancement of age and an improved fluency among female speakers.


Corpora ◽  
2010 ◽  
Vol 5 (1) ◽  
pp. 45-74 ◽  
Author(s):  
Soili Nokkonen

This paper explores need to, a semi-modal of obligation and necessity, and its semantic variation in connection with the sociolinguistic variables of gender, age and social class in the spoken demographic part of the British National Corpus. The semantic/pragmatic uses of need to include internal, deontic, dynamic and epistemic domains based both on traditional concepts and cross-linguistic studies. The sociolinguistic analysis applies the generalisations by Labov, but pays attention to the interactional styles and the communicative needs of the various social groups as well. The results reveal that need to is undergoing change. It shows monotonic distribution among adults, but it is slightly more common among men than women, and, in terms of social class, the upper middle class takes the lead. The semantic variation corroborates these findings – older speakers stick to the more traditional domains – but also reflects the gendered life stages and discourse styles of the speaker groups.


English Today ◽  
2018 ◽  
Vol 35 (1) ◽  
pp. 54-58
Author(s):  
Daria Bębeniec

The British National Corpus (BNC) has been available to the research community for more than two decades. Over the course of its three editions to date, this 100-million-word database, containing samples of both transcribed speech and written texts representing British English of the 1990s and earlier, has established itself as a valuable resource used around the world in a wide range of language-related applications.


2020 ◽  
pp. 1-26 ◽  
Author(s):  
PAULA RAUTIONAHO ◽  
ROBERT FUCHS

The spread of the progressive from dynamic to stative verbs started in the seventeenth century, and slowed down in the late twentieth century. The present study investigates recent change in the use of stative progressives in conversational British English from the early 1990s to the early 2010s. The analysis focuses on a total of 100 stative verb lemmata in the spoken, demographic sections of the original and new British National Corpus, restricted to a variable context where a progressive could potentially occur. Results indicate that overall, stative progressives have not become more frequent in the last twenty years, and that the group of stative verbs is highly heterogeneous. However, particular verbs, such as expect and think, do indeed combine more frequently with the progressive now, which could be the cause of the popular impression of the continuing spread of stative progressives. In addition to a frequency-based analysis, a distinctive collexeme analysis offers a more fine-grained analysis of the collostructional preferences of individual verb lemmata and semantic classes of stative verbs. This analysis reveals that the stative verbs are heterogenous and that the lemmata most distinctly associated with the progressive belong to the group of stance verbs.


2004 ◽  
Vol 13 (3) ◽  
pp. 235-268 ◽  
Author(s):  
Anthony McEnery ◽  
Zhonghua Xiao

Swearing is a part of everyday language use. To date it has been infrequently studied, though some recent work on swearing in American English, Australian English and British English has addressed the topic. Nonetheless, there is still no systematic account of swear-words in English. In terms of approaches, swearing has been approached from the points of view of history, lexicography, psycholinguistics and semantics. There have been few studies of swearing based on sociolinguistic variables such as gender, age and social class. Such a study has been difficult in the absence of corpus resources. With the production of the British National Corpus (BNC), a 100,000,000-word balanced corpus of modern British English, such a study became possible. In addition to parts of speech, the corpus is richly annotated with metadata pertaining to demographic features such as age, gender and social class, and textual features such as register, publication medium and domain. While bad language may be related to religion (e.g. Jesus, heaven, hell and damn), sex (e.g. fuck), racism (e.g. nigger), defecation (e.g. shit), homophobia (e.g. queer) and other matters, we will, in this article, examine only the pattern of uses of fuck and its morphological variants, because this is a typical swear-word that occurs frequently in the BNC. This article will build and expand upon the examination of fuck by McEnery et al. (2000) by examining the distribution pattern of fuck within and across spoken and written registers.


2015 ◽  
Vol 6 (3) ◽  
pp. 309-339 ◽  
Author(s):  
Jean-Marc Dewaele

AbstractThe present study investigates the differences between 414 L1 speakers of British and 556 L1 speakers of American English in self-reported frequency of swearing and in the understanding of the meaning, the perceived offensiveness and the frequency of use of 30 negative words extracted from the British National Corpus. Words ranged from mild to highly offensive, insulting and taboo. Statistical analysies revealed no significant differences between the groups in self reported frequency of swearing. The British English L1 participants reported a significantly better understanding of nearly half the chosen words from the corpus. They gave significantly higher offensiveness scores to four words (including “bollocks”) while the American English L1 participants rated a third of words as significantly more offensive (including “jerk”). British English L1 participants reported significantly more frequent use of a third of words (including “bollocks”) while the American English L1 participants reported more frequent use of half of the words (including “jerk”). This is interpreted as evidence of differences in semantic and conceptual representations of these words in both variants of English.


Sign in / Sign up

Export Citation Format

Share Document