The Spoken British National Corpus 2014 – a new initiative launched by Lancaster University and Cambridge University Press

The British National Corpus (BNC) has been available to the research community for more than two decades. Over the course of its three editions to date, this 100-million-word database, containing samples of both transcribed speech and written texts representing British English of the 1990s and earlier, has established itself as a valuable resource used around the world in a wide range of language-related applications.

Download Full-text

“That’s well good”: A Re-emergent Intensifier in Current British English

Journal of English Linguistics ◽

10.1177/0075424220979143 ◽

2020 ◽

pp. 007542422097914

Author(s):

Karin Aijmer

Keyword(s):

Social Class ◽

Fourteenth Century ◽

Social Factors ◽

British English ◽

Discourse Marker ◽

Time Gap ◽

British National Corpus ◽

Semantic Types ◽

Over Time ◽

National Corpus

Well has a long history and is found as an intensifier already in older English. It is argued that diachronically well has developed from its etymological meaning (‘in a good way’) on a cline of adverbialization to an intensifier and to a discourse marker. Well is replaced by other intensifiers in the fourteenth century but emerges in new uses in Present-Day English. The changes in frequency and use of the new intensifier are explored on the basis of a twenty-year time gap between the old British National Corpus (1994) and the new Spoken British National Corpus (2014). The results show that well increases in frequency over time and that it spreads to new semantic types of adjectives and participles, and is found above all in predicative structures with a copula. The emergence of a new well and its increase in frequency are also related to social factors such as the age, gender, and social class of the speakers, and the informal character of the conversation.

Download Full-text

LOVE in English and Polish

Cognitive Studies | Études cognitives ◽

10.11649/cs.1312 ◽

2017 ◽

Author(s):

Małgorzata Brożyna Reczko

Keyword(s):

Analytical Method ◽

Theoretical Basis ◽

Abstract Concept ◽

Contrastive Analysis ◽

The World ◽

Similarities And Differences ◽

British National Corpus ◽

National Corpus

LOVE in English and PolishThe paper presents a sample contrastive analysis of the linguistic picture of love in English and Polish. The material used in the survey is drawn from lexicographic data, including the British National Corpus and Narodowy Korpus Języka Polskiego [National Corpus of Polish]. The paper focuses on the similarities and differences in conceptualizing the abstract concept of love in the English and Polish languages. An analytical method, developed by Bartmiński and associates, serves as the theoretical basis for the reconstruction of the linguistic picture of the world. MIŁOŚĆ w języku angielskim i polskimNiniejszy artykuł to próba kontrastywnego porównania językowego obrazu świata MIŁOŚCI w języku angielskim i polskim. Materiał badawczy pochodzi głównie ze źródeł leksykograficznych: słowników oraz korpusów (Narodowego Korpusu Języka Polskiego oraz z korpusu języka angielskiego British National Corpus). Celem badania było poszukiwanie podobieństw i różnic w konceptualizacji MIŁOŚCI w tych dwóch językach. Metoda badawcza została zaczerpnięta z prac J. Bartmińskiego i dotyczy rekonstrukcji językowego obrazu świata różnych pojęć.

Download Full-text

A diachronic corpus-based study into the effects of age and gender on the usage patterns of verb-forming suffixation in spoken British English

International Journal of Corpus Linguistics ◽

10.1075/ijcl.22.3.04law ◽

2017 ◽

Vol 22 (3) ◽

pp. 375-402 ◽

Cited By ~ 1

Author(s):

Jacqueline Laws ◽

Chris Ryder ◽

Sylvia Jaworska

Keyword(s):

Age And Gender ◽

Lexical Diversity ◽

British English ◽

Usage Patterns ◽

British National Corpus ◽

And Gender ◽

Age Range ◽

National Corpus

Abstract The aim of this paper is to ascertain the degree to which lexical diversity, density and creativity in everyday spoken British English have changed over a 20-year period, as a function of age and gender. Usage patterns of four verb-forming suffixes, -ate, -en, -ify and -ize, were compared in contemporary speech from the Spoken British National Corpus 2014 Sample (Spoken BNC2014S) with its 20-year old counterpart, the BNC1994’s demographically-sampled component (the Spoken BNC1994DS). Frequency comparisons revealed that verb suffixation is denser in the Spoken BNC2014S than in the Spoken BNC1994DS, with the exception of the -en suffix, the use of which has decreased, particularly among female and younger speakers in general. Male speakers and speakers in the 35–59 age range showed the greatest type diversity; there is evidence that this peak is occurring earlier in the more recent corpus. Contrary to expectations, female rather than male speakers produced the largest number of neologisms and rare forms.

Download Full-text

Swearing in informal spoken English: 1990s–2010s

Text & Talk - An Interdisciplinary Journal of Language Discourse Communication Studies ◽

10.1515/text-2020-0051 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Robbie Love

Keyword(s):

Large Scale ◽

Economic Status ◽

Age Groups ◽

British English ◽

Spoken English ◽

Gender And Age ◽

The Social ◽

Social Distribution ◽

British National Corpus ◽

National Corpus

Abstract This paper investigates changes in swearing usage in informal speech using large-scale corpus data, comparing the occurrence and social distribution of swear words in two corpora of informal spoken British English: the demographically-sampled part of the Spoken British National Corpus 1994 (BNC1994) and the Spoken British National Corpus 2014 (BNC2014); the compilation of the latter has facilitated large-scale, diachronic analyses of authentic spoken data on a scale which has, until now, not been possible. A form and frequency analysis of a set of 16 ‘pure’ swear word lemma forms is presented. The findings reveal that swearing occurrence is significantly lower in the Spoken BNC2014 but still within a comparable range to previous studies. Furthermore, FUCK is found to overtake BLOODY as the most popular swear word lemma. Finally, the social distribution of swearing across gender and age groups generally supports the findings of previous research: males still swear more than females, and swearing still peaks in the twenties and declines thereafter. However, the distribution of swearing according to socio-economic status is found to be more complex than expected in the 2010s and requires further investigation. This paper also reflects on some of the methodological challenges associated with making comparisons between the two corpora.

Download Full-text

A Corpus-based Study of the Use of Pause Fillers Among British English Speakers

Journal of English Language Teaching and Applied Linguistics ◽

10.32996/jeltal.2021.3.12.2 ◽

2021 ◽

Vol 3 (12) ◽

pp. 09-16

Author(s):

Dr. Hamad Abdullah H Aldawsari

Keyword(s):

The Other ◽

Data Driven ◽

English Speakers ◽

Age And Gender ◽

British English ◽

Gender Influence ◽

British National Corpus ◽

And Gender ◽

National Corpus

Many people use pause fillers such as um, erm, and er in order to signal to the other person that they have not finished speaking yet. This paper aims to investigate pause fillers and their relationship with the two sociolinguistic variables of age and gender. The data-driven analysis is based on the British National Corpus (BNC). The results show that the sociolinguistic variables of age and gender influence the use of pause fillers among British English speakers, which is proposed to be linked to the advancement of age and an improved fluency among female speakers.

Download Full-text

Creating the Thai National Corpus

MANUSYA ◽

10.1163/26659077-01003001 ◽

2007 ◽

Vol 10 (3) ◽

pp. 4-17 ◽

Cited By ~ 6

Author(s):

Wirote Aroonmanakun

Keyword(s):

Comparative Study ◽

Language Teaching ◽

Written Language ◽

Text Encoding ◽

Text Encoding Initiative ◽

Linguistic Research ◽

Written Texts ◽

Text Types ◽

British National Corpus ◽

National Corpus

This paper reports on the progress of Thai National Corpus development. The TNC is designed as a general corpus of standard Thai. Only written texts are collected in the first phase. It aims to include at least eighty million words. Various text types produced by various authors are included in the TNC so that it would closely represent written language in general. Texts are word segmented and tagged following the Text Encoding Initiative (TEl) guidelines on text encoding. The TNC was designed as a resource for general applications, such as lexicography, language teaching, and linguistic research. In addition, the TNC is designed to be comparable to the British National Corpus so that a comparative study between the two languages is also possible.

Download Full-text

‘How many taxis there needs to be?’ The sociolinguistic variation of need to in spoken British English

Corpora ◽

10.3366/cor.2010.0003 ◽

2010 ◽

Vol 5 (1) ◽

pp. 45-74 ◽

Cited By ~ 2

Author(s):

Soili Nokkonen

Keyword(s):

Social Class ◽

Middle Class ◽

Social Groups ◽

Life Stages ◽

Sociolinguistic Variation ◽

British English ◽

Semantic Variation ◽

British National Corpus ◽

Upper Middle Class ◽

National Corpus

This paper explores need to, a semi-modal of obligation and necessity, and its semantic variation in connection with the sociolinguistic variables of gender, age and social class in the spoken demographic part of the British National Corpus. The semantic/pragmatic uses of need to include internal, deontic, dynamic and epistemic domains based both on traditional concepts and cross-linguistic studies. The sociolinguistic analysis applies the generalisations by Labov, but pays attention to the interactional styles and the communicative needs of the various social groups as well. The results reveal that need to is undergoing change. It shows monotonic distribution among adults, but it is slightly more common among men than women, and, in terms of social class, the upper middle class takes the lead. The semantic variation corroborates these findings – older speakers stick to the more traditional domains – but also reflects the gendered life stages and discourse styles of the speaker groups.

Download Full-text

Computational Tools for Analysing Talk

Nordic Journal of Linguistics ◽

10.1017/s0332586500002262 ◽

1990 ◽

Vol 13 (2) ◽

pp. 187-199

Author(s):

Kim Plunkett

Keyword(s):

Data Exchange ◽

Child Language ◽

Research Community ◽

Exchange System ◽

Computational Tools ◽

Future Developments ◽

Software Packages ◽

Wide Range ◽

The World ◽

Language Data

The Child Language Data Exchange System — CHILDES — is the largest child language archive in the world. The archive includes a wide range of languages covering both normal and abnormal populations. The database is freely accessible to the research community and the user is supported with guidelines for carrying out transcription work and software packages for the automatic analysis of transcriptions. The article provides a brief overview of the CHAT transcription notation and the CLAN programs that can be used to analyse transcripts written in CHAT format. Current drawbacks of the CHILDES system are discussed and some pointers to future developments higlighted.

Download Full-text

Recent change in stative progressives: a collostructional investigation of British English in 1994 and 2014

English Language and Linguistics ◽

10.1017/s136067431900042x ◽

2020 ◽

pp. 1-26 ◽

Cited By ~ 1

Author(s):

PAULA RAUTIONAHO ◽

ROBERT FUCHS

Keyword(s):

Twentieth Century ◽

Seventeenth Century ◽

Recent Change ◽

British English ◽

Fine Grained ◽

Semantic Classes ◽

Late Twentieth ◽

British National Corpus ◽

Late Twentieth Century ◽

National Corpus

The spread of the progressive from dynamic to stative verbs started in the seventeenth century, and slowed down in the late twentieth century. The present study investigates recent change in the use of stative progressives in conversational British English from the early 1990s to the early 2010s. The analysis focuses on a total of 100 stative verb lemmata in the spoken, demographic sections of the original and new British National Corpus, restricted to a variable context where a progressive could potentially occur. Results indicate that overall, stative progressives have not become more frequent in the last twenty years, and that the group of stative verbs is highly heterogeneous. However, particular verbs, such as expect and think, do indeed combine more frequently with the progressive now, which could be the cause of the popular impression of the continuing spread of stative progressives. In addition to a frequency-based analysis, a distinctive collexeme analysis offers a more fine-grained analysis of the collostructional preferences of individual verb lemmata and semantic classes of stative verbs. This analysis reveals that the stative verbs are heterogenous and that the lemmata most distinctly associated with the progressive belong to the group of stance verbs.

Download Full-text

Swearing in Modern British English: The Case of Fuck in the BNC

Language and Literature ◽

10.1177/0963947004044873 ◽

2004 ◽

Vol 13 (3) ◽

pp. 235-268 ◽

Cited By ~ 46

Author(s):

Anthony McEnery ◽

Zhonghua Xiao

Keyword(s):

Social Class ◽

American English ◽

Systematic Account ◽

British English ◽

Parts Of Speech ◽

Points Of View ◽

British National Corpus ◽

Australian English ◽

Morphological Variants ◽

National Corpus

Swearing is a part of everyday language use. To date it has been infrequently studied, though some recent work on swearing in American English, Australian English and British English has addressed the topic. Nonetheless, there is still no systematic account of swear-words in English. In terms of approaches, swearing has been approached from the points of view of history, lexicography, psycholinguistics and semantics. There have been few studies of swearing based on sociolinguistic variables such as gender, age and social class. Such a study has been difficult in the absence of corpus resources. With the production of the British National Corpus (BNC), a 100,000,000-word balanced corpus of modern British English, such a study became possible. In addition to parts of speech, the corpus is richly annotated with metadata pertaining to demographic features such as age, gender and social class, and textual features such as register, publication medium and domain. While bad language may be related to religion (e.g. Jesus, heaven, hell and damn), sex (e.g. fuck), racism (e.g. nigger), defecation (e.g. shit), homophobia (e.g. queer) and other matters, we will, in this article, examine only the pattern of uses of fuck and its morphological variants, because this is a typical swear-word that occurs frequently in the BNC. This article will build and expand upon the examination of fuck by McEnery et al. (2000) by examining the distribution pattern of fuck within and across spoken and written registers.

Download Full-text