FUNCTIONAL LOAD: TRANSCRIPTION AND ANALYSIS OF THE 10,000 MOST FREQUENT WORDS IN SPOKEN ENGLISH

Not all aspects of a language have equal importance for speakers or for learners. From the point of view of language description, functional load is a construct that attempts to establish quantifiable hierarchies of relevance among elements of a linguistic class. This paper makes use of analyses conducted on the 10-million-word spoken subcorpus of the British National Corpus in order to characterize what amounts to approximately 97% of the phonological forms and components heard and produced by fluent speakers in a range of contexts. Our aim is to provide segmental, sequential, and syllabic level rankings of spoken English that can serve as the basis for reference and subsequent work by language educators and researchers.

Download Full-text

Social Differentiation in the Use of English Vocabulary

International Journal of Corpus Linguistics ◽

10.1075/ijcl.2.1.07ray ◽

1997 ◽

Vol 2 (1) ◽

pp. 133-152 ◽

Cited By ~ 62

Author(s):

Paul Rayson ◽

Geoffrey N. Leech ◽

Mary Hodges

Keyword(s):

Social Group ◽

Geographical Region ◽

Social Differentiation ◽

Future Research ◽

Analysis Tool ◽

Spoken English ◽

Transcription System ◽

Group A ◽

British National Corpus ◽

National Corpus

In this article, we undertake selective quantitative analyses of the demographi-cally-sampled spoken English component of the British National Corpus (for brevity, referred to here as the ''Conversational Corpus"). This is a subcorpus of c. 4.5 million words, in which speakers and respondents (see I below) are identified by such factors as gender, age, social group, and geographical region. Using a corpus analysis tool developed at Lancaster, we undertake a comparison of the vocabulary of speakers, highlighting those differences which are marked by a very high X2 value of difference between different sectors of the corpus according to gender, age, and social group. A fourth variable, that of geographical region of the United Kingdom, is not investigated in this article, although it remains a promising subject for future research. (As background we also briefly examine differences between spoken and written material in the British National Corpus [BNC].) This study is illustrative of the potentiality of the Conversational Corpus for future corpus-based research on social differentiation in the use of language. There are evident limitations, including (a) the reliance on vocabulary frequency lists and (b) the simplicity of the transcription system employed for the spoken part of the BNC The conclusion of the article considers future advances in the research paradigm illustrated here.

Download Full-text

Basic Paradigmatic Correlations in the Semantic Field of the Concept Summer

Bulletin of Science and Practice ◽

10.33619/2414-2948/53/61 ◽

2020 ◽

Vol 6 (4) ◽

pp. 516-522

Author(s):

E. Grudeva

Keyword(s):

Russian Language ◽

Point Of View ◽

Semantic Field ◽

Species Relationships ◽

British National Corpus ◽

Time Of Year ◽

The Russian Language ◽

Illustrative Material ◽

National Corpus ◽

Comprehensive Study

Article is based on the materials of a comprehensive study of the concepts summer and autumn from the point of view of their perception by representatives of Russian and English linguistic cultures. This paper shows the features of the paradigmatic relations of Russian and English concept summer. The study was built on the identification of synonymous (quasi-synonymous), antonymic, (hypo) hyperonymic, or genus-species relationships, as well as the incompatibility relations of this concept. Study is based on the materials of the explanatory dictionaries and dictionaries of synonyms and antonyms of the Russian and English languages; illustrative material was taken from the National corpus of the Russian language and the British National Corpus. The analysis made it possible to conclude that the paradigmatic explication of the content of the concept summer most clearly actualizes only one of the four previously identified cognitive features of the concept, namely the sign ‘time of year, season’.

Download Full-text

Swearing in informal spoken English: 1990s–2010s

Text & Talk - An Interdisciplinary Journal of Language Discourse Communication Studies ◽

10.1515/text-2020-0051 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Robbie Love

Keyword(s):

Large Scale ◽

Economic Status ◽

Age Groups ◽

British English ◽

Spoken English ◽

Gender And Age ◽

The Social ◽

Social Distribution ◽

British National Corpus ◽

National Corpus

Abstract This paper investigates changes in swearing usage in informal speech using large-scale corpus data, comparing the occurrence and social distribution of swear words in two corpora of informal spoken British English: the demographically-sampled part of the Spoken British National Corpus 1994 (BNC1994) and the Spoken British National Corpus 2014 (BNC2014); the compilation of the latter has facilitated large-scale, diachronic analyses of authentic spoken data on a scale which has, until now, not been possible. A form and frequency analysis of a set of 16 ‘pure’ swear word lemma forms is presented. The findings reveal that swearing occurrence is significantly lower in the Spoken BNC2014 but still within a comparable range to previous studies. Furthermore, FUCK is found to overtake BLOODY as the most popular swear word lemma. Finally, the social distribution of swearing across gender and age groups generally supports the findings of previous research: males still swear more than females, and swearing still peaks in the twenties and declines thereafter. However, the distribution of swearing according to socio-economic status is found to be more complex than expected in the 2010s and requires further investigation. This paper also reflects on some of the methodological challenges associated with making comparisons between the two corpora.

Download Full-text

How large a vocabulary is needed for reading and listening?

10.26686/wgtn.12552221.v1 ◽

2020 ◽

Author(s):

Paul Nation

Keyword(s):

Vocabulary Size ◽

Written Text ◽

Modern Language ◽

Spoken English ◽

Spoken Text ◽

British National Corpus ◽

National Corpus

This article has two goals: to report on the trialling of fourteen 1,000 word-family lists made from the British National Corpus, and to use these lists to see what vocabulary size is needed for unassisted comprehension of written and spoken English. The trialling showed that the lists were properly sequenced and there were no glaring omissions from the lists. If 98% coverage of a text is needed for unassisted comprehension, then a 8,000 to 9,000 word-family vocabulary is needed for comprehension of written text and a vocabulary of 6,000 to 7,000 for spoken text. © 2006 The Canadian Modern Language Review/La Revue canadienne des langues vivantes.

Download Full-text

On the interchangeability of actually and really in spoken English: quantitative and qualitative evidence from corpora

English Language and Linguistics ◽

10.1017/s1360674311000323 ◽

2012 ◽

Vol 16 (1) ◽

pp. 151-170 ◽

Cited By ~ 3

Author(s):

MARK GRAY

Keyword(s):

Political Discussion ◽

Spoken Discourse ◽

Spoken English ◽

Qualitative Evidence ◽

Current Thinking ◽

Quantitative Analyses ◽

British National Corpus ◽

Test Current ◽

Semantic Properties ◽

National Corpus

Much of the research that has been carried out into the functions of actually and – to a lesser extent – really has focused on their so-called ‘discourse functions’. However, when they appear medially both actually and really are usually classified as intensifiers, and it has been argued that they are often interchangeable (see for example Lenk 1998; Oh 2000; Taglicht 2001). The purpose of this article is to test current thinking on this question by casting further light on the way medial actually and really are used in spoken discourse. Two complementary approaches are taken. Firstly, the interchangeability hypothesis is assessed on the basis of quantitative analyses of data from the British National Corpus. Secondly, the question of the extent to which actually and/or really function as intensifiers in preverbal position is addressed via a detailed qualitative analysis of data from a small corpus of recent BBC radio broadcasts of the panel-based political discussion programme Any Questions. The analyses presented here suggest that the interchangeability hypothesis is untenable and that the two adverbs have different core meanings, with any intensifying function being largely the result of interplay between the distinct semantic properties of each adverb and the discourse context.

Download Full-text

The word on the street

English Today ◽

10.1017/s0266078400008415 ◽

1995 ◽

Vol 11 (3) ◽

pp. 29-35

Author(s):

Michael Rundell

Keyword(s):

Spoken English ◽

British National Corpus ◽

National Corpus

New insights on spoken English from the British National Corpus

Download Full-text

The perfect participle paradox: some implications for the architecture of grammar

English Language and Linguistics ◽

10.1017/s1360674314000124 ◽

2014 ◽

Vol 18 (3) ◽

pp. 449-470 ◽

Cited By ~ 3

Author(s):

CARSTEN BREUL

Keyword(s):

Internal Structure ◽

American English ◽

Phonological Representation ◽

Point Of View ◽

Distributed Morphology ◽

Generative Syntax ◽

Know How ◽

Verb Phrase ◽

British National Corpus ◽

National Corpus

The topic of this article can be exemplified by the final clause of the following attested sentence: I don't know how he found out that she belonged to that lass, but find out he has. Clauses like this one show a preposed verb phrase that is headed by a plain verb whereas the non-preposed verb phrase of their canonical counterparts is obligatorily headed by a perfect participle (i.e. he has {found / *find} out). This peculiarity of verb phrase preposing, which will be referred to as the perfect participle paradox, has seldom been discussed. The article starts by showing that clauses that manifest the paradox are more frequent in the Corpus of Contemporary American English and in the British National Corpus than their non-paradoxical analogues with preposed canonical perfect participles. The article then looks at the paradox from the point of view of generative syntax, discusses and rejects previous analyses, and argues that a solution entails the rejection of two assumptions that have been associated with a lexicalist position, especially by proponents of distributed morphology. These are the assumptions that (a) a syntactic terminal is an item supplied by the lexicon and comprising a phonological representation and (b) that syntax may not manipulate the internal structure of syntactic terminals. The article proposes an analysis that is not based on these assumptions, but argues that the analysis does not entail the superiority of a distributed morphology framework.

Download Full-text

COMMUNICATIVE APPROACH WITH REFERENCE TO THE CORPUS-BASED DATA ANALYSIS BY TEACHING FOREIGN LANGUAGE (THE CASE OF TENSE-ASPECT FORMS OF THE VERB)

Vestnik SSUGT (Siberian State University of Geosystems and Technologies) ◽

10.33764/2411-1759-2021-26-1-163-168 ◽

2021 ◽

Vol 26 (1) ◽

pp. 163-168

Author(s):

Arsentiy I. Bochkarev ◽

◽

Sergey S. Zhdanov ◽

Keyword(s):

Data Analysis ◽

Foreign Language ◽

Language Education ◽

Point Of View ◽

British English ◽

Corpus Data ◽

British National Corpus ◽

Communicative Approach ◽

Oriented Approach ◽

National Corpus

The paper deals with the frequency of tense-aspect forms in British English for justifying the selec-tion of language phenomena from the linguistic point of view. This approach is applied through educa-tional process at universities. Moreover, communicative oriented approach to language education should be based on this selection. It presupposes educational orientation to real communicative situa-tions. Based on analyzing corpus data from the British National Corpus all tense-aspect forms can be divided into four groups: rare, occasional, frequent and constant. The authors have made the algorithm for learning tense-aspect forms in British English based on the frequency of these forms.

Download Full-text

Words We Would Want: Comparison of Three Pre-programmed Vocabulary Sets With Frequently Used Words in English

Perspectives on Augmentative and Alternative Communication ◽

10.1044/aac17.4.156 ◽

2008 ◽

Vol 17 (4) ◽

pp. 156-164

Author(s):

Bruce Helmbold

Keyword(s):

Descriptive Study ◽

Spoken English ◽

British National Corpus ◽

Word Frequencies ◽

National Corpus

Abstract In this descriptive study, three pre-programmed vocabulary sets—Picture WordPower 45 location (Inman Innovations), Unity 45 Full vs. 4.06 (Prentke-Romich Company), and Gateway 60 vs. 1.06.18 (Dynavox Technologies)—were examined for word-based vocabulary content and keystrokes per word. The vocabulary contents of the each set were then compared to the thousand most common words as identified by two different listings apiece, that published in Word Frequencies in Written and Spoken English based on the British National Corpus (BNC), and Wiktionary TV/Movie Frequency Lists (2006). The pre-programmed vocabulary set best representing these frequency lists was Unity 45 Full, followed by Gateway 60 and Picture WordPower. The vocabulary sets using the fewest average keystrokes per word, based on frequency lists, were Picture WordPower and Gateway 60 followed by Unity 45 Full. Results provide an aid for evaluating the comparative merits of pre-programmed vocabulary sets, such as inclusion of frequently used English words and relative keystroke savings.

Download Full-text

Based on Research Connecting Word Corpus of Spoken English

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.1030-1032.2689 ◽

2014 ◽

Vol 1030-1032 ◽

pp. 2689-2692

Author(s):

Yong Mei Peng ◽

Yun Hua Qu

Keyword(s):

Native Speakers ◽

Chinese Students ◽

Spoken Word ◽

English Teaching ◽

Spoken English ◽

English Majors ◽

Reference Corpus ◽

Native Speakers Of English ◽

British National Corpus ◽

National Corpus

This paper examines our spoken English Majors used to connect words and characteristics. Corpus used the "Chinese students Spoken and Written English Corpus (SWECCL2.0)" in the spoken corpus SECCL2.0, reference corpus used in the British National Corpus BNC spoken corpus BNC Spoken Corpus (BNC / S). The study found that of native speakers of English majors and English spoken words using both common connections are also differences. Meanwhile, China's English Majors spoken word there are multiple connections with the situation misuse. Based on the findings, the article on spoken English teaching some suggestions.

Download Full-text