scholarly journals Identifying and describing functional discourse units in the BNC Spoken 2014

Author(s):  
Jesse Egbert ◽  
Stacey Wizner ◽  
Daniel Keller ◽  
Douglas Biber ◽  
Tony McEnery ◽  
...  

Abstract On the surface, it appears that conversational language is produced in a stream of spoken utterances. In reality conversation is composed of contiguous units that are characterized by coherent communicative purposes. A large number of important research questions about the nature of conversational discourse could be addressed if researchers could investigate linguistic variation across functional discourse units. To date, however, no corpus of conversational language has been annotated according to functional units, and there are no existing methods for carrying out this type of annotation. We introduce a new method for segmenting transcribed conversation files into discourse units and characterizing those units based on their communicative purposes. In this paper, the development and piloting of this method is described in detail and the final framework is presented. We conclude with a discussion of an ongoing project where we are applying this coding framework to the British National Corpus Spoken 2014.

sjesr ◽  
2020 ◽  
Vol 3 (4) ◽  
pp. 262-267
Author(s):  
Abdul Ghaffar Bhatti ◽  
Muhammad Imran ◽  
Muhammad Younas

Technology plays a pivotal role in the ESL teaching and education sector. In language teaching, gender and language research mostly favors the idea of potential differences in language use between men and women. This paper explores different indicators of gender in the writing of males and females in a large subset of the British National Corpus (BNC) covering the domain of fiction with the application of the Corpus tool. Robin Lakoff's four key linguistic terms that mark female language have been used as benchmarks against which the study has been conducted. Previous researchers like Argamon, Koppel, and Shimoni claim that females use more pronouns and a smaller number of nouns as compared to men. The hits and frequencies of Lakoff's terms and researchers' claims have been checked on BNC to get at the empirical findings. Taking general corpus BNC, corpus research method has been used to answer the research questions. The study found a substantial difference in the documents authored by male and female written text. It was also found that females use many more pronouns and males use many more nouns. Assumptions made regarding Lakoff's terms have been partially substantiated since the results vary a little concerning the use of empty adjectives like 'cute' and 'divine'. The work is a valuable addition to the existing corpus of knowledge about gender differences in language and it provides space for researchers to work in even broader perspectives.


2005 ◽  
Vol 10 (4) ◽  
pp. 489-516 ◽  
Author(s):  
Bayan Abu Shawar ◽  
Eric Steven Atwell

A chatbot is a machine conversation system which interacts with human users via natural conversational language. Software to machine-learn conversational patterns from a transcribed dialogue corpus has been used to generate a range of chatbots speaking various languages and sublanguages including varieties of English, as well as French, Arabic and Afrikaans. This paper presents a program to learn from spoken transcripts of the Dialogue Diversity Corpus of English, the Minnesota French Corpus, the Corpus of Spoken Afrikaans, the Qur'an Arabic-English parallel corpus, and the British National Corpus of English; we discuss the problems which arose during learning and testing. Two main goals were achieved from the automation process. One was the ability to generate different versions of the chatbot in different languages, bringing chatbot technology to languages with few if any NLP resources: the corpus-based learning techniques transferred straightforwardly to develop chatbots for Afrikaans and Qur'anic Arabic. The second achievement was the ability to learn a very large number of categories within a short time, saving effort and errors in doing such work manually: we generated more than one million AIML categories or conversation-rules from the BNC corpus, 20 times the size of existing AIML rule-sets, and probably the biggest AI Knowledge-Base ever.


2021 ◽  
Vol 25 (1) ◽  
pp. 39-42
Author(s):  
Shuochao Yao ◽  
Jinyang Li ◽  
Dongxin Liu ◽  
Tianshi Wang ◽  
Shengzhong Liu ◽  
...  

Future mobile and embedded systems will be smarter and more user-friendly. They will perceive the physical environment, understand human context, and interact with end-users in a human-like fashion. Daily objects will be capable of leveraging sensor data to perform complex estimation and recognition tasks, such as recognizing visual inputs, understanding voice commands, tracking objects, and interpreting human actions. This raises important research questions on how to endow low-end embedded and mobile devices with the appearance of intelligence despite their resource limitations.


2020 ◽  
pp. 007542422097914
Author(s):  
Karin Aijmer

Well has a long history and is found as an intensifier already in older English. It is argued that diachronically well has developed from its etymological meaning (‘in a good way’) on a cline of adverbialization to an intensifier and to a discourse marker. Well is replaced by other intensifiers in the fourteenth century but emerges in new uses in Present-Day English. The changes in frequency and use of the new intensifier are explored on the basis of a twenty-year time gap between the old British National Corpus (1994) and the new Spoken British National Corpus (2014). The results show that well increases in frequency over time and that it spreads to new semantic types of adjectives and participles, and is found above all in predicative structures with a copula. The emergence of a new well and its increase in frequency are also related to social factors such as the age, gender, and social class of the speakers, and the informal character of the conversation.


2014 ◽  
Vol 12 (4) ◽  
pp. 319-340
Author(s):  
Anu Koskela

This paper explores the lexicographic representation of a type of polysemy that arises when the meaning of one lexical item can either include or contrast with the meaning of another, as in the case of dog/bitch, shoe/boot, finger/thumb and animal/bird. A survey of how such pairs are represented in monolingual English dictionaries showed that dictionaries mostly represent as explicitly polysemous those lexical items whose broader and narrower readings are more distinctive and clearly separable in definitional terms. They commonly only represented the broader readings for terms that are in fact frequently used in the narrower reading, as shown by data from the British National Corpus.  


2021 ◽  
Vol 3 (1) ◽  
pp. 9-21
Author(s):  
Namkil Kang

The ultimate goal of this paper is to provide a comparative analysis of rely on and depend on in the Corpus of Contemporary American English and the British National Corpus. The COCA clearly shows that the expression rely on government is the most preferred by Americans, followed by rely on people, and rely on data. The COCA further indicates that the expression depend on slate is the most preferred by Americans, followed by depend on government, and depend on people. The BNC shows, on the other hand, that the expression rely on others is the most preferred by the British, followed by rely on people, and rely on friends. The BNC further indicates that depend on factors and depend on others are the most preferred by the British, followed by depend on age, and depend on food. Finally, in the COCA, the nouns government, luck, welfare, people, information, state, fossil, water, family, oil, food, and things are linked to both rely on and depend on, but many nouns are not still linked to both of them. On the other hand, in the BNC, only the nouns state, chance, government, and others are linked to both rely on and depend on, but many nouns are not still linked to both rely on and depend on. It can thus be inferred from this that rely on is slightly different from depend on in its use.


2016 ◽  
Vol 6 (2) ◽  
pp. 109
Author(s):  
Bei Yang ◽  
Bin Chen

<p>Semantic prosody is a concept that has been subject to considerable criticism and debate. One big concern is to what extent semantic prosody is domain or register-related. Previous studies reach the agreement that CAUSE has an overwhelmingly negative meaning in general English. Its semantic prosody remains controversial in academic writing, however, because of the size and register of the corpus used in different studies. In order to minimize the role that corpus choice has to play in determining the research findings, this paper uses sub-corpora from the British National Corpus to investigate the usage of CAUSE in different types of scientific writing. The results show that the occurrence of CAUSE is the highest in social science, less frequent in applied science, and the lowest in natural and pure science. Its semantic prosody is overwhelmingly negative in social science and applied science, and mainly neutral in natural and pure science. It seems that the verb CAUSE lacks its normal negative semantic prosody in contexts that do not refer to human beings. The implications of the findings for language learning are also discussed.</p>


1997 ◽  
Vol 2 (1) ◽  
pp. 133-152 ◽  
Author(s):  
Paul Rayson ◽  
Geoffrey N. Leech ◽  
Mary Hodges

In this article, we undertake selective quantitative analyses of the demographi-cally-sampled spoken English component of the British National Corpus (for brevity, referred to here as the ''Conversational Corpus"). This is a subcorpus of c. 4.5 million words, in which speakers and respondents (see I below) are identified by such factors as gender, age, social group, and geographical region. Using a corpus analysis tool developed at Lancaster, we undertake a comparison of the vocabulary of speakers, highlighting those differences which are marked by a very high X2 value of difference between different sectors of the corpus according to gender, age, and social group. A fourth variable, that of geographical region of the United Kingdom, is not investigated in this article, although it remains a promising subject for future research. (As background we also briefly examine differences between spoken and written material in the British National Corpus [BNC].) This study is illustrative of the potentiality of the Conversational Corpus for future corpus-based research on social differentiation in the use of language. There are evident limitations, including (a) the reliance on vocabulary frequency lists and (b) the simplicity of the transcription system employed for the spoken part of the BNC The conclusion of the article considers future advances in the research paradigm illustrated here.


2005 ◽  
Vol 10 (4) ◽  
pp. 517-541 ◽  
Author(s):  
Mike Thelwall

The Web has recently been used as a corpus for linguistic investigations, often with the help of a commercial search engine. We discuss some potential problems with collecting data from commercial search engine and with using the Web as a corpus. We outline an alternative strategy for data collection, using a personal Web crawler. As a case study, the university Web sites of three nations (Australia, New Zealand and the UK) were crawled. The most frequent words were broadly consistent with non-Web written English, but with some academic-related words amongst the top 50 most frequent. It was also evident that the university Web sites contained a significant amount of non-English text, and academic Web English seems to be more future-oriented than British National Corpus written English.


Sign in / Sign up

Export Citation Format

Share Document