Exploiting a Large Spoken Corpus

1999 ◽  
Vol 4 (1) ◽  
pp. 29-52 ◽  
Author(s):  
Ylva Berglund

The British National Corpus (BNC) contains a spoken component of about 10 million words, consisting of spoken language of various kinds produced by different speakers in a variety of situations. Starting from an end-user s perspective, this paper surveys the potential of this resource and some possible problems one might encounter if not fully versed in the details of the compilation and coding plans. Among the issues touched upon are questions relating to the composition of the component, the transcription principles employed, and points relating to the nature and coverage of the mark-up. By way of illustration, examples are drawn from a case study of the variant forms gonna and going to.

2005 ◽  
Vol 10 (4) ◽  
pp. 517-541 ◽  
Author(s):  
Mike Thelwall

The Web has recently been used as a corpus for linguistic investigations, often with the help of a commercial search engine. We discuss some potential problems with collecting data from commercial search engine and with using the Web as a corpus. We outline an alternative strategy for data collection, using a personal Web crawler. As a case study, the university Web sites of three nations (Australia, New Zealand and the UK) were crawled. The most frequent words were broadly consistent with non-Web written English, but with some academic-related words amongst the top 50 most frequent. It was also evident that the university Web sites contained a significant amount of non-English text, and academic Web English seems to be more future-oriented than British National Corpus written English.


1997 ◽  
Vol 2 (1) ◽  
pp. 23-64 ◽  
Author(s):  
Beth Levin ◽  
Grace Song

This paper demonstrates the essential role of corpus data in the development of a theory that explains and predicts word behavior. We make this point through a case study of verbs of sound, drawing our evidence primarily from the British National Corpus. We begin by considering pretheoretic notions of the verbs of sound as presented in corpus-based dictionaries and then contrast them with the predictions made by a theory of syntax, as represented by Chomsky's Government-Binding framework. We identify and classify the transitive uses of sixteen representative verbs of sound found in the corpus data. Finally, we consider what a linguistic account with both syntactic and lexical semantic components has to offer as an explanation of observed differences in the behavior of the sample verbs.


2019 ◽  
Vol 1 (2) ◽  
pp. 34
Author(s):  
Entusiastik -

This paper analysed the use of corpus and spoken language features in the English Language Teaching (ELT) coursebook “Touchstone”. The corpus analysis was carried out by using the British National Corpus (BNC) which was chosen for its easy and free access. In doing the spoken language analysis, I refer to McCarthy and Carter’s (2015, p.5) argument which take the grammar of conversation as ‘the benchmark for a grammar of speaking’ by considering features such as ellipsis, heads and teailsm lexical bundles, and vagueness. The analysis indicated that the language used in this coursebook signified a certain level of authentic and natural language, although areas of improvement were also found.


2014 ◽  
Vol 1 (2) ◽  
pp. 236-251
Author(s):  
Yan Ding ◽  
Dirk Noël

This paper addresses the question of conceptual diversity in the seat of emotions via a corpus-based case study of diachronic variation in the metaphorical containers of sadness in English. Data sourced from Literature Online, Early English Books Online and the British National Corpus reveal three types of metaphorical containers of sadness: (1) the human body in general and whatever is either literally internal to it, or at least often conceptualized as such, such as the heart and the soul; (2) external body parts and different kinds of superficial body features, such as the eyes and the voice; and (3) containers that are not inherently connected with the human body, such as a room and a sonnet. A comparison between the types of metaphorical containers in different periods shows that whereas the percentage of the third type of containers remains constant by and large, there has been a noticeable increase in the percentage of the second type of containers and a quite obvious decrease in the percentage of the first type of containers. It is argued that the diachronic variation in the relative frequencies of the two types of containers may have been related to a shift in the general conception of body and emotions, and specifically to the gradual disintegration of humoral theory.


2018 ◽  
Vol 23 (1) ◽  
pp. 1-27 ◽  
Author(s):  
Jacqueline Laws ◽  
Chris Ryder

Abstract The aim of this paper is to identify the effect of register variation in spoken British English on the occurrence of the four principal verb-forming suffixes: ‑ate, ‑en, ‑ify and ‑ize, by building on the work of Biber et al. (1999), Plag et al. (1999) and Schmid (2011). Register variation effects were compared between the less formal Demographically-Sampled and the more formal Context-Governed components of the original 1994 version of the British National Corpus. The pattern of ‑ize derivatives revealed the most marked register-based differences with respect to frequency counts and the creation of neologisms, whereas ‑en derivatives varied the least compared with the other three suffixes. Quantitative and qualitative analyses of these suffix profiles in the context of spoken language reveal markers of register formality that have not hitherto been explored; derivative usage patterns provide an additional dimension to previous research on register variation which has mainly focused on grammatical and lexical features of language.


ReCALL ◽  
2010 ◽  
Vol 22 (2) ◽  
pp. 191-211
Author(s):  
Silvia Molina-Plaza ◽  
Eduardo de Gregorio-Godeo

AbstractWithin the context of on-going research,1 this paper explores the pedagogical implications of contrastive analyses of multiword units in English and Spanish based on electronic corpora as a CALL resource. The main tenets of collocations from a contrastive perspective – and the points of contact and departure between both languages – are discussed prior to examining the commonest types of verb + noun combinations as a significant case of so-called ‘de-lexicalized’, ‘light’, ‘empty’, ‘thin’, ‘stretched’ or ‘support verbs’. A qualitatively and quantitatively-oriented case study is accordingly conducted, determining the weight of dar in support verb constructions within the Corpus de Referencia del Español Actual (CREA) and of the English equivalent stretched verb constructions with give within the British National Corpus (BNC). Based on the empirical data obtained in this way, this paper provides relevant insights for more accurate translations, helping to enhance the collocational competence of L2 students, who tend to avoid constructions including empty verbs like give in favour of full-verb forms. The detailed findings in this paper come to shed light on the potential of CALL resources for improving the collocational usage of foreign-language learners, as quantitative and qualitative comparisons of collocations based on electronic corpora serve to highlight the similarities and, more importantly, the lexical and typological differences between both languages, thereby substantiating the invaluable role that corpus analysis may play for language teaching in general and for collocational knowledge and proficiency in particular.


1997 ◽  
Vol 2 (2) ◽  
pp. 259-280 ◽  
Author(s):  
Aquilino Sánchez ◽  
Pascual Cantos-Gomez

Various research centres and publishing companies all around the world have been developing corpus resources for many years, and there has been a growing awareness throughout the eighties of their importance to linguistic and lexicographic work. To give some idea of scale, the British National Corpus contains 100 million words, and its counterpart for Spanish—compiled by the Spanish Real Academia de la Lengua—will reach 100 million words at first and 200 million words in a second stage. However, little convincing research has been done in the direction of sample size—directly connected to a further topic: representativeness. We shall investigate here a related issue: Is it possible to predict the different word forms and lemmas of a given corpus? And if so, how? A positive answer to this question may contribute to decision making regarding some aspects of representativeness in given fields. We shall attempt further to find a reliable procedure to predict the total number of word forms (types) and lemmas in a specific corpus.


Corpora ◽  
2014 ◽  
Vol 9 (2) ◽  
pp. 137-154 ◽  
Author(s):  
Catherine Smith ◽  
Svenja Adolphs ◽  
Kevin Harvey ◽  
Louise Mullany

The abundance of language data that is now available in digital form, and the rise of distinct language varieties that are used for digital communication, means that issues of non-standard spellings and spelling errors are, in future, likely to become more prominent for compilers of corpora. This paper examines the effect of spelling variation on keywords in a born-digital corpus in order to explore the extent and impact of this variation for future corpus studies. The corpus used in this study consists of e-mails about health concerns that were sent to a health website by adolescents. Keywords are generated using the original version of the corpus and a version with spelling errors corrected, and the British National Corpus (BNC) acts as the reference corpus. The ranks of the keywords are shown to be very similar and, therefore, suggest that, depending on the research goals, keywords could be generated reliably without any need for spelling correction.


Healthcare ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 854
Author(s):  
Dalia Almaghaslah ◽  
Abdulrhman Alsayari ◽  
Saleh Ali Alyahya ◽  
Rana Alshehri ◽  
Khawlah Alqadi ◽  
...  

Introduction: Design thinking, an innovative problem-solving approach, has gained wide popularity in healthcare disciplines. The aim of this work is to improve outpatients’ experiences in hospital pharmacies in two hospitals in Asir region, Saudi Arabia. Methods: The design thinking approach, adopted from Stanford University’s D-School, was used in this study. Results: Several problems were identified: lack of comfortable environment in the pharmacies’ waiting area, lack of a queue management system, and workflow inefficiencies related to ordering and supplies of medicines. A prototype was proposed to overcome these challenges. Discussion and Conclusion: The design thinking approach helped in identifying end-user (patients visiting outpatient pharmacies) values and desires and provided an understanding of their struggles. It also proposed tailored solutions that could improve patients’ experiences while using the services of the outpatient pharmacies.


Sign in / Sign up

Export Citation Format

Share Document