Research in Corpus Linguistics
Latest Publications


TOTAL DOCUMENTS

83
(FIVE YEARS 67)

H-INDEX

2
(FIVE YEARS 2)

Published By Research In Corpus Linguistics

2243-4712
Updated Sunday, 17 October 2021

2021 ◽  
Vol 9 (2) ◽  
pp. 114-130
Author(s):  
Javier Calle-Martín

The Early Middle English period witnessed the massive borrowing and adoption of the Latin system of abbreviations in England. Mediaeval writers appropriated those symbols that were directly transferable from Latin exemplars, especially suspensions and brevigraphs, while contractions and superior letters were incorporated somewhat later. The existing accounts of abbreviations in handwritten documents are fragmentary as they offer the picture of the literary compositions of the period, which have been traditionally taken as the source of evidence for handbooks on palaeography. In addition to this, most of these accounts are limited to the description of their use and typology in independent witnesses, being in many cases impossible to extrapolate the results beyond the practice of individual scribes. The present paper takes that step beyond individuality and pursues the study of abbreviations from a variationist perspective with the following objectives: a) to analyse the use and distribution of abbreviations in Late Middle English and Early Modern English (1350–1700), and b) to evaluate the relevance of these abbreviations across different text types of medical writing. The data used as source of evidence come from The Málaga Corpus of Early English Scientific Prose, both the Late Middle English and the Early Modern English components (1350–1500 and 1500–1700, respectively).


2021 ◽  
Vol 10 (1) ◽  
pp. 45-62
Author(s):  
Ece Genç-Yöntem ◽  
Evrim Eveyik-Aydın

Although compiling a spoken learner corpus is not a recent enterprise, the number of developmental learner spoken corpora in the field of corpus linguistics is not satisfactory. This report describes the compilation of the Yeditepe Spoken Corpus of Learner English (YESCOLE), a 119,787-word corpus of Turkish students’ spoken English at tertiary level. YESCOLE was compiled to generate a developmental corpus of spoken interlanguage by collecting samples from learners of different English proficiency levels at regular short intervals over seven months. In order to shed light on the laborious methodology of compiling the developmental spoken learner corpus, this paper elucidates the steps taken to build YESCOLE and discusses its potential benefits for research and instructional purposes.


2021 ◽  
Vol 10 (1) ◽  
pp. 31-44
Author(s):  
Gaëtanelle Gilquin

The Process Corpus of English in Education (PROCEED) is a learner corpus of English which, in addition to written texts, consists of data that make the writing process visible in the form of keystroke log files and screencast videos. It comes with rich metadata about each learner, among which indices of exposure to the target language and cognitive measures such as working memory or fluid intelligence. It also includes an L1 component which is made up of similar data produced by the learners in their mother tongue. PROCEED opens new perspectives in the study of learner writing, by going beyond the written product. It makes it possible to investigate aspects such as writing fluency, use of online resources, cognitive phenomena like automaticity and avoidance, or theoretical modelling of the writing process. It also has applications for teaching, e.g. by showing students screencast video clips from the corpus illustrating effective writing strategies, as well as for testing, e.g. by establishing a corpus-derived standard of writing fluency for learners at a certain proficiency level.


2021 ◽  
Vol 9 (2) ◽  
pp. 90-113
Author(s):  
José Santaemilia

Violence Against Women (VAW) is a very sensitive, and highly ideological, topic in the Spanish society, as well as in Western societies generally. In Spain, media accounts of VAW are very closely related to two quality newspapers, El País and El Mundo, providing a variety of naming practices for VAW, with differing ideological and evaluative implications. In this paper, I compare and contrast these two dailies in their use of the three main naming practices —violencia de género ‘gender-based violence’, violencia doméstica ‘domestic violence’ and violencia machista ‘male violence’— used in VAW news. To do so I resort to the news values approach proposed by Bednarek and Caple (2012, 2014, 2017), which involves paying attention to the combined insights from both Corpus Linguistics and Critical Discourse Analysis (cf. Baker et al. 2008, Partington et al. 2013).


2021 ◽  
Vol 9 (2) ◽  
pp. 131-151
Author(s):  
Jake Flatt ◽  
Laura Esteban-Segura

Rural dialects are slowly disappearing and giving way to larger, more generalised ways of speaking (Trudgill 2004; Kortmann 2008; Beal 2010; Braber 2015). This paper is concerned with the study of the specific subdialect of Nottinghamshire, known as ‘Notts’ or ‘Nottinghamese’, and aims at describing its linguistic features. For the purpose, a personalised corpus of approximately 26,000 words has been compiled. The corpus consists of oral texts, which have been transcribed, from a TV show set in the area. The analysis is focused on three facets of the dialectal variation surrounding the county of Nottinghamshire, namely relating to the linguistic levels of phonology, morphosyntax and lexis. Several conclusions have been reached, including the /æ/ phoneme as an indicator of a northern dialect, the usage of the velar nasal plus cluster, as well as the pronunciation of continuous forms and past tense irregularities. In terms of lexical analysis, a justification for the evolution of language use in the area is provided.


2021 ◽  
Vol 9 (1) ◽  
pp. i-viii
Author(s):  
Tanja Säily ◽  
Jukka Tyrkkö

Recent advances in the availability of ever larger and more varied electronic datasets, both historical and modern, provide unprecedented opportunities for corpus linguistics and the digital humanities. However, combining unstructured text with images, video, audio as well as structured metadata poses a variety of challenges to corpus compilers. This paper presents an overview of the topic to contextualise this special issue of Research in Corpus Linguistics. The aim of the special issue is to highlight some of the challenges faced and solutions developed in several recent and ongoing corpus projects. Rather than providing overall descriptions of corpora, each contributor discusses specific challenges they faced in the corpus development process, summarised in this paper. We hope that the special issue will benefit future corpus projects by providing solutions to common problems and by paving the way for new best practices for the compilation and development of rich-data corpora. We also hope that this collection of articles will help keep the conversation going on the theoretical and methodological challenges of corpus compilation.


2021 ◽  
Vol 10 (1) ◽  
pp. 1-30
Author(s):  
Tieu-Thuy Chung ◽  
Luyen-Thi Bui ◽  
Peter Crosthwaite

Appraisal theory (Martin and White 2005), an approach to discourse analysis dealing with evaluative language, has been previously employed in analysing newspaper articles and spoken discourses in several earlier studies, although it is gaining in popularity as a framework for comparing first and second (L1/L2) writing. This study investigated 40 English majors’ Vietnamese and English paragraphs for evaluative language, a key component of successful academic writing, as realised under Appraisal theory. To this purpose, we collected L1 Vietnamese and L2 English data from the same student writers across the same topics and using a corpus-informed Contrastive Interlanguage Analysis approach to the annotation and analysis of appraisal. A range of commonalities were present in the use of appraisal across the two language varieties, while the results also suggest significant differences between students’ evaluative expressions in Vietnamese as a mother tongue and English as a second or foreign language. This variation includes the comparative under- and over-use of specific appraisal resources employed in L1 and L2 writing respectively, in particular, regarding writers’ employment of attitudinal features. The findings serve to inform future pedagogical applications regarding explicit instruction in stance and appraisal features for novice L2 English writers in Vietnam.


2021 ◽  
Vol 9 (2) ◽  
pp. 64-89
Author(s):  
Lucía Loureiro-Porto

The second half or the twentieth century witnessed the emergence and expansion of linguistic changes associated to a number of processes related to changes in socio-cultural norms, such as colloquialization, informalization and democratization. This paper focuses on the latter, a phenomenon that has been claimed to be responsible for several ongoing changes in inner-circle varieties of English, but is rather unexplored in outer-circle varieties. The paper explores Hong Kong English and studies two linguistic sets of markers that include items that represent the (old) undemocratic alternative and the (new) democratic option, namely modal must vs. semi-modals have (got) to, need (to) and want to, and epicene pronouns including undemocratic generic he, on the one hand, and democratic singular they and conjoined he or she, on the other. Using the Hong Kong component of the International Corpus of English, and adopting a register approach, the paper reaches conclusions regarding the role played by prescriptivism in the diffusion of democratic items.


2021 ◽  
Vol 9 (1) ◽  
pp. 104-131
Author(s):  
Lassi Saario ◽  
Tanja Säily ◽  
Samuli Kaislaniemi ◽  
Terttu Nevalainen

This paper discusses the process of part-of-speech tagging the Corpus of Early English Correspondence Extension (CEECE), as well as the end result. The process involved normalisation of historical spelling variation, conversion from a legacy format into TEI-XML, and finally, tokenisation and tagging by the CLAWS software. At each stage, we had to face and work around problems such as whether to retain original spelling variants in corpus markup, how to implement overlapping hierarchies in XML, and how to calculate the accuracy of tagging in a way that acknowledges errors in tokenisation. The final tagged corpus is estimated to have an accuracy of 94.5 per cent (in the C7 tagset), which is circa two percentage points (pp) lower than that of present-day corpora but respectable for Late Modern English. The most accurate tag groups include pronouns and numerals, whereas adjectives and adverbs are among the least accurate. Normalisation increased the overall accuracy of tagging by circa 3.7pp. The combination of POS tagging and social metadata will make the corpus attractive to linguists interested in the interplay between language-internal and -external factors affecting variation and change.


2021 ◽  
Vol 9 (1) ◽  
pp. 35-62
Author(s):  
Nele Põldvere ◽  
Johan Frid ◽  
Victoria Johansson ◽  
Carita Paradis

This article aims to describe key challenges of preparing and releasing audio material for spoken data and to propose solutions to these challenges. We draw on our experience of compiling the new London-Lund Corpus 2 (LLC-2), where transcripts are released together with the audio files. However, making the audio material publicly available required careful consideration of how to, most effectively, 1) align the transcripts with the audio and 2) anonymise personal information in the recordings. First, audio-to-text alignment was solved through the insertion of timestamps in front of speaker turns in the transcription stage, which, as we show in the article, may later be used as a valuable complement to more robust automatic segmentation. Second, anonymisation was done by means of a Praat script, which replaced all personal information with a sound that made the lexical information incomprehensible but retained the prosodic characteristics. The public release of the LLC-2 audio material is a valuable feature of the corpus that allows users to extend the corpus data relative to their own research interests and, thus, broaden the scope of corpus linguistics. To illustrate this, we present three studies that have successfully used the LLC-2 audio material.


Sign in / Sign up

Export Citation Format

Share Document