Corpus linguistics and language testing: Navigating uncharted waters

The use of corpora and corpus linguistic methods in language testing research is increasing at an accelerated pace. The growing body of language testing research that uses corpus linguistic data is a testament to their utility in test development and validation. Although there are many reasons to be optimistic about the future of using corpus data in language testing, the convergence of these two fields introduces uncharted waters that should be traversed carefully to ensure that high standards of methodological rigor are maintained. The objectives of this paper are as follows: (1) to describe and evaluate the ways corpora and corpus data have been used in language testing to date; and (2) to offer recommendations for best practices to encourage rigorous and appropriate corpus linguistic methods for language testing purposes. This is accomplished with the aid of examples from papers in this special issue, as well as other previous work in this area. The future holds great promise for a useful methodological synergy between corpus linguistics and language testing. The choices researchers make as they navigate the uncharted and challenging waters that lie ahead will ultimately determine whether that potential is fully realized.

Download Full-text

The Linguistic Annotation of Corpora

International Journal of Corpus Linguistics ◽

10.1075/ijcl.3.2.02aar ◽

1998 ◽

Vol 3 (2) ◽

pp. 189-210 ◽

Cited By ~ 1

Author(s):

Jan Aarts ◽

Hans van Halteren ◽

Nelleke Oostdijk

Keyword(s):

Language Processing ◽

Corpus Linguistics ◽

System Performance ◽

Annotation System ◽

Corpus Linguistic ◽

Linguistic Annotation ◽

Corpus Data ◽

Analysis System ◽

Performance Results

The article discusses the role of linguistic annotation in corpus linguistics as opposed to annotation in natural language processing. In corpus linguistics, annotation is an integral part of the process of linguistic interpretation and description of the data. Tagging and parsing are discussed as the automatic counterparts of, respectively, the paradigmatic and the syntagmatic description of corpus data. The requirements for a corpus linguistic annotation system are considered. An account is given of the TOSCA analysis system as representative of such an annotation system. Performance results of the system are given, and an evaluation is made.

Download Full-text

Cognitive Corpus Linguistics: five points of debate on current theory and methodology

Corpora ◽

10.3366/cor.2010.0001 ◽

2010 ◽

Vol 5 (1) ◽

pp. 1-27 ◽

Cited By ~ 45

Author(s):

Antti Arppe ◽

Gaëtanelle Gilquin ◽

Dylan Glynn ◽

Martin Hilpert ◽

Arne Zeschel

Keyword(s):

Corpus Linguistics ◽

Cognitive Linguistics ◽

Language Use ◽

Current Theory ◽

Linguistic Data ◽

Corpus Linguistic ◽

Grammaticality Judgments ◽

Corpus Data ◽

Authentic Language ◽

Discussion Note

Within cognitive linguistics, there is an increasing awareness that the study of linguistic phenomena needs to be grounded in usage. Ideally, research in cognitive linguistics should be based on authentic language use, its results should be replicable, and its claims falsifiable. Consequently, more and more studies now turn to corpora as a source of data. While corpus-based methodologies have increased in sophistication, the use of corpus data is also associated with a number of unresolved problems. The study of cognition through off-line linguistic data is, arguably, indirect, even if such data fulfils desirable qualities such as being natural, representative and plentiful. Several topics in this context stand out as particularly pressing issues. This discussion note addresses (1) converging evidence from corpora and experimentation, (2) whether corpora mirror psychological reality, (3) the theoretical value of corpus linguistic studies of ‘alternations’, (4) the relation of corpus linguistics and grammaticality judgments, and, lastly, (5) the nature of explanations in cognitive corpus linguistics. We do not claim to resolve these issues nor to cover all possible angles; instead, we strongly encourage reactions and further discussion.

Download Full-text

The Role of Native and Learner Corpora in Vocabulary Test Design

English Language Teaching ◽

10.5539/elt.v9n7p10 ◽

2016 ◽

Vol 9 (7) ◽

pp. 10

Author(s):

Eman Saleh Akeel

Keyword(s):

Corpus Linguistics ◽

Teaching And Learning ◽

Language Teaching ◽

Language Testing ◽

Test Design ◽

Vocabulary Test ◽

Language Pedagogy ◽

Learner Corpora ◽

Corpus Data

<p>The growing field of corpus linguistics has been engaged heavily in language pedagogy during the last two decades. This has encouraged researchers to look for more applications that corpora have on language teaching and learning and led to the emersion of using corpora in language testing. The aim of this article is to provide an overview of using corpus data for the purpose of vocabulary test designing. It presents some native and learner corpora which are available for item writers to use. It covers the benefits and limitations of using corpora in language testing and argues for the importance and usefulness of using native as well as learner corpora as tools for designing a vocabulary test. The article aims to illustrate how both native and learner corpora can be used in language testing in general and in the development of vocabulary tests in particular.</p>

Download Full-text

Applying corpus linguistics to pedagogy

International Journal of Corpus Linguistics ◽

10.1075/ijcl.14.3.05flo ◽

2009 ◽

Vol 14 (3) ◽

pp. 393-417 ◽

Cited By ~ 62

Author(s):

Lynne Flowerdew

Keyword(s):

Corpus Linguistics ◽

Top Down ◽

Bottom Up ◽

Inductive Approach ◽

Corpus Linguistic ◽

Final Consideration ◽

Corpus Data

This article reviews and discusses four somewhat contentious issues in the application of corpus linguistics to pedagogy, ESP in particular. Corpus linguistic techniques have been criticized on the grounds that they encourage a more bottom-up rather than top-down processing of text in which concordance lines are examined atomistically. One criticism levelled against corpus data is that a corpus presents language out of its original context. For this reason, some corpus linguists have underscored the importance of ‘pedagogic mediation’ to contextualize the data for the students’ own writing environment. Concerns relating to the inductive approach associated with corpus-based pedagogy have also been raised as this approach may not always be the most appropriate one. A final consideration relates to the issue of whether a corpus is always the most appropriate resource to use among the wealth of other resources available.

Download Full-text

Discovering varieties of English around the world using corpus-linguistic methods - Claudia Lange & Sven Leuckert, Corpus Linguistics and World Englishes: A Guide for Research. New York: Routledge, 2019. Pp. xv+220. eBook £15.99, ISBN: 9780429489433

English Today ◽

10.1017/s0266078420000383 ◽

2020 ◽

pp. 1-2

Author(s):

Huayong Li

Keyword(s):

New York ◽

Corpus Linguistics ◽

World Englishes ◽

Corpus Linguistic ◽

The World ◽

Linguistic Methods

Download Full-text

Mind-modelling with corpus stylistics inDavid Copperfield

Language and Literature ◽

10.1177/0963947015576168 ◽

2015 ◽

Vol 24 (2) ◽

pp. 129-147 ◽

Cited By ~ 25

Author(s):

Peter Stockwell ◽

Michaela Mahlberg

Keyword(s):

Charles Dickens ◽

Corpus Linguistics ◽

Real Life ◽

The Novel ◽

Cognitive Poetics ◽

Human Relationships ◽

Authorial Intention ◽

Corpus Linguistic ◽

Corpus Stylistics ◽

Linguistic Methods

We suggest an innovative approach to literary discourse by using corpus linguistic methods to address research questions from cognitive poetics. In this article, we focus on the way that readers engage in mind-modelling in the process of characterisation. The article sets out our cognitive poetic model of characterisation that emphasises the continuity between literary characterisation and real-life human relationships. The model also aims to deal with the modelling of the author’s mind in line with the modelling of the minds of fictional characters. Crucially, our approach to mind-modelling is text-driven. Therefore we are able to employ corpus linguistic techniques systematically to identify textual patterns that function as cues triggering character information. In this article, we explore our understanding of mind-modelling through the characterisation of Mr. Dick from David Copperfield by Charles Dickens. Using the CLiC tool (Corpus Linguistics in Cheshire) developed for the exploration of 19th-century fiction, we investigate the textual traces in non-quotations around this character, in order to draw out the techniques of characterisation other than speech presentation. We show that Mr. Dick is a thematically and authorially significant character in the novel, and we move towards a rigorous account of the reader’s modelling of authorial intention.

Download Full-text

Analysing discourse around COVID-19 in the Australian Twittersphere: A real-time corpus-based analysis

Big Data & Society ◽

10.1177/20539517211021437 ◽

2021 ◽

Vol 8 (1) ◽

pp. 205395172110214

Author(s):

Martin Schweinberger ◽

Michael Haugh ◽

Sam Hames

Keyword(s):

Social Media ◽

Text Mining ◽

Corpus Linguistics ◽

Public Discourse ◽

Sampling Methods ◽

The Public ◽

Corpus Linguistic ◽

Social Media Platforms ◽

Linguistic Methods ◽

Over Time

Public discourse about the COVID-19 that appears on Twitter and other social media platforms provides useful insights into public concerns and responses to the pandemic. However, acknowledging that public discourse around COVID-19 is multi-faceted and evolves over time poses both analytical and ontological challenges. Studies that use text-mining approaches to analyse responses to major events commonly treat public discourse on social media as an undifferentiated whole, without systematically examining the extent to which that discourse consists of distinct sub-discourses or which phases characterize its development. They also confound structured behavioural data (i.e., tagging) with unstructured user-generated data (i.e., content of tweets) in their sampling methods. The present study aims to demonstrate how one might go about addressing both of these sets of challenges by combining corpus linguistic methods with a data-driven text-mining approach to gain a better understanding of how the public discourse around COVID-19 developed over time and what topics combine to form this discourse in the Australian Twittersphere over a period of nearly four months. By combining text mining and corpus linguistics, this study exemplifies how both approaches can complement each other productively.

Download Full-text

Clinical applications of artificial intelligence and machine learning in cancer diagnosis: looking into the future

Cancer Cell International ◽

10.1186/s12935-021-01981-1 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Muhammad Javed Iqbal ◽

Zeeshan Javed ◽

Haleema Sadia ◽

Ijaz A. Qureshi ◽

Asma Irshad ◽

...

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Cancer Diagnosis ◽

Disease Risk ◽

Clinical Applications ◽

Optimal Decision ◽

Great Promise ◽

Time Range ◽

Base System ◽

The Future

AbstractArtificial intelligence (AI) is the use of mathematical algorithms to mimic human cognitive abilities and to address difficult healthcare challenges including complex biological abnormalities like cancer. The exponential growth of AI in the last decade is evidenced to be the potential platform for optimal decision-making by super-intelligence, where the human mind is limited to process huge data in a narrow time range. Cancer is a complex and multifaced disorder with thousands of genetic and epigenetic variations. AI-based algorithms hold great promise to pave the way to identify these genetic mutations and aberrant protein interactions at a very early stage. Modern biomedical research is also focused to bring AI technology to the clinics safely and ethically. AI-based assistance to pathologists and physicians could be the great leap forward towards prediction for disease risk, diagnosis, prognosis, and treatments. Clinical applications of AI and Machine Learning (ML) in cancer diagnosis and treatment are the future of medical guidance towards faster mapping of a new treatment for every individual. By using AI base system approach, researchers can collaborate in real-time and share knowledge digitally to potentially heal millions. In this review, we focused to present game-changing technology of the future in clinics, by connecting biology with Artificial Intelligence and explain how AI-based assistance help oncologist for precise treatment.

Download Full-text

Transatlantic Trade and Investment Partnership

World Economy and International Relations ◽

10.20542/0131-2227-2015-2-14-22 ◽

2015 ◽

pp. 14-22 ◽

Cited By ~ 1

Author(s):

P. Kadochnikov ◽

M. Ptashkina

Keyword(s):

Market Access ◽

Trade Agreement ◽

Government Procurement ◽

Atlantic Trade ◽

Bilateral Relations ◽

The Us ◽

The Future ◽

High Standards ◽

Set Up ◽

Trade And Investment

The US and the EU are negotiating a comprehensive Trans-Atlantic Trade and Investment Partnership (TTIP). The main purposes of the agreement are to stimulate economic growth and employment, to facilitate trade and investment and raise competitiveness on both sides of the Atlantic. The US and EU are the biggest trade and investment partners for each other, as well as most important partners for a number of other countries. The Trans-Atlantic free trade agreement would not only facilitate bilateral cooperation, but has a potential to set up new, more advanced international trade and investment rules and practices. The agreement is aimed, among other point, at resolving some of the existing problems in bilateral relations, such as differences in regulatory practices, market access conditions, government procurement, intellectual property rights (IPR) and investor protection. However, some of these differences are deeply inherent in the regulatory systems and have become the reasons for numerous disputes. Despite the fact that the negotiations on TTIP are still in progress, it is already possible to identify and assess the underlying differences that would potentially hamper the creation of deep provisions in the future agreement. The paper aims at analyzing the most difficult areas of negotiations and giving predictions for the future provisions. Firstly, the paper gives an overview of the scope and structure of bilateral relations between the US and EU. Secondly, the authors give detailed analysis of the most important points of the negotiation’s agenda, making stress on the underlying differences in domestic regulation and assessing the depth of those differences. The conclusions are as follows. While some of the areas, such as tariffs, labor and environment, SMEs, state enterprises and others, are relatively easy to agree upon, as both economies are striving to achieve high standards, negotiations on other issues, such as government procurement, NTM regulation and IPR are less likely to achieve high standards.

Download Full-text

What Russian University Stands for: Analyzing Socially Embedded Vision and Values

Zeitschrift für Slawistik ◽

10.1515/slaw-2021-0020 ◽

2021 ◽

Vol 66 (3) ◽

pp. 491-506

Author(s):

Valeria Chernyavskaya ◽

Olga Kamshilova

Keyword(s):

Collective Identity ◽

Social Constructivism ◽

Mission Statements ◽

Cultural Practice ◽

Original Research ◽

Analytical Methodology ◽

Corpus Linguistic ◽

Core Mission ◽

New Research ◽

Linguistic Methods

Summary The present investigation is a response to the discourse analytical methodology expanded by corpus linguistic techniques. Within a discursive approach the university’s identity is seen as existing in and being constructed through discourse. The research interest is in how ideology and the obligation models set by the state construct the university’s self-image and university-based research as its core mission. The study is generally consistent with current trends in social constructivism where identity is considered as the process of identity construction rather than a rigid category. It is presumed that key factors are developed within a definite socio-cultural practice, which then shape the concept of collective identity. Detecting and analyzing such factors on the basis of Russian realities and modern Russian university is becoming a new research objective. The focus of the given article is on how certain values can be foregrounded in texts representing university strategies to the public. The research employs corpus linguistic methods in discourse analysis. The organization of the paper is as follows. First, it outlines the socio-political context in which the transformation of academic values and organizational principles of Russian national universities are embedded. Second, it discusses corpus findings obtained from an original research corpus which includes mission statements posted on the websites of Russian national research and federal universities. Conclusions concerning the university mission statements reflect ongoing transformations of the universities’ role in the society. The rhetoric of the statements is declarative and foregrounding new values. The linguistic data analysis shows their socially constructive nature as they build a framework for currently relevant uniformed ideas and concepts.

Download Full-text