Corpus linguistics and language testing: Navigating uncharted waters

2017 ◽  
Vol 34 (4) ◽  
pp. 555-564 ◽  
Author(s):  
Jesse Egbert

The use of corpora and corpus linguistic methods in language testing research is increasing at an accelerated pace. The growing body of language testing research that uses corpus linguistic data is a testament to their utility in test development and validation. Although there are many reasons to be optimistic about the future of using corpus data in language testing, the convergence of these two fields introduces uncharted waters that should be traversed carefully to ensure that high standards of methodological rigor are maintained. The objectives of this paper are as follows: (1) to describe and evaluate the ways corpora and corpus data have been used in language testing to date; and (2) to offer recommendations for best practices to encourage rigorous and appropriate corpus linguistic methods for language testing purposes. This is accomplished with the aid of examples from papers in this special issue, as well as other previous work in this area. The future holds great promise for a useful methodological synergy between corpus linguistics and language testing. The choices researchers make as they navigate the uncharted and challenging waters that lie ahead will ultimately determine whether that potential is fully realized.

1998 ◽  
Vol 3 (2) ◽  
pp. 189-210 ◽  
Author(s):  
Jan Aarts ◽  
Hans van Halteren ◽  
Nelleke Oostdijk

The article discusses the role of linguistic annotation in corpus linguistics as opposed to annotation in natural language processing. In corpus linguistics, annotation is an integral part of the process of linguistic interpretation and description of the data. Tagging and parsing are discussed as the automatic counterparts of, respectively, the paradigmatic and the syntagmatic description of corpus data. The requirements for a corpus linguistic annotation system are considered. An account is given of the TOSCA analysis system as representative of such an annotation system. Performance results of the system are given, and an evaluation is made.


Corpora ◽  
2010 ◽  
Vol 5 (1) ◽  
pp. 1-27 ◽  
Author(s):  
Antti Arppe ◽  
Gaëtanelle Gilquin ◽  
Dylan Glynn ◽  
Martin Hilpert ◽  
Arne Zeschel

Within cognitive linguistics, there is an increasing awareness that the study of linguistic phenomena needs to be grounded in usage. Ideally, research in cognitive linguistics should be based on authentic language use, its results should be replicable, and its claims falsifiable. Consequently, more and more studies now turn to corpora as a source of data. While corpus-based methodologies have increased in sophistication, the use of corpus data is also associated with a number of unresolved problems. The study of cognition through off-line linguistic data is, arguably, indirect, even if such data fulfils desirable qualities such as being natural, representative and plentiful. Several topics in this context stand out as particularly pressing issues. This discussion note addresses (1) converging evidence from corpora and experimentation, (2) whether corpora mirror psychological reality, (3) the theoretical value of corpus linguistic studies of ‘alternations’, (4) the relation of corpus linguistics and grammaticality judgments, and, lastly, (5) the nature of explanations in cognitive corpus linguistics. We do not claim to resolve these issues nor to cover all possible angles; instead, we strongly encourage reactions and further discussion.


2016 ◽  
Vol 9 (7) ◽  
pp. 10
Author(s):  
Eman Saleh Akeel

<p>The growing field of corpus linguistics has been engaged heavily in language pedagogy during the last two decades. This has encouraged researchers to look for more applications that corpora have on language teaching and learning and led to the emersion of using corpora in language testing. The aim of this article is to provide an overview of using corpus data for the purpose of vocabulary test designing. It presents some native and learner corpora which are available for item writers to use. It covers the benefits and limitations of using corpora in language testing and argues for the importance and usefulness of using native as well as learner corpora as tools for designing a vocabulary test. The article aims to illustrate how both native and learner corpora can be used in language testing in general and in the development of vocabulary tests in particular.</p>


2009 ◽  
Vol 14 (3) ◽  
pp. 393-417 ◽  
Author(s):  
Lynne Flowerdew

This article reviews and discusses four somewhat contentious issues in the application of corpus linguistics to pedagogy, ESP in particular. Corpus linguistic techniques have been criticized on the grounds that they encourage a more bottom-up rather than top-down processing of text in which concordance lines are examined atomistically. One criticism levelled against corpus data is that a corpus presents language out of its original context. For this reason, some corpus linguists have underscored the importance of ‘pedagogic mediation’ to contextualize the data for the students’ own writing environment. Concerns relating to the inductive approach associated with corpus-based pedagogy have also been raised as this approach may not always be the most appropriate one. A final consideration relates to the issue of whether a corpus is always the most appropriate resource to use among the wealth of other resources available.


2015 ◽  
Vol 24 (2) ◽  
pp. 129-147 ◽  
Author(s):  
Peter Stockwell ◽  
Michaela Mahlberg

We suggest an innovative approach to literary discourse by using corpus linguistic methods to address research questions from cognitive poetics. In this article, we focus on the way that readers engage in mind-modelling in the process of characterisation. The article sets out our cognitive poetic model of characterisation that emphasises the continuity between literary characterisation and real-life human relationships. The model also aims to deal with the modelling of the author’s mind in line with the modelling of the minds of fictional characters. Crucially, our approach to mind-modelling is text-driven. Therefore we are able to employ corpus linguistic techniques systematically to identify textual patterns that function as cues triggering character information. In this article, we explore our understanding of mind-modelling through the characterisation of Mr. Dick from David Copperfield by Charles Dickens. Using the CLiC tool (Corpus Linguistics in Cheshire) developed for the exploration of 19th-century fiction, we investigate the textual traces in non-quotations around this character, in order to draw out the techniques of characterisation other than speech presentation. We show that Mr. Dick is a thematically and authorially significant character in the novel, and we move towards a rigorous account of the reader’s modelling of authorial intention.


2021 ◽  
Vol 8 (1) ◽  
pp. 205395172110214
Author(s):  
Martin Schweinberger ◽  
Michael Haugh ◽  
Sam Hames

Public discourse about the COVID-19 that appears on Twitter and other social media platforms provides useful insights into public concerns and responses to the pandemic. However, acknowledging that public discourse around COVID-19 is multi-faceted and evolves over time poses both analytical and ontological challenges. Studies that use text-mining approaches to analyse responses to major events commonly treat public discourse on social media as an undifferentiated whole, without systematically examining the extent to which that discourse consists of distinct sub-discourses or which phases characterize its development. They also confound structured behavioural data (i.e., tagging) with unstructured user-generated data (i.e., content of tweets) in their sampling methods. The present study aims to demonstrate how one might go about addressing both of these sets of challenges by combining corpus linguistic methods with a data-driven text-mining approach to gain a better understanding of how the public discourse around COVID-19 developed over time and what topics combine to form this discourse in the Australian Twittersphere over a period of nearly four months. By combining text mining and corpus linguistics, this study exemplifies how both approaches can complement each other productively.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Muhammad Javed Iqbal ◽  
Zeeshan Javed ◽  
Haleema Sadia ◽  
Ijaz A. Qureshi ◽  
Asma Irshad ◽  
...  

AbstractArtificial intelligence (AI) is the use of mathematical algorithms to mimic human cognitive abilities and to address difficult healthcare challenges including complex biological abnormalities like cancer. The exponential growth of AI in the last decade is evidenced to be the potential platform for optimal decision-making by super-intelligence, where the human mind is limited to process huge data in a narrow time range. Cancer is a complex and multifaced disorder with thousands of genetic and epigenetic variations. AI-based algorithms hold great promise to pave the way to identify these genetic mutations and aberrant protein interactions at a very early stage. Modern biomedical research is also focused to bring AI technology to the clinics safely and ethically. AI-based assistance to pathologists and physicians could be the great leap forward towards prediction for disease risk, diagnosis, prognosis, and treatments. Clinical applications of AI and Machine Learning (ML) in cancer diagnosis and treatment are the future of medical guidance towards faster mapping of a new treatment for every individual. By using AI base system approach, researchers can collaborate in real-time and share knowledge digitally to potentially heal millions. In this review, we focused to present game-changing technology of the future in clinics, by connecting biology with Artificial Intelligence and explain how AI-based assistance help oncologist for precise treatment.


Author(s):  
P. Kadochnikov ◽  
M. Ptashkina

The US and the EU are negotiating a comprehensive Trans-Atlantic Trade and Investment Partnership (TTIP). The main purposes of the agreement are to stimulate economic growth and employment, to facilitate trade and investment and raise competitiveness on both sides of the Atlantic. The US and EU are the biggest trade and investment partners for each other, as well as most important partners for a number of other countries. The Trans-Atlantic free trade agreement would not only facilitate bilateral cooperation, but has a potential to set up new, more advanced international trade and investment rules and practices. The agreement is aimed, among other point, at resolving some of the existing problems in bilateral relations, such as differences in regulatory practices, market access conditions, government procurement, intellectual property rights (IPR) and investor protection. However, some of these differences are deeply inherent in the regulatory systems and have become the reasons for numerous disputes. Despite the fact that the negotiations on TTIP are still in progress, it is already possible to identify and assess the underlying differences that would potentially hamper the creation of deep provisions in the future agreement. The paper aims at analyzing the most difficult areas of negotiations and giving predictions for the future provisions. Firstly, the paper gives an overview of the scope and structure of bilateral relations between the US and EU. Secondly, the authors give detailed analysis of the most important points of the negotiation’s agenda, making stress on the underlying differences in domestic regulation and assessing the depth of those differences. The conclusions are as follows. While some of the areas, such as tariffs, labor and environment, SMEs, state enterprises and others, are relatively easy to agree upon, as both economies are striving to achieve high standards, negotiations on other issues, such as government procurement, NTM regulation and IPR are less likely to achieve high standards.


2021 ◽  
Vol 66 (3) ◽  
pp. 491-506
Author(s):  
Valeria Chernyavskaya ◽  
Olga Kamshilova

Summary The present investigation is a response to the discourse analytical methodology expanded by corpus linguistic techniques. Within a discursive approach the university’s identity is seen as existing in and being constructed through discourse. The research interest is in how ideology and the obligation models set by the state construct the university’s self-image and university-based research as its core mission. The study is generally consistent with current trends in social constructivism where identity is considered as the process of identity construction rather than a rigid category. It is presumed that key factors are developed within a definite socio-cultural practice, which then shape the concept of collective identity. Detecting and analyzing such factors on the basis of Russian realities and modern Russian university is becoming a new research objective. The focus of the given article is on how certain values can be foregrounded in texts representing university strategies to the public. The research employs corpus linguistic methods in discourse analysis. The organization of the paper is as follows. First, it outlines the socio-political context in which the transformation of academic values and organizational principles of Russian national universities are embedded. Second, it discusses corpus findings obtained from an original research corpus which includes mission statements posted on the websites of Russian national research and federal universities. Conclusions concerning the university mission statements reflect ongoing transformations of the universities’ role in the society. The rhetoric of the statements is declarative and foregrounding new values. The linguistic data analysis shows their socially constructive nature as they build a framework for currently relevant uniformed ideas and concepts.


Sign in / Sign up

Export Citation Format

Share Document