Language and Disciplinary Concepts in Corpus Linguistics: Investigating Corpus Data

2021 ◽  
Vol 8 (2) ◽  
pp. 79-91
Author(s):  
Zuraidah Mohd Don ◽  
Gerry Knowles

This paper is intended for researchers involved in or contemplating research in corpus linguistics, and is concerned in particular with the language of corpus linguistics. It introduces and explains technical terms in the context in which they are normally used. Technical terms lead on to the concepts to which they refer, and the concepts are related to the procedures, including tagging and parsing, by which they are implemented. English and Malay are used as the languages of illustration, and for the benefit of readers who do not know Malay, Malay examples are translated into English. The paper has a historical dimension, and the language of corpus linguistics is traced to traditional usage in the language classroom, and in particular to the study of Latin in Europe. The inheritance from the past is evident in the design of MaLex, which is a working device that does empirical Malay corpus linguistics, and is presented here as a contribution to the digital humanities.

2019 ◽  
Vol 15 (2) ◽  
pp. 383-417 ◽  
Author(s):  
Roland Schäfer

AbstractOver the past years, multifactorial corpus-based explorations of alternations in grammar have become an accepted major tool in cognitively oriented corpus linguistics. For example, prototype theory as a theory of similarity-based and inherently probabilistic linguistic categorization has received support from studies showing that alternating constructions and items often occur with probabilities influenced by prototypical formal, semantic or contextual factors. In this paper, I analyze a low-frequency alternation effect in German noun inflection in terms of prototype theory, based on strong hypotheses from the existing literature that I integrate into an established theoretical framework of usage-based probabilistic morphology, which allows us to account for similarity effects even in seemingly regular areas of the grammar. Specifically, the so-calledweakmasculine nouns in German, which follow an unusual pattern of case marking and often have characteristic lexical properties, sporadically occur in forms of the dominantstrongmasculine nouns. Using data from the nine-billion-token DECOW12A web corpus of contemporary German, I demonstrate that the probability of the alternation is influenced by the presence or absence of semantic, phonotactic, and paradigmatic features. Token frequency is also shown to have an effect on the alternation, in line with common assumptions about the relation between frequency and entrenchment. I use a version of prototype theory with weighted features and polycentric categories, but I also discuss the question of whether such corpus data can be taken as strong evidence for or against specific models of cognitive representation (prototypes vs. exemplars).


2021 ◽  
Vol 55 (1) ◽  
Author(s):  
Daniel F. O'Kennedy

The kingdom of God in the Old Testament: A brief survey. The kingdom of God is a central concept in the teaching of Jesus, but the question posed by this article is the following: What does the Old Testament say about the kingdom of God? Several Old Testament terms convey the concept of kingdom, kingship and rule of God. This article focuses on the Hebrew and Aramaic ‘technical’ terms for kingdom: mamlākâ, malkût, mělûkâ and malkû. One finds only a few Old Testament references where these terms are directly connected to God, most of them in the post-exilic literature: 1 Chronicles 17:14; 28:5; 29:11; 2 Chronicles 13:8; Psalm 22:29; 103:19; 145:11–13; Daniel 2:44; 3:33 (4:3); 4:31 (4:34); 6:27; 7:14, 18, 27; Obadiah 21. A brief study of these specific references leads to a few preliminary conclusions: The kingdom of God refers to a realm and the reign of God, the God of the kingdom is depicted in different ways, God’s kingdom is eternal and incomparable with earthly kingdoms, the scope of the kingdom is particularistic and universalistic, the Old Testament testifies about a kingdom that is and one that is yet to come, et cetera. It seems that there is no real difference when comparing the ‘kingdom of God’ with the ‘God is King’ passages. One cannot unequivocally declare that ‘kingdom of God’ is the central concept in the Old Testament. However, we must acknowledge that Jesus’s teaching about the kingdom of God did not evolve in a vacuum. His followers probably knew about the Old Testament perspective on the kingdom of God.Contribution: The concept ‘kingdom of God’ is relevant for the church in South Africa, especially congregations who strive to be missional. Unfortunately, the Old Testament perspective was neglected in the past. The purpose of this brief survey is to stimulate academics and church leaders in their further reflection on the kingdom of God.


ReCALL ◽  
2009 ◽  
Vol 21 (1) ◽  
pp. 55-75 ◽  
Author(s):  
Pascual Pérez-Paredes ◽  
Jose M. Alcaraz-Calero

AbstractAlthough annotation is a widely-researched topic in Corpus Linguistics (CL), its potential role in Data Driven Learning (DDL) has not been addressed in depth by Foreign Language Teaching (FLT) practitioners. Furthermore, most of the research in the use of DDL methods pays little attention to annotation in the design and implementation of corpus-based/driven language teaching.In this paper, we set out to examine the process of development of SACODEYL Annotator, an application that seeks to assist SACODEYL system users in annotating XML multilingual corpora. First, we discuss the role of annotation in DDL and the dominating paradigm in general corpus applications. In the context of the language classroom, we argue that it is essential that corpora should be pedagogically motivated (Braun, 2005 and 2007a). Then, we move on to deal with the analysis and design stages of our annotation solution by illustrating its main features. Some of these include a user friendly hierarchical and extensible taxonomy tree to facilitate the learner-oriented annotation of the corpora; real-time graphics representation of the annotated corpus matching the XML TEI-compliant (Text Encoding Initiative) standard, as well as an intuitive management of the different data sections and associated metadata.SACODEYL (System Aided Compilation and Open Distribution of European Youth Language) is an EU funded MINERVA project which aims to develop an ICT-based system for the assisted compilation and open distribution of multimedia European teen talk in the context of language education. This research lays emphasis on the functionalities of the application within the SACODEYL context. However, our paper addresses similarly the needs of potential multimedia language corpus administrators in general on the lookout for powerful annotation assisting software. SACODEYL Annotator is free to use and can be downloaded from our website.


2018 ◽  
Vol 16 (1) ◽  
pp. 113-133
Author(s):  
Tim Vandenhoek

Corpora provide teachers and materials developers with the ability to ensure that the instructions they use in class and in teaching materials correctly reflect natural use. This paper examines the ways in which grammar reference books and two types of EFL/ESL materials present the past perfect aspect and whether they do so accurately. It will be argued that there are several issues concerning how these books present the grammar point. Many of the books surveyed provide incomplete explanations of when and how the form is used and several contain usage guidelines that are not supported by available corpus data. The paper ends with several recommendations to improve how the form is presented to teachers and learners.


Corpora ◽  
2017 ◽  
Vol 12 (3) ◽  
pp. 459-482 ◽  
Author(s):  
William Allen

Researchers using corpora can visualise their data and analyses using a growing number of tools. Visualisations are especially valuable in environments where researchers communicate and work with public-facing partners under the auspices of ‘knowledge exchange’ or ‘impact’, and corpus data are more available thanks to digital methods. However, although the field of corpus linguistics continues to generate its own range of techniques, it largely remains orientated towards finding ways for academics to communicate results directly with other academics rather than with or through groups outside universities. Also, there is a lack of discussion about how communication, motivations and values also feature in the process of making corpus data visible. My argument is that these sociocultural and practical factors also influence visualisation outputs alongside technical aspects. I draw upon two corpus-based projects about press portrayal of migrants, conducted by an intermediary organisation that links university researchers with users outside academia. Analysing these projects' visualisation outputs in their organisational and communication contexts produces key lessons for researchers wanting to visualise text; consider the aims and values of partners; develop communication strategies that acknowledge different areas of expertise; and link visualisation choices with wider project objectives.


Author(s):  
Erla Hallsteinsdóttir

Multiword expressions – i.e. phraseological units – like idioms and collocations are one of the most interesting part of every language. In this article, I investigate phraseological units from a lexicographical point of view. I discuss the theoretical and methodological basis of phraseography as a discipline that includes aspects of lexicography, phraseology, corpus linguistics and theories of language learning. I demonstrate the importance of corpora as a source for the lexicographer and the use of corpus data. I also discuss the requirements for the lexicographical treatment of phraseological units by the compilation of a phraseological database for language learners in relation to their assumed needs that have already been described in detail.


2015 ◽  
Vol 37 (4) ◽  
pp. 195-214
Author(s):  
Sarah M. Loose

This article focuses on digital humanities and Renaissance studies in Canada, highlighting established projects such as Iter and newer efforts such as Serai, and addressing recent interest in historical GIS. This survey of projects demonstrates how the work of Renaissance studies faculty and graduate students in Canada is increasing accessibility to sources, creating new knowledge environments and spaces for collaboration, and encouraging new ways to map and visualize Renaissance data, with an end result that enhances our understanding of the past and the ways that digital technology is changing humanities scholarship. The article also suggests that from the perspective of graduate students, participation in these endeavours provides not only training in digital technologies but also the opportunity to contribute knowledge to the field in concrete ways and the chance to establish a foundation in methodologies and practices that will shape approaches to Renaissance studies research and teaching in the future. Cet article se penche sur les humanités numériques et les études de la Renaissance au Canada, en présentant des projets établis tels qu’Iter et plus récents tels que Serai, ainsi qu’en examinant l’intérêt plus récent pour le système d’information géographique (SIG) historique. Ce survol de différents projets montre comment le travail de professeurs et d’étudiants aux études supérieures dans le domaine améliore l’accès aux sources, créent des environnements pour de nouvelles connaissances et des espaces de collaboration, et favorisent de nouvelles façons de visualiser des données relatives à la Renaissance, enrichissant ainsi notre compréhension du passé, tout en mettant en lumière les transformations des sciences humaines provoquées par les technologies numériques. Cet article avance également qu’en ce qui concerne les étudiants aux études supérieures, la participation dans ces projets non seulement leur donne de l’expérience en humanités numériques, mais leur donne aussi la chance de pouvoir contribuer de façon concrète à l’avancement des connaissances dans leur domaine. Ces expériences leur donne également l’opportunité de développer une méthode et des pratiques qui détermineront leurs approches dans leur recherche et leur enseignement à venir en études de la Renaissance.


1996 ◽  
Vol 115 (1) ◽  
pp. 123-131 ◽  
Author(s):  
Michael Rothschild

During the past 25 years, the Internet has grown tremendously. Starting as four academic computers linked by the Department of Defense, it has become a major technical and cultural entity that is accessible to millions of persons outside the realm of government and academia. The field of medicine has been well served by this telecommunications system, in which many applications have been developed to assist in research, clinical medicine, and education. More recently, resources of specific interest to otolaryngologists have been implemented at various academic departments and national organizations. This review is intended to simplify the Internet for otolaryngologists who do not have extensive experience in computers or telecommunication. The Internet is described in basic, minimally technical terms, and specific examples are provided of ways that on-line resources can be used in the practice of otolaryngology-head and neck surgery.


Artnodes ◽  
2019 ◽  
Author(s):  
Ana Rodriguez Granell

It gives us great pleasure to present the 23rd issue of the magazine as a heterogeneous collection that brings together selected articles submitted in response to three different calls for contributions. On the one hand, we bring the volume focusing on media archaeology to a close with this second series of texts. The section on Digital Humanities also comprises an interesting series of contributions related to the 3rd Congress of the International Society of Hispanic Digital Humanities. The last section of this issue brings together another set of articles submitted in response to the magazine’s regular call for contributions, including different perspectives on issues that fall within the magazine’s scope of interest. All the sections and research contained here are unavoidably disparate from each other, yet, when taken as a whole, the reader will realise that there is a common thread throughout this issue, focusing on the impact of certain technologies have had on the way we view the past. The historical scope of technologies does not only operate in a single direction, but rather throughout time in its entirety.


2019 ◽  
Vol 8 (2) ◽  
pp. 327-369
Author(s):  
Shlomo Klapper

Abstract Rarely is a new yardstick of legal meaning created. But over the past decade, corpus linguistics has begun to be utilized as a new tool to measure ordinary meaning in statutory interpretation and original public meaning in constitutional interpretation. The legal application of corpus linguistics posits that an examination of every use of a term in a wide variety of documents can yield a more complete, impartial understanding of a word than can dictionaries, intuition, or an unsystematic survey of sources. Corpora could supplement, or even supplant, dictionaries and native-speaker intuition in legal analyses. For originalism in particular, legal corpus linguistics promises to offer what would be a more scientific methodology for a point of view which, until now, has lacked one. However, corpus linguistics, as applied to legal problems, falls prey to a fatal methodological criticism – the frequency fallacy. The criticism states that in a corpus, an unusual meaning can have many corpus entries while a perfectly ordinary meaning can be completely absent from the corpus. That is, frequency is not a good measure of meaning. Since legal corpus linguistics relies on frequency, the corpus cannot inform legal meaning. This article parries this otherwise fatal critique. It argues that while the frequency fallacy is self-evidently true, the fallacy is not inherent to the corpus, but rather is an artifact of misinterpreting the corpus by treating it like a dictionary. This defense consists of a number of steps. The first step distinguishes between two different methods of discerning ordinary meaning: extension and abstraction. As illustrated by Yates v. United States and United States v. Marshall, extension entails extending the statutory term to varying facts, while abstraction keeps the facts constant and abstracts out key qualities to find an appropriate term. Critically, this article argues that abstraction offers a way to avoid the frequency fallacy. Second, to use abstraction properly, one must analyze not only the presence of the legal term in question but also its absence; that is, one must determine the presence or absence of other terms to describe a similar factual scenario to distinguish between artifacts of language and facts about the world. This article concludes by arguing that this method has a beneficial emergent quality. Not only does this answer make legal corpus analysis methodologically sound, but it also paves the way for the first tool to approximate how an ordinary person would read the law, thus potentially furthering the rule of law.


Sign in / Sign up

Export Citation Format

Share Document