What do (some of) our association measures measure (most)? Association?

Author(s):  
Stefan Th. Gries

Abstract This paper discusses the degree to which some of the most widely-used measures of association in corpus linguistics are not particularly valid in the sense of actually measuring association rather than some amalgam of a lot of frequency and a little association. The paper demonstrates these issues on the basis of hypothetical and actual corpus data and outlines implications of the findings. I then outline how to design an association measure that only measures association and show that its behavior supports the use of the log odds ratio as a true association-only measure but separately from frequency; in addition, this paper sets the stage for an analogous review of dispersion measures in corpus linguistics.

2013 ◽  
Vol 18 (1) ◽  
pp. 137-166 ◽  
Author(s):  
Stefan Th. Gries

This paper explores ways in which research into collocation should be improved. After a discussion of the parameters underlying the notion of collocation, the paper has three main parts. First, I argue that corpus linguistics would benefit from taking more seriously the understudied fact that collocations are not necessarily symmetric, as most association measures imply. Also, I introduce an association measure from the associative learning literature that can identify asymmetric collocations and show that it can also distinguish collocations with high and low association strengths well. Second, I summarize some advantages of this measure and brainstorm about ways in which it can help re-examine previous studies as well as support further applications. Finally, I adopt a broader perspective and discuss a variety of ways in which all association measures – directional or not – in corpus linguistics should be improved in order for us to obtain better and more reliable results.


Corpora ◽  
2017 ◽  
Vol 12 (3) ◽  
pp. 459-482 ◽  
Author(s):  
William Allen

Researchers using corpora can visualise their data and analyses using a growing number of tools. Visualisations are especially valuable in environments where researchers communicate and work with public-facing partners under the auspices of ‘knowledge exchange’ or ‘impact’, and corpus data are more available thanks to digital methods. However, although the field of corpus linguistics continues to generate its own range of techniques, it largely remains orientated towards finding ways for academics to communicate results directly with other academics rather than with or through groups outside universities. Also, there is a lack of discussion about how communication, motivations and values also feature in the process of making corpus data visible. My argument is that these sociocultural and practical factors also influence visualisation outputs alongside technical aspects. I draw upon two corpus-based projects about press portrayal of migrants, conducted by an intermediary organisation that links university researchers with users outside academia. Analysing these projects' visualisation outputs in their organisational and communication contexts produces key lessons for researchers wanting to visualise text; consider the aims and values of partners; develop communication strategies that acknowledge different areas of expertise; and link visualisation choices with wider project objectives.


Author(s):  
Erla Hallsteinsdóttir

Multiword expressions – i.e. phraseological units – like idioms and collocations are one of the most interesting part of every language. In this article, I investigate phraseological units from a lexicographical point of view. I discuss the theoretical and methodological basis of phraseography as a discipline that includes aspects of lexicography, phraseology, corpus linguistics and theories of language learning. I demonstrate the importance of corpora as a source for the lexicographer and the use of corpus data. I also discuss the requirements for the lexicographical treatment of phraseological units by the compilation of a phraseological database for language learners in relation to their assumed needs that have already been described in detail.


2022 ◽  
Vol 16 (1) ◽  
Author(s):  
Baoying Yang ◽  
Wenbo Wu ◽  
Xiangrong Yin

2009 ◽  
Vol 52 (1) ◽  
pp. 125-138 ◽  
Author(s):  
Ramesh C. Gupta

2021 ◽  
Vol 8 (2) ◽  
pp. 79-91
Author(s):  
Zuraidah Mohd Don ◽  
Gerry Knowles

This paper is intended for researchers involved in or contemplating research in corpus linguistics, and is concerned in particular with the language of corpus linguistics. It introduces and explains technical terms in the context in which they are normally used. Technical terms lead on to the concepts to which they refer, and the concepts are related to the procedures, including tagging and parsing, by which they are implemented. English and Malay are used as the languages of illustration, and for the benefit of readers who do not know Malay, Malay examples are translated into English. The paper has a historical dimension, and the language of corpus linguistics is traced to traditional usage in the language classroom, and in particular to the study of Latin in Europe. The inheritance from the past is evident in the design of MaLex, which is a working device that does empirical Malay corpus linguistics, and is presented here as a contribution to the digital humanities.


Biometrics ◽  
1986 ◽  
Vol 42 (4) ◽  
pp. 949 ◽  
Author(s):  
N. E. Breslow ◽  
J. Cologne

Sign in / Sign up

Export Citation Format

Share Document