What do (most of) our dispersion measures measure (most)? Dispersion?

Journal of Second Language Studies ◽

10.1075/jsls.21029.gri ◽

2021 ◽

Author(s):

Stefan Th. Gries

Keyword(s):

Lexical Decision ◽

Corpus Linguistics ◽

Predictive Power ◽

Dispersion Measure ◽

Dispersion Measures ◽

Measures Of Dispersion ◽

Decision Times

Abstract This paper discusses the degree to which most of the most widely-used measures of dispersion in corpus linguistics are not particularly valid in the sense of actually measuring dispersion rather than some amalgam of a lot of frequency and a little dispersion. The paper demonstrates these issues on the basis of data from a variety of corpora. I then outline how to design a dispersion measure that only measures dispersion and show that (i) it indeed measures information that is different from frequency in an intuitive way and (ii) has a higher degree of predictive power of lexical decision times from the MALD database than nearly all other measures in nearly all corpora tested.

Download Full-text

Dispersions and adjusted frequencies in corpora

International Journal of Corpus Linguistics ◽

10.1075/ijcl.13.4.02gri ◽

2008 ◽

Vol 13 (4) ◽

pp. 403-437 ◽

Cited By ~ 106

Author(s):

Stefan Th. Gries

Keyword(s):

Corpus Linguistics ◽

Linguistic Variable ◽

Alternative Measure ◽

Linguistic Variables ◽

Degree Of Dispersion ◽

Simple Alternative ◽

Dispersion Measures ◽

Measures Of Dispersion

The most frequent statistics in corpus linguistics are frequencies of occurrence and frequencies of co-occurrence of two or more linguistic variables. However, such frequencies in isolation may sometimes be misleading since they do not take into consideration the degree of dispersion of the relevant linguistic variable. Many dispersion measures and adjusted frequency measures have been suggested but are neither widely known nor applied. Another unfortunate aspect of such measures is that many also come with a variety of problems. I pursue three objectives with this article. First, I want to raise awareness of this issue and make the available measures more widely known, so I present an overview of many measures of dispersion and adjusted frequencies. Second, I propose a conceptually simple alternative measure, DP, explain and exemplify it, and compare it to previously discussed measures. Third and most importantly, I urge corpus linguists to explore the notion of dispersion in more detail and outline a few proposals which steps to take next.

Download Full-text

Homography and Polysemy as Factors in Bilingual Word Recognition

South African Journal of Psychology ◽

10.1177/008124639202200102 ◽

1992 ◽

Vol 22 (1) ◽

pp. 10-16 ◽

Cited By ~ 1

Author(s):

Denise Klein ◽

Estelle Ann Doctor

Keyword(s):

Word Recognition ◽

Lexical Decision ◽

Response Times ◽

Semantic Representation ◽

Bilingual Memory ◽

Lexical Decisions ◽

Interlingual Homographs ◽

Decision Times

This study reports an experiment which examines semantic representation in lexical decisions as a source of interconnection between words in bilingual memory. Lexical decision times were compared for interlingual polysemes such as HAND which share spelling and meaning in both languages, and interlingual homographs such as KIND which share spelling but not meaning. The main result was faster “response times for polysemes than for interlingual homographs. Current theories of monolingual word recognition and bilingual semantic representation are discussed, and the findings are accommodated within the model of bilingual word recognition proposed by Doctor and Klein.

Download Full-text

Predictors of second language English lexical recognition: Further insights from a large database of second language lexical decision times

10.31219/osf.io/cpdjs ◽

2019 ◽

Author(s):

Stephen Skalicky ◽

Scott Crossley ◽

Cynthia M. Berger

Keyword(s):

Second Language ◽

Lexical Decision ◽

The United States ◽

English Learning ◽

Large Database ◽

Linguistic Features ◽

Contextual Diversity ◽

Orthographic Similarity ◽

Statistical Measures ◽

Decision Times

In this study we analyze a large database of lexical decision times for English content words made by speakers of English as an additional language residing in the United States. Our first goal was to test whether the use of statistical measures better able to model variation associated with participants and items would replicate findings of a previous analysis of this data (Berger, Crossley, & Skalicky, 2019). Our second goal was to determine whether variables related to experiences using and learning English would interact with linguistic features of the target words. Results from our statistical analysis suggest affirmative answers to both of these questions. First, our results included significant effects for linguistic features related to contextual diversity and contextual distinctiveness, providing a replication of findings from the original study in that words appearing in more textual and lexical contexts were responded to quicker. Second, a measure of length of English learning and a measure of daily English use interacted with a measure of orthographic similarity. Our study provides further evidence regarding how a large, crowdsourced database can be used to obtain a better understanding of second language lexical recognition behavior and provides suggestions for further research.

Download Full-text

Statistical Measurements of Dispersion Measure Fluctuations of FRBs

The Astrophysical Journal ◽

10.3847/2041-8213/ac399c ◽

2021 ◽

Vol 922 (2) ◽

pp. L31

Author(s):

Siyao Xu ◽

David H. Weinberg ◽

Bing Zhang

Keyword(s):

Electron Density ◽

Large Scale ◽

Density Fluctuations ◽

Clear Correlation ◽

Dispersion Measure ◽

Large Dispersion ◽

Dispersion Measures ◽

The Difference ◽

Electron Density Fluctuations ◽

Scale Turbulence

Abstract Extragalactic fast radio bursts (FRBs) have large dispersion measures (DMs) and are unique probes of intergalactic electron density fluctuations. By using the recently released First CHIME/FRB Catalog, we reexamined the structure function (SF) of DM fluctuations. It shows a large DM fluctuation similar to that previously reported in Xu & Zhang, but no clear correlation hinting toward large-scale turbulence is reproduced with this larger sample. To suppress the distortion effect from FRB distances and their host DMs, we focus on a subset of CHIME catalog with DM < 500 pc cm−3. A trend of nonconstant SF and nonzero correlation function (CF) at angular separations θ less than 10° is seen, but with large statistical uncertainties. The difference found between SF and that derived from CF at θ ≲ 10° can be ascribed to the large statistical uncertainties or the density inhomogeneities on scales on the order of 100 Mpc. The possible correlation of electron density fluctuations and inhomogeneities of density distribution should be tested when several thousands of FRBs are available.

Download Full-text

What do (some of) our association measures measure (most)? Association?

Journal of Second Language Studies ◽

10.1075/jsls.21028.gri ◽

2021 ◽

Author(s):

Stefan Th. Gries

Keyword(s):

Corpus Linguistics ◽

Odds Ratio ◽

Association Measure ◽

Measures Of Association ◽

Association Measures ◽

Log Odds ◽

Dispersion Measures ◽

Corpus Data ◽

Behavior Supports ◽

True Association

Abstract This paper discusses the degree to which some of the most widely-used measures of association in corpus linguistics are not particularly valid in the sense of actually measuring association rather than some amalgam of a lot of frequency and a little association. The paper demonstrates these issues on the basis of hypothetical and actual corpus data and outlines implications of the findings. I then outline how to design an association measure that only measures association and show that its behavior supports the use of the log odds ratio as a true association-only measure but separately from frequency; in addition, this paper sets the stage for an analogous review of dispersion measures in corpus linguistics.

Download Full-text

Pulsar Astrometry

Symposium - International Astronomical Union ◽

10.1017/s0074180900078451 ◽

1984 ◽

Vol 110 ◽

pp. 347-353

Author(s):

Joseph H. Taylor ◽

Carl R. Gwinn ◽

Joel M. Weisberg ◽

Lloyd A. Rawley

Keyword(s):

Electron Density ◽

Reference Frames ◽

Neutral Hydrogen ◽

Space Velocity ◽

Solar Neighborhood ◽

Dispersion Measure ◽

Direct Measurements ◽

Pulse Timing ◽

Local Space ◽

Dispersion Measures

High precision measurements of the celestial coordinates of pulsars are desirable for a number of reasons. If carried out at several epochs, the measurements can yield angular proper motions; together with distance estimates based on dispersion measure, the proper motion of a pulsar reveals two of three components of its space velocity, and consequently provides important kinematic information on pulsar ages (see, for example, Manchester, Taylor and Van 1974; Lyne, Anderson and Salter 1982; and references therein). Direct measurements of annual parallaxes are also possible in principle, and are marginally feasible with present techniques for a few of the closest pulsars. Model independent distances obtained from parallax measurements, together with observed pulsar dispersion measures, yield the electron density along the line of sight to the pulsar. Knowledge of the interstellar electron density in the solar neighborhood provides a calibration of the dispersion-based distance scale that is complementary to the calibration derived from neutral hydrogen absorption measurements of more distant pulsars (Weisberg et al. 1980), and permits appropriate statistical analyses to be made of the local space density of pulsars and their birthrate (e.g. Taylor and Manchester 1977). Finally, pulsar astrometry can be expected to yield important information on the relative orientations of fundamental reference frames. In particular, pulse timing observations yield positions in a reference frame based on motions of the planets, while interferometric position measurements are based on an Earth-equatorial system. At present the relative orientation of these two coordinate systems is known to only accuracy, though the potential precision of both types of measurements is much higher.

Download Full-text

Well‐being and the accessibility of pleasant and unpleasant concepts

European Journal of Personality ◽

10.1002/per.613 ◽

2007 ◽

Vol 21 (2) ◽

pp. 169-189 ◽

Cited By ~ 15

Author(s):

Peter Borkenau ◽

Nadine Mauer

Keyword(s):

Individual Differences ◽

Lexical Decision ◽

Lexical Decision Task ◽

Reaction Times ◽

Well Being ◽

Decision Task ◽

Related Concept ◽

Trait Affect ◽

Negative Trait ◽

Decision Times

The trait–congruency hypothesis predicts that persons high in positive or negative trait affect more readily process pleasant or unpleasant stimuli, respectively. In two studies, participants were administered measures of personality and affect. Moreover, a yes/no lexical decision task with pleasant, unpleasant and neutral words was administered in Study 1, whereas a go/no‐go task was used in Study 2. Several methods to increase reliabilities of differences in reaction times are explored. Correlations of measures of personality and trait affect with decision times were mostly consistent with the trait–congruency hypothesis, particularly for decision times in the go/no‐go task that measured individual differences in valence‐specific decision times more reliably. The findings suggest that trait‐related concept accessibility is one source of trait congruity. Copyright © 2006 John Wiley & Sons, Ltd.

Download Full-text

The use of film subtitles to estimate word frequencies

Applied Psycholinguistics ◽

10.1017/s014271640707035x ◽

2007 ◽

Vol 28 (4) ◽

pp. 661-677 ◽

Cited By ~ 122

Author(s):

BORIS NEW ◽

MARC BRYSBAERT ◽

JEAN VERONIS ◽

CHRISTOPHE PALLIER

Keyword(s):

Lexical Decision ◽

Word Frequency ◽

The Internet ◽

Human Interactions ◽

Explained Variance ◽

French Words ◽

Word Frequencies ◽

Decision Times ◽

Text Writing

We examine the use of film subtitles as an approximation of word frequencies in human interactions. Because subtitle files are widely available on the Internet, they may present a fast and easy way to obtain word frequency measures in language registers other than text writing. We compiled a corpus of 52 million French words, coming from a variety of films. Frequency measures based on this corpus compared well to other spoken and written frequency measures, and explained variance in lexical decision times in addition to what is accounted for by the available French written frequency measures.

Download Full-text