What do (most of) our dispersion measures measure (most)? Dispersion?

Author(s):  
Stefan Th. Gries

Abstract This paper discusses the degree to which most of the most widely-used measures of dispersion in corpus linguistics are not particularly valid in the sense of actually measuring dispersion rather than some amalgam of a lot of frequency and a little dispersion. The paper demonstrates these issues on the basis of data from a variety of corpora. I then outline how to design a dispersion measure that only measures dispersion and show that (i) it indeed measures information that is different from frequency in an intuitive way and (ii) has a higher degree of predictive power of lexical decision times from the MALD database than nearly all other measures in nearly all corpora tested.

2008 ◽  
Vol 13 (4) ◽  
pp. 403-437 ◽  
Author(s):  
Stefan Th. Gries

The most frequent statistics in corpus linguistics are frequencies of occurrence and frequencies of co-occurrence of two or more linguistic variables. However, such frequencies in isolation may sometimes be misleading since they do not take into consideration the degree of dispersion of the relevant linguistic variable. Many dispersion measures and adjusted frequency measures have been suggested but are neither widely known nor applied. Another unfortunate aspect of such measures is that many also come with a variety of problems. I pursue three objectives with this article. First, I want to raise awareness of this issue and make the available measures more widely known, so I present an overview of many measures of dispersion and adjusted frequencies. Second, I propose a conceptually simple alternative measure, DP, explain and exemplify it, and compare it to previously discussed measures. Third and most importantly, I urge corpus linguists to explore the notion of dispersion in more detail and outline a few proposals which steps to take next.


1992 ◽  
Vol 22 (1) ◽  
pp. 10-16 ◽  
Author(s):  
Denise Klein ◽  
Estelle Ann Doctor

This study reports an experiment which examines semantic representation in lexical decisions as a source of interconnection between words in bilingual memory. Lexical decision times were compared for interlingual polysemes such as HAND which share spelling and meaning in both languages, and interlingual homographs such as KIND which share spelling but not meaning. The main result was faster “response times for polysemes than for interlingual homographs. Current theories of monolingual word recognition and bilingual semantic representation are discussed, and the findings are accommodated within the model of bilingual word recognition proposed by Doctor and Klein.


2019 ◽  
Author(s):  
Stephen Skalicky ◽  
Scott Crossley ◽  
Cynthia M. Berger

In this study we analyze a large database of lexical decision times for English content words made by speakers of English as an additional language residing in the United States. Our first goal was to test whether the use of statistical measures better able to model variation associated with participants and items would replicate findings of a previous analysis of this data (Berger, Crossley, & Skalicky, 2019). Our second goal was to determine whether variables related to experiences using and learning English would interact with linguistic features of the target words. Results from our statistical analysis suggest affirmative answers to both of these questions. First, our results included significant effects for linguistic features related to contextual diversity and contextual distinctiveness, providing a replication of findings from the original study in that words appearing in more textual and lexical contexts were responded to quicker. Second, a measure of length of English learning and a measure of daily English use interacted with a measure of orthographic similarity. Our study provides further evidence regarding how a large, crowdsourced database can be used to obtain a better understanding of second language lexical recognition behavior and provides suggestions for further research.


2021 ◽  
Vol 922 (2) ◽  
pp. L31
Author(s):  
Siyao Xu ◽  
David H. Weinberg ◽  
Bing Zhang

Abstract Extragalactic fast radio bursts (FRBs) have large dispersion measures (DMs) and are unique probes of intergalactic electron density fluctuations. By using the recently released First CHIME/FRB Catalog, we reexamined the structure function (SF) of DM fluctuations. It shows a large DM fluctuation similar to that previously reported in Xu & Zhang, but no clear correlation hinting toward large-scale turbulence is reproduced with this larger sample. To suppress the distortion effect from FRB distances and their host DMs, we focus on a subset of CHIME catalog with DM < 500 pc cm−3. A trend of nonconstant SF and nonzero correlation function (CF) at angular separations θ less than 10° is seen, but with large statistical uncertainties. The difference found between SF and that derived from CF at θ ≲ 10° can be ascribed to the large statistical uncertainties or the density inhomogeneities on scales on the order of 100 Mpc. The possible correlation of electron density fluctuations and inhomogeneities of density distribution should be tested when several thousands of FRBs are available.


Author(s):  
Stefan Th. Gries

Abstract This paper discusses the degree to which some of the most widely-used measures of association in corpus linguistics are not particularly valid in the sense of actually measuring association rather than some amalgam of a lot of frequency and a little association. The paper demonstrates these issues on the basis of hypothetical and actual corpus data and outlines implications of the findings. I then outline how to design an association measure that only measures association and show that its behavior supports the use of the log odds ratio as a true association-only measure but separately from frequency; in addition, this paper sets the stage for an analogous review of dispersion measures in corpus linguistics.


1984 ◽  
Vol 110 ◽  
pp. 347-353
Author(s):  
Joseph H. Taylor ◽  
Carl R. Gwinn ◽  
Joel M. Weisberg ◽  
Lloyd A. Rawley

High precision measurements of the celestial coordinates of pulsars are desirable for a number of reasons. If carried out at several epochs, the measurements can yield angular proper motions; together with distance estimates based on dispersion measure, the proper motion of a pulsar reveals two of three components of its space velocity, and consequently provides important kinematic information on pulsar ages (see, for example, Manchester, Taylor and Van 1974; Lyne, Anderson and Salter 1982; and references therein). Direct measurements of annual parallaxes are also possible in principle, and are marginally feasible with present techniques for a few of the closest pulsars. Model independent distances obtained from parallax measurements, together with observed pulsar dispersion measures, yield the electron density along the line of sight to the pulsar. Knowledge of the interstellar electron density in the solar neighborhood provides a calibration of the dispersion-based distance scale that is complementary to the calibration derived from neutral hydrogen absorption measurements of more distant pulsars (Weisberg et al. 1980), and permits appropriate statistical analyses to be made of the local space density of pulsars and their birthrate (e.g. Taylor and Manchester 1977). Finally, pulsar astrometry can be expected to yield important information on the relative orientations of fundamental reference frames. In particular, pulse timing observations yield positions in a reference frame based on motions of the planets, while interferometric position measurements are based on an Earth-equatorial system. At present the relative orientation of these two coordinate systems is known to only accuracy, though the potential precision of both types of measurements is much higher.


2007 ◽  
Vol 21 (2) ◽  
pp. 169-189 ◽  
Author(s):  
Peter Borkenau ◽  
Nadine Mauer

The trait–congruency hypothesis predicts that persons high in positive or negative trait affect more readily process pleasant or unpleasant stimuli, respectively. In two studies, participants were administered measures of personality and affect. Moreover, a yes/no lexical decision task with pleasant, unpleasant and neutral words was administered in Study 1, whereas a go/no‐go task was used in Study 2. Several methods to increase reliabilities of differences in reaction times are explored. Correlations of measures of personality and trait affect with decision times were mostly consistent with the trait–congruency hypothesis, particularly for decision times in the go/no‐go task that measured individual differences in valence‐specific decision times more reliably. The findings suggest that trait‐related concept accessibility is one source of trait congruity. Copyright © 2006 John Wiley & Sons, Ltd.


2007 ◽  
Vol 28 (4) ◽  
pp. 661-677 ◽  
Author(s):  
BORIS NEW ◽  
MARC BRYSBAERT ◽  
JEAN VERONIS ◽  
CHRISTOPHE PALLIER

We examine the use of film subtitles as an approximation of word frequencies in human interactions. Because subtitle files are widely available on the Internet, they may present a fast and easy way to obtain word frequency measures in language registers other than text writing. We compiled a corpus of 52 million French words, coming from a variety of films. Frequency measures based on this corpus compared well to other spoken and written frequency measures, and explained variance in lexical decision times in addition to what is accounted for by the available French written frequency measures.


Sign in / Sign up

Export Citation Format

Share Document