Accessing and Comparing Word Frequency Data

Author(s):  
Matthew L. Jockers ◽  
Rosamond Thalken
2003 ◽  
Vol 26 (4) ◽  
pp. 479-479 ◽  
Author(s):  
Marc Brysbaert ◽  
Denis Drieghe

Reichle et al. claim to successfully simulate a frequency effect of 60% on skipping rate in human data, whereas the original article reports an effect of only 4%. We suspect that the deviation is attributable to the length of the words in the different conditions, which implies that E-Z Reader is wrong in its conception of eye guidance between words.


Author(s):  
David Allen

Abstract Research has demonstrated that cognates are processed and acquired more readily than noncognates regardless of whether the languages share a common script or etymological background (e. g., Japanese and English). Very little research, however, has focused on the prevalence and frequency of cognates in orthographically distinct languages. Using Japanese word frequency data, the present study demonstrates that between 49 % and 22 % of the most common 10000 words in English are cognate in Japanese, depending on the frequency threshold used. The analysis is extended to the Academic Word List (Coxhead 2000), which is shown to be between 59 % and 30 % cognate. Finally, a lexical familiarity study revealed that Japanese cognate frequency was a reliable indicator of whether the word was known to the majority of Japanese speakers. Based on the findings and drawing upon research in psycholinguistics, a number of recommendations are put forward for future studies in applied linguistics.


2014 ◽  
Vol 9 (1) ◽  
pp. 131-140
Author(s):  
Quratulain H. Khan ◽  
Lori Buchanan

Performance on word processing tasks is known to be influenced by the frequency with which words occur in a language. Large and robust effects of word frequency occur across languages and the processes thought to be sensitive to word frequency are considered fundamentally important characteristics of the mental lexicon. To our knowledge, word frequency data is non-existent for Urdu. This important language has characteristics that make it appealing to psycholinguists. Unfortunately, most of the Urdu published electronically is in the form of image files rather than text and therefore, has been largely inaccessible by programs designed to generate word counts. Consequently, unlike other important orthographies (e.g., English) orthographic word frequencies in Urdu are not readily available. We describe here a database that addresses this methodological gap. We have constructed a word frequency database for written Urdu and describe that development. We also describe data from simple tests of the effects of Urdu word frequency to demonstrate that our measure results in effects considered to be the hallmark of frequency effects. The frequency counts from this database will help psycholinguists and cognitive psychologists conduct and control future studies on the mental lexicon using Urdu. This database can be downloaded from http://web2.uwindsor.ca/psychology/urdufrequency/


2021 ◽  
Author(s):  
Mahesh Srinivasan ◽  
Bodo Winter

Metaphors and other tropes are commonly thought to reflect asymmetries in concreteness, with concrete sources being used to talk about relatively more abstract targets synchronically. Similarly, originating senses in diachronic semantic change have often been argued to be more concrete than extended senses. In this paper, we use a dataset of cross-linguistically attested semantic changes to empirically test the idea that asymmetries in figurative language are predicted by asymmetries in concreteness. We find only weak evidence for the role of concreteness and argue that concreteness is not a helpful notion when it comes to describing changes where both originating and extended senses are highly concrete (e.g., skin > bark, liver > lungs). Moreover, we find that word frequency data from English and other languages is a stronger predictor of these typologically common semantic changes. We discuss the implications of our findings for metaphor theory and theories of semantic change.


2017 ◽  
Vol 12 (2) ◽  
pp. 234-262 ◽  
Author(s):  
C. Sophia Rammell ◽  
Diana Van Lancker Sidtis ◽  
David B. Pisoni

Abstract Background: Formulaic expressions, including idioms and other fixed expressions, comprise a significant proportion of discourse. Although much has been written about this topic, controversy remains about their psychological status. An important claim about formulaic expressions, that they are known to native speakers, has seldom been directly demonstrated. This study tested the hypothesis that formulaic expressions are known and stored as whole unit mental representations by performing three perceptual experiments. Method: Listeners transcribed two kinds of spectrally-degraded spoken sentences, half formulaic, and half novel, newly created expressions, matched for grammar and length. Two familiarity ratings, usage and exposure, were obtained from listeners for each expression. Text frequency data for the stimuli and their constituent words were obtained using a spoken corpus. Results: Participants transcribed formulaic more successfully than literal utterances. Usage and familiarity ratings correlated with accuracy, but formulaic utterances with low ratings were also transcribed correctly. Phrase types differed significantly in text frequency, but word frequency counts did not differentiate the two kinds of expressions. Discussion: These studies provide new converging evidence that formulaic expressions are encoded and processed as whole units, supporting a dual-process model of language processing, which assumes that grammatical and formulaic expressions are differentially processed.


Sign in / Sign up

Export Citation Format

Share Document