Evaluating lists of high-frequency words

2016 ◽  
Vol 167 (2) ◽  
pp. 132-158 ◽  
Author(s):  
This Ngoc Yen Dang ◽  
Stuart Webb

This study compared the lexical coverage provided by four wordlists [West’s (1953) General Service List (GSL), Nation’s (2006) most frequent 2,000 British National Corpus word families (BNC2000), Nation’s (2012) most frequent 2,000 British National Corpus and Corpus of Contemporary American-English word families (BNC/COCA2000), and Brezina and Gablasova’s (2015) New-GSL list] in 18 corpora. The comparison revealed that the headwords in the BNC/COCA2000 tended to provide the greatest average coverage. However, when the coverage of the most frequent 1,000, 1,500, and 1,996 headwords in the lists was compared, the New-GSL provided the highest coverage. The GSL had the worst performance using both criteria. Pedagogical and methodological implications related to second language (L2) vocabulary learning and teaching are discussed in detail.

2020 ◽  
Vol 9 (2) ◽  
pp. 1-8
Author(s):  
G. Denison ◽  
◽  
I. Custance ◽  

In this article, we describe the pedagogical basis for class vocabulary lists (CVLs) and their implementation using Google Sheets. CVLs allow students to collaborate and build “notebooks” of vocabulary that they feel is important to learn. CVL choices of students (N = 53) in three classes of mixed non-English majors and one informatics class were compared against frequency-based lists (British National Corpus/Corpus of Contemporary American English Word Family Lists [BNC/COCA], New General Service List [NGSL], Test of English for International Communication [TOEIC] Service List [TSL]) using the Compleat Web Vocabulary Profiler (Web VP) to determine the usefulness of the selected vocabulary. An information technology keywords list, constructed using AntConc and AntCorGen, was compared against the informatics group’s CVL to determine if those students were choosing field-appropriate vocabulary. Results suggest that when given autonomy to choose vocabulary, students generally select useful and relevant words for their contexts (e.g, simulation, virtual, privacy, artificial, denuclearization, aftershock, heatstroke) and that CVLs supplement frequency-based lists in beneficial ways.


2020 ◽  
pp. 136216882091118
Author(s):  
Thi Ngoc Yen Dang ◽  
Stuart Webb ◽  
Averil Coxhead

With a number of word lists available for teachers to choose from, teachers and students need to know which list provides the best return for learning? Four well-established lists were compared and it was found that BNC/COCA2000 (British National Corpus / Corpus of Contemporary American English 2000) and the New General Service List (New-GSL) provided the greatest lexical coverage in spoken and written corpora. The present study further compared these two lists using teacher perceptions of word usefulness and learner vocabulary knowledge as the criteria. First, 78 experienced teachers of English as a second language / English as a foreign language (ESL/EFL) rated the usefulness of 973 non-overlapping items between the two lists for their learners. Second, 135 Vietnamese EFL learners completed 15 yes/no tests which measured their knowledge of the same 973 words. Teachers perceived that the BNC/COCA2000 had more useful words. Items in this list were also better known by the learners. This suggests that the BNC/COCA2000 is the more useful high-frequency wordlist for second language (L2) learners.


2019 ◽  
pp. 273-291
Author(s):  
Tatiana N. Chugaeva ◽  
◽  
Olga V. Baiburova ◽  
Anton A. Vakhotin ◽  
Svetlana Y. Dmitrieva ◽  
...  

Corpus research presents obvious benefits, though linguists approach the material in various ways. For example, corpus linguists approach data in an exploratory way, whereas psycholinguists more often tend to combine corpus data and experimental research. The current work uses the theoretical systemic approach to describe the two frequency strata of the three corpora (Russian National Corpus, British National Corpus and Open American National Corpus) and build the classification of phonetic word types in Russian and English (British and American). The aim of the research is to draw up the phonetic (perceptive) classification of the corresponding languages and to describe the identity of their sound systems based on these types. The high frequency and the frequency strata of the three corpora have been analyzed to identify the words characterized by the following linguistic features: length in syllables, stressed vowel, rhythmic structure, etc. The data comparison discovered more distinctions than similarities among the three corpora...


2021 ◽  
Vol 3 (1) ◽  
pp. 9-21
Author(s):  
Namkil Kang

The ultimate goal of this paper is to provide a comparative analysis of rely on and depend on in the Corpus of Contemporary American English and the British National Corpus. The COCA clearly shows that the expression rely on government is the most preferred by Americans, followed by rely on people, and rely on data. The COCA further indicates that the expression depend on slate is the most preferred by Americans, followed by depend on government, and depend on people. The BNC shows, on the other hand, that the expression rely on others is the most preferred by the British, followed by rely on people, and rely on friends. The BNC further indicates that depend on factors and depend on others are the most preferred by the British, followed by depend on age, and depend on food. Finally, in the COCA, the nouns government, luck, welfare, people, information, state, fossil, water, family, oil, food, and things are linked to both rely on and depend on, but many nouns are not still linked to both of them. On the other hand, in the BNC, only the nouns state, chance, government, and others are linked to both rely on and depend on, but many nouns are not still linked to both rely on and depend on. It can thus be inferred from this that rely on is slightly different from depend on in its use.


2021 ◽  
Author(s):  
◽  
Zheng Wei

<p>The research first proposes a vocabulary learning technique: the word part technique, and then tests its effectiveness in aiding vocabulary learning and retention. The first part of the thesis centers around the idea that the knowledge of the first 2000 words language learners already possess may give them easier access to words of other frequency levels because the root parts of the low frequency new words share form and meaning similarities with the high frequency known words. The research addresses the issue at two stages: to quantify the information concerning the number of words able to be accessed through the analysis of the word roots, and to analyze the pedagogical usefulness of the accessible words. A Comprehensive Etymological Dictionary of the English Language (Klein, 1966) was used as the source to show the possible formal and meaning connections among words. All the words in the first 2000 word list were first looked up individually and all the cognates provided under each of these words were collected and placed under each of the high frequency words if they meet the requirement that their roots share more than one letter and/or more than one phoneme with the roots of the first 2000 known words. After the data was roughly gathered, three criteria were applied to filter the data, namely, the frequency criterion, the meaning criterion and form criterion. In applying the frequency criterion, words with frequency levels lower than the tenth thousand were removed from the data. In applying the meaning criterion, hints were given to show the semantic relations between the higher frequency words and the first 2000 thousand words. The hints were then rated on the scale for measuring meaning transparency. Words that were rated at level 5 on the scale were considered inaccessible; words that were rated at levels 1, 2a, 2b, 2c, and 3a were considered easy to access. In applying the form criterion, calculations were done for each semantically accessible word to show their phonological similarity and orthographic similarity in relation to the known word. The words whose phonological or orthographical similarity scores were larger than 0.5 were considered to be phonologically or orthographically easy to access. Finally the "find" function of Microsoft Word was used to check the data by picking up any words that might have been missed in the first round of data gathering. The above procedures resulted in 2156 word families that are able to be accessed through the meaning and form relations with the first 2000 words in their root parts. Among the 2156 word families, 739 can be accessed easily and are therefore more pedagogically useful and 259 can be accessed, but with difficulty. 21 pedagogically useful form constants were selected because they can give access to more unknown lower frequency words than other form constants. In the second part of the thesis, an experiment was conducted to test the effectiveness of the word part technique in comparison with the keyword technique and self-strategy learning. The results show that with the experienced Chinese EFL learners, the keyword technique is slightly inferior to the word part technique and the self-strategy learning.</p>


2020 ◽  
pp. 323-330
Author(s):  
A.S. Dautova

The article presents the experience of studying the semantic structure of the English verbs with the meaning of leaving. The author focuses on the problem of modulating the meaning of the English verbs “leave”, “depart” and their transition into another lexical and semantic group. The urgency of the study lies in addressing the category of space as one of the basic linguistic forms of conceptualization and interpretation of extra-linguistic reality, which man operates in the process of cognition, interpretation of the surrounding world. The problem of research is solved by describing the modulation of meaning in terms of the concept of space of sets, as one of the factors contributing to the change of meaning. The verification of the research hypothesis is based on the analysis of lexicographical data sources of the British National Corpus and the Corpus of Modern American English.


2019 ◽  
Vol 28 (3) ◽  
pp. 203-220 ◽  
Author(s):  
Roi Tartakovsky ◽  
Yeshayahu Shen

A novel distinction is proposed between two types of closed similes: the standard and the non-standard. While the standard simile presents a ground that is a salient feature of the source term (e.g. meek as a lamb), the non-standard simile somewhat enigmatically supplies a non-salient ground (e.g. meek as milk). The latter thus violates a deep-seated norm of similes and presents interpreters with unexpected difficulty, whereby the concept set up to be an exemplar of a quality is actually less than ideal to fulfil this role. The main question addressed here is how these two simile types are relatively distributed across poetic and non-poetic corpora. We elaborate the criteria for what constitutes the non-standard simile, including separating it out from adjacent phenomena like the ironic simile (e.g. brave as a mouse), and go on to explain our operational criteria for salience. Then, we report culling 329 closed similes from an anthology of poetry and 350 closed similes from two corpora of non-poetic discourse, the Corpus of Historical American English and the British National Corpus. An independent judge rated the salience of each ground-and-source pair of each of the similes, presented in randomized order. Results show that while the standard simile is found in both types of discourse, the non-standard kind is only marginally present in the non-poetic corpora but makes up over 40% of the similes in the poetic corpus. We conclude by discussing the implications of these results for theories of poetic language and literariness.


2012 ◽  
Vol 47 (4) ◽  
pp. 484-503 ◽  
Author(s):  
Norbert Schmitt ◽  
Diane Schmitt

The high-frequency vocabulary of English has traditionally been thought to consist of the 2,000 most frequent word families, and low-frequency vocabulary as that beyond the 10,000 frequency level. This paper argues that these boundaries should be reassessed on pedagogic grounds. Based on a number of perspectives (including frequency and acquisition studies, the amount of vocabulary necessary for English usage, the range of graded readers, and dictionary defining vocabulary), we argue that high-frequency English vocabulary should include the most frequent 3,000 word families. We also propose that the low-frequency vocabulary boundary should be lowered to the 9,000 level, on the basis that 8–9,000 word families are sufficient to provide the lexical resources necessary to be able to read a wide range of authentic texts (Nation 2006). We label the vocabulary between high-frequency (3,000) and low-frequency (9,000+) as mid-frequency vocabulary. We illustrate the necessity of mid-frequency vocabulary for proficient language use, and make some initial suggestions for research addressing the pedagogical challenge raised by mid-frequency vocabulary.


Sign in / Sign up

Export Citation Format

Share Document