scholarly journals A Rasch-based validation of the Vietnamese version of the Listening Vocabulary Levels Test

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Hung Tan Ha

AbstractThe Listening Vocabulary Levels Test (LVLT) created by McLean et al. Language Teaching Research 19:741-760, 2015 filled an important gap in the field of second language assessment by introducing an instrument for the measurement of phonological vocabulary knowledge. However, few attempts have been made to provide further validity evidence for the LVLT and no Vietnamese version of the test has been created to date. The present study describes the development and validation of the Vietnamese version of the LVLT. Data was collected from 311 Vietnamese university students and then analyzed based on the Rasch model using several aspects of Messick’s, Educational Measurement, 1989; American Psychologist 50:741–749, 1995 validation framework. Supportive evidence for the test’s validity was provided. First, the test items showed very good fit to the Rasch model and presented a sufficient spread of difficulty. Second, the items displayed sound unidimensionality and were locally independent. Finally, the Vietnamese version of the LVLT showed a high degree of generalizability and was found to positively correlate with the IELTS listening test at 0.65.

2018 ◽  
Vol 11 (2) ◽  
pp. 129-144 ◽  
Author(s):  
Mona Tabatabaee-Yazdi ◽  
◽  
Khalil Motallebzadeh ◽  
Hamid Ashraf ◽  
Purya Baghaei ◽  
...  

2018 ◽  
Vol 3 (1) ◽  
pp. 73
Author(s):  
Yulinda Erma Suryani

<p class="IABSTRAK"><strong>Abstract:</strong> The concept of objective measurement in the social sciences and educational assessment must have five criteria: 1) Gives a linear measure with the same interval; 2) Conduct a proper estimation process; 3) Finding unfeasible items (misfits) or outliers; 4) Overcoming the lost data; 5) Generate replicable measurements (independent of the parameters studied). These five conditions of measurement, so far only Rasch model that can fulfill it. The quality of intelligence measurements made with the Rasch model will have the same quality as the measurements made in the physical dimension in the field of physics. The logit scale (log odds unit) generated in the Rasch model is the scale of the same interval and is linear from the data ratio (odds ratio).  Based on the results of the analysis that has been done on the IST test instrument can be seen that in general the quality of IST test included in either category. Of the 176 IST test items there is only 1 item that is not good, ie aitem 155 (WU19) so that aitem 155 should be discarded. Based on the DIF analysis it can be seen that there are 28 items in favor of one gender only, so the twenty-eight items should be revised.</p><strong>Abstrak: </strong>Konsep pengukuran objektif dalam ilmu sosial dan penilaian pendidikan harus memiliki lima kriteria: 1) Memberikan ukuran yang linier dengan interval yang sama; 2) Melakukan proses estimasi yang tepat; 3) Menemukan item yang tidak tepat (misfits) atau tidak umum (outlier); 4) Mengatasi data yang hilang; 5) Hasilkan pengukuran yang <em>replicable </em>(independen dari parameter yang diteliti). Kelima kondisi pengukuran ini, sejauh ini hanya model Rasch yang bisa memenuhinya. Kualitas pengukuran kecerdasan yang dibuat dengan model Rasch akan memiliki kualitas yang sama dengan pengukuran yang dibuat dalam dimensi fisik di bidang fisika. Skala logit (<em>log odds unit</em>) yang dihasilkan dalam Rasch model adalah skala interval yang sama dan linear dari rasio data (<em>odds ratio</em>). Berdasarkan hasil analisis yang telah dilakukan pada instrumen tes IST dapat diketahui bahwa secara umum kualitas tes IST termasuk dalam kategori baik. Dari 176 item tes IST hanya ada 1 item yang tidak bagus, yaitu aitem 155 (WU19) sehingga aitem 155 harus dibuang. Berdasarkan analisis DIF dapat dilihat bahwa ada 28 item yang mendukung satu jenis kelamin saja, sehingga dua puluh delapan item harus direvisi.


2016 ◽  
Vol 25 (2) ◽  
pp. 142-152 ◽  
Author(s):  
Iris H.-Y. Ng ◽  
Kathy Y. S. Lee ◽  
Joffee H. S. Lam ◽  
C. Andrew van Hasselt ◽  
Michael C. F. Tong

Purpose The purpose of this study was to describe an attempt to apply item-response theory (IRT) and the Rasch model to construction of speech-recognition tests. A set of word-recognition test items applicable to children as young as 3 years old—with any level of hearing sensitivity, with or without using hearing devices—was developed. Method Test items were constructed through expert consultation and by reference to some established language corpora, validated with 121 participants with various degrees of hearing loss and 255 with typical hearing. IRT and the Rasch model were applied to evaluate item quality. Results Eighty disyllabic word items were selected in accordance with IRT. The speech-recognition abilities of the 376 young participants are reported. The IRT analyses on this set of data are also discussed. Conclusions A new set of speech-recognition test materials in Cantonese Chinese has been developed. Construction of short equivalent lists may be performed in accordance with IRT item qualities. Clinical applications of this test tool in the particular language population are discussed.


2017 ◽  
Vol 41 (5) ◽  
Author(s):  
Carla Barros ◽  
Liliana Cunha ◽  
Pilar Baylina ◽  
Alexandra Oliveira ◽  
Álvaro Rocha

2018 ◽  
Vol 40 (1) ◽  
pp. 5 ◽  
Author(s):  
Tim Stoeckel ◽  
Phil Bennett ◽  
Tomoko Ishii

This paper describes the development and initial validation of a Japanese-English bilingual version of the New General Service List Test (NGSLT; Stoeckel & Bennett, 2015). The New General Service List (NGSL; Browne, 2013) consists of 2,800 high frequency words and is intended to provide maximal coverage of texts for learners of English. The NGSLT is a diagnostic instrument designed to identify gaps in knowledge of words on the NGSL. The NGSLT is a multiple-choice test that consists of 5 levels, each assessing knowledge of 20 randomly sampled words from a 560-word frequency-based level of the NGSL. A bilingual version of the NGSLT was developed to minimize the risk of conflating vocabulary knowledge with understanding of the answer choices. A validation study with 382 Japanese high school and university learners found the instrument to be reliable (α = .97) and unidimensional and to demonstrate good fit to the Rasch model. 本論文では New General Service List (NGSL) に基づく語彙サイズテスト(NGSLT)の日本語版の開発及び検証を論じる。NGSL (Browne, 2013) は高いテキストカバー率を目指して編集された2800語の高頻度語彙のリストであり、NGSLT (Stoeckel & Bennett, 2015) はそのリストについての学習者の知識を診断するテストである。NGSLを560語ごとの5レベルに分割し、各レベルから20語を無作為に抽出し計100問の多肢選択式のテストを作成した。選択肢の理解不足によって不正解になる懸念があるため、日本語版を作成した。大学生・高校生合わせて382人の学習者による検証により、この日本語版の信頼性が高いこと(α = .97)、測定が一次元的に行われていること、またラッシュモデルに適合することが確認された。


2020 ◽  
Vol 7 (1) ◽  
pp. 1736849
Author(s):  
Sara Kazemi ◽  
Hamid Ashraf ◽  
Khalil Motallebzadeh ◽  
Mitra Zeraatpishe

2013 ◽  
Vol 2013 ◽  
pp. 1-7 ◽  
Author(s):  
Michaela Wagner-Menghin ◽  
Ingrid Preusche ◽  
Michael Schmidts

Background. Relevant literature reports no increase in individual scores when test items are reused, but information on change in item difficulty is lacking. Purpose. To test an approach for quantifying the effects of reusing items on item difficulty. Methods. A total of 671 students sat a newly introduced exam in four testing shifts. The test forms experimentally combined published, unused, and reused items. Figures quantifying reuse effects were obtained using the Rasch model to compare item difficulties from different person samples. Results. The observed decrease in mean item difficulty for reused items was insignificant. Students who self-scheduled to the last test performed worse than other students did. Conclusion. Availability of leaked material did not translate into higher individual scores, as mastering leaked material does not guarantee transfer of knowledge to new exam items. Exam quality will not automatically deteriorate when a low ratio of randomly selected items is reused.


Sign in / Sign up

Export Citation Format

Share Document