scholarly journals Utility of a goodness-of-fit index for the graded response model with small sample sizes : a Monte Carlo investigation.

2012 ◽  
Author(s):  
Christina Studts
1991 ◽  
Vol 21 (1) ◽  
pp. 58-65 ◽  
Author(s):  
Dennis E. Jelinski

Chi-square (χ2) tests are analytic procedures that are often used to test the hypothesis that animals use a particular food item or habitat in proportion to its availability. Unfortunately, several sources of error are common to the use of χ2 analysis in studies of resource utilization. Both the goodness-of-fit and homogeneity tests have been incorrectly used interchangeably when resource availabilities are estimated or known apriori. An empirical comparison of the two methods demonstrates that the χ2 test of homogeneity may generate results contrary to the χ2 goodness-of-fit test. Failure to recognize the conservative nature of the χ2 homogeneity test, when "expected" values are known apriori, may lead to erroneous conclusions owing to the increased possibility of committing a type II error. Conversely, proper use of the goodness-of-fit method is predicated on the availability of accurate maps of resource abundance, or on estimates of resource availability based on very large sample sizes. Where resource availabilities have been estimated from small sample sizes, the use of the χ2 goodness-of-fit test may lead to type I errors beyond the nominal level of α. Both tests require adherence to specific critical assumptions that often have been violated, and accordingly, these assumptions are reviewed here. Alternatives to the Pearson χ2 statistic are also discussed.


2015 ◽  
Vol 19 (1) ◽  
pp. 69-81
Author(s):  
Sri Yamtinah ◽  
Budiyono Budiyono

Tujuan penelitian ini untuk: (1) mengembangkan instrumen ordered multiple-choice untuk diagnosis kesulitan belajar Stoikiometri kelas X, (2) menetapkan karakteristik dari instrumen, dan (3) membuat profil diagnostik peserta didik sebagai laporan yang informatif. Penelitian pengembangan ini menggunakan model Borg & Gall. Preliminary study menghasilkan pokok bahasan tersulit Stoikiometri. Penetapan atribut dan hierarki atribut dengan focus group discussion dilanjutkan dua putaran teknik Delphi. Indikator dan rumusan butir soal ditelaah expert untuk membuktikan validasi isi dengan rumus Aiken. Ujicoba terbatas dan ujicoba menengah pada sekolah kategori tinggi, sedang dan rendah. Ujicoba kelayakan dilakukan pada sekolah-sekolah di wilayah Surakarta, Karanganyar, Boyolali, dan Sragen. Kesimpulan: (1) penelitian berhasil mengembangkan tiga paket soal Ordered Multiple Choice (OMC) untuk mendeteksi kesulitan belajar peserta didik pada materi Stoikiometri kelas X; (2) butir-butir soal OMC pada paket A, B, dan C memiliki validitas konstruk yang baik dengan Goodness of Fit (GoF) berdasar  Smart PLS > 0,36 yaitu 0,437; 0,466 dan 0,433. Instrumen memiliki reliabilitas tinggi berturut-turut 0,79; 0,81 dan 0,75; (3) profil peserta didik berupa laporan diagnostik  tentang atribut yang sudah dikuasai dan belum dikuasai peserta didik. Kata kunci: Attribute Hierarchy Method, Ordered Multiple Choice, Graded Response Model ______________________________________________________________ DEVELOPING A DIAGNOSTIC INSTRUMENT FOR LEARNING DIFFICULTIES IN CHEMISTRY IN SECONDARY HIGH SCHOOLAbstract The purposes of this study are to: (1) develop Ordered Multiple Choice (OMC) instruments with a model of Attribute Hierarchy Method (AHM) for the diagnosis of stoichiometry learning difficulties in chemistry learning in secondary high school, (2) determine the characteristic of instruments which have been developed based on the Graded Response Model (GRM), (3) create a diagnostic profile of learners as an informative report. This development research used the Borg & Gall model. The development of the instrument is done using the AHM with the form of OMC. The determination of attributes and attribute hierarchy was done by the focus group discussion (FGD) by three experts, six teachers, and two measurement experts and continued with the Delphi technique by three experts with two rounds. A limited try out was conducted in the high, medium, and low category schools. A feasibility try out was conducted to the high, medium, and low category schools in the regions of Surakarta, Karanganyar, Boyolali, and Sragen. The results are as follows. 1). This research has developed three packages of OMC to detect the learning difficulties of students in the subjects of Chemistry, especially in Stoichiometry of class X. 2). The OMC test items on packages A, B, and C have good construct validity with the Goodness of Fit (GoF) greater than 0.36 namely 0.437, 0.466, and 0.433. 3). The learners’ profiles are created in the form of diagnostic report about the attributes which have been mastered and have not been mastered by the learners.Keywords: Attribute Hierarchy Method, Ordered Multiple Choice, Graded Response Model


2020 ◽  
pp. 001316442095806
Author(s):  
Shiyang Su ◽  
Chun Wang ◽  
David J. Weiss

[Formula: see text] is a popular item fit index that is available in commercial software packages such as flexMIRT. However, no research has systematically examined the performance of [Formula: see text] for detecting item misfit within the context of the multidimensional graded response model (MGRM). The primary goal of this study was to evaluate the performance of [Formula: see text] under two practical misfit scenarios: first, all items are misfitting due to model misspecification, and second, a small subset of items violate the underlying assumptions of the MGRM. Simulation studies showed that caution should be exercised when reporting item fit results of polytomous items using [Formula: see text] within the context of the MGRM, because of its inflated false positive rates (FPRs), especially with a small sample size and a long test. [Formula: see text] performed well when detecting overall model misfit as well as item misfit for a small subset of items when the ordinality assumption was violated. However, under a number of conditions of model misspecification or items violating the homogeneous discrimination assumption, even though true positive rates (TPRs) of [Formula: see text] were high when a small sample size was coupled with a long test, the inflated FPRs were generally directly related to increasing TPRs. There was also a suggestion that performance of [Formula: see text] was affected by the magnitude of misfit within an item. There was no evidence that FPRs for fitting items were exacerbated by the presence of a small percentage of misfitting items among them.


2002 ◽  
Vol 95 (3) ◽  
pp. 837-842 ◽  
Author(s):  
M. T. Bradley ◽  
D. Smith ◽  
G. Stoica

A Monte-Carlo study was done with true effect sizes in deviation units ranging from 0 to 2 and a variety of sample sizes. The purpose was to assess the amount of bias created by considering only effect sizes that passed a statistical cut-off criterion of α = .05. The deviation values obtained at the .05 level jointly determined by the set effect sizes and sample sizes are presented. This table is useful when summarizing sets of studies to judge whether published results reflect an accurate appraisal of an underlying effect or a distorted estimate expected because significant studies are published and nonsignificant results are not. The table shows that the magnitudes of error are substantial with small sample sizes and inherently small effect sizes. Thus, reviews based on published literature could be misleading and especially so if true effect sizes were close to zero. A researcher should be particularly cautious of small sample sizes showing large effect sizes when larger samples indicate diminishing smaller effects.


Sign in / Sign up

Export Citation Format

Share Document