Grade-Related Differential Item Functioning in General English Proficiency Test-Kids Listening

Differential Item Functioning (DIF) analysis is always an indispensable methodology for detecting item and test bias in the arena of language testing. This study investigated grade-related DIF in the General English Proficiency Test-Kids (GEPT-Kids) listening section. Quantitative data were test scores collected from 791 test takers (Grade 5 = 398; Grade 6 = 393) from eight Chinese-speaking cities, and qualitative data were expert judgments collected from two primary school English teachers in Guangdong province. Two R packages “difR” and “difNLR” were used to perform five types of DIF analysis (two-parameter item response theory [2PL IRT] based Lord’s chi-square and Raju’s area tests, Mantel-Haenszel [MH], logistic regression [LR], and nonlinear regression [NLR] DIF methods) on the test scores, which altogether identified 16 DIF items. ShinyItemAnalysis package was employed to draw item characteristic curves (ICCs) for the 16 items in RStudio, which presented four different types of DIF effect. Besides, two experts identified reasons or sources for the DIF effect of four items. The study, therefore, may shed some light on the sustainable development of test fairness in the field of language testing: methodologically, a mixed-methods sequential explanatory design was adopted to guide further test fairness research using flexible methods to achieve research purposes; practically, the result indicates that DIF analysis does not necessarily imply bias. Instead, it only serves as an alarm that calls test developers’ attention to further examine the appropriateness of test items.

Download Full-text

Differential Item Functioning in Brief Instruments of Disordered Eating

European Journal of Psychological Assessment ◽

10.1027/1015-5759/a000472 ◽

2019 ◽

Vol 35 (6) ◽

pp. 823-833 ◽

Cited By ~ 4

Author(s):

Desiree Thielemann ◽

Felicitas Richter ◽

Bernd Strauss ◽

Elmar Braehler ◽

Uwe Altmann ◽

...

Keyword(s):

Differential Item Functioning ◽

Disordered Eating ◽

Structural Equation ◽

Young Female ◽

Eating Attitudes ◽

Equation Model ◽

German Population ◽

Test Fairness ◽

Item Functioning ◽

Multiple Indicator

Abstract. Most instruments for the assessment of disordered eating were developed and validated in young female samples. However, they are often used in heterogeneous general population samples. Therefore, brief instruments of disordered eating should assess the severity of disordered eating equally well between individuals with different gender, age, body mass index (BMI), and socioeconomic status (SES). Differential item functioning (DIF) of two brief instruments of disordered eating (SCOFF, Eating Attitudes Test [EAT-8]) was modeled in a representative sample of the German population ( N = 2,527) using a multigroup item response theory (IRT) and a multiple-indicator multiple-cause (MIMIC) structural equation model (SEM) approach. No DIF by age was found in both questionnaires. Three items of the EAT-8 showed DIF across gender, indicating that females are more likely to agree than males, given the same severity of disordered eating. One item of the EAT-8 revealed slight DIF by BMI. DIF with respect to the SCOFF seemed to be negligible. Both questionnaires are equally fair across people with different age and SES. The DIF by gender that we found with respect to the EAT-8 as screening instrument may be also reflected in the use of different cutoff values for men and women. In general, both brief instruments assessing disordered eating revealed their strengths and limitations concerning test fairness for different groups.

Download Full-text

Linguistic Distance and Translation Differential Item Functioning on Trends in International Mathematics and Science Study Mathematics Assessment Items

Journal of Psychoeducational Assessment ◽

10.1177/07342829211010537 ◽

2021 ◽

pp. 073428292110105

Author(s):

Semirhan Gökçe ◽

Giray Berberoğlu ◽

Craig S. Wells ◽

Stephen G. Sireci

Keyword(s):

Differential Item Functioning ◽

Test Scores ◽

Mathematics Assessment ◽

Paired Comparisons ◽

Science Study ◽

Linguistic Distance ◽

Test Characteristic ◽

Item Functioning ◽

Mathematics And Science ◽

Country Specific

The 2015 Trends in International Mathematics and Science Study (TIMSS) involved 57 countries and 43 different languages to assess students’ achievement in mathematics and science. The purpose of this study is to evaluate whether items and test scores are affected as the differences between language families and cultures increase. Using differential item functioning (DIF) procedures, we compared the consistency of students’ performance across three combinations of languages and countries: (a) same language but different countries, (b) same countries but different languages, and (c) different languages and different countries. The analyses consisted of the detection of the number of DIF items for all paired comparisons within each condition, the direction of DIF, the magnitude of DIF, and the differences between test characteristic curves. As the countries were more distant with respect to cultures and language families, the presence of DIF increased. The magnitude of DIF was greatest when both language and country differed, and smallest when the languages were same, but the countries were different. Results suggest that when TIMSS results are compared across countries, the language- and country-specific differences which could reflect cultural, curriculum, or other differences should be considered.

Download Full-text

A comparability study between the General English Proficiency Test- Advanced and the Internet-Based Test of English as a Foreign Language

Language Testing in Asia ◽

10.1186/s40468-017-0048-x ◽

2017 ◽

Vol 7 (1) ◽

Author(s):

Antony John Kunnan ◽

Nathan Carr

Keyword(s):

Foreign Language ◽

Proficiency Test ◽

English Proficiency ◽

The Internet ◽

General English Proficiency Test

Download Full-text

The Power of General English Proficiency Test on Taiwanese Society and Its Tertiary English Education

Assessing Chinese Learners of English ◽

10.1057/9781137449788_13 ◽

2016 ◽

pp. 270-286

Author(s):

Shwu-Wen Lin

Keyword(s):

Proficiency Test ◽

English Education ◽

English Proficiency ◽

General English Proficiency Test

Download Full-text

Correction of Differentially Functioning Items: Basis for Maintaining and Enhancing Test Validity and Reliability

World Journal of Educational Research ◽

10.22158/wjer.v4n1p62 ◽

2016 ◽

Vol 4 (1) ◽

pp. 62

Author(s):

Jose Q. Pedrajita

Keyword(s):

Differential Item Functioning ◽

Internal Consistency ◽

Private School ◽

Content Validity ◽

Test Scores ◽

Concurrent Validity ◽

Test Validity ◽

Internal Consistency Reliability ◽

School Type ◽

Item Functioning

This study looked into differentially functioning items in a Chemistry Achievement Test. It also examined the effect of eliminating differentially functioning items on the content and concurrent validity, and internal consistency reliability of the test. Test scores of two hundred junior high school students matched on school type were subjected to Differential Item Functioning (DIF) analysis. One hundred students came from a public school, while the other 100 were private school examinees. The descriptive-comparative research design utilizing differential item functioning analysis and validity and reliability analysis was employed. The Chi-Square, Distractor Response Analysis, Logistic Regression, and the Mantel-Haenszel Statistic were the methods used in the DIF analysis. A six-point scale ranging from inadequate to adequate was used to assess the content validity of the test. Pearson r was used in the concurrent validity analysis. The KR-20 formula was used for estimating the internal consistency reliability of the test. The findings revealed the presence of differentially functioning items between the public and private school examinees. The DIF methods differed in the number of differentially functioning items identified. However, there was a high degree of correspondence between the Logistic Regression and Mantel-Haenszel Statistic. After the elimination of the differentially functioning items, the content and the concurrent validity, and the internal consistency reliability differed per DIF method used. The content validity of the test differed ranging from slightly adequate to moderately adequate in the number of items retained. The concurrent validity of the test also differed but all were positive and indicate moderate relationship between the examinees’ test scores and their GPA in Science III. Likewise, the internal consistency reliability of the test differed. The more differentially functioning items eliminated, the lesser was the content and concurrent validity, and internal consistency reliability of the test becomes. Elimination of differentially functioning items diminishes content and concurrent validity, and internal consistency reliability, but could be use as basis in enhancing content, concurrent as well as internal consistency reliability by replacing eliminated DIF items.

Download Full-text

Determining differential item functioning and its effect on the test scores of selected pib indexes, using item response theory techniques

SA Journal of Industrial Psychology ◽

10.4102/sajip.v27i2.783 ◽

2001 ◽

Vol 27 (2) ◽

Author(s):

Pieter Schaap

Keyword(s):

Item Response Theory ◽

Differential Item Functioning ◽

Item Response ◽

Test Scores ◽

Response Theory ◽

South Africans ◽

Test Characteristics ◽

Potential Index ◽

Item Functioning

The objective of this article is to present the results of an investigation into the item and test characteristics of two tests of the Potential Index Batteries (PIB) in terms of differential item functioning (DIP) and the effect thereof on test scores of different race groups. The English Vocabulary (Index 12) and Spelling Tests (Index 22) of the PIB were analysed for white, black and coloured South Africans. Item response theory (IRT) methods were used to identify items which function differentially for white, black and coloured race groups. Opsomming Die doel van hierdie artikel is om die resultate van n ondersoek na die item- en toetseienskappe van twee PIB (Potential Index Batteries) toetse in terme van itemsydigheid en die invloed wat dit op die toetstellings van rassegroepe het, weer te gee. Die Potential Index Batteries (PIB) se Engelse Woordeskat (Index 12) en Spellingtoetse (Index 22) is ten opsigte van blanke, swart en gekleurde Suid-Afrikaners ontleed. Itemresponsteorie (IRT) is gebruik om items te identifiseer wat as sydig (DIP) vir die onderskeie rassegroepe beskou kan word.

Download Full-text

The Washback of the General English Proficiency Test on University Policies: A Taiwan Case Study

Language Assessment Quarterly ◽

10.1080/15434301003664196 ◽

2010 ◽

Vol 7 (3) ◽

pp. 234-254 ◽

Cited By ~ 15

Author(s):

Chih-Min Shih

Keyword(s):

Proficiency Test ◽

English Proficiency ◽

General English Proficiency Test ◽

University Policies

Download Full-text

An Empirical Comparison of the Area Methods, Lord'S Chi-Square Test, and the Mantel-Haenszel Technique for Assessing Differential Item Functioning

Educational and Psychological Measurement ◽

10.1177/0013164493053002001 ◽

1993 ◽

Vol 53 (2) ◽

pp. 301-314 ◽

Cited By ~ 13

Author(s):

Nambury S. Raju ◽

Fritz Drasgow ◽

Jeffrey A. Slinde

Keyword(s):

Differential Item Functioning ◽

Empirical Comparison ◽

Chi Square ◽

Item Functioning ◽

Chi Square Test

Download Full-text

Test review: GEPT: General English Proficiency Test

Language Testing ◽

10.1177/0265532208090159 ◽

2008 ◽

Vol 25 (3) ◽

pp. 403-408 ◽

Cited By ~ 33

Author(s):

Carsten Roever ◽

Yi-Ching Pan

Keyword(s):

Proficiency Test ◽

English Proficiency ◽

Test Review ◽

General English Proficiency Test

Download Full-text

Undergraduates' Intentions to Take a Second Language Proficiency Test: A Comparison of Predictions from the Theory of Planned Behavior and Social Cognitive Theory

Psychological Reports ◽

10.2466/pr0.106.3.798-810 ◽

2010 ◽

Vol 106 (3) ◽

pp. 798-810 ◽

Cited By ~ 5

Author(s):

Bih-Jiau Lin ◽

Wen-Bin Chiou

Keyword(s):

College Students ◽

Theory Of Planned Behavior ◽

Social Cognitive Theory ◽

Planned Behavior ◽

Proficiency Test ◽

Social Cognitive ◽

Cognitive Theory ◽

English Proficiency ◽

Outcome Expectancy ◽

General English Proficiency Test

English competency has become essential for obtaining a better job or succeeding in higher education in Taiwan. Thus, passing the General English Proficiency Test is important for college students in Taiwan. The current study applied Ajzen's theory of planned behavior and the notions of outcome expectancy and self-efficacy from Bandura's social cognitive theory to investigate college students' intentions to take the General English Proficiency Test. The formal sample consisted of 425 undergraduates (217 women, 208 men; M age = 19.5 yr., SD = 1.3). The theory of planned behavior showed greater predictive ability ( R2 = 33%) of intention than the social cognitive theory ( R2 = 7%) in regression analysis and made a unique contribution to prediction of actual test-taking behavior one year later in logistic regression. Within-model analyses indicated that subjective norm in theory of planned behavior and outcome expectancy in social cognitive theory are crucial factors in predicting intention. Implications for enhancing undergraduates' intentions to take the English proficiency test are discussed.

Download Full-text