scholarly journals A Propensity Score Method for Investigating Differential Item Functioning in Performance Assessment

2019 ◽  
Vol 80 (3) ◽  
pp. 476-498 ◽  
Author(s):  
Michelle Y. Chen ◽  
Yan Liu ◽  
Bruno D. Zumbo

This study introduces a novel differential item functioning (DIF) method based on propensity score matching that tackles two challenges in analyzing performance assessment data, that is, continuous task scores and lack of a reliable internal variable as a proxy for ability or aptitude. The proposed DIF method consists of two main stages. First, propensity score matching is used to eliminate preexisting group differences before the test, ideally creating equivalent groups as in a randomized experimental study. Then, linear mixed effects models are adopted to perform DIF analysis based on the matched data set. We demonstrate this propensity DIF method using a high-stakes functional English language proficiency test. DIF due to education was investigated in the writing component, which consists of two continuously scored performance-based tasks. Although the proposed method is demonstrated in the context of language testing, it can be applied to other types of performance assessments.

Data ◽  
2019 ◽  
Vol 4 (3) ◽  
pp. 119
Author(s):  
Shifang Tang ◽  
Fuhui Tong ◽  
Xiuhong Lu

We sought to quantify the effectiveness of a gifted and talented (GT) program, as was provided to university students who demonstrated a talent for learning English as a foreign language (EFL) in China. To do so, we used propensity score matching (PSM) techniques to analyze data collected from a tier-1 university where an English talent (ET) program was provided. Specifically, we provided (a) a step-by-step guide of PSM analysis using the R analytical package, (b) the codes for PSM analysis and visualization, and (c) the final analysis of baseline equivalence and treatment effect based on the matching sample. Collectively, the results of descriptive statistics, visualization, and baseline equivalence indicate that PSM is an effective matching technique for generating an unbiased counterfactual analysis. Moreover, the ET program yields a statistically significant, positive effect on ET students’ English language proficiency.


The purpose of this study was to examine the differences in sensitivity of three methods: IRT-Likelihood Ratio (IRT-LR), Mantel-Haenszel (MH) and Logistics Regression (LR), in detecting gender differential item functioning (DIF) on National Mathematics Examination (Ujian Nasional: UN) for 2014/2015 academic year in North Sumatera Province of Indonesia. DIF item shows the unfairness. It advantages the test takers of certain groups and disadvantages other group test takers, in the case they have the same ability. The presence of DIF was reviewed in grouping by gender: men as reference groups (R) and women as focus groups (F). This study used the experimental method, 3x1 design, with one factor (i.e. method) with three treatments, in the form of 3 different DIF detection methods. There are 5 types of UN Mathematics Year 2015 packages (codes: 1107, 2207, 3307, 4407 and 5507). The 2207 package code was taken as the sample data, consisting of 5000 participants (3067 women, 1933 men; for 40 UN items). Item selection was carried out based on the classical test theory (CTT) on 40 UN items, producing 32 items that fulfilled, and item response theory selection (IRT) produced 18 items that fulfilled. With program R 3.333 and IRTLRDIF 2.0, it was found 5 items were detected as DIF by the IRT-Likelihood Ratio-method (IRTLR), 4 items were detected as DIF by the Logistic Regression method (LR), and 3 items were detected as DIF by the MantelHaenszel method (MH). To test the sensitivity of the three methods, it is not enough with just one time DIF detection, but formed six groups of data analysis: (4400,40),(4400,32), (4400,18), (3000,40), (3000,32), (3000,18), and generate 40 random data sets (without repetitions) in each group, and conduct detecting DIF on the items in each data set. Although the data lacks model fit, the 3 parameter logistic model (3PL) is chosen as the most suitable model. With the Tukey's HSD post hoc test, the IRT-LR method is known to be more sensitive than the MH and LR methods in the group (4400,40) and (3000,40). The IRT-LR method is not longer more sensitive than LR in the group (4400,32) and (3000,32), but still more sensitive than MH. In the groups (4400,18) and (3000,18) the IRT-LR method is more sensitive than LR, but not significantly more sensitive than MH. The LR method is consistently tested to be more sensitive than the MH method in the entire analysis groups.


Author(s):  
Yamin Qian

While rubrics have been widely recognized as an effective instructional tool for teachers to evaluate students’ writing products, fewer studies explored how students use it for their writing process in an EFL university academic writing classes. This study explores the application of process-oriented rubrics in two EFL writing programs, and investigates whether English language proficiency, motivation to writing, and their previous experiences with writing programs would significantly affect the use of the rubrics. The participants (N=190) were from two student cohorts, each of which had 95 participants. The data set includes students’ self-, peer- use and the instructor’s use of the rubrics, and students’ written reflection upon peer feedbacks. The data showed that the rubrics can guide students to practice a writing process, and that the 20-item rubric was statistically reliable.  The data of rubrics also showed that the participants were more critical on their peers’ writing, and the reflection data showed students’ awareness of revision strategies. The qualitative data seemed to suggest that peer reviews and reflections upon such reviews could enhance students' revision strategies. This article will conclude itself by providing some pedagogical suggestions in EFL contexts


2011 ◽  
Vol 35 (8) ◽  
pp. 604-622 ◽  
Author(s):  
Hirotaka Fukuhara ◽  
Akihito Kamata

A differential item functioning (DIF) detection method for testlet-based data was proposed and evaluated in this study. The proposed DIF model is an extension of a bifactor multidimensional item response theory (MIRT) model for testlets. Unlike traditional item response theory (IRT) DIF models, the proposed model takes testlet effects into account, thus estimating DIF magnitude appropriately when a test is composed of testlets. A fully Bayesian estimation method was adopted for parameter estimation. The recovery of parameters was evaluated for the proposed DIF model. Simulation results revealed that the proposed bifactor MIRT DIF model produced better estimates of DIF magnitude and higher DIF detection rates than the traditional IRT DIF model for all simulation conditions. A real data analysis was also conducted by applying the proposed DIF model to a statewide reading assessment data set.


Author(s):  
Donna M. Velliaris

In many Asian countries, tertiary education remains a much desired but seemingly unattainable goal for high school graduates, due to rigorous unified national examinations. With that in mind, international students invest millions of dollars annually attempting to enter Australian higher education (HE). Students arrive with high expectations, but in the early stages of their study abroad experience, they face a range of transitional difficulties centered around ‘academic English'. An author-developed semi-structured questionnaire included the open-ended question: In your own words, how would you describe your English language ability in terms of (1) listening, (2) speaking, (3) reading, and (4) writing? The data set collected the ‘voice' of 209 pathway students attending the Eynesbury Institute of Business and Technology (EIBT). Their self-reported narratives share personal perceptions of their own English language proficiency across the four domains largely within the context of their enrolment at the institute.


2017 ◽  
Vol 9 (2) ◽  
pp. 169-186 ◽  
Author(s):  
Liang Zhao ◽  
Tsvi Vinig

Purpose In the existing literature on crowdfunding project performance, previous studies have given little attention to the impact of investors’ hedonic value and utilitarian value on project results. In a crowdfunding setting, utilitarian value is somehow hard to satisfy due to information asymmetry and adverse selection problem. Therefore, the projects with more hedonic value can be more attractive for potential investors. Lucky draw is a method to increase consumer hedonic value, and it can influence investors’ behavior as a result. The authors hypothesize that projects with hedonic treatment (lucky draw) may have higher probability to win their campaign than others. The paper aims to discuss these issues. Design/methodology/approach A unique self-extracted two-year Chinese crowdfunding platform real data set has been applied as the analysis sample. The authors first employ propensity score matching methods to control for the endogeneity of hedonic treatment adoption (lucky draw). The authors then run OLS regression and probit regression in order to test the hypotheses. Findings The analysis suggests a significant positive relationship not only between project lottery adoption and project results but also between project lottery adoption and project popularity. Originality/value The results suggest that an often ignored factor – hedonic treatment (lucky draw) – can play an important role in crowdfunding project performance.


2016 ◽  
Vol 43 (10) ◽  
pp. 1031-1048 ◽  
Author(s):  
Roberto Zotti ◽  
Nino Speziale ◽  
Cristian Barra

Purpose The purpose of this paper is to investigate the effect of religious involvement on subjective well-being (SWB), specifically taking into account the implication of selection effects explaining religious influence using the British Household Panel Survey data set. Design/methodology/approach In order to measure the level of religious involvement, the authors construct different indices on the base of individual religious belonging, participation and beliefs applying a propensity score matching estimator. Findings The results show that religious active participation plays a relevant role among the different aspects of religiosity; moreover, having a strong religious identity such as, at the same time, belonging to any religion, attending religious services once a week or more and believing that religion makes a great difference in life, has a high causal impact on SWB. The authors’ findings are robust to different aspects of life satisfaction. Originality/value The authors offer an econometric account of the causal impact of different aspects of religiosity finding evidence that the causal effect of religious involvement on SWB is better captured than through typical regression methodologies focussing on the mean effects of the explanatory variables.


Sign in / Sign up

Export Citation Format

Share Document