differential item function
Recently Published Documents


TOTAL DOCUMENTS

27
(FIVE YEARS 10)

H-INDEX

5
(FIVE YEARS 2)

2021 ◽  
Vol 2098 (1) ◽  
pp. 012020
Author(s):  
F H Dewi ◽  
A Samsudin ◽  
D T Chandra

Abstract Fluid dynamics has some complex and unobservable concept, however, tests are rarely developed to measure students’ understanding of these concepts. This condtion make a difficulty to map students’ mental models with existing instruments. Based on that problem, this study aims to develop FD-MT (fluid dynamic-multi tier test) as a diagnostic test for measure students’ mental model on fluid dynamic concept. The ADDIE (Analyzing, Designing, Developing, Implementing, Evaluating) is a research method that used in this study. The data were collected by FD-MT will be analyzed by rasch model analysis, include the reliability, the validity, item fit and the differential item function. Student mental model clasified as Scientific, Synthesis-A (Sy-A), Synthesis-B(Sy-B), Syinthesis-C (Sy-C), Synthesis-D(Sy-D), and Initial. The participant of this study are 20 students in the eleventh grade (9 boys and 11 girls, 16-17 age in average) at high school in Bandung. Based on result study shows that students’ mental model mostly in mental Sc (8.18%), Sy-A (3.18%), Sy-B (21.36%), Sy-C (19.09%), Sy-D (38.18%) and In (10%), and incompleted answer (Nr = 0%). This conclude that, FD-MT is able to measureing students’ mental model on fluid dynamic concept.


Author(s):  
Anju Devianee Keetharuth ◽  
Jakob Bue Bjorner ◽  
Michael Barkham ◽  
John Browne ◽  
Tim Croudace ◽  
...  

Abstract Purpose ReQoL-10 and ReQoL-20 have been developed for use as outcome measures with individuals aged 16 and over, experiencing mental health difficulties. This paper reports modelling results from the item response theory (IRT) analyses that were used for item reduction. Methods From several stages of preparatory work including focus groups and a previous psychometric survey, a pool of items was developed. After confirming that the ReQoL item pool was sufficiently unidimensional for scoring, IRT model parameters were estimated using Samejima’s Graded Response Model (GRM). All 39 mental health items were evaluated with respect to item fit and differential item function regarding age, gender, ethnicity, and diagnosis. Scales were evaluated regarding overall measurement precision and known-groups validity (by care setting type and self-rating of overall mental health). Results The study recruited 4266 participants with a wide range of mental health diagnoses from multiple settings. The IRT parameters demonstrated excellent coverage of the latent construct with the centres of item information functions ranging from − 0.98 to 0.21 and with discrimination slope parameters from 1.4 to 3.6. We identified only two poorly fitting items and no evidence of differential item functioning of concern. Scales showed excellent measurement precision and known-groups validity. Conclusion The results from the IRT analyses confirm the robust structure properties and internal construct validity of the ReQoL instruments. The strong psychometric evidence generated guided item selection for the final versions of the ReQoL measures.


2020 ◽  
pp. 003329412092249
Author(s):  
Hae-Deok Song ◽  
Ah Jeong Hong ◽  
Yunseong Jo

Work engagement is considered the core factor that affects various outcomes at the organizational and individual levels including absenteeism, turnover rate, profitability, and productivity. Therefore, the concept is drawing substantial attention in the practical and academic fields. There have been several attempts to measure work engagement to enable its effective management. The Utrecht Work Engagement Scale-17 is a representative tool for measuring work engagement, which is used in several organizations worldwide. However, despite its popularity, the validity of the Utrecht Work Engagement Scale-17 is often questioned. Especially in Korea, the Utrecht Work Engagement Scale-17 is one of the most commonly utilized tools to measure work engagement, but there is limited psychometric evidence on its validity. Thus, the present study aimed to test the validity of the Utrecht Work Engagement Scale-17 in a Korean sample, using the Rasch measurement model to examine validity pertaining to different dimensions. The analysis of item fitness to test the content validity of the tool indicated that two of the items require reconsideration. Furthermore, the person-item map to test its substantive validity indicated that the Utrecht Work Engagement Scale-17 did not reflect the level of work engagement adequately in the Korean sample. The Rasch factor analysis conducted to test the structural validity of the tool indicated that the Utrecht Work Engagement Scale-17 comprises three subscales. Finally, the differential item function between male and female participants was examined to gather evidence on the generalizability aspect of the tool’s validity. Findings revealed that only 9 out of the 17 items expressed adequate differentiation between males and females.


2020 ◽  
Author(s):  
Sarah Bauermeister ◽  
John Gallacher

Abstract Background Neuroticism has been described as a broad and pervasive personality dimension or ‘heterogeneous’ trait measuring components of mood instability such as worry; anxiety; irritability; moodiness; self-consciousness; sadness and irritabililty. Consistent with depression and anxiety-related disorders, increased neuroticism places an individual vulnerable for other unipolar and bipolar mood disorders. However, the measurement of neuroticism remains a challenge. Our aim was to identify psychometrically efficient items and inform the inclusion of redundant items across the 12-item EPQ-R Neuroticism scale using Item Response Theory (IRT). Methods The 12-item binary EPQ-R Neuroticism scale was evaluated by estimating a two-parameter (2-PL) IRT model on data from 502,591 UK Biobank participants aged 37 to 73 years (M = 56.53 years; SD = 8.05), 54% female. Models were run listwise (n= 401,648) and post-estimation mathematical assumptions were computed. All analyses were conducted in STATA 16 SE on the Dementias Platform UK (DPUK) Data Portal. Results A plot of θ values (Item Information functions) showed that most items clustered around the mid-range where discrimination values ranged from 1.34 to 2.28. Difficulty values for individual item θ scores ranged from -0.13 to 1.41. A Mokken analysis suggested a weak to medium level of monotonicity between the items, no items reach strong scalability (H=0.35-0.47). Systematic item deletions and rescaling found that an 7-item scale is more efficient and with information (discrimination) ranging from 1.56 to 2.57 and stronger range of scalability (H=0.47-0.52). A 3-item scale is highly discriminatory but offers a narrow range of person ability (difficulty). A logistic regression differential item function (DIF) analysis exposed significant gender item bias functioning uniformly across all versions of the scale. Conclusions Across 401,648 UK Biobank participants, the 12-item EPQ-R neuroticism scale exhibited psychometric inefficiency with poor discrimination at the extremes of the scale-range. High and low scores are relatively poorly represented and uninformative suggesting that high neuroticism scores derived from the EPQ-R are a function of cumulative mid-range values. The scale also shows evidence of gender item bias and future scale development should consider the former along with item deletions.


2019 ◽  
Vol 1315 ◽  
pp. 012036
Author(s):  
M Ardiyaningrum ◽  
L Badriah ◽  
Trisniawati ◽  
Suhartini ◽  
S A Widodo

2019 ◽  
Author(s):  
Sarah Bauermeister ◽  
John Gallacher

AbstractBackgroundNeuroticism has been described as a broad and pervasive personality dimension or ‘heterogeneous’ trait measuring components of mood instability such as worry; anxiety; irritability; moodiness; self-consciousness; sadness and irritabililty. Consistent with depression and anxiety-related disorders, increased neuroticism places an individual vulnerable for other unipolar and bipolar mood disorders. However, the measurement of neuroticism remains a challenge. Our aim was to identify psychometrically efficient items and inform the inclusion of redundant items across the 12-item EPQ-R Neuroticism scale using Item Response Theory (IRT).MethodsThe 12-item binary EPQ-R Neuroticism scale was evaluated by estimating a two-parameter (2-PL) IRT model on data from 502,591 UK Biobank participants aged 37 to 73 years (M = 56.53 years; SD = 8.05), 54% female. Models were run listwise (n= 401,648) and post-estimation mathematical assumptions were computed. All analyses were conducted in STATA 16 SE on the Dementias Platform UK (DPUK) Data Portal.ResultsA plot of θ values (Item Information functions) showed that most items clustered around the mid-range where discrimination values ranged from 1.34 to 2.28. Difficulty values for individual item θ scores ranged from −0.13 to 1.41. A Mokken analysis suggested a weak to medium level of monotonicity between the items, no items reach strong scalability (H=0.35-0.47). Systematic item deletions and rescaling found that an 7-item scale is more efficient and with information (discrimination) ranging from 1.56 to 2.57 and stronger range of scalability (H=0.47-0.52). A 3-item scale is highly discriminatory but offers a narrow range of person ability (difficulty). A logistic regression differential item function (DIF) analysis exposed significant gender item bias functioning uniformly across all versions of the scale.ConclusionsAcross 401,648 UK Biobank participants, the 12-item EPQ-R neuroticism scale exhibited psychometric inefficiency with poor discrimination at the extremes of the scale-range. High and low scores are relatively poorly represented and uninformative suggesting that high neuroticism scores derived from the EPQ-R are a function of cumulative mid-range values. The scale also shows evidence of gender item bias and future scale development should consider the former along with item deletions.


2019 ◽  
Vol 43 (2) ◽  
pp. 211-220 ◽  
Author(s):  
Jack A. Cerchiara ◽  
Kerry J. Kim ◽  
Eli Meir ◽  
Mary Pat Wenderoth ◽  
Jennifer H. Doherty

The basis for understanding neurophysiology is understanding ion movement across cell membranes. Students in introductory courses recognize ion concentration gradients as a driving force for ion movement but struggle to simultaneously account for electrical charge gradients. We developed a 17-multiple-choice item assessment of students’ understanding of electrochemical gradients and resistance in neurophysiology, the Electrochemical Gradients Assessment Device (EGAD). We investigated the internal evidence validity of the assessment by analyzing item characteristic curves of score probability and student ability for each question, and a Wright map of student scores and ability. We used linear mixed-effect regression to test student performance and ability. Our assessment discriminated students with average ability (weighted likelihood estimate: −2 to 1.5 Θ); however, it was not as effective at discriminating students at the highest ability (weighted likelihood estimate: >2 Θ). We determined the assessment could capture changes in both assessment scores (model r2 = 0.51, P < 0.001, n = 444) and ability estimates (model r2 = 0.47, P < 0.001, n = 444) after a simulation-based laboratory and course instruction for 222 students. Differential item function analysis determined that each item on the assessment performed equitably for all students, regardless of gender, race/ethnicity, or economic status. Overall, we found that men scored higher ( r2 = 0.51, P = 0.014, n = 444) and had higher ability scores ( P = 0.003) on the EGAD assessment. Caucasian students of both genders were positively correlated with score ( r2 = 0.51, P < 0.001, n = 444) and ability ( r2 = 0.47, P < 0.001, n = 444). Based on the evidence gathered through our analyses, the scores obtained from the EGAD can distinguish between levels of content knowledge on neurophysiology principles for students in introductory physiology courses.


2019 ◽  
Vol 40 (4) ◽  
pp. 422-429 ◽  
Author(s):  
Guiping Liu ◽  
Alexander C. Peterson ◽  
Kevin Wing ◽  
Trafford Crump ◽  
Alastair Younger ◽  
...  

Background: Significant ankle arthritis results in functional limitations and patient morbidity. There is a need to measure symptoms and the impact of interventions on patient’s quality of life using valid and reliable patient-reported measurement instruments. The objective of this research was to validate the Ankle Osteoarthritis Scale instrument in the preoperative setting using factor analysis, item response theory, and differential item function methods. Methods: This research is based on secondary analysis of patients scheduled for ankle arthrodesis or total ankle replacement in Vancouver, Canada. Participants completed the instrument between September 2014 and August 2017. Item response theory was used to estimate item difficulty and discrimination parameters, controlling for study participants’ underlying level of ankle function. Differential item function was examined for sex, age group, and surgery. There were 88 participants. Results: Modification indices suggested that item 10, “walking around the house,” would better fit the pain domain rather than the disability domain. Items in the pain domain displayed a range of discrimination and difficulty. Items in the disability domain exhibited a range of discrimination, though the disability domain had low difficulty. Differential item functioning for sex, age group, and ankle arthrodesis or total ankle replacement appeared to be ignorable. Conclusion: This evaluation of the Ankle Osteoarthritis Scale found the instrument to be a strong measure of the effect of pain and dysfunction among patients with end-stage ankle arthritis, even when removing items 7 and 8, supporting its prior use in numerous clinical studies. Level of Evidence: Level II, prospective comparative study.


Sign in / Sign up

Export Citation Format

Share Document