Correction of Differentially Functioning Items: Basis for Maintaining and Enhancing Test Validity and Reliability

2016 ◽  
Vol 4 (1) ◽  
pp. 62
Author(s):  
Jose Q. Pedrajita

This study looked into differentially functioning items in a Chemistry Achievement Test. It also<br />examined the effect of eliminating differentially functioning items on the content and concurrent validity,<br />and internal consistency reliability of the test. Test scores of two hundred junior high school students<br />matched on school type were subjected to Differential Item Functioning (DIF) analysis. One hundred<br />students came from a public school, while the other 100 were private school examinees. The<br />descriptive-comparative research design utilizing differential item functioning analysis and validity and<br />reliability analysis was employed. The Chi-Square, Distractor Response Analysis, Logistic Regression,<br />and the Mantel-Haenszel Statistic were the methods used in the DIF analysis. A six-point scale ranging<br />from inadequate to adequate was used to assess the content validity of the test. Pearson r was used in<br />the concurrent validity analysis. The KR-20 formula was used for estimating the internal consistency<br />reliability of the test. The findings revealed the presence of differentially functioning items between the<br />public and private school examinees. The DIF methods differed in the number of differentially<br />functioning items identified. However, there was a high degree of correspondence between the Logistic<br />Regression and Mantel-Haenszel Statistic. After the elimination of the differentially functioning items,<br />the content and the concurrent validity, and the internal consistency reliability differed per DIF method<br />used. The content validity of the test differed ranging from slightly adequate to moderately adequate in<br />the number of items retained. The concurrent validity of the test also differed but all were positive and<br />indicate moderate relationship between the examinees’ test scores and their GPA in Science III.<br />Likewise, the internal consistency reliability of the test differed. The more differentially functioning<br />items eliminated, the lesser was the content and concurrent validity, and internal consistency reliability<br />of the test becomes. Elimination of differentially functioning items diminishes content and concurrent<br />validity, and internal consistency reliability, but could be use as basis in enhancing content, concurrent<br />as well as internal consistency reliability by replacing eliminated DIF items.

Assessment ◽  
2020 ◽  
pp. 107319112098176
Author(s):  
Cristina Espinosa da Silva ◽  
Heather A. Pines ◽  
Thomas L. Patterson ◽  
Shirley Semple ◽  
Alicia Harvey-Vera ◽  
...  

Shame may increase HIV risk among stigmatized populations. The Personal Feelings Questionnaire–2 (PFQ-2) measures shame, but has not been validated in Spanish-speaking or nonclinical stigmatized populations disproportionately affected by HIV in resource-limited settings. We examined the psychometric properties of the Spanish-translated PFQ-2 shame subscale among female sex workers in two Mexico–U.S. border cities. From 2016 to 2017, 602 HIV-negative female sex workers in Tijuana and Ciudad Juarez participated in an efficacy trial evaluating a behavior change maintenance intervention. Interviewer-administered surveys collected information on shame (10-item PFQ-2 subscale), psychosocial factors, and sociodemographics. Item performance, confirmatory factor analysis, internal consistency, differential item functioning by city, and concurrent validity were assessed. Response options were collapsed to 3-point responses to improve item performance, and one misfit item was removed. The revised 9-item shame subscale supported a single construct and had good internal consistency (Cronbach’s α = .86). Notable differential item functioning was found but resulted in a negligible effect on overall scores. Correlations between the revised shame subscale and guilt ( r = .79, p < .01), depression ( r = .69, p < .01), and emotional support ( r = −.28, p < .01) supported concurrent validity. The revised PFQ-2 shame subscale showed good reliability and concurrent validity in our sample, and should be explored in other stigmatized populations.


2021 ◽  
pp. 073428292110105
Author(s):  
Semirhan Gökçe ◽  
Giray Berberoğlu ◽  
Craig S. Wells ◽  
Stephen G. Sireci

The 2015 Trends in International Mathematics and Science Study (TIMSS) involved 57 countries and 43 different languages to assess students’ achievement in mathematics and science. The purpose of this study is to evaluate whether items and test scores are affected as the differences between language families and cultures increase. Using differential item functioning (DIF) procedures, we compared the consistency of students’ performance across three combinations of languages and countries: (a) same language but different countries, (b) same countries but different languages, and (c) different languages and different countries. The analyses consisted of the detection of the number of DIF items for all paired comparisons within each condition, the direction of DIF, the magnitude of DIF, and the differences between test characteristic curves. As the countries were more distant with respect to cultures and language families, the presence of DIF increased. The magnitude of DIF was greatest when both language and country differed, and smallest when the languages were same, but the countries were different. Results suggest that when TIMSS results are compared across countries, the language- and country-specific differences which could reflect cultural, curriculum, or other differences should be considered.


2021 ◽  
pp. 003022282110162
Author(s):  
Adalberto Campo-Arias ◽  
Andrés Felipe Tirado-Otálvaro ◽  
Isabel Álvarez-Solorza ◽  
Carlos Arturo Cassiani-Miranda

The study aimed to perform confirmatory factor analysis, internal consistency, gender differential item functioning, and discriminant validity of the Fear of COVID-5 Scale in emerging adult students of a university in Mexico. Confirmatory factor analysis, internal consistency (Cronbach's alpha and McDonald's omega), and gender differential item functioning were estimated (Kendall tau b correlation). The Fear of COVID-5 Scale showed a one-dimension structure (RMSEA = 0.07, CFI = 0.98, TLI = 0.96, and SRMR = 0.02), with high internal consistency (Cronbach's alpha of 0.78 and McDonald's omega of 0.81), non-gender differential item functioning (Kendall tau b between 0.07 and 0.10), and significant discriminant validity (Higher scores for fear of COVID-19 were observed in high clinical anxiety levels). In conclusion, the Fear of COVID-5 Scale presents a clear one-dimension structure similar to a previous study.


2001 ◽  
Vol 27 (2) ◽  
Author(s):  
Pieter Schaap

The objective of this article is to present the results of an investigation into the item and test characteristics of two tests of the Potential Index Batteries (PIB) in terms of differential item functioning (DIP) and the effect thereof on test scores of different race groups. The English Vocabulary (Index 12) and Spelling Tests (Index 22) of the PIB were analysed for white, black and coloured South Africans. Item response theory (IRT) methods were used to identify items which function differentially for white, black and coloured race groups. Opsomming Die doel van hierdie artikel is om die resultate van n ondersoek na die item- en toetseienskappe van twee PIB (Potential Index Batteries) toetse in terme van itemsydigheid en die invloed wat dit op die toetstellings van rassegroepe het, weer te gee. Die Potential Index Batteries (PIB) se Engelse Woordeskat (Index 12) en Spellingtoetse (Index 22) is ten opsigte van blanke, swart en gekleurde Suid-Afrikaners ontleed. Itemresponsteorie (IRT) is gebruik om items te identifiseer wat as sydig (DIP) vir die onderskeie rassegroepe beskou kan word.


1993 ◽  
Vol 73 (3_part_1) ◽  
pp. 995-1004
Author(s):  
Jane L. Garthoeffner ◽  
Carolyn S. Henry ◽  
Linda C. Robinson

This study was designed to evaluate a modification of the Interpersonal Relationship Scale and to establish subscales representing dimensions of intimacy (N = 356). The initial self-report scale was tested for internal consistency reliability. Next, subscales were identified using principal components factoring with varimax rotation. Internal consistency reliability and concurrent validity of the modified over-all scale and subscales were examined. The modified scale and subscales provided reliable and valid measures of the quality of interpersonal relationships in young adults.


1984 ◽  
Vol 54 (2) ◽  
pp. 629-630 ◽  
Author(s):  
Kelly J. Grover ◽  
Lois A. Paff-Bergen ◽  
Candyce S. Russell ◽  
Walter R. Schumm

The Kansas Marital Satisfaction Scale was administered by survey to 51 wives between the ages of 32 and 71 yr. Further support for the internal consistency reliability of the scale (α = 0.92) was obtained, and patterns of differences between the item means paralleled previous research. Evidence was found for the concurrent validity of the scale, which correlated significantly with six of seven items from the satisfaction subscale of Spanier's Dyadic Adjustment Scale.


2022 ◽  
Vol 22 (1) ◽  
Author(s):  
Carolyn Ingram ◽  
Yanbing Chen ◽  
Conor Buggy ◽  
Vicky Downey ◽  
Mary Archibald ◽  
...  

Abstract Background Despite widespread COVID-19 vaccination programs, there is an ongoing need for targeted disease prevention and control efforts in high-risk occupational settings. This study aimed to develop, pilot, and validate an instrument for surveying occupational COVID-19 infection prevention and control (IPC) measures available to workers in diverse geographic and occupational settings. Methods A 44-item online survey was developed in English and validated for face and content validity according to literature review, expert consultation, and pre-testing. The survey was translated and piloted with 890 workers from diverse industries in Canada, Ireland, Argentina, Poland, Nigeria, China, the US, and the UK. Odds ratios generated from univariable, and multivariable logistic regression assessed differences in ‘feeling protected at work’ according to gender, age, occupation, country of residence, professional role, and vaccination status. Exploratory factor analysis (EFA) was conducted, and internal consistency reliability verified with Cronbach’s alpha. Hypothesis testing using two-sample t-tests verified construct validity (i.e., discriminant validity, known-groups technique), and criterion validity. Results After adjustment for occupational sector, characteristics associated with feeling protected at work included being male (AOR = 1.88; 95% CI = 1.18,2.99), being over 55 (AOR = 2.17; 95% CI = 1.25,3.77) and working in a managerial position (AOR = 3.1; 95% CI = 1.99,4.83). EFA revealed nine key IPC domains relating to: environmental adjustments, testing and surveillance, education, costs incurred, restricted movements, physical distancing, masking, isolation strategies, and areas for improvement. Each domain showed sufficient internal consistency reliability (Cronbach’s alpha ≥0.60). Hypothesis testing revealed differences in survey responses by country and occupational sector, confirming construct validity (p < 0.001), criterion validity (p = 0.04), and discriminant validity (p < 0.001). Conclusions The online survey, developed in English to identify the COVID-19 protective measures used in diverse workplace settings, showed strong face validity, content validity, internal consistency, criterion validity, and construct validity. Translations in Chinese, Spanish, French, Polish, and Hindi demonstrated adaptability of the survey for use in international working environments. The multi-lingual tool can be used by decision makers in the distribution of IPC resources, and to guide occupational safety and health (OSH) recommendations for preventing COVID-19 and future infectious disease outbreaks.


2019 ◽  
Author(s):  
Chris B. Agala ◽  
Bruce J. Fried ◽  
James C. Thomas ◽  
Heidi W. Reynolds ◽  
Kristen Hassmiller Lich ◽  
...  

Abstract Background: Adherence to antiretroviral therapy is critical to the achievement of the third target of the UNAIDS Fast-Track Initiative goals of 2020-2030. Reliable, valid and accurate measurement of adherence are important for correct assessment of adherence and in predicting the efficacy of ART. The Simplified Medication Adherence Questionnaire is a six-item scale which assesses the perception of persons living with HIV about their adherence to ART. Despite recent widespread use, its measurement properties have yet to be carefully documented beyond the original study in Spain. The objective of this paper was to conduct internal consistency reliability, concurrent validity and measurement invariance tests for the SMAQ. Methods: HIV-positive women who were receiving ART services from 51 service providers in two sub-cities of Addis Ababa, Ethiopia completed the SMAQ in a HIV treatment referral network study between 2011 and 2012. Two cross-sections of 402 and 524 female patients of reproductive age, respectively, from the two sub-cities were randomly selected and interviewed at baseline and follow-up. We used Cronbach’s coefficient alpha (α) to assess internal consistency reliability, Pearson product-moment correlation (r) to assess concurrent validity and multiple-group confirmatory factor analysis to analyze factorial structure and measurement invariance of the SMAQ. Results: All participants were female with a mean age of 33 (33.06-33.74; median: 34 years; range 18-45 years. Cronbach’s alphas for the six items of the SMAQ were 0.66, 0.68, 0.75 and 0.75 for T1 control, T1 intervention, T2 control, and T2 intervention groups, respectively. Pearson correlation coefficients were 0.78, 0.49, 0.52, 0.48, 0.76 and 0.80 for items 1 to 6, respectively, between T1 compared to T2. We found invariance for factor loadings, observed item intercepts and factor variances, also known as strong measurement invariance, when we compared latent adherence levels between and across patient-groups. Conclusions: Our results show that the six-item SMAQ scale has adequate reliability and validity indices for this sample, in addition to being invariant across comparison groups. The findings of this study strengthen the evidence in support of the increasing use of SMAQ by interventionists and researchers to examine, pool and compare adherence scores across groups and time periods.


2021 ◽  
Author(s):  
NAYARA RODRIGUES GOMES OLIVEIRA ◽  
CIBELLE MARTINS ROBERTO FORMIGA ◽  
BRUNA ABREU RAMOS ◽  
RAFAELA NOLETO DOS SANTOS ◽  
NAYARA NUBIA DE SOUSA MOREIRA ◽  
...  

Abstract Objectives:To verify the Correlation between the Neonatal Infant Pain Scale (NIPS) and Premature Infant Pain Profile – Revised (PIPP-R), the internal consistency of each scale, and assess the reliability between evaluators in the assessment of pain during the aspiration procedure in premature newborns. Methods: an observational, prospective study. Participated in the study, 50 infants who met the following criteria for inclusion: Newborn (NB) preterm (GA> 26 weeks and <36 weeks and five days) with low birth weight (<2500 g), hemodynamically stable, with minimal sedation or without sedation or mechanical ventilation in CPAP or catheter nose of O2 or the air environment that needed to perform the procedure of vacuum in the period of hospitalization. The evaluation of the newborns occurred during three different aspiration procedures; aspiration 1 (no intervention), aspiration 2 (use of gentle touch), and aspiration 3 (use of sucrose). They have applied two evaluation instruments, NIPS and PIPP-R. that Cronbach's alpha determined the internal consistency, reliability between evaluators by the coefficient of Correlation intraclass, validity competitor by Spearman test. Results: Internal Consistency was high for NIPS (r= 0.824) and moderate for PIPP-R (0.655). Reliability between raters was high respectively in the three conditions 0.991; 0.987; 0.993 on the NIPS scale and 0.997; 0.986; 0.977 on the PIPP-R scale. One observed concurrent validity Only in the first aspiration. Conclusion: the NIPS seems to have a better utility clinic than PIPP-R; however, the two scales showed good reliability among the evaluators, and internal consistency, being a good choice for evaluation of pain during the procedure of aspiration.


Sign in / Sign up

Export Citation Format

Share Document