Correction of Differentially Functioning Items: Basis for Maintaining and Enhancing Test Validity and Reliability

This study looked into differentially functioning items in a Chemistry Achievement Test. It also examined the effect of eliminating differentially functioning items on the content and concurrent validity, and internal consistency reliability of the test. Test scores of two hundred junior high school students matched on school type were subjected to Differential Item Functioning (DIF) analysis. One hundred students came from a public school, while the other 100 were private school examinees. The descriptive-comparative research design utilizing differential item functioning analysis and validity and reliability analysis was employed. The Chi-Square, Distractor Response Analysis, Logistic Regression, and the Mantel-Haenszel Statistic were the methods used in the DIF analysis. A six-point scale ranging from inadequate to adequate was used to assess the content validity of the test. Pearson r was used in the concurrent validity analysis. The KR-20 formula was used for estimating the internal consistency reliability of the test. The findings revealed the presence of differentially functioning items between the public and private school examinees. The DIF methods differed in the number of differentially functioning items identified. However, there was a high degree of correspondence between the Logistic Regression and Mantel-Haenszel Statistic. After the elimination of the differentially functioning items, the content and the concurrent validity, and the internal consistency reliability differed per DIF method used. The content validity of the test differed ranging from slightly adequate to moderately adequate in the number of items retained. The concurrent validity of the test also differed but all were positive and indicate moderate relationship between the examinees’ test scores and their GPA in Science III. Likewise, the internal consistency reliability of the test differed. The more differentially functioning items eliminated, the lesser was the content and concurrent validity, and internal consistency reliability of the test becomes. Elimination of differentially functioning items diminishes content and concurrent validity, and internal consistency reliability, but could be use as basis in enhancing content, concurrent as well as internal consistency reliability by replacing eliminated DIF items.

Download Full-text

Psychometric Evaluation of the Personal Feelings Questionnaire–2 (PFQ-2) Shame Subscale Among Spanish-Speaking Female Sex Workers in Mexico

Assessment ◽

10.1177/1073191120981768 ◽

2020 ◽

pp. 107319112098176

Author(s):

Cristina Espinosa da Silva ◽

Heather A. Pines ◽

Thomas L. Patterson ◽

Shirley Semple ◽

Alicia Harvey-Vera ◽

...

Keyword(s):

Differential Item Functioning ◽

Internal Consistency ◽

Sex Workers ◽

Concurrent Validity ◽

Female Sex Workers ◽

Good Reliability ◽

Response Options ◽

Spanish Speaking ◽

Female Sex ◽

Item Functioning

Shame may increase HIV risk among stigmatized populations. The Personal Feelings Questionnaire–2 (PFQ-2) measures shame, but has not been validated in Spanish-speaking or nonclinical stigmatized populations disproportionately affected by HIV in resource-limited settings. We examined the psychometric properties of the Spanish-translated PFQ-2 shame subscale among female sex workers in two Mexico–U.S. border cities. From 2016 to 2017, 602 HIV-negative female sex workers in Tijuana and Ciudad Juarez participated in an efficacy trial evaluating a behavior change maintenance intervention. Interviewer-administered surveys collected information on shame (10-item PFQ-2 subscale), psychosocial factors, and sociodemographics. Item performance, confirmatory factor analysis, internal consistency, differential item functioning by city, and concurrent validity were assessed. Response options were collapsed to 3-point responses to improve item performance, and one misfit item was removed. The revised 9-item shame subscale supported a single construct and had good internal consistency (Cronbach’s α = .86). Notable differential item functioning was found but resulted in a negligible effect on overall scores. Correlations between the revised shame subscale and guilt ( r = .79, p < .01), depression ( r = .69, p < .01), and emotional support ( r = −.28, p < .01) supported concurrent validity. The revised PFQ-2 shame subscale showed good reliability and concurrent validity in our sample, and should be explored in other stigmatized populations.

Download Full-text

Linguistic Distance and Translation Differential Item Functioning on Trends in International Mathematics and Science Study Mathematics Assessment Items

Journal of Psychoeducational Assessment ◽

10.1177/07342829211010537 ◽

2021 ◽

pp. 073428292110105

Author(s):

Semirhan Gökçe ◽

Giray Berberoğlu ◽

Craig S. Wells ◽

Stephen G. Sireci

Keyword(s):

Differential Item Functioning ◽

Test Scores ◽

Mathematics Assessment ◽

Paired Comparisons ◽

Science Study ◽

Linguistic Distance ◽

Test Characteristic ◽

Item Functioning ◽

Mathematics And Science ◽

Country Specific

The 2015 Trends in International Mathematics and Science Study (TIMSS) involved 57 countries and 43 different languages to assess students’ achievement in mathematics and science. The purpose of this study is to evaluate whether items and test scores are affected as the differences between language families and cultures increase. Using differential item functioning (DIF) procedures, we compared the consistency of students’ performance across three combinations of languages and countries: (a) same language but different countries, (b) same countries but different languages, and (c) different languages and different countries. The analyses consisted of the detection of the number of DIF items for all paired comparisons within each condition, the direction of DIF, the magnitude of DIF, and the differences between test characteristic curves. As the countries were more distant with respect to cultures and language families, the presence of DIF increased. The magnitude of DIF was greatest when both language and country differed, and smallest when the languages were same, but the countries were different. Results suggest that when TIMSS results are compared across countries, the language- and country-specific differences which could reflect cultural, curriculum, or other differences should be considered.

Download Full-text

Confirmatory Factor Analysis, Internal Consistency, Gender Differential Item Functioning and Discriminant Validity of the Fear of COVID-5 Scale Amidst Emerging Adult University Students in Mexico

OMEGA - Journal of Death and Dying ◽

10.1177/00302228211016216 ◽

2021 ◽

pp. 003022282110162

Author(s):

Adalberto Campo-Arias ◽

Andrés Felipe Tirado-Otálvaro ◽

Isabel Álvarez-Solorza ◽

Carlos Arturo Cassiani-Miranda

Keyword(s):

Factor Analysis ◽

Confirmatory Factor Analysis ◽

Differential Item Functioning ◽

Internal Consistency ◽

Discriminant Validity ◽

Emerging Adult ◽

One Dimension ◽

Item Functioning ◽

Confirmatory Factor ◽

Gender Differential

The study aimed to perform confirmatory factor analysis, internal consistency, gender differential item functioning, and discriminant validity of the Fear of COVID-5 Scale in emerging adult students of a university in Mexico. Confirmatory factor analysis, internal consistency (Cronbach's alpha and McDonald's omega), and gender differential item functioning were estimated (Kendall tau b correlation). The Fear of COVID-5 Scale showed a one-dimension structure (RMSEA = 0.07, CFI = 0.98, TLI = 0.96, and SRMR = 0.02), with high internal consistency (Cronbach's alpha of 0.78 and McDonald's omega of 0.81), non-gender differential item functioning (Kendall tau b between 0.07 and 0.10), and significant discriminant validity (Higher scores for fear of COVID-19 were observed in high clinical anxiety levels). In conclusion, the Fear of COVID-5 Scale presents a clear one-dimension structure similar to a previous study.

Download Full-text

Measuring Upper Limb Capacity in Poststroke Patients: Development, Fit of the Monotone Homogeneity Model, Unidimensionality, Fit of the Double Monotonicity Model, Differential Item Functioning, Internal Consistency, and Feasibility of the Stroke Upper Limb Capacity Scale, SULCS

Archives of Physical Medicine and Rehabilitation ◽

10.1016/j.apmr.2010.10.034 ◽

2011 ◽

Vol 92 (2) ◽

pp. 214-227 ◽

Cited By ~ 26

Author(s):

Leo D. Roorda ◽

Annemieke Houwink ◽

Wendy Smits ◽

Ivo W. Molenaar ◽

Alexander C. Geurts

Keyword(s):

Differential Item Functioning ◽

Internal Consistency ◽

Upper Limb ◽

Item Functioning ◽

Double Monotonicity

Download Full-text

Determining differential item functioning and its effect on the test scores of selected pib indexes, using item response theory techniques

SA Journal of Industrial Psychology ◽

10.4102/sajip.v27i2.783 ◽

2001 ◽

Vol 27 (2) ◽

Author(s):

Pieter Schaap

Keyword(s):

Item Response Theory ◽

Differential Item Functioning ◽

Item Response ◽

Test Scores ◽

Response Theory ◽

South Africans ◽

Test Characteristics ◽

Potential Index ◽

Item Functioning

The objective of this article is to present the results of an investigation into the item and test characteristics of two tests of the Potential Index Batteries (PIB) in terms of differential item functioning (DIP) and the effect thereof on test scores of different race groups. The English Vocabulary (Index 12) and Spelling Tests (Index 22) of the PIB were analysed for white, black and coloured South Africans. Item response theory (IRT) methods were used to identify items which function differentially for white, black and coloured race groups. Opsomming Die doel van hierdie artikel is om die resultate van n ondersoek na die item- en toetseienskappe van twee PIB (Potential Index Batteries) toetse in terme van itemsydigheid en die invloed wat dit op die toetstellings van rassegroepe het, weer te gee. Die Potential Index Batteries (PIB) se Engelse Woordeskat (Index 12) en Spellingtoetse (Index 22) is ten opsigte van blanke, swart en gekleurde Suid-Afrikaners ontleed. Itemresponsteorie (IRT) is gebruik om items te identifiseer wat as sydig (DIP) vir die onderskeie rassegroepe beskou kan word.

Download Full-text

The Modified Interpersonal Relationship Scale: Reliability and Validity,

Psychological Reports ◽

10.1177/00332941930733pt141 ◽

1993 ◽

Vol 73 (3_part_1) ◽

pp. 995-1004

Author(s):

Jane L. Garthoeffner ◽

Carolyn S. Henry ◽

Linda C. Robinson

Keyword(s):

Interpersonal Relationships ◽

Internal Consistency ◽

Concurrent Validity ◽

Interpersonal Relationship ◽

Reliability And Validity ◽

Internal Consistency Reliability ◽

Self Report ◽

Varimax Rotation ◽

Scale Reliability

This study was designed to evaluate a modification of the Interpersonal Relationship Scale and to establish subscales representing dimensions of intimacy (N = 356). The initial self-report scale was tested for internal consistency reliability. Next, subscales were identified using principal components factoring with varimax rotation. Internal consistency reliability and concurrent validity of the modified over-all scale and subscales were examined. The modified scale and subscales provided reliable and valid measures of the quality of interpersonal relationships in young adults.

Download Full-text

The Kansas Marital Satisfaction Scale: A Further Brief Report

Psychological Reports ◽

10.2466/pr0.1984.54.2.629 ◽

1984 ◽

Vol 54 (2) ◽

pp. 629-630 ◽

Cited By ~ 21

Author(s):

Kelly J. Grover ◽

Lois A. Paff-Bergen ◽

Candyce S. Russell ◽

Walter R. Schumm

Keyword(s):

Marital Satisfaction ◽

Internal Consistency ◽

Concurrent Validity ◽

Research Evidence ◽

Internal Consistency Reliability ◽

Dyadic Adjustment ◽

Satisfaction Scale ◽

Dyadic Adjustment Scale ◽

Satisfaction Subscale

The Kansas Marital Satisfaction Scale was administered by survey to 51 wives between the ages of 32 and 71 yr. Further support for the internal consistency reliability of the scale (α = 0.92) was obtained, and patterns of differences between the item means paralleled previous research. Evidence was found for the concurrent validity of the scale, which correlated significantly with six of seven items from the satisfaction subscale of Spanier's Dyadic Adjustment Scale.

Download Full-text

Development and validation of a multi-lingual online questionnaire for surveying the COVID-19 prevention and control measures used in global workplaces

BMC Public Health ◽

10.1186/s12889-022-12500-w ◽

2022 ◽

Vol 22 (1) ◽

Author(s):

Carolyn Ingram ◽

Yanbing Chen ◽

Conor Buggy ◽

Vicky Downey ◽

Mary Archibald ◽

...

Keyword(s):

Construct Validity ◽

Internal Consistency ◽

Content Validity ◽

Prevention And Control ◽

Online Survey ◽

Discriminant Validity ◽

Internal Consistency Reliability ◽

Criterion Validity ◽

And Control ◽

Occupational Settings

Abstract Background Despite widespread COVID-19 vaccination programs, there is an ongoing need for targeted disease prevention and control efforts in high-risk occupational settings. This study aimed to develop, pilot, and validate an instrument for surveying occupational COVID-19 infection prevention and control (IPC) measures available to workers in diverse geographic and occupational settings. Methods A 44-item online survey was developed in English and validated for face and content validity according to literature review, expert consultation, and pre-testing. The survey was translated and piloted with 890 workers from diverse industries in Canada, Ireland, Argentina, Poland, Nigeria, China, the US, and the UK. Odds ratios generated from univariable, and multivariable logistic regression assessed differences in ‘feeling protected at work’ according to gender, age, occupation, country of residence, professional role, and vaccination status. Exploratory factor analysis (EFA) was conducted, and internal consistency reliability verified with Cronbach’s alpha. Hypothesis testing using two-sample t-tests verified construct validity (i.e., discriminant validity, known-groups technique), and criterion validity. Results After adjustment for occupational sector, characteristics associated with feeling protected at work included being male (AOR = 1.88; 95% CI = 1.18,2.99), being over 55 (AOR = 2.17; 95% CI = 1.25,3.77) and working in a managerial position (AOR = 3.1; 95% CI = 1.99,4.83). EFA revealed nine key IPC domains relating to: environmental adjustments, testing and surveillance, education, costs incurred, restricted movements, physical distancing, masking, isolation strategies, and areas for improvement. Each domain showed sufficient internal consistency reliability (Cronbach’s alpha ≥0.60). Hypothesis testing revealed differences in survey responses by country and occupational sector, confirming construct validity (p < 0.001), criterion validity (p = 0.04), and discriminant validity (p < 0.001). Conclusions The online survey, developed in English to identify the COVID-19 protective measures used in diverse workplace settings, showed strong face validity, content validity, internal consistency, criterion validity, and construct validity. Translations in Chinese, Spanish, French, Polish, and Hindi demonstrated adaptability of the survey for use in international working environments. The multi-lingual tool can be used by decision makers in the distribution of IPC resources, and to guide occupational safety and health (OSH) recommendations for preventing COVID-19 and future infectious disease outbreaks.

Download Full-text

Reliability, validity and invariance of the Simplified Medication Adherence Questionnaire (SMAQ) among HIV-positive women in Ethiopia: a quasi-experimental study

10.21203/rs.2.12427/v1 ◽

2019 ◽

Author(s):

Chris B. Agala ◽

Bruce J. Fried ◽

James C. Thomas ◽

Heidi W. Reynolds ◽

Kristen Hassmiller Lich ◽

...

Keyword(s):

Medication Adherence ◽

Measurement Invariance ◽

Internal Consistency ◽

Concurrent Validity ◽

Internal Consistency Reliability ◽

Measurement Properties ◽

Hiv Positive ◽

Hiv Positive Women ◽

Persons Living With Hiv ◽

Validity Indices

Abstract Background: Adherence to antiretroviral therapy is critical to the achievement of the third target of the UNAIDS Fast-Track Initiative goals of 2020-2030. Reliable, valid and accurate measurement of adherence are important for correct assessment of adherence and in predicting the efficacy of ART. The Simplified Medication Adherence Questionnaire is a six-item scale which assesses the perception of persons living with HIV about their adherence to ART. Despite recent widespread use, its measurement properties have yet to be carefully documented beyond the original study in Spain. The objective of this paper was to conduct internal consistency reliability, concurrent validity and measurement invariance tests for the SMAQ. Methods: HIV-positive women who were receiving ART services from 51 service providers in two sub-cities of Addis Ababa, Ethiopia completed the SMAQ in a HIV treatment referral network study between 2011 and 2012. Two cross-sections of 402 and 524 female patients of reproductive age, respectively, from the two sub-cities were randomly selected and interviewed at baseline and follow-up. We used Cronbach’s coefficient alpha (α) to assess internal consistency reliability, Pearson product-moment correlation (r) to assess concurrent validity and multiple-group confirmatory factor analysis to analyze factorial structure and measurement invariance of the SMAQ. Results: All participants were female with a mean age of 33 (33.06-33.74; median: 34 years; range 18-45 years. Cronbach’s alphas for the six items of the SMAQ were 0.66, 0.68, 0.75 and 0.75 for T1 control, T1 intervention, T2 control, and T2 intervention groups, respectively. Pearson correlation coefficients were 0.78, 0.49, 0.52, 0.48, 0.76 and 0.80 for items 1 to 6, respectively, between T1 compared to T2. We found invariance for factor loadings, observed item intercepts and factor variances, also known as strong measurement invariance, when we compared latent adherence levels between and across patient-groups. Conclusions: Our results show that the six-item SMAQ scale has adequate reliability and validity indices for this sample, in addition to being invariant across comparison groups. The findings of this study strengthen the evidence in support of the increasing use of SMAQ by interventionists and researchers to examine, pool and compare adherence scores across groups and time periods.

Download Full-text

Correlation Between he Ratio Pipp-R and Nips and Inter-Rater Reliability to Evaluate Pain During the Procedure Suction in Newborn Premature

10.21203/rs.3.rs-956847/v1 ◽

2021 ◽

Author(s):

NAYARA RODRIGUES GOMES OLIVEIRA ◽

CIBELLE MARTINS ROBERTO FORMIGA ◽

BRUNA ABREU RAMOS ◽

RAFAELA NOLETO DOS SANTOS ◽

NAYARA NUBIA DE SOUSA MOREIRA ◽

...

Keyword(s):

Internal Consistency ◽

Concurrent Validity ◽

Internal Consistency Reliability ◽

Pain Scale ◽

Good Choice ◽

Test Results ◽

Good Reliability ◽

Infant Pain ◽

Spearman Test ◽

Premature Infant Pain Profile

Abstract Objectives:To verify the Correlation between the Neonatal Infant Pain Scale (NIPS) and Premature Infant Pain Profile – Revised (PIPP-R), the internal consistency of each scale, and assess the reliability between evaluators in the assessment of pain during the aspiration procedure in premature newborns. Methods: an observational, prospective study. Participated in the study, 50 infants who met the following criteria for inclusion: Newborn (NB) preterm (GA> 26 weeks and <36 weeks and five days) with low birth weight (<2500 g), hemodynamically stable, with minimal sedation or without sedation or mechanical ventilation in CPAP or catheter nose of O2 or the air environment that needed to perform the procedure of vacuum in the period of hospitalization. The evaluation of the newborns occurred during three different aspiration procedures; aspiration 1 (no intervention), aspiration 2 (use of gentle touch), and aspiration 3 (use of sucrose). They have applied two evaluation instruments, NIPS and PIPP-R. that Cronbach's alpha determined the internal consistency, reliability between evaluators by the coefficient of Correlation intraclass, validity competitor by Spearman test. Results: Internal Consistency was high for NIPS (r= 0.824) and moderate for PIPP-R (0.655). Reliability between raters was high respectively in the three conditions 0.991; 0.987; 0.993 on the NIPS scale and 0.997; 0.986; 0.977 on the PIPP-R scale. One observed concurrent validity Only in the first aspiration. Conclusion: the NIPS seems to have a better utility clinic than PIPP-R; however, the two scales showed good reliability among the evaluators, and internal consistency, being a good choice for evaluation of pain during the procedure of aspiration.

Download Full-text