scholarly journals A psychometric evaluation of the 12-item EPQ-R neuroticism scale in 502,591 UK Biobank participants using item response theory (IRT)

2020 ◽  
Author(s):  
Sarah Bauermeister ◽  
John Gallacher

Abstract Background Neuroticism has been described as a broad and pervasive personality dimension or ‘heterogeneous’ trait measuring components of mood instability such as worry; anxiety; irritability; moodiness; self-consciousness; sadness and irritabililty. Consistent with depression and anxiety-related disorders, increased neuroticism places an individual vulnerable for other unipolar and bipolar mood disorders. However, the measurement of neuroticism remains a challenge. Our aim was to identify psychometrically efficient items and inform the inclusion of redundant items across the 12-item EPQ-R Neuroticism scale using Item Response Theory (IRT). Methods The 12-item binary EPQ-R Neuroticism scale was evaluated by estimating a two-parameter (2-PL) IRT model on data from 502,591 UK Biobank participants aged 37 to 73 years (M = 56.53 years; SD = 8.05), 54% female. Models were run listwise (n= 401,648) and post-estimation mathematical assumptions were computed. All analyses were conducted in STATA 16 SE on the Dementias Platform UK (DPUK) Data Portal. Results A plot of θ values (Item Information functions) showed that most items clustered around the mid-range where discrimination values ranged from 1.34 to 2.28. Difficulty values for individual item θ scores ranged from -0.13 to 1.41. A Mokken analysis suggested a weak to medium level of monotonicity between the items, no items reach strong scalability (H=0.35-0.47). Systematic item deletions and rescaling found that an 7-item scale is more efficient and with information (discrimination) ranging from 1.56 to 2.57 and stronger range of scalability (H=0.47-0.52). A 3-item scale is highly discriminatory but offers a narrow range of person ability (difficulty). A logistic regression differential item function (DIF) analysis exposed significant gender item bias functioning uniformly across all versions of the scale. Conclusions Across 401,648 UK Biobank participants, the 12-item EPQ-R neuroticism scale exhibited psychometric inefficiency with poor discrimination at the extremes of the scale-range. High and low scores are relatively poorly represented and uninformative suggesting that high neuroticism scores derived from the EPQ-R are a function of cumulative mid-range values. The scale also shows evidence of gender item bias and future scale development should consider the former along with item deletions.

2019 ◽  
Author(s):  
Sarah Bauermeister ◽  
John Gallacher

AbstractBackgroundNeuroticism has been described as a broad and pervasive personality dimension or ‘heterogeneous’ trait measuring components of mood instability such as worry; anxiety; irritability; moodiness; self-consciousness; sadness and irritabililty. Consistent with depression and anxiety-related disorders, increased neuroticism places an individual vulnerable for other unipolar and bipolar mood disorders. However, the measurement of neuroticism remains a challenge. Our aim was to identify psychometrically efficient items and inform the inclusion of redundant items across the 12-item EPQ-R Neuroticism scale using Item Response Theory (IRT).MethodsThe 12-item binary EPQ-R Neuroticism scale was evaluated by estimating a two-parameter (2-PL) IRT model on data from 502,591 UK Biobank participants aged 37 to 73 years (M = 56.53 years; SD = 8.05), 54% female. Models were run listwise (n= 401,648) and post-estimation mathematical assumptions were computed. All analyses were conducted in STATA 16 SE on the Dementias Platform UK (DPUK) Data Portal.ResultsA plot of θ values (Item Information functions) showed that most items clustered around the mid-range where discrimination values ranged from 1.34 to 2.28. Difficulty values for individual item θ scores ranged from −0.13 to 1.41. A Mokken analysis suggested a weak to medium level of monotonicity between the items, no items reach strong scalability (H=0.35-0.47). Systematic item deletions and rescaling found that an 7-item scale is more efficient and with information (discrimination) ranging from 1.56 to 2.57 and stronger range of scalability (H=0.47-0.52). A 3-item scale is highly discriminatory but offers a narrow range of person ability (difficulty). A logistic regression differential item function (DIF) analysis exposed significant gender item bias functioning uniformly across all versions of the scale.ConclusionsAcross 401,648 UK Biobank participants, the 12-item EPQ-R neuroticism scale exhibited psychometric inefficiency with poor discrimination at the extremes of the scale-range. High and low scores are relatively poorly represented and uninformative suggesting that high neuroticism scores derived from the EPQ-R are a function of cumulative mid-range values. The scale also shows evidence of gender item bias and future scale development should consider the former along with item deletions.


2019 ◽  
Vol 40 (4) ◽  
pp. 422-429 ◽  
Author(s):  
Guiping Liu ◽  
Alexander C. Peterson ◽  
Kevin Wing ◽  
Trafford Crump ◽  
Alastair Younger ◽  
...  

Background: Significant ankle arthritis results in functional limitations and patient morbidity. There is a need to measure symptoms and the impact of interventions on patient’s quality of life using valid and reliable patient-reported measurement instruments. The objective of this research was to validate the Ankle Osteoarthritis Scale instrument in the preoperative setting using factor analysis, item response theory, and differential item function methods. Methods: This research is based on secondary analysis of patients scheduled for ankle arthrodesis or total ankle replacement in Vancouver, Canada. Participants completed the instrument between September 2014 and August 2017. Item response theory was used to estimate item difficulty and discrimination parameters, controlling for study participants’ underlying level of ankle function. Differential item function was examined for sex, age group, and surgery. There were 88 participants. Results: Modification indices suggested that item 10, “walking around the house,” would better fit the pain domain rather than the disability domain. Items in the pain domain displayed a range of discrimination and difficulty. Items in the disability domain exhibited a range of discrimination, though the disability domain had low difficulty. Differential item functioning for sex, age group, and ankle arthrodesis or total ankle replacement appeared to be ignorable. Conclusion: This evaluation of the Ankle Osteoarthritis Scale found the instrument to be a strong measure of the effect of pain and dysfunction among patients with end-stage ankle arthritis, even when removing items 7 and 8, supporting its prior use in numerous clinical studies. Level of Evidence: Level II, prospective comparative study.


Author(s):  
Anju Devianee Keetharuth ◽  
Jakob Bue Bjorner ◽  
Michael Barkham ◽  
John Browne ◽  
Tim Croudace ◽  
...  

Abstract Purpose ReQoL-10 and ReQoL-20 have been developed for use as outcome measures with individuals aged 16 and over, experiencing mental health difficulties. This paper reports modelling results from the item response theory (IRT) analyses that were used for item reduction. Methods From several stages of preparatory work including focus groups and a previous psychometric survey, a pool of items was developed. After confirming that the ReQoL item pool was sufficiently unidimensional for scoring, IRT model parameters were estimated using Samejima’s Graded Response Model (GRM). All 39 mental health items were evaluated with respect to item fit and differential item function regarding age, gender, ethnicity, and diagnosis. Scales were evaluated regarding overall measurement precision and known-groups validity (by care setting type and self-rating of overall mental health). Results The study recruited 4266 participants with a wide range of mental health diagnoses from multiple settings. The IRT parameters demonstrated excellent coverage of the latent construct with the centres of item information functions ranging from − 0.98 to 0.21 and with discrimination slope parameters from 1.4 to 3.6. We identified only two poorly fitting items and no evidence of differential item functioning of concern. Scales showed excellent measurement precision and known-groups validity. Conclusion The results from the IRT analyses confirm the robust structure properties and internal construct validity of the ReQoL instruments. The strong psychometric evidence generated guided item selection for the final versions of the ReQoL measures.


2017 ◽  
Vol 8 (1) ◽  
pp. 14
Author(s):  
Rana Th. Momani

Item Response Theory becomes one of the most popular methods for instruments development and evaluation methods. This baseline study is a self-directed learning readiness (SDLR) 40 item scale with data from 648 undergraduate psychology female students attending Qassim University in Saudi Arabia through randomized selection to evaluate an SDLR scale at item and scale levels using GRM. Results provide more detailed diagnostic information to modulate the scale. GRM analysis led to the detection of two locally dependent items, one item with low discrimination parameter and 15 model misfit items. The scale often tends to measure low and moderate levels of SDLR. Advanced psychometric evaluations should be made and the SDLR scale must be reviewed based on quantitative and qualitative analysis.


2011 ◽  
Vol 19 (3) ◽  
pp. 239-248 ◽  
Author(s):  
Thelma J. Mielenz ◽  
Michael C. Edwards ◽  
Leigh F. Callahan

Benefits of physical activity for those with arthritis are clear, yet physical activity is difficult to initiate and maintain. Self-efficacy is a key modifiable psychosocial determinant of physical activity. This study examined two scales for self-efficacy for exercise behavior (SEEB) to identify their strengths and weaknesses using item response theory (IRT) from community-based randomized controlled trials of physical activity programs in adults with arthritis. The 2 SEEB scales included the 9-item scale by Resnick developed with older adults and the 5-item scale by Marcus developed with employed adults. All IRT analyses were conducted using the graded-response model. IRT assumptions were assessed using both exploratory and confirmatory factor analysis. The IRT analyses indicated that these scales are precise and reliable measures for identifying people with arthritis and low SEEB. The Resnick SEEB scale is slightly more precise at lower levels of self-efficacy in older adults with arthritis.


2018 ◽  
Vol 66 ◽  
pp. 179-186 ◽  
Author(s):  
Yue Zhao ◽  
Hoi Kei Kuan ◽  
Joyce O.K. Chung ◽  
Cecilia K.Y. Chan ◽  
William H.C. Li

Sign in / Sign up

Export Citation Format

Share Document