scholarly journals Using Item Response Theory to Evaluate Self-directed Learning Readiness Scale

2017 ◽  
Vol 8 (1) ◽  
pp. 14
Author(s):  
Rana Th. Momani

Item Response Theory becomes one of the most popular methods for instruments development and evaluation methods. This baseline study is a self-directed learning readiness (SDLR) 40 item scale with data from 648 undergraduate psychology female students attending Qassim University in Saudi Arabia through randomized selection to evaluate an SDLR scale at item and scale levels using GRM. Results provide more detailed diagnostic information to modulate the scale. GRM analysis led to the detection of two locally dependent items, one item with low discrimination parameter and 15 model misfit items. The scale often tends to measure low and moderate levels of SDLR. Advanced psychometric evaluations should be made and the SDLR scale must be reviewed based on quantitative and qualitative analysis.

2011 ◽  
Vol 19 (3) ◽  
pp. 239-248 ◽  
Author(s):  
Thelma J. Mielenz ◽  
Michael C. Edwards ◽  
Leigh F. Callahan

Benefits of physical activity for those with arthritis are clear, yet physical activity is difficult to initiate and maintain. Self-efficacy is a key modifiable psychosocial determinant of physical activity. This study examined two scales for self-efficacy for exercise behavior (SEEB) to identify their strengths and weaknesses using item response theory (IRT) from community-based randomized controlled trials of physical activity programs in adults with arthritis. The 2 SEEB scales included the 9-item scale by Resnick developed with older adults and the 5-item scale by Marcus developed with employed adults. All IRT analyses were conducted using the graded-response model. IRT assumptions were assessed using both exploratory and confirmatory factor analysis. The IRT analyses indicated that these scales are precise and reliable measures for identifying people with arthritis and low SEEB. The Resnick SEEB scale is slightly more precise at lower levels of self-efficacy in older adults with arthritis.


2021 ◽  
pp. 019394592110159
Author(s):  
Wen Liu ◽  
Lilian Dindo ◽  
Katherine Hadlandsmyth ◽  
George Jay Unick ◽  
M. Bridget Zimmerman ◽  
...  

Little research has compared item functioning of the Patient-Reported Outcomes Measurement Information System (PROMIS®) anxiety short form 6a and the generalized anxiety disorder 7-item scale using item response theory models. This was a secondary analysis of self-reported assessments from 67 at-risk U.S. military veterans. The two measures performed comparably well with data fitting adequately to models, acceptable item discriminations, and item and test information curves being unimodal and symmetric. The PROMIS® anxiety short form 6a performed better in that item difficulty estimates had a wider range and distributed more evenly and all response categories had less floor effect, while the third category in most items of the generalized anxiety disorder 7-item scale were rarely used. While both measures may be appropriate, findings provided preliminary information supporting use of the PROMIS® anxiety short form 6a as potentially preferable, especially for veterans with low-to-moderate anxiety. Further testing is needed in larger, more diverse samples.


2020 ◽  
Author(s):  
Sarah Bauermeister ◽  
John Gallacher

Abstract Background Neuroticism has been described as a broad and pervasive personality dimension or ‘heterogeneous’ trait measuring components of mood instability such as worry; anxiety; irritability; moodiness; self-consciousness; sadness and irritabililty. Consistent with depression and anxiety-related disorders, increased neuroticism places an individual vulnerable for other unipolar and bipolar mood disorders. However, the measurement of neuroticism remains a challenge. Our aim was to identify psychometrically efficient items and inform the inclusion of redundant items across the 12-item EPQ-R Neuroticism scale using Item Response Theory (IRT). Methods The 12-item binary EPQ-R Neuroticism scale was evaluated by estimating a two-parameter (2-PL) IRT model on data from 502,591 UK Biobank participants aged 37 to 73 years (M = 56.53 years; SD = 8.05), 54% female. Models were run listwise (n= 401,648) and post-estimation mathematical assumptions were computed. All analyses were conducted in STATA 16 SE on the Dementias Platform UK (DPUK) Data Portal. Results A plot of θ values (Item Information functions) showed that most items clustered around the mid-range where discrimination values ranged from 1.34 to 2.28. Difficulty values for individual item θ scores ranged from -0.13 to 1.41. A Mokken analysis suggested a weak to medium level of monotonicity between the items, no items reach strong scalability (H=0.35-0.47). Systematic item deletions and rescaling found that an 7-item scale is more efficient and with information (discrimination) ranging from 1.56 to 2.57 and stronger range of scalability (H=0.47-0.52). A 3-item scale is highly discriminatory but offers a narrow range of person ability (difficulty). A logistic regression differential item function (DIF) analysis exposed significant gender item bias functioning uniformly across all versions of the scale. Conclusions Across 401,648 UK Biobank participants, the 12-item EPQ-R neuroticism scale exhibited psychometric inefficiency with poor discrimination at the extremes of the scale-range. High and low scores are relatively poorly represented and uninformative suggesting that high neuroticism scores derived from the EPQ-R are a function of cumulative mid-range values. The scale also shows evidence of gender item bias and future scale development should consider the former along with item deletions.


2020 ◽  
Author(s):  
Alissa Walsh ◽  
Rena Cao ◽  
Darren Wong ◽  
Ramona Kantschuster ◽  
Lawrence Matini ◽  
...  

Abstract BackgroundThe SCCAI was designed to facilitate assessment of disease activity in ulcerative colitis (UC). We aimed to interrogate the metric properties of individual items of the SCCAI using item response theory (IRT) analysis, to simplify and improve its performance. MethodsThe original 9-item SCCAI was collected through TrueColours, a real-time software platform which allows remote entry and monitoring of patients with UC. Data were securely uploaded onto Dementias Platform UK Data Portal, where they were analysed in Stata 16.1 SE. A 2-parameter (2-PL) logistic IRT model was estimated to evaluate each item of the SCCAI for its informativeness (discrimination). A revised scale was generated and re-assessed following systematic removal of items. ResultsSCCAI data for 516 patients (41 years, SD=15) were examined. After initial item deletion (Erythema nodosum, Pyoderma gangrenosum), a 7-item scale was estimated. Discrimination values (information) ranged from 0.41 to 2.52 suggesting selected item inefficiency. Systematic item deletions found that a 4-item scale (bowel frequency day; bowel frequency nocturnal; urgency; rectal bleeding) was more informative and discriminatory of trait severity (discrimination values of 1.50 to 2.78). The 4-item scale possesses higher scalability and unidimensionality, suggesting that the responses to items are either direct endorsement or non-endorsement of the trait (disease activity). Conclusion Reduction of the SCCAI from the original 9-item scale to a 4-item scale provides optimum trait information that will minimise response burden. This new 4-item scale needs validation against other measures of disease activity such as faecal calprotectin, endoscopy and histopathology.


2020 ◽  
Author(s):  
Alissa Walsh ◽  
Rena Cao ◽  
Darren Wong ◽  
Ramona Kantschuster ◽  
Lawrence Matini ◽  
...  

Abstract BackgroundThe SCCAI was designed to facilitate assessment of disease activity in ulcerative colitis (UC). We aimed to interrogate the metric properties of individual items of the SCCAI using item response theory (IRT) analysis, to simplify and improve its performance. MethodsThe original 9-item SCCAI was collected through TrueColours, a real-time software platform which allows remote entry and monitoring of patients with UC. Data were securely uploaded onto Dementias Platform UK Data Portal, where they were analysed in Stata 16.1 SE. A 2-parameter (2-PL) logistic IRT model was estimated to evaluate each item of the SCCAI for its informativeness (discrimination). A revised scale was generated and re-assessed following systematic removal of items. ResultsSCCAI data for 516 UC patients (41 years, SD=15) treated in Oxford were examined. After initial item deletion (Erythema nodosum, Pyoderma gangrenosum), a 7-item scale was estimated. Discrimination values (information) ranged from 0.41 to 2.52 indicating selected item inefficiency with three items <1.70 which is a suggested discriminatory value for optimal efficiency. Systematic item deletions found that a 4-item scale (bowel frequency day; bowel frequency nocturnal; urgency to defaecation; rectal bleeding) was more informative and discriminatory of trait severity (discrimination values of 1.50 to 2.78). The 4-item scale possesses higher scalability and unidimensionality, suggesting that the responses to items are either direct endorsement (patient selection by symptom) or non-endorsement of the trait (disease activity). Conclusion Reduction of the SCCAI from the original 9-item scale to a 4-item scale provides optimum trait information that will minimise response burden. This new 4-item scale needs validation against other measures of disease activity such as faecal calprotectin, endoscopy and histopathology.


2019 ◽  
Author(s):  
Sarah Bauermeister ◽  
John Gallacher

AbstractBackgroundNeuroticism has been described as a broad and pervasive personality dimension or ‘heterogeneous’ trait measuring components of mood instability such as worry; anxiety; irritability; moodiness; self-consciousness; sadness and irritabililty. Consistent with depression and anxiety-related disorders, increased neuroticism places an individual vulnerable for other unipolar and bipolar mood disorders. However, the measurement of neuroticism remains a challenge. Our aim was to identify psychometrically efficient items and inform the inclusion of redundant items across the 12-item EPQ-R Neuroticism scale using Item Response Theory (IRT).MethodsThe 12-item binary EPQ-R Neuroticism scale was evaluated by estimating a two-parameter (2-PL) IRT model on data from 502,591 UK Biobank participants aged 37 to 73 years (M = 56.53 years; SD = 8.05), 54% female. Models were run listwise (n= 401,648) and post-estimation mathematical assumptions were computed. All analyses were conducted in STATA 16 SE on the Dementias Platform UK (DPUK) Data Portal.ResultsA plot of θ values (Item Information functions) showed that most items clustered around the mid-range where discrimination values ranged from 1.34 to 2.28. Difficulty values for individual item θ scores ranged from −0.13 to 1.41. A Mokken analysis suggested a weak to medium level of monotonicity between the items, no items reach strong scalability (H=0.35-0.47). Systematic item deletions and rescaling found that an 7-item scale is more efficient and with information (discrimination) ranging from 1.56 to 2.57 and stronger range of scalability (H=0.47-0.52). A 3-item scale is highly discriminatory but offers a narrow range of person ability (difficulty). A logistic regression differential item function (DIF) analysis exposed significant gender item bias functioning uniformly across all versions of the scale.ConclusionsAcross 401,648 UK Biobank participants, the 12-item EPQ-R neuroticism scale exhibited psychometric inefficiency with poor discrimination at the extremes of the scale-range. High and low scores are relatively poorly represented and uninformative suggesting that high neuroticism scores derived from the EPQ-R are a function of cumulative mid-range values. The scale also shows evidence of gender item bias and future scale development should consider the former along with item deletions.


2014 ◽  
Vol 15 (1) ◽  
pp. 31-41
Author(s):  
Agus Santoso

One of the most popular item selection methods in the design of adaptive testing is Maximum Information Method. This method provides the items with the maximum information at a certain level of selecting ability. The lack of this method is rarely accurate in estimating the level of ability of the examinees at the beginning of the test and tends to select items with higher discrimination-parameter value than items with lower discrimination-parameter. It creates problem in maintaining item bank. Therefore, another method should be found. The objective of the study is to determine the performance of the application of the estimation method Balanced Efficiency Information (EBI) on the design of an adaptive test. This research was carried out through a simulation study in the setting of organizing the Open University final exams. Item bank for the purposes of simulation models based on 3 parameters Item Response Theory was contructed a total of 900 items were generated base on the ideal parameter of the item specifications. Two selection criteria items were simulated, namely Information Maximum and EBI Maximum. Those two criteria were also designed to meet the content balancing. This is to ensure that the algorithm is appropriate with the applicable UT modular learning, meaning items of each module were proportionally represented and suited the blueprint. The tests will be stopped at when the standard error of estimate ( standard error of estimation = SEE ) is 0.3.The study summarized that the algorithm of EBI was more accurate than the Maximum Information criteria in estimating performance capabilities of participants. This is indicated by the value of the bias and the standard deviation of EBI is smaller than Maximum Information criterias. Another advantage of the application of the EBI Maximum is optimally utilizing of the item bank. The items with low level of the discrimination-parameter will also be chosen at the begining of the test. The maximum information criterion is more efficient in terms of test length but less optimally of the item bank utilization. Salah satu metode pemilihan butir soal yang popular digunakan dalam rancangan tes adaptif adalah metode Informasi Maksimum. Melalui metode ini, butir soal yang memiliki informasi maksimum pada tingkat kemampuan tertentu akan dipilih dan diberikan kepada peserta tes. Namun kelemahan dari metode ini adalah kurang akurat dalam mengestimasi tingkat kemampuan peserta pada awal tes dan memiliki kecenderungan untuk memilih butir dengan nilai daya pembeda parameter butir yang tinggi dibandingkan butir dengan nilai parameter daya pembeda yang rendah, sehingga menimbulkan masalah pemeliharaan butir soal dalam bank soal. Karena itu dicari cara lain untuk mengatasi masalah tersebut. Penelitian ini bertujuan untuk mengetahui performa hasil estimasi dari penerapan metode Efficiency Balanced Information (EBI) pada rancangan tes adaptif. Penelitian ini dilakukan melalui studi simulasi dalam setting penyelenggaraan ujian akhir semester Universitas Terbuka. Bank soal untuk keperluan simulasi dibangkitkan berdasarkan model Item Response Theory 3 parameter. Sebanyak 900 butir soal dalam bank soal bangkitan dengan spesifikasi parameter butir yang ideal. Dua kriteria pemilihan butir soal yang disimulasikan yaitu Informasi Maksimum dan EBI Maksimum yang juga dirancang agar memenuhi keseimbangan isi. Hal ini agar menjamin bahwa algoritma yang dihasilkan sesuai dengan pembelajaran moduler yang diterapkan UT, artinya butir soal setiap modul secara proporsional terwakili dan sesuai kisi-kisi. Aturan pemberhentian tes menggunakan kesalahan baku estimasi (standard error of estimation=SEE) sebesar 0,3. Hasil penelitian menyimpulkan bahwa algoritma rancangan tes adaptif dengan kriteria EBI menghasilkan performa hasil estimasi kemampuan peserta yang lebih akurat dibandingkan kriteria Informasi Maksimum. Hal ini ditunjukkan oleh nilai bias dan simpangan baku pengukuran yang lebih kecil dibandingkan kriteria Informasi Maksimum. Kelebihan lain dari penerapan kriteria EBI Maksimum adalah kebermanfaatan bank soal lebih optimal karena butir-butir soal dengan tingkat daya beda rendah juga dimunculkan khususnya pada awal tes. Sedangkan kriteria Informasi Maksimum walaupun lebih efisien dari sisi panjang tes tetapi kurang optimal dalam memanfaatkan bank soal.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Alissa Walsh ◽  
Rena Cao ◽  
Darren Wong ◽  
Ramona Kantschuster ◽  
Lawrence Matini ◽  
...  

Abstract Background The SCCAI was designed to facilitate assessment of disease activity in ulcerative colitis (UC). We aimed to interrogate the metric properties of individual items of the SCCAI using item response theory (IRT) analysis, to simplify and improve its performance. Methods The original 9-item SCCAI was collected through TrueColours, a real-time software platform which allows remote entry and monitoring of patients with UC. Data were securely uploaded onto Dementias Platform UK Data Portal, where they were analysed in Stata 16.1 SE. A 2-parameter (2-PL) logistic IRT model was estimated to evaluate each item of the SCCAI for its informativeness (discrimination). A revised scale was generated and re-assessed following systematic removal of items. Results SCCAI data for 516 UC patients (41 years, SD = 15) treated in Oxford were examined. After initial item deletion (Erythema nodosum, Pyoderma gangrenosum), a 7-item scale was estimated. Discrimination values (information) ranged from 0.41 to 2.52 indicating selected item inefficiency with three items < 1.70 which is a suggested discriminatory value for optimal efficiency. Systematic item deletions found that a 4-item scale (bowel frequency day; bowel frequency nocturnal; urgency to defaecation; rectal bleeding) was more informative and discriminatory of trait severity (discrimination values of 1.50 to 2.78). The 4-item scale possesses higher scalability and unidimensionality, suggesting that the responses to items are either direct endorsement (patient selection by symptom) or non-endorsement of the trait (disease activity). Conclusion Reduction of the SCCAI from the original 9-item scale to a 4-item scale provides optimum trait information that will minimise response burden. This new 4-item scale needs validation against other measures of disease activity such as faecal calprotectin, endoscopy and histopathology.


Sign in / Sign up

Export Citation Format

Share Document