scholarly journals The transition to digital presentation of the diagnostic imaging domain of the Part IV examination of the National Board of Chiropractic Examiners

2020 ◽  
Vol 34 (1) ◽  
pp. 52-67 ◽  
Author(s):  
Igor Himelfarb ◽  
Margaret A. Seron ◽  
John K. Hyland ◽  
Andrew R. Gow ◽  
Nai-En Tang ◽  
...  

Objective: This article introduces changes made to the diagnostic imaging (DIM) domain of the Part IV of the National Board of Chiropractic Examiners examination and evaluates the effects of these changes in terms of item functioning and examinee performance. Methods: To evaluate item function, classical test theory and item response theory (IRT) methods were employed. Classical statistics were used for the assessment of item difficulty and the relation to the total test score. Item difficulties along with item discrimination were calculated using IRT. We also studied the decision accuracy of the redesigned DIM domain. Results: The diagnostic item analysis revealed similarity in item function across test forms and across administrations. The IRT models found a reasonable fit to the data. The averages of the IRT parameters were similar across test forms and across administrations. The classification of test takers into ability (theta) categories was consistent across groups (both norming and all examinees), across all test forms, and across administrations. Conclusion: This research signifies a first step in the evaluation of the transition to digital DIM high-stakes assessments. We hope that this study will spur further research into evaluations of the ability to interpret radiographic images. In addition, we hope that the results prove to be useful for chiropractic faculty, chiropractic students, and the users of Part IV scores.

2020 ◽  
Vol 12 (2-2) ◽  
Author(s):  
Nor Aisyah Saat

Item analysis is the process of examining student responses to test items individually in order to get clear picture on the quality of the item and the overall test. Teachers are encouraged to perform item analysis for each administered test in order to determine which items should be retained, modified, or discarded in the given test. This study aims to analyse items in 2 summative examination question papers by using classical test theory (CTT). The instruments used were the SPM Mathematics Trial Examination Questions 1 2019 which involved 50 students in form 5 students and the SPM Mathematics Trial Examination Question 1 2019 which involved 20 students. The SPM Mathematics Trial Examination Question paper 1 contains 40 objective questions while the SPM Mathematics Trial Examination paper 1 contains 25 subjective questions. The data obtained were analysed using Microsoft Excel software based on the formulas of item difficulty index and discrimination index. This analysis can help teachers for better understanding about the difficulty level of the items used. Finally, based on the analysis items obtained, the items were classified as good, good but improved, marginal or weak items.


2020 ◽  
Vol 17 (Number 2) ◽  
pp. 63-101
Author(s):  
S. Kanageswari Suppiah Shanmugam ◽  
Vincent Wong ◽  
Murugan Rajoo

Purpose - This study examined the quality of English test items using psychometric and linguistic characteristics among Grade Six pupils. Method - Contrary to the conventional approach of relying only on statistics when investigating item quality, this study adopted a mixed-method approach by employing psychometric analysis and cognitive interviews. The former was conducted on 30 Grade Six pupils, with each item representing a different construct commonly found in English test papers. Qualitative input was obtained through cognitive interviews with five Grade Six pupils and expert judgements from three teachers. Findings - None of the items were found to be too easy or difficult, and all items had positive discrimination indices. The item on idioms was most ideal in terms of difficulty and discrimination. Difficult items were found to be vocabulary-based. Surprisingly, the higher-order-thinking subjective items proved to be excellent in difficulty, although improvements could be made on their ability to discriminate. The qualitative expert judgements agreed with the quantitative psychometric analysis. Certain results from the item analysis, however, contradicted past findings that items with the ideal item difficulty value between 0.4 and 0.6 would have equally ideal item discrimination index. Significance -The findings of the study can serve as a reminder on the significance of using Classical Test Theory, a non-complex psychometric approach in assisting classroom teacher practitioners during the meticulous process of test design and ensuring test item quality.


Assessment of learning involves deciding whether or not the content and objectives of education are down pat by administering quality tests. This study assesses the standard of Chemistry action take a look at and compares the item statistics generated mistreatment CTT and IRT strategies. A descriptive survey was adopted involving a sample of N=530 students. The specialised XCALIBRE 4 and ITEMAN 4 softwares were used to conduct the item analysis. Results indicate that, both the two methods commonly identified 13(32.5%) items as “problematic” and 27(67.5%) were “good”. Similarly, a significantly higher correlation exists between item statistics derived from the CTT and IRT models, [(r=-0.985,) and (r=0.801) p<0.05] for item difficulty and discrimination respectively; the study concludes that the Chemistry Achievement test used do not pass through the processes of standardisation. Secondly, CTT and IRT frameworks appeared to be effective and reliable in assessing test items as the two frameworks provide similar and comparable results. The study recommends that the teacher made Chemistry tests used in measuring students’ achievement should be made to pass through all the processes of standardisation. Meanwhile, CTT and IRT approaches of item analysis ought to be integrated within the aspects of item development and analysis because of their superiority within the investigation of reliability and minimising measurement errors


2018 ◽  
Vol 7 (1) ◽  
pp. 29
Author(s):  
Ari Arifin Danuwijaya

Developing a test is a complex and reiterative process which subject to revision even if the items were developed by skilful item writers. Many commercial test publishers need to conduct test analysis, rather than trusting the item writers� judgement and skills to improve the quality of items that need to be proven statistically after trying out was performed. This study is a part of test development process which aims to analyse the reading comprehension test items. One hundred multiple choice questions were pilot tested to 50 postgraduate students in one university. The pilot testing was aimed to investigate item quality which can further be developed better. The responses were then analysed using Classical Test Theory and using psychometric software called Lertap. The results showed that item difficulty level was mostly average. In terms of item discrimination, more than half of the total items were categorized marginal which required further modifications. This study suggests some recommendation that can be useful to improve the quality of the developed items.��Keywords: reading comprehension; item analysis; classical test theory; item difficulty; test development.


2021 ◽  
Vol 11 (13) ◽  
pp. 6048
Author(s):  
Jaroslav Melesko ◽  
Simona Ramanauskaite

Feedback is a crucial component of effective, personalized learning, and is usually provided through formative assessment. Introducing formative assessment into a classroom can be challenging because of test creation complexity and the need to provide time for assessment. The newly proposed formative assessment algorithm uses multivariate Elo rating and multi-armed bandit approaches to solve these challenges. In the case study involving 106 students of the Cloud Computing course, the algorithm shows double learning path recommendation precision compared to classical test theory based assessment methods. The algorithm usage approaches item response theory benchmark precision with greatly reduced quiz length without the need for item difficulty calibration.


Author(s):  
Lusine Vaganian ◽  
Sonja Bussmann ◽  
Maren Boecker ◽  
Michael Kusch ◽  
Hildegard Labouvie ◽  
...  

Abstract Purpose The World Health Organization Disability Assessent Schedule 2.0 (WHODAS 2.0) assesses disability in individuals irrespective of their health condition. Previous studies validated the usefulness of the WHODAS 2.0 using classical test theory. This study is the first investigating the psychometric properties of the 12-items WHODAS 2.0 in patients with cancer using item analysis according to the Rasch model. Methods In total, 350 cancer patients participated in the study. Rasch analysis of the 12-items version of the WHODAS 2.0 was conducted and included testing unidimensionality, local independence, and testing for differential item functioning (DIF) with regard to age, gender, type of cancer, presence of metastases, psycho-oncological support, and duration of disease. Results After accounting for local dependence, which was mainly found across items of the same WHODAS domain, satisfactory overall fit to the Rasch model was established (χ2 = 36.14, p = 0.07) with good reliability (PSI = 0.82) and unidimensionality of the scale. DIF was found for gender (testlet ‘Life activities’) and age (testlet ‘Getting around/Self-care’), but the size of DIF was not substantial. Conclusion Overall, the analysis results according to the Rasch model support the use of the WHODAS 2.0 12-item version as a measure of disability in cancer patients.


Author(s):  
Geum-Hee Jeong ◽  
Mi Kyoung Yim

To test the applicability of item response theory (IRT) to the Korean Nurses' Licensing Examination (KNLE), item analysis was performed after testing the unidimensionality and goodness-of-fit. The results were compared with those based on classical test theory. The results of the 330-item KNLE administered to 12,024 examinees in January 2004 were analyzed. Unidimensionality was tested using DETECT and the goodness-of-fit was tested using WINSTEPS for the Rasch model and Bilog-MG for the two-parameter logistic model. Item analysis and ability estimation were done using WINSTEPS. Using DETECT, Dmax ranged from 0.1 to 0.23 for each subject. The mean square value of the infit and outfit values of all items using WINSTEPS ranged from 0.1 to 1.5, except for one item in pediatric nursing, which scored 1.53. Of the 330 items, 218 (42.7%) were misfit using the two-parameter logistic model of Bilog-MG. The correlation coefficients between the difficulty parameter using the Rasch model and the difficulty index from classical test theory ranged from 0.9039 to 0.9699. The correlation between the ability parameter using the Rasch model and the total score from classical test theory ranged from 0.9776 to 0.9984. Therefore, the results of the KNLE fit unidimensionality and goodness-of-fit for the Rasch model. The KNLE should be a good sample for analysis according to the IRT Rasch model, so further research using IRT is possible.


2019 ◽  
Vol 23 (1) ◽  
pp. 124-153 ◽  
Author(s):  
Daniel R. Smith ◽  
Michael E. Hoffman ◽  
James M. LeBreton

This article provides a review of the approach that James used when conducting item analyses on his conditional reasoning test items. That approach was anchored in classical test theory. Our article extends this work in two important ways. First, we offer a set of test development protocols that are tailored to the unique nature of conditional reasoning tests. Second, we further extend James’s approach by integrating his early test validation protocols (based on classical test theory) with more recent protocols (based on item response theory). We then apply our integrated item analytic framework to data collected on James’s first test, the conditional reasoning test for relative motive strength. We illustrate how this integrated approach furnishes additional diagnostic information that may allow researchers to make more informed and targeted revisions to an initial set of items.


Author(s):  
Mehmet Barış Horzum ◽  
Gülden Kaya Uyanik

The aim of this study is to examine validity and reliability of Community of Inquiry Scale commonly used in online learning by the means of Item Response Theory. For this purpose, Community of Inquiry Scale version 14 is applied on 1,499 students of a distance education center’s online learning programs at a Turkish state university via internet. The collected data is analyzed by using a statistical software package. Research data is analyzed in three aspects, which are checking model assumptions, checking model-data fit and item analysis. Item and test features of the scale are examined by the means of Graded Response Theory. In order to use this model of IRT, after testing the assumptions out of the data gathered from 1,499 participants, data model compliance was examined. Following the affirmative results gathered from the examinations, all data is analyzed by using GRM. As a result of the study, the Community of Inquiry Scale adapted to Turkish by Horzum (in press) is found to be reliable and valid by the means of Classical Test Theory and Item Response Theory.


2016 ◽  
Vol 33 (1) ◽  
pp. 74 ◽  
Author(s):  
Alejandro Veas ◽  
Juan Luis Castejón ◽  
Raquel Gilar ◽  
Pablo Miñano

<p>The School Attitude Assessment Survey-Revised (SAAS-R) was developed by McCoach &amp; Siegle (2003b) and validated in Spain by Author (2014) using Classical Test Theory. The objective of the current research is to validate SAAS-R using multidimensional Rasch analysis. Data were collected from 1398 students attending different high schools. Principal Component Analysis supported the multidimensional SAAS-R. The item difficulty and person ability were calibrated along the same latent trait scale. 10 items were removed from the scale due to misfit with the Rasch model. Differential Item Functioning revealed no significant differences across gender for the remaining 25 items. The 7-category rating scale structure did not function well, and the subscale goal valuation obtained low reliability values. The multidimensional Rasch model supported 25 item-scale SAAS-R measures from five latent factors. Therefore, the advantages of multidimensional Rasch analysis are demonstrated in this study.</p>


Sign in / Sign up

Export Citation Format

Share Document