Evaluating Instrument Quality: Rasch Model – Analyses of Post Test of Curriculum 2013 Training

The main purpose of this study was to evaluate the quality of post test utilized by LPMP Central Kalimantan Indonesia in curriculum 2013 training for X grade teachers. It uses Rasch analysis to explore the item fit, the reliability ( item and person), item difficulty, and the Wrigh map of post test. This study also applies Classical Test Teory (CTT) to determine item discrimination and distracters. Following a series of iterative Rasch analyses that adopted the “data should fit the model” approach, 30 items post test of curriculum 2013 training was analyzed using Acer Conquest 4 software, software based on Rasch measurement model. All items of post test of curriculum 2013 training are sufficient fit to the Rasch model. The difficulty levels (i.e. item measures) for the 30 items range from –1.746 logits to +1.861 logits. The item separation reliability is acceptable at 0.990 and person separation reliability is low at 0.485. The wright map indicates that the test is difficult for the teachers or the teachers have low ability in knowledge of curriculum 2013. The post test items cannot cover all the ranges of the teachers’ ability levels. Items discrimination of post test of curriculum 2013 training grouped into fair discrimination (item 2, 4, 5, 8, 11, 18) and poor discrimination (1, 3, 6, 7, 9, 10,12, 13, 14, 15, 16, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30). Some distracters from item 1, 2, 6, 7, 8, 9, 11, 13, 16, 17, 18, 19, 20, 22, 24, 25, 27, 28, 29 and 30 are problematic. These distracters require further investigation or revision. Key words: Rasch analysis, training, curriculum 2013, post test

Download Full-text

Analysis of Rasch Model for the Validation of Chemistry National Exam Instruments

Jurnal Pendidikan Sains Indonesia ◽

10.24815/jpsi.v9i3.19618 ◽

2021 ◽

Vol 9 (3) ◽

pp. 329-345

Author(s):

Ayi Darmana ◽

Ani Sutiani ◽

Haqqi Annazili Nasution ◽

Ismanisa Ismanisa* ◽

Nurhaswinda Nurhaswinda

Keyword(s):

Rasch Model ◽

Item Difficulty ◽

State University ◽

Test Items ◽

Item Fit ◽

Analysis Technique ◽

New Students ◽

Level Of Difficulty ◽

The Rasch Model

Information about score obtained from a test is often interpreted as an indicator of the student's ability level. This is one of the weaknesses of classical analysis that are unable to provide meaningful and fair information. The acquisition of the same score if it comes from a test item with a different level of difficulty, must show different abilities. Analysis of the Rasch model will overcome this weakness. The purpose of this study was to analyze the quality of the items by validating the national chemistry exam instrument using the Rasch model. The research sample was 212 new students of the Department of Chemistry at the State University of Medan. The data collected was in the form of respondent's answer data to the 2013 chemistry UN questions, which amounted to 40 items multiple choice and uses the documentation method. Data analysis technique used the Rasch Model with Ministep software. The results of the analysis show the quality of the Chemistry National Exam (UN) questions is categorized as very good based on the following aspects: unidimension, item fit test, person map item, difficulty test level, person and item reliability. There is one item found to be gender bias, in which men benefit more than women. The average chemistry ability of respondents is above the average level of difficulty of the test items

Download Full-text

Validation of the Spanish adaptation of the School Atitude Assessment Survey-Revised using multidimensional Rasch analysis

Anales de Psicología ◽

10.6018/analesps.33.1.235271 ◽

2016 ◽

Vol 33 (1) ◽

pp. 74 ◽

Cited By ~ 1

Author(s):

Alejandro Veas ◽

Juan Luis Castejón ◽

Raquel Gilar ◽

Pablo Miñano

Keyword(s):

Rasch Model ◽

Rasch Analysis ◽

Rating Scale ◽

Item Difficulty ◽

Classical Test Theory ◽

Latent Trait ◽

Principal Component ◽

Analysis Data ◽

Test Theory ◽

Assessment Survey

The School Attitude Assessment Survey-Revised (SAAS-R) was developed by McCoach & Siegle (2003b) and validated in Spain by Author (2014) using Classical Test Theory. The objective of the current research is to validate SAAS-R using multidimensional Rasch analysis. Data were collected from 1398 students attending different high schools. Principal Component Analysis supported the multidimensional SAAS-R. The item difficulty and person ability were calibrated along the same latent trait scale. 10 items were removed from the scale due to misfit with the Rasch model. Differential Item Functioning revealed no significant differences across gender for the remaining 25 items. The 7-category rating scale structure did not function well, and the subscale goal valuation obtained low reliability values. The multidimensional Rasch model supported 25 item-scale SAAS-R measures from five latent factors. Therefore, the advantages of multidimensional Rasch analysis are demonstrated in this study.

Download Full-text

A Systematic Literature Review on the Application of Rasch Analysis in Musculoskeletal Disease — A Special Interest Group Report of OMERACT 11

The Journal of Rheumatology ◽

10.3899/jrheum.130814 ◽

2013 ◽

Vol 41 (1) ◽

pp. 159-164 ◽

Cited By ~ 31

Author(s):

Ying-Ying Leung ◽

May-Ee Png ◽

Philip Conaghan ◽

Alan Tennant

Keyword(s):

Literature Review ◽

Outcome Measures ◽

Interest Group ◽

Systematic Literature Review ◽

Rasch Model ◽

Special Interest ◽

Rasch Analysis ◽

Quality Of Reporting ◽

Over Time

Objective.The Rasch measurement model provides robust analysis of the internal construct validity of outcome measures. We reviewed the application of Rasch analysis in musculoskeletal medicine as part of the work leading to discussion in a Special Interest Group in Rasch Analysis at Outcome Measures in Rheumatology 11.Methods.A systematic literature review of SCOPUS and MEDLINE was performed (January 1, 1985, to February 29, 2012. Original research reports in English using “Rasch” or “Item Response Theory” in musculoskeletal diseases were assessed by 2 independent reviewers. The topics of focus and analysis methodology details were recorded.Results.Of 212 articles reviewed, 114 were included. The number of publications rose from 1 in 1991–1992 to 23 in 2011–February 2012. Disease areas included rheumatoid arthritis (28%), osteoarthritis (16.6%), and general musculoskeletal disorders (43%). Sixty-six reports (57.9%) evaluated psychometric properties of existing scales and 35 (30.7%) involved development of new scales. Nine articles (7.9%) were on methodology illustration. Four articles were on item banking and computer adaptive testing. A majority of the articles reported fit statistics, while the basic Rasch model assumption (i.e., unidimensionality) was examined in only 57.2% of the articles. An improvement in reporting qualities with Rasch articles was noted over time. In addition, only 11.4% of the articles provided a transformation table for interval scale measurement in clinical practice.Conclusion.The Rasch model has been increasingly used in rheumatology over the last 2 decades in a wide range of applications. The majority of the articles demonstrated reasonable quality of reporting. Improvements in quality of reporting over time were revealed.

Download Full-text

Aplikasi Rasch Model dalam Mengevaluasi Intelligenz Structure Test (IST)

Psikohumaniora Jurnal Penelitian Psikologi ◽

10.21580/pjpp.v3i1.2052 ◽

2018 ◽

Vol 3 (1) ◽

pp. 73

Author(s):

Yulinda Erma Suryani

Keyword(s):

Rasch Model ◽

Odds Ratio ◽

Linear Measure ◽

Test Items ◽

The Social ◽

Log Odds ◽

Proper Estimation ◽

The Rasch Model ◽

Made In

Abstract: The concept of objective measurement in the social sciences and educational assessment must have five criteria: 1) Gives a linear measure with the same interval; 2) Conduct a proper estimation process; 3) Finding unfeasible items (misfits) or outliers; 4) Overcoming the lost data; 5) Generate replicable measurements (independent of the parameters studied). These five conditions of measurement, so far only Rasch model that can fulfill it. The quality of intelligence measurements made with the Rasch model will have the same quality as the measurements made in the physical dimension in the field of physics. The logit scale (log odds unit) generated in the Rasch model is the scale of the same interval and is linear from the data ratio (odds ratio). Based on the results of the analysis that has been done on the IST test instrument can be seen that in general the quality of IST test included in either category. Of the 176 IST test items there is only 1 item that is not good, ie aitem 155 (WU19) so that aitem 155 should be discarded. Based on the DIF analysis it can be seen that there are 28 items in favor of one gender only, so the twenty-eight items should be revised.Abstrak: Konsep pengukuran objektif dalam ilmu sosial dan penilaian pendidikan harus memiliki lima kriteria: 1) Memberikan ukuran yang linier dengan interval yang sama; 2) Melakukan proses estimasi yang tepat; 3) Menemukan item yang tidak tepat (misfits) atau tidak umum (outlier); 4) Mengatasi data yang hilang; 5) Hasilkan pengukuran yang replicable (independen dari parameter yang diteliti). Kelima kondisi pengukuran ini, sejauh ini hanya model Rasch yang bisa memenuhinya. Kualitas pengukuran kecerdasan yang dibuat dengan model Rasch akan memiliki kualitas yang sama dengan pengukuran yang dibuat dalam dimensi fisik di bidang fisika. Skala logit (log odds unit) yang dihasilkan dalam Rasch model adalah skala interval yang sama dan linear dari rasio data (odds ratio). Berdasarkan hasil analisis yang telah dilakukan pada instrumen tes IST dapat diketahui bahwa secara umum kualitas tes IST termasuk dalam kategori baik. Dari 176 item tes IST hanya ada 1 item yang tidak bagus, yaitu aitem 155 (WU19) sehingga aitem 155 harus dibuang. Berdasarkan analisis DIF dapat dilihat bahwa ada 28 item yang mendukung satu jenis kelamin saja, sehingga dua puluh delapan item harus direvisi.

Download Full-text

Reliability and Validity Testing of the Quantified Quality of Interaction Scale (QuIS)

Journal of Nursing Measurement ◽

10.1891/jnm-d-19-00101 ◽

2021 ◽

pp. JNM-D-19-00101

Author(s):

Barbara Resnick ◽

Elizabeth Galik ◽

Anju Paudel ◽

Rachel McPherson ◽

Kimberly Van Haitsma ◽

...

Keyword(s):

Rasch Analysis ◽

Psychological Symptoms ◽

Reliability And Validity ◽

Validity Testing ◽

Item Fit ◽

Resistiveness To Care ◽

Behavioral And Psychological Symptoms ◽

Item Reliability ◽

Evidence Integration

Background and PurposeThe purpose of this study was to test the reliability and validity of the Quality of Interaction Survey (QuIS) using a quantification scoring approach.MethodsBaseline data from the Evidence Integration Triangle for Behavioral and Psychological Symptoms of Dementia (EIT-4-BPSD) study was used.ResultsA total of 553 residents participated. There was evidence of inter-rater reliability with Kappa scores of .86 to 1.00 and internal consistency based on the Rasch analysis (item reliability of .98). There was some support for validity based on item fit and hypothesis testing as resistiveness to care was significantly associated with total QuIS scores.ConclusionThis study supports the use of the quantified QuIS to evaluate the quality of interactions over time and to test interventions to improve interactions.

Download Full-text

Validating Translation Test Items via the Many-Facet Rasch Model

Psychological Reports ◽

10.1177/0033294118768664 ◽

2018 ◽

Vol 122 (2) ◽

pp. 748-772 ◽

Cited By ~ 1

Author(s):

Wen-Ta Tseng ◽

Tzi-Ying Su ◽

John-Michael L. Nix

Keyword(s):

Rasch Model ◽

Item Difficulty ◽

Educational Institutions ◽

English Sentence ◽

Entrance Exam ◽

Test Items ◽

Rater Severity ◽

Expert Novice ◽

The Many ◽

Language Context

This study applied the many-facet Rasch model to assess learners’ translation ability in an English as a foreign language context. Few attempts have been made in extant research to detect and calibrate rater severity in the domain of translation testing. To fill the research gap, this study documented the process of validating a test of Chinese-to-English sentence translation and modeled raters’ scoring propensity defined by harshness or leniency, expert/novice effects on severity, and concomitant effects on item difficulty. Two hundred twenty-five, third-year senior high school Taiwanese students and six educators from tertiary and secondary educational institutions served as participants. The students’ mean age was 17.80 years ( SD = 1.20, range 17–19). The exam consisted of 10 translation items adapted from two entrance exam tests. The results showed that this subjectively scored performance assessment exhibited robust unidimensionality, thus reliably measuring translation ability free from unmodeled disturbances. Furthermore, discrepancies in ratings between novice and expert raters were also identified and modeled by the many-facet Rasch model. The implications for applying the many-facet Rasch model in translation tests at the tertiary level were discussed.

Download Full-text

DEMQOL and DEMQOL-Proxy: a Rasch analysis among those diagnosed with dementia

Health and Quality of Life Outcomes ◽

10.1186/s12955-019-1216-8 ◽

2019 ◽

Vol 17 (1) ◽

Cited By ~ 1

Author(s):

A. A. Jolijn Hendriks ◽

Sarah C. Smith ◽

Nick Black

Keyword(s):

Rasch Model ◽

Rasch Analysis ◽

Qualitative Investigation ◽

Family Carers ◽

Local Independence ◽

Response Options ◽

Item Fit ◽

Item Functioning ◽

Diagnosis Of Dementia ◽

The Rasch Model

Abstract Background In previous work we concluded that DEMQOL and DEMQOL-Proxy can provide robust measurement of HRQL in dementia when scores are derived from analysis using the Rasch model. As the study sample included people with mild cognitive impairment, we undertook a replication study in the subsample with a diagnosis of dementia (PWD). PWD constitute the population for whom DEMQOL and DEMQOL-Proxy were originally developed. Methods We conducted a Rasch model analysis using the RUMM2030 software to re-evaluate DEMQOL (441 PWD) and DEMQOL-Proxy (342 family carers). We evaluated scale to sample targeting, ordering of item thresholds, item fit to the model, and differential item functioning (sex, age, severity, relationship), local independence, unidimensionality and reliability. Results For both DEMQOL and DEMQOL-Proxy, results were highly similar to the results in the original sample. We found the same problems with content and response options. Conclusions DEMQOL and DEMQOL-Proxy can provide robust measurement of HRQL in people with a diagnosis of dementia when scores are derived from analysis using the Rasch model. As in the wider sample, the problems identified with content and response options require qualitative investigation in order to improve the scoring of DEMQOL and DEMQOL-Proxy.

Download Full-text

Analysis on Achievement Test in Intensive English Program of IAIN Samarinda

FENOMENA ◽

10.21093/fj.v10i2.1320 ◽

2018 ◽

Vol 10 (2) ◽

pp. 117-134

Author(s):

Sari Agung Sucahyo ◽

Widya Noviana Noor

Keyword(s):

Item Difficulty ◽

Achievement Test ◽

Intensive English ◽

Test Quality ◽

Intensive English Program ◽

Good Test ◽

Test Items ◽

Item Quality ◽

English Program

As one of the tests, achievement test has to be qualified. A qualified test will be able to give the information about teaching correctly. If the achievement test is less qualified, the information related to students’ sucesss to achieve the instructional objective will also be less qualified. It means the test has to meet the characteristics of a good test. In fact, there has not been any effort yet to identify the quality of the achievement test which is used in Intensive English program. It means the information of the test quality cannot be found yet. Therefore, researchers are interested in analyzing the quality of achievement test for students in Intensive English program of IAIN Samarinda. Design of this research belongs to Content Analysis. Subject of this research is English achievement tests and 28 to 30 students were involved in the process of try out. Data were collected through three steps. Data were analyzed based on validity, reliability, and item quality. Finding of the research reveals 60 % of the tests have a good construct validity justified by related theories. It was found 55% of the tests have a good content validity. Reliability coefficient of the first tests format is 0, 65 and the second tests format shows 0, 52. Calculation of item difficulty shows 68% of the test items were between 0,20 – 0,80. The estimation of item discrimination shows 73% of the test items were between 0,20 – 0,50. While calculation of distracter efficiency shows 65% of the distracters were effective to distract the test takers.

Download Full-text

The quality of an English summative test of a public junior high school, Kupang-NTT

English Language Teaching Educational Journal ◽

10.12928/eltej.v3i2.2311 ◽

2020 ◽

Vol 3 (2) ◽

pp. 133

Author(s):

Thresia Trivict Semiun ◽

Fransiska Densiana Luruk

Keyword(s):

High School ◽

Public School ◽

Junior High School ◽

Content Validity ◽

Item Difficulty ◽

Item Analysis ◽

Item Discrimination ◽

Test Items ◽

Evaluative Research

This study aimed at examining the quality of an English summative test of grade VII in a public school located in Kupang. Particularly, this study examined content validity, reliability, and conducted item analysis including item validity, item difficulty, item discrimination, and distracter effectiveness. This study was descriptive evaluative research with documentation to collect data. The data was analyzed quantitatively except for content validity, which was done qualitatively. Content validity was analyzed by matching the test items with materials stated in the curriculum. The findings revealed that the English summative test had a high content validity. The reliability was estimated by applying the Kuder-Richardson’s formula (K-R20). The result showed that the test was reliable and very good for a classroom test. The item analysis was conducted by using ITEMAN 3.0. and it revealed that the the test was mostly constructed by easy items, most of the items could discriminate the students, most distracters were able to perform well, and the most of items were valid.

Download Full-text

How does Rasch modeling reveal difficulty and suitability level the fraction test question?

Jurnal Elemen ◽

10.29408/jel.v8i1.4170 ◽

2022 ◽

Vol 8 (1) ◽

pp. 66-76

Author(s):

Karlimah Karlimah

Keyword(s):

Rasch Model ◽

Arithmetic Operation ◽

Multiple Choice Questions ◽

Test Question ◽

Mean Values ◽

Test Items ◽

Fourth Grade Students ◽

Ability Levels ◽

Point Measure ◽

Level Of Difficulty

This article explains how to analyze test items in arithmetic operation with fractions to obtain the items' level of difficulty and fitness. Data were collected by using multiple-choice questions given to 50 fourth-grade students of an elementary school in Tasikmalaya city. The answers were then analyzed using the Rasch model and Winsteps 3.75 application, a combination of standard deviation (SD) and logit mean values (Mean). The score data of each person and question were used to estimate the pure score in the logit scale, indicating the level of difficulty of the test items. The categories were difficult (logit value +1 SD); very difficult (0.0 logit +1 SD); easy (0.0 logit -1 SD); very easy (logit value –SD). Three criteria were used to determine the level of difficulty and fitness of the questions: the Outfit Z-Standard/ZSTD value; Outfit Mean Square/MNSQ; and Point Measure Correlation. It resulted in a collection of test items suitable for use with several levels of difficulties, namely, difficult, very difficult, easy, and very easy, from the previous items, which had difficult, medium, and easy categories. Rasch model can help categorize questions and students' ability levels.

Download Full-text