Item Response Theory in Cross-Cultural Measurement

Author(s):  
Thanh V. Tran ◽  
Tam Nguyen ◽  
Keith Chan

Item response theory (IRT) is a modern measurement theory that, as its name implies, focuses mainly at the item level as opposed to the test level. The underlying principle of IRT is that a relationship exists between an individual’s ability and how the individual responds to items on a test. IRT offers item-level details not provided through classical approaches. The aims of this chapter are to (1) provide a brief overview of IRT, (2) demonstrate the basic features of IRT using existing data, and (3) walk the reader through the key steps in conducting IRT analysis using IRTPRO®. IRT has also increasingly been used to develop, shorten, and refine psychosocial instruments.

2021 ◽  
Vol 19 (4) ◽  
pp. 147470492110441
Author(s):  
Patrick J. Nebl ◽  
Mark G. McCoy ◽  
Garett C. Foster ◽  
Michael J. Zickar

The mate retention inventory (MRI) has been a valuable tool in the field of evolutionary psychology for the past 30 years. The goal of the current research is to subject the MRI to rigorous psychometric analysis using item response theory to answer three broad questions. Do the individual items of the MRI fit the scale well? Does the overall function of the MRI match what is predicted? Finally, do men and women respond similarly to the MRI? Using a graded response model, it was found that all but two of the items fit acceptable model patterns. Test information function analysis found that the scale acceptably captures individual differences for participants with a high degree of mate retention but the scale is lacking in capturing information from participants with a low degree of mate retention. Finally, discriminate item function analysis reveals that the MRI is better at assessing male than female participants, indicating that the scale may not be the best indicator of female behavior in a relationship. Overall, we conclude that the MRI is a good scale, especially for assessing male behavior, but it could be improved for assessing female behavior and individuals lower on overall mate retention behavior. It is suggested that this paper be used as a framework for how the newest psychometrics techniques can be applied in order to create more robust and valid measures in the field of evolutionary psychology.


Politics ◽  
2019 ◽  
Vol 40 (1) ◽  
pp. 3-21 ◽  
Author(s):  
Steven M Van Hauwaert ◽  
Christian H Schimpf ◽  
Flavio Azevedo

Recent research in the populism literature has devoted considerable efforts to the conceptualisation and examination of populism on the individual level, that is, populist attitudes. Despite rapid progress in the field, questions of adequate measurement and empirical evaluation of measures of populist attitudes remain scarce. Seeking to remedy these shortcomings, we apply a cross-national measurement model, using item response theory, to six established and two new populist indicators. Drawing on a cross-national survey (nine European countries, n = 18,368), we engage in a four-folded analysis. First, we examine the commonly used 6-item populism scale. Second, we expand the measurement with two novel items. Third, we use the improved 8-item populism scale to further refine equally comprehensive but more concise and parsimonious populist measurements. Finally, we externally validate these sub-scales and find that some of the proposed sub-scales outperform the initial 6- and 8-item scales. We conclude that existing measures of populism capture moderate populist attitudes, but face difficulties measuring more extreme levels, while the individual information of some of the populist items remains limited. Altogether, this provides several interesting routes for future research, both within and between countries.


2020 ◽  
Vol 35 (7) ◽  
pp. 1094-1108
Author(s):  
Morgan E Nitta ◽  
Brooke E Magnus ◽  
Paul S Marshall ◽  
James B Hoelzle

Abstract There are many challenges associated with assessment and diagnosis of ADHD in adulthood. Utilizing the graded response model (GRM) from item response theory (IRT), a comprehensive item-level analysis of adult ADHD rating scales in a clinical population was conducted with Barkley's Adult ADHD Rating Scale-IV, Self-Report of Current Symptoms (CSS), a self-report diagnostic checklist and a similar self-report measure quantifying retrospective report of childhood symptoms, Barkley's Adult ADHD Rating Scale-IV, Self-Report of Childhood Symptoms (BAARS-C). Differences in item functioning were also considered after identifying and excluding individuals with suspect effort. Items associated with symptoms of inattention (IA) and hyperactivity/impulsivity (H/I) are endorsed differently across the lifespan, and these data suggest that they vary in their relationship to the theoretical constructs of IA and H/I. Screening for sufficient effort did not meaningfully change item level functioning. The application IRT to direct item-to-symptom measures allows for a unique psychometric assessment of how the current DSM-5 symptoms represent latent traits of IA and H/I. Meeting a symptom threshold of five or more symptoms may be misleading. Closer attention given to specific symptoms in the context of the clinical interview and reported difficulties across domains may lead to more informed diagnosis.


2020 ◽  
Author(s):  
E. Damiano D'Urso ◽  
Kim De Roover ◽  
Jeroen K. Vermunt ◽  
Jesper Tijmstra

In social sciences, the study of group differences concerning latent constructs is ubiquitous. These constructs are generally measured by means of scales composed of ordinal items. In order to compare these constructs across groups, one crucial requirement is that they are measured equivalently or, in technical jargon, that measurement invariance holds across the groups. This study compared the performance of multiple group categorical confirmatory factor analysis (MG-CCFA) and multiple group item response theory (MG-IRT) in testing measurement invariance with ordinal data. A simulation study was conducted to compare the true positive rate (TPR) and false positive rate (FPR) both at the scale and at the item level for these two approaches under an invariance and a non-invariance scenario. The results of the simulation studies showed that the performance, in terms of the TPR, of MG-CCFA- and MG-IRT-based approaches mostly depends on the scale length. In fact, for long scales, the likelihood ratio test (LRT) approach, for MG-IRT, outperformed the other approaches, while, for short scales, MG-CCFA seemed to be generally preferable. In addition, the performance of MG-CCFA's fit measures, such as RMSEA and CFI, seemed to depend largely on the length of the scale, especially when MI was tested at the item level. General caution is recommended when using these measures, especially when MI is tested for each item individually. A decision flowchart, based on the results of the simulation studies, is provided to help summarizing the results and providing indications on which approach performed best and in which setting.


2020 ◽  
Vol 18 (2) ◽  
pp. 2-43
Author(s):  
William R. Dardick ◽  
Brandi A. Weiss

New variants of entropy as measures of item-fit in item response theory are investigated. Monte Carlo simulation(s) examine aberrant conditions of item-level misfit to evaluate relative (compare EMRj, X2, G2, S-X2, and PV-Q1) and absolute (Type I error and empirical power) performance. EMRj has utility in discovering misfit.


2021 ◽  
Vol 5 (1) ◽  
pp. 63
Author(s):  
Medianta Tarigan ◽  
Fadillah Fadillah

Intelligence as one of the individual abilities that is widely used in everyday life has been extensively studied and measured using psychological measurement tools. One of them is the Intelligenz Structure Test (IST). However, at this time IST has leakage through discussions made by many parties. Moreover, the process of IST adaptation to the Indonesian version which tends to translate each word allegedly results in a bias of meaning that can affect the validity of this measurement tools. Therefore, this study is aimed to evaluating the current quality of IST by testing the feasibility of the Indonesian version of IST items for verbal ability, namely SE (Satzergaenzung), WA (Wortauswahl), and AN (Analogien). Item Response Theory (IRT) is used as a research method. The data were collected from 2.064 participants who live in Bandung. The results of the analysis revealed that the SE, WA, and AN subtest are still valid. Based on 60 items analyzed, 71.67% of the items have good quality, i.e. 43 of the 60 items have estimation of discriminant (a) parameter is acceptable. In addition, based on the fit item statistics it was also known that 78.33% of significant items followed the IRT model. Furthermore, based on statistics of item fit, it is also known that 78.33% of items fit the IRT model. This shows that the Indonesian version of IST is still valid to be used particularly in measuring verbal comprehension (V) through 3 subtests (SE, WA, and AN). However, it is necessary to revise the items that have been infected with DIF, in which 25% of items were declared to have a gender bias. Inteligensi sebagai salah satu kemampuan individu yang banyak berperan dalam kehidupan sehari-hari telah banyak diteliti dan diukur menggunakan alat ukur psikologi. Salah satunya adalah Intelligenz Struktur Test (IST). Namun, saat ini IST telah mengalami kebocoran melalui pembahasan yang dibuat oleh banyak pihak. Selain itu, proses adaptasi IST ke bahasa Indonesia yang cenderung menerjemahkan setiap kata secara langsung diduga mengakibatkan terjadinya bias makna yang dapat mempengaruhi keabsahan alat ukur ini. Oleh karena itu, penelitian ini ditujukan untuk mengevaluasi kualitas terkini IST dengan menguji kelayakan butir soal IST Bahasa Indonesia untuk kemampuan verbal, yaitu SE (Satzergaenzung), WA (Wortauswahl), dan AN (Analogien). Item Response Theory (IRT) digunakan sebagai metode penelitian ini. Data penelitian ini diperoleh dari 2.064 partisipan yang berdomisili di kota Bandung. Adapun penelitian ini menunjukkan hasil bahwa subtes SE, WA, dan AN masih tergolong valid. Berdasarkan 60 item yang dianalisis, 71,67% item memiliki kualitas yang cukup baik, yaitu 43 dari 60 item memiliki estimasi daya beda yang dapat diterima. Selain itu, berdasarkan statistik item fit juga diketahui 78,33% item signifikan mengikuti model IRT. Hal ini menunjukkan bahwa IST Bahasa Indonesia masih valid untuk digunakan terutama dalam mengukur verbal comprehension (V) melalui 3 subtes (SE, WA, dan AN). Namun, perlu dilakukan revisi terhadap item soal yang terjangkit DIF, di mana 25% butir soal dinyatakan mempunyai bias jenis kelamin.


Sign in / Sign up

Export Citation Format

Share Document