scholarly journals Methods for Estimating Item-Score Reliability

2018 ◽  
Vol 42 (7) ◽  
pp. 553-570 ◽  
Author(s):  
Eva A. O. Zijlmans ◽  
L. Andries van der Ark ◽  
Jesper Tijmstra ◽  
Klaas Sijtsma

Reliability is usually estimated for a test score, but it can also be estimated for item scores. Item-score reliability can be useful to assess the item’s contribution to the test score’s reliability, for identifying unreliable scores in aberrant item-score patterns in person-fit analysis, and for selecting the most reliable item from a test to use as a single-item measure. Four methods were discussed for estimating item-score reliability: the Molenaar–Sijtsma method (method MS), Guttman’s method [Formula: see text], the latent class reliability coefficient (method LCRC), and the correction for attenuation (method CA). A simulation study was used to compare the methods with respect to median bias, variability (interquartile range [IQR]), and percentage of outliers. The simulation study consisted of six conditions: standard, polytomous items, unequal [Formula: see text] parameters, two-dimensional data, long test, and small sample size. Methods MS and CA were the most accurate. Method LCRC showed almost unbiased results, but large variability. Method [Formula: see text] consistently underestimated item-score reliabilty, but showed a smaller IQR than the other methods.

Author(s):  
Pavel Mozgunov ◽  
Rochelle Knight ◽  
Helen Barnett ◽  
Thomas Jaki

There is growing interest in Phase I dose-finding studies studying several doses of more than one agent simultaneously. A number of combination dose-finding designs were recently proposed to guide escalation/de-escalation decisions during the trials. The majority of these proposals are model-based: a parametric combination-toxicity relationship is fitted as data accumulates. Various parameter shapes were considered but the unifying theme for many of these is that typically between 4 and 6 parameters are to be estimated. While more parameters allow for more flexible modelling of the combination-toxicity relationship, this is a challenging estimation problem given the typically small sample size in Phase I trials of between 20 and 60 patients. These concerns gave raise to an ongoing debate whether including more parameters into combination-toxicity model leads to more accurate combination selection. In this work, we extensively study two variants of a 4-parameter logistic model with reduced number of parameters to investigate the effect of modelling assumptions. A framework to calibrate the prior distributions for a given parametric model is proposed to allow for fair comparisons. Via a comprehensive simulation study, we have found that the inclusion of the interaction parameter between two compounds does not provide any benefit in terms of the accuracy of selection, on average, but is found to result in fewer patients allocated to the target combination during the trial.


2012 ◽  
Author(s):  
Nor Haniza Sarmin ◽  
Md Hanafiah Md Zin ◽  
Rasidah Hussin

Suatu transformasi terhadap min dilakukan menggunakan penganggar pembetulan kepincangan bagi mendapatkan statistik untuk menguji min hipotesis taburan terpencong. Penghasilan statistik ini melibatkan pengubahsuaian pemboleh ubah . Kajian simulasi yang dijalankan terhadap taburan yang terpencong iaitu taburan eksponen, kuasa dua khi dan Weibull ke atas Kebarangkalian Ralat Jenis I menunjukkan bahawa statistik t3 sesuai untuk ujian satu hujung sebelah kiri dan saiz sampel yang kecil (n=5). Kata kunci: Min; statistik; taburan terpencong; penganggar pembetulan kepincangan; kebarangkalian Ralat Jenis I A transformation of mean has been done using a bias correction estimator to produce a statistic for mean hypothesis of skewed distributions. The statistic found involves a modification of the variable . A simulation study that has been done on some skewed distributions i.e. esponential, chi-square and Weibull on the Type I Error shows that t3 is suitable for the left-tailed test and a small sample size (n=5). Key words: Mean; statistic; skewed distribution; bias correction estimator; Type I Error


2012 ◽  
Vol 9 (1) ◽  
pp. 13-18 ◽  
Author(s):  
AK Kumar Hemanth ◽  
V Sudha ◽  
G Ramachandran

Introduction: Treatment of tuberculosis (TB) requires a combination of drugs. Isoniazid (INH) and pyrazinamide (PZA) are key components of the fi rst-line regimen used in the treatment of TB and monitoring these drug levels in plasma would help in better patient care. The objective of the study is to develop and validate a simple and rapid high performance liquid chromatographic method for simultaneous determination of INH and PZA in human plasma. Methodology: The method involved deproteinisation of plasma with para hydroxy benzaldehyde and trifl uoroacetic acid and analysis using a reversed-phase C8 column and UV detection at 267nm. The fl ow rate was set at 1.5 ml/min at ambient temperature. The accuracy, linearity, precision, specifi city, stability and recovery of the method were evaluated. The method was applied to estimate plasma INH and PZA collected from six children with TB. Results: Well resolved peaks of PZA and INH at retention times of 3.2 and 6.1 minutes respectively were obtained. The assay was linear from 0.25 - 10.0 ìg/ml for INH and 1.25 – 50.0 ìg/ml for PZA. The within-day and between-day relative standard deviation for standards were below 10%. The average recoveries of INH and PZA from plasma were 104 and 102% respectively. Conclusions: A rapid and accurate method for simultaneous determination of INH and PZA in plasma was validated. The assay spans the concentration range of clinical interest. The easy sample preparation and small sample size makes this assay highly suitable for pharmacokinetic studies of INH and PZA in TB patients. SAARC Journal of Tuberculosis, Lung Diseases & HIV/AIDS 2012; IX (1) 13-18 DOI: http://dx.doi.org/10.3126/saarctb.v9i1.6960


2020 ◽  
Vol 29 (11) ◽  
pp. 3166-3178 ◽  
Author(s):  
Ben Van Calster ◽  
Maarten van Smeden ◽  
Bavo De Cock ◽  
Ewout W Steyerberg

When developing risk prediction models on datasets with limited sample size, shrinkage methods are recommended. Earlier studies showed that shrinkage results in better predictive performance on average. This simulation study aimed to investigate the variability of regression shrinkage on predictive performance for a binary outcome. We compared standard maximum likelihood with the following shrinkage methods: uniform shrinkage (likelihood-based and bootstrap-based), penalized maximum likelihood (ridge) methods, LASSO logistic regression, adaptive LASSO, and Firth’s correction. In the simulation study, we varied the number of predictors and their strength, the correlation between predictors, the event rate of the outcome, and the events per variable. In terms of results, we focused on the calibration slope. The slope indicates whether risk predictions are too extreme (slope < 1) or not extreme enough (slope > 1). The results can be summarized into three main findings. First, shrinkage improved calibration slopes on average. Second, the between-sample variability of calibration slopes was often increased relative to maximum likelihood. In contrast to other shrinkage approaches, Firth’s correction had a small shrinkage effect but showed low variability. Third, the correlation between the estimated shrinkage and the optimal shrinkage to remove overfitting was typically negative, with Firth’s correction as the exception. We conclude that, despite improved performance on average, shrinkage often worked poorly in individual datasets, in particular when it was most needed. The results imply that shrinkage methods do not solve problems associated with small sample size or low number of events per variable.


2007 ◽  
Vol 28 (8) ◽  
pp. 921-926 ◽  
Author(s):  
Robin S. Hanna ◽  
Steven L. Haddad ◽  
Martin L. Lazarus

Background: Osteolysis after total ankle arthroplasty (TAA) has become a major concern regarding long-term implant survival. The primary goal of this study was to determine whether CT was more sensitive than plain films in detecting the presence and extent of periprosthetic lucency. A secondary goal was to determine whether lack of syndesmotic fusion was associated with more extensive lucency. Methods: Seventeen patients (19 ankles) who had TAA between 2001 and 2003 were consecutively recruited and evaluated as part of a prospective study. Plain radiographs and helical CT with metal-artifact minimization were obtained. Evidence of lucent lesions and syndesmotic fusion was compared using the different imaging techniques. Results: Of the 19 ankles imaged, a total of 29 lesions were detected by CT, whereas plain radiographs detected 18 lesions. CT detected 21 lesions less than 200mm 2 , of which plain radiographs detected only 11. The mean size of the lesions detected on CT was over three times larger than the size on plain radiographs. With the small sample size used, there were no statistically significant differences between ankles with and without fusion of the syndesmosis and the extent ( p = 0.84) and location ( p = 0.377) of lucency. Conclusion: CT is a more accurate method for early detection and quantification of periprosthetic lucency than plain radiographs. Accurate evaluation of lucent lesions may identify patients at high risk for lack of syndesmotic fusion with subsequent loosening and implant failure.


2017 ◽  
Vol 35 (5_suppl) ◽  
pp. 157-157
Author(s):  
Grace Mausisa ◽  
Judy Mastick ◽  
Melissa Mazor ◽  
Steven M. Paul ◽  
Bruce A. Cooper ◽  
...  

157 Background: CIN is the most prevalent neurologic complication of cancer treatment. Inter-individual variability exists in survivors’ reports of the factors that aggravate CIN in their hands. The purpose of this study was to identify groups of survivors with CIN in their hands based on distinct aggravating factors and evaluate for differences in demographic, clinical, symptom characteristics and quality of life (QOL) based on group membership. Methods: Cancer survivors (n = 307) who received a platinum and/or a taxane and rated their altered sensation/pain in their hands at > 3 on 0-10 scale were enrolled and completed study questionnaires, including a list of 22 factors that could make pain worse. Medical records were reviewed and sensory and motor tests were done. Latent class analysis was used to identify groups of survivors based on the occurrence rates for aggravating factors. Differences among the groups were evaluated using parametric and nonparametric statistics. Results: Three groups were identified based on occurrence rates for aggravating factors in the hands (i.e., Activity and Temperature (41.0%), Activity (8.7%), Few Factors (52.2%)). No differences were found among the groups in demographic characteristics, or sensory (light touch, temperature, pain, vibration) and motor (grip strength, pegboard) tests. Compared to the Few Factors group, the Activity and Temperature group had more comorbidities, poorer sleep, greater fatigue, and more anxious and depressive symptoms. Survivors who received a platinum compound were more likely to be in the Activity and Temperature group. Those who received a taxane compound were more likely to be in the Few Factors or Activity groups. Conclusions: Survivors who reported a higher occurrence of aggravating factors had a higher symptom burden and poorer QOL. Most differences were found between the Few Factors group and the Activity and Temperature group which may be due to the small sample size of the Activity group. Objective measures did not differ among the groups. Findings suggest that subgroups of survivors can be identified based on their reports of CIN aggravating factors.


2020 ◽  
pp. 001316442095806
Author(s):  
Shiyang Su ◽  
Chun Wang ◽  
David J. Weiss

[Formula: see text] is a popular item fit index that is available in commercial software packages such as flexMIRT. However, no research has systematically examined the performance of [Formula: see text] for detecting item misfit within the context of the multidimensional graded response model (MGRM). The primary goal of this study was to evaluate the performance of [Formula: see text] under two practical misfit scenarios: first, all items are misfitting due to model misspecification, and second, a small subset of items violate the underlying assumptions of the MGRM. Simulation studies showed that caution should be exercised when reporting item fit results of polytomous items using [Formula: see text] within the context of the MGRM, because of its inflated false positive rates (FPRs), especially with a small sample size and a long test. [Formula: see text] performed well when detecting overall model misfit as well as item misfit for a small subset of items when the ordinality assumption was violated. However, under a number of conditions of model misspecification or items violating the homogeneous discrimination assumption, even though true positive rates (TPRs) of [Formula: see text] were high when a small sample size was coupled with a long test, the inflated FPRs were generally directly related to increasing TPRs. There was also a suggestion that performance of [Formula: see text] was affected by the magnitude of misfit within an item. There was no evidence that FPRs for fitting items were exacerbated by the presence of a small percentage of misfitting items among them.


2020 ◽  
Vol 40 (2) ◽  
pp. 183-197
Author(s):  
Nicholas Mitsakakis ◽  
Karen E. Bremner ◽  
George Tomlinson ◽  
Murray Krahn

Background. Quality-of-life research and cost-effectiveness analyses frequently require data on health utility, a global measure of health-related quality of life. When utilities are unavailable, researchers have “mapped” descriptive instruments to utility instruments, using samples of responses to both instruments. Health utilities have an idiosyncratic distribution, with upper bound and probability mass at 1, left skewness, and kurtosis. Estimation of mean utility values conditional on covariates is of interest, particularly in health utility mapping applications. Traditional linear regression may be unsuitable because fundamental assumptions are violated. Complex statistical methods come with deficiencies that may outweigh their benefits. Aim. To investigate the benefits of transforming the health utility response variable before fitting a linear regression model. Methods. We compared log, logit, arcsin, and Box-Cox transformations with an untransformed model, using several measures of model accuracy. We made our evaluation by designing and conducting a simulation study and reanalyzing data from 2 published studies, which “mapped” a psychometric descriptive instrument to a utility instrument. Results. In the simulation study, log transformation with smearing estimator had in most cases the lowest bias but one of the highest variances, especially for estimating low utility values under small sample size. The untransformed model was outperformed by the transformed models. Findings were inconclusive for the analysis of real data, where arcsin gave the lowest error for one of the data sets, while the untransformed model had the best performance for the other. Conclusions. We identified the benefits of transformations and offered suggestions for future modeling of health utilities. However, the benefits were moderate and no single transformation appeared to be universally optimal, suggesting that selection requires examination on a case-by-case basis.


1974 ◽  
Vol 57 (1) ◽  
pp. 130-133 ◽  
Author(s):  
Henry P Fleming ◽  
Roger L Thompson ◽  
JOHN L Etchells

Abstract A simple, accurate method for determining carbon dioxide in fermenting cucumber brines is described. The method involves distillation of carbon dioxide from the acidified brine into standardized sodium hydroxide inside a closed jar. The sample is injected by a syringe and needle through a rubber serum stopper placed in the jar cap, into an acid solution. A small vial of sodium hydroxide placed inside the jar traps the carbon dioxide as it distills from the acidified solution. After being held in the jar 24 hr at 37°C, the vial is removed; the remaining base is titrated to the phenolphthalein end point with standardized hydrochloric acid. Advantages of the method include a limited working time, minimized loss of carbon dioxide during analysis, and a relatively small sample size.


Sign in / Sign up

Export Citation Format

Share Document