scholarly journals The Two Errors of Using the Within-Subject Standard Deviation (WSD) as the Standard Error of a Reliable Change Index

2010 ◽  
Vol 25 (5) ◽  
pp. 451-456 ◽  
Author(s):  
G. H. Maassen
2000 ◽  
Vol 6 (3) ◽  
pp. 364-364 ◽  
Author(s):  
NANCY R. TEMKIN ◽  
ROBERT K. HEATON ◽  
IGOR GRANT ◽  
SUREYYA S. DIKMEN

Hinton-Bayre (2000) raises a point that may occur to many readers who are familiar with the Reliable Change Index (RCI). In our previous paper comparing four models for detecting significant change in neuropsychological performance (Temkin et al., 1999), we used a formula for calculating Sdiff, the measure of variability for the test–retest difference, that differs from the one Hinton-Bayre has seen employed in other studies of the RCI. In fact, there are two ways of calculating Sdiff—a direct method and an approximate method. As stated by Jacobson and Truax (1991, p. 14), the direct method is to compute “the standard error of the difference between the two test scores” or equivalently [begin square root](s12 + s22 − 2s1s2rxx′)[end square root] where si is the standard deviation at time i and rxx′ is the test–retest correlation or reliability coefficient. Jacobson and Truax also provide a formula for the approximation of Sdiff when one does not have access to retest data on the population of interest, but does have a test–retest reliability coefficient and an estimate of the cross-sectional standard deviation, i.e., the standard deviation at a single point in time. This approximation assumes that the standard deviations at Time 1 and Time 2 are equal, which may be close to true in many cases. Since we had the longitudinal data to directly calculate the standard error of the difference between scores at Time 1 and Time 2, we used the direct method. Which method is preferable? When the needed data are available, it is the one we used.


2021 ◽  
Author(s):  
Ron D. Hays ◽  
Mary E. Slaughter ◽  
Karen L. Spritzer ◽  
Patricia M. Herman

Abstract Background: Identifying how many individuals significantly improve (“responders”) provides important supplementary information beyond group mean change about the effects of treatment options. This supplemental information can enhance interpretation of clinical trials and observation studies. This study provides a comparison of five ways of estimating the significance of individual change.Methods: Secondary analyses of the Impact Stratification Score (ISS) for chronic low back pain which was administered at two timepoints in two samples: 1) three months apart in an observational study of 1,680 patients undergoing chiropractic care; and 2) 6 weeks apart in a randomized trial of 720 active-duty military personnel with low back pain. The ISS is the sum of the PROMIS-29 v2.1 physical function, pain interference and pain intensity scores and has a possible range of 8 (least impact) to 50 (greatest impact). The five methods of evaluating individual change compared were: 1) standard deviation index; 2) standard error of measurement (SEM); 3) standard error of estimate; 4) standard error of prediction; and 5) reliable change index.Results: Internal consistency reliability of the ISS at baseline was 0.90 in Sample 1 and 0.92 in Sample 2. Effect size of change on the ISS was -0.16 in Sample 1 and -0.59 in Sample 2. The denominators for the five methods in Sample 1 (Sample 2) were 7.6 (8.4) for the standard deviation index, 2.4 (2.4) for the SEM, 2.3 (2.3) for the standard error of estimation, and 3.3 (3.4) for the standard error of prediction and the reliable change index. The amount of change on the ISS needed for significant individual change in both samples was about 15-16 for the standard deviation index, 5 for the SEM and for the standard error of estimation, and 7 for the standard error of prediction and reliable change index. The percentage of people classified as responders ranged from 1% (standard deviation index in Sample 1) to 57% (SEM and standard error of estimate in Sample 2).Conclusions: The standard error of prediction and reliable change index estimates of significant change are consistent with retrospective ratings of change of at least moderately better in prior research. These two are less likely than other methods to classify people as responders who have not actually gotten better.


2021 ◽  
Author(s):  
Ron D. Hays ◽  
Mary E. Slaughter ◽  
Patricia M. Herman

Abstract Background: Identifying how many individuals significantly improve (“responders”) provides important supplementary information beyond group mean change about the effects of treatment options. This supplemental information can enhance interpretation of clinical trials and observation studies. This study provides a comparison of five ways of estimating the significance of individual change.Methods: Secondary analyses of the Impact Stratification Scale (ISS) selected for chronic low back pain was administered at two timepoints three months apart in an observational study of 1,680 patients undergoing chiropractic care. The (ISS is the sum of the PROMIS-29 v2,1 physical function, pain interference and pain intensity scores and has a possible range of 8 (least impact) to 50 (greatest impact). The five methods of evaluating individual change compared were: 1) standard deviation index; 2) confidence interval around the standard error of measurement (SEM); 3) standard error of estimate; 4) standard error of prediction; and 5) reliable change index.Results: Internal consistency reliability of the ISS at baseline was 0.90. Effect size of change on the ISS was -0.16 using the SD (7.6) at baseline. The denominators for the five methods were 7.6 for the standard deviation index, 2.4 for the confidence interval around the SEM, 2.3 for the standard error of estimation, and 3.3 for the standard error of prediction and the reliable change index. The amount of change on the ISS needed for significant individual change was 15 for the standard deviation index, 5 for the confidence interval around the SEM and for the standard error of estimation, and 7 for the standard error of prediction and reliable change index. The percentage of people classified as responders ranged from 1% (standard deviation index) to 22% (standard error of prediction and reliable change index).Conclusions: The standard error of prediction and reliable change index estimates of significant change are consistent with retrospective ratings of change of at least moderately better in prior research. These two are less likely than other methods to classify people as responders who have not actually gotten better.


2004 ◽  
Vol 10 (6) ◽  
pp. 899-901 ◽  
Author(s):  
NANCY R. TEMKIN

Different authors have used different estimates of variability in the denominator of the Reliable Change Index (RCI). Maassen attempts to clarify some of the differences and the assumptions underlying them. In particular he compares the ‘classical’ approach using an estimate SEd supposedly based on measurement error alone with an estimate SDiff based on the variability of observed differences in a population that should have no true change. Maassen concludes that not only is SEd based on classical theory, but it properly estimates variability due to measurement error and practice effect while SDiff overestimates variability by accounting twice for the variability due to practice. Simulations show Maassen to be wrong on both accounts. With an error rate nominally set to 10%, RCI estimates using SDiff wrongly declare change in 10.4% and 9.4% of simulated cases without true change while estimates using SEd wrongly declare change in 17.5% and 12.3% of the simulated cases (p < .000000001 and p < .008, respectively). In the simulation that separates measurement error and practice effects, SEd estimates the variability of change due to measurement error to be .34, when the true variability due to measurement error was .014. Neuropsychologists should not use SEd in the denominator of the RCI. (JINS, 2004, 10, 899–901.)


2004 ◽  
Vol 10 (6) ◽  
pp. 888-893 ◽  
Author(s):  
GERARD H. MAASSEN

Researchers and clinicians using Jacobson and Truax's index to assess the reliability of change in patients, or its counterpart by Chelune et al., which takes practice effects into account, are confused by the different ways of calculating the standard error encountered in the literature (see the discussion started in this journal by Hinton-Bayre). This article compares the characteristics of (1) the standard error used by Jacobson and Truax, (2) the standard error of difference scores used by Temkin et al. and (3) an adaptation of Jacobson and Truax's approach that accounts for difference between initial and final variance. It is theoretically demonstrated that the last variant is preferable, which is corroborated by real data. (JINS, 2004, 10, 888–893.)


Author(s):  
Carolin Szász-Janocha ◽  
Eva Vonderlin ◽  
Katajun Lindenberg

Zusammenfassung. Fragestellung: Das junge Störungsbild der Computerspiel- und Internetabhängigkeit hat in den vergangenen Jahren in der Forschung zunehmend an Aufmerksamkeit gewonnen. Durch die Aufnahme der „Gaming Disorder“ in die ICD-11 (International Statistical Classification of Diseases and Related Health Problems) wurde die Notwendigkeit von evidenzbasierten und wirksamen Interventionen avanciert. PROTECT+ ist ein kognitiv-verhaltenstherapeutisches Gruppentherapieprogramm für Jugendliche mit Symptomen der Computerspiel- und Internetabhängigkeit. Die vorliegende Studie zielt auf die Evaluation der mittelfristigen Effekte nach 4 Monaten ab. Methodik: N = 54 Patientinnen und Patienten im Alter von 9 bis 19 Jahren (M = 13.48; SD = 1.72) nahmen an der Frühinterventionsstudie zwischen April 2016 und Dezember 2017 in Heidelberg teil. Die Symptomschwere wurde zu Beginn, zum Abschluss der Gruppentherapie sowie nach 4 Monaten anhand von standardisierten Diagnostikinstrumenten erfasst. Ergebnisse: Mehrebenenanalysen zeigten eine signifikante Reduktion der Symptomschwere anhand der Computerspielabhängigkeitsskala (CSAS) nach 4 Monaten. Im Selbstbeurteilungsbogen zeigte sich ein kleiner Effekt (d = 0.35), im Elternurteil ein mittlerer Effekt (d = 0.77). Der Reliable Change Index, der anhand der Compulsive Internet Use Scale (CIUS) berechnet wurde, deutete auf eine starke Heterogenität im individuellen Symptomverlauf hin. Die Patientinnen und Patienten bewerteten das Programm zu beiden Follow-Up-Messzeitpunkten mit einer hohen Zufriedenheit. Schlussfolgerungen: Die vorliegende Arbeit stellt international eine der wenigen Studien dar, die eine Reduktion der Symptome von Computerspiel- und Internetabhängigkeit im Jugendalter über 4 Monate belegen konnte.


1. It is widely felt that any method of rejecting observations with large deviations from the mean is open to some suspicion. Suppose that by some criterion, such as Peirce’s and Chauvenet’s, we decide to reject observations with deviations greater than 4 σ, where σ is the standard error, computed from the standard deviation by the usual rule; then we reject an observation deviating by 4·5 σ, and thereby alter the mean by about 4·5 σ/ n , where n is the number of observations, and at the same time we reduce the computed standard error. This may lead to the rejection of another observation deviating from the original mean by less than 4 σ, and if the process is repeated the mean may be shifted so much as to lead to doubt as to whether it is really sufficiently representative of the observations. In many cases, where we suspect that some abnormal cause has affected a fraction of the observations, there is a legitimate doubt as to whether it has affected a particular observation. Suppose that we have 50 observations. Then there is an even chance, according to the normal law, of a deviation exceeding 2·33 σ. But a deviation of 3 σ or more is not impossible, and if we make a mistake in rejecting it the mean of the remainder is not the most probable value. On the other hand, an observation deviating by only 2 σ may be affected by an abnormal cause of error, and then we should err in retaining it, even though no existing rule will instruct us to reject such an observation. It seems clear that the probability that a given observation has been affected by an abnormal cause of error is a continuous function of the deviation; it is never certain or impossible that it has been so affected, and a process that completely rejects certain observations, while retaining with full weight others with comparable deviations, possibly in the opposite direction, is unsatisfactory in principle.


Sign in / Sign up

Export Citation Format

Share Document