Rating Scales in Accounting Research: The Impact of Scale Points and Labels

2015 ◽  
Vol 27 (2) ◽  
pp. 35-51 ◽  
Author(s):  
Jared Eutsler ◽  
Bradley Lang

ABSTRACT Rating scales are one of the most widely used tools in behavioral research. Decisions regarding scale design can have a potentially profound effect on research findings. Despite this importance, an analysis of extant literature in top accounting journals reveals a wide variety of rating scale compositions. The purpose of this paper is to experimentally investigate the impact of scale characteristics on participants' responses. Two experiments are conducted that manipulate the number of scale points and the corresponding labels to study their influence on the statistical properties of the resultant data. Results suggest that scale design impacts the statistical characteristics of response data and emphasize the importance of labeling all scale points. A scale with all points labeled effectively minimizes response bias, maximizes variance, maximizes power, and minimizes error. This analysis also suggests variance may be maximized when the scale length is set at 7 points. Although researchers commonly believe using additional scale points will maximize variance, results indicate increasing scale points beyond 7 does not increase variance. Taken together, a fully labeled 7-point scale may provide the greatest benefits to researchers. The importance of scale labels provides a significant contribution to accounting research as only 5 percent of the accounting studies reviewed have reported scales with all points labeled.

2006 ◽  
Vol 22 (4) ◽  
pp. 259-267 ◽  
Author(s):  
Eelco Olde ◽  
Rolf J. Kleber ◽  
Onno van der Hart ◽  
Victor J.M. Pop

Childbirth has been identified as a possible traumatic experience, leading to traumatic stress responses and even to the development of posttraumatic stress disorder (PTSD). The current study investigated the psychometric properties of the Dutch version of the Impact of Event Scale-Revised (IES-R) in a group of women who recently gave birth (N = 435). In addition, a comparison was made between the original IES and the IES-R. The scale showed high internal consistency (α = 0.88). Using confirmatory factor analysis no support was found for a three-factor structure of an intrusion, an avoidance, and a hyperarousal factor. Goodness of fit was only reasonable, even after fitting one intrusion item on the hyperarousal scale. The IES-R correlated significantly with scores on depression and anxiety self-rating scales, as well as with scores on a self-rating scale of posttraumatic stress disorder. Although the IES-R can be used for studying posttraumatic stress reactions in women who recently gave birth, the original IES proved to be a better instrument compared to the IES-R. It is concluded that adding the hyperarousal scale to the IES-R did not make the scale stronger.


Author(s):  
Rachel Cooper

Psychiatric research currently faces multiple crises; one is that trust in reported research findings has been eroded. Concerns that much research serves the interests of industry rather than the interests of patients have become mainstream. Such worries are not unique to psychiatry, but extend to many areas of science. One way in which such concerns can be ameliorated is via the development of more amateur/ citizen/ user-led research. I argue that promoting research conducted outside of traditional academic settings promises a range of benefits – both to the non-traditional researchers themselves and to others who want truths to be discovered. Having argued that it would be a good idea to have more user-produced research, I discuss how research by users might be facilitated or hindered by changes to the informational infrastructure of science. In particular, I discuss how different styles of classification, and rating scale, can facilitate the work of some research communities and set-back the work of others.


Author(s):  
Geoffrey M. Hudson ◽  
Yao Lu ◽  
Xiaoke Zhang ◽  
James Hahn ◽  
Johannah E. Zabal ◽  
...  

The creation of personalized avatars that may be morphed to simulate realistic changes in body size is useful when studying self-perception of body size. One drawback is that these methods are resource intensive compared to rating scales that rely upon generalized drawings. Little is known about how body perception ratings compare across different methods, particularly across differing levels of personalized detail in visualizations. This knowledge is essential to inform future decisions about the appropriate tradeoff between personalized realism and resource availability. The current study aimed to determine the impact of varying degrees of personalized realism on self-perception of body size. We explored this topic in young adult women, using a generalized line drawing scale, as well as several types of personalized avatars, including 3D textured images presented in immersive virtual reality (VR). Body perception ratings using generalized line drawings were often higher than responses using individualized visualization methods. While the personalized details seemed to help with identification, there were few differences among the three conditions containing different amounts of individualized realism (e.g., photo-realistic texture). These results suggest that using scales based on personalized texture and limb dimensions are beneficial, although presentation in immersive VR may not be essential.


2020 ◽  
pp. 089443932090246
Author(s):  
Jan Karem Höhne ◽  
Dagmar Krebs ◽  
Steffen-M. Kühnel

In social science research, unipolar and bipolar scales are commonly used methods in measuring respondents’ attitudes and opinions. Compared to other rating scale characteristics, scale polarity (unipolar and bipolar) and its effects on response behavior have rarely been addressed in previous research. To fill this gap in the literature, we investigate whether and to what extent fully verbalized unipolar and bipolar scales influence response behavior by analyzing observed and latent response distributions and latent thresholds of response categories. For this purpose, we conducted a survey experiment in a probability-based online panel and randomly assigned respondents to a unipolar or bipolar scale condition. The results reveal substantial differences between the two rating scales. They show significantly different response distributions and measurement non-invariance. In addition, response categories (and latent thresholds) of unipolar and bipolar scales are not equally distributed. The findings show that responses to unipolar and bipolar scales differ not only on the observational level but also on the latent level. Both rating scales vary with respect to their measurement properties, so that the responses obtained using each scale are not easily comparable. We recommend not considering unipolar and bipolar scales as interchangeable.


2020 ◽  
Author(s):  
Alexandria Remus ◽  
Valerie Smith ◽  
Francesca Wuytack

Abstract Background: As the development of core outcome sets (COS) increases, guidance for developing and reporting high-quality COS continues to evolve; however, a number of methodological uncertainties still remain. The objectives of this study were: (1) to explore the impact of including patient interviews in developing a COS, (2) to examine the impact of using a 5-point versus a 9-point rating scale during Delphi consensus methods on outcome selection and (3) to inform and contribute to COS development methodology by advancing the evidence base on COS development techniques. Methods: Semi-structured patient interviews and a nested randomised controlled parallel group trial as part of the Pelvic Girdle Pain Core Outcome Set project (PGP-COS). Patient interviews, as an adjunct to a systematic review of outcomes reported in previous studies, were undertaken to identify preliminary outcomes for including in a Delphi consensus survey. In the Delphi survey, participants were randomised (1:1) to a 5-point or 9-point rating scale for rating the importance of the list of preliminary outcomes. Results: Four of the eight patient interview derived outcomes were included in the preliminary COS, however, none of these outcomes were included in the final PGP-COS. The 5-point rating scale resulted in twice as many outcomes reaching consensus after the 3-round Delphi survey compared to the 9-point scale. Consensus on all five outcomes included in the final PGP-COS was achieved by participants allocated the 5-point rating scale, whereas consensus on four of these was achieved by those using the 9-point scale. Conclusions: Using patient interviews to identify preliminary outcomes as an adjunct to conducting a systematic review of outcomes measured in the literature did not appear to influence outcome selection in developing the COS in this study. The use of different rating scales in a Delphi survey, however, did appear to impact on outcome selection. The 5-point scale demonstrated greater congruency than the 9-point scale with the outcomes included in the final PGP-COS. Future research to substantiate our findings and to explore the impact of other rating scales on outcome selection during COS development, however, is warranted.


1986 ◽  
Vol 58 (1) ◽  
pp. 63-66 ◽  
Author(s):  
Alan M. Beck ◽  
Louisa Seraydarian ◽  
G. Frederick Hunter

This study compared the impact of therapy and activity groups on two matched groups of 8 and 9 psychiatric inpatients. Daily sessions of the groups were held for 11 wk. in identical rooms except for the presence of caged finches in one of the rooms. The patients were evaluated before and after the sessions using standard psychiatric rating scales. The group who met in the room that contained animals (a cage with four finches) had significantly better attendance and participation and significantly improved in areas assessed by the Brief Psychiatric Rating Scale. Other positive trends indicated that the study should be replicated with larger samples and modified to increase interactions with the animals.


2021 ◽  
Vol 8 ◽  
Author(s):  
Corinna C. A. Clark ◽  
Nicola J. Rooney

Rating scales are widely used to rate working dog behavior and performance. Whilst behaviour scales have been extensively validated, instruments used to rate ability have usually been designed by training and practitioner organizations, and often little consideration has been given to how seemingly insignificant aspects of the scale design might alter the validity of the results obtained. Here we illustrate how manipulating one aspect of rating scale design, the provision of verbal benchmarks or labels (as opposed to just a numerical scale), can affect the ability of observers to distinguish between differing levels of search dog performance in an operational environment. Previous studies have found evidence for range restriction (using only part of the scale) in raters' use of the scales and variability between raters in their understanding of the traits used to measures performance. As provision of verbal benchmarks has been shown to help raters in a variety of disciplines to select appropriate scale categories (or scores), it may be predicted that inclusion of verbal benchmarks will bring raters' conceptualization of the traits closer together, increasing agreement between raters, as well as improving the ability of observers to distinguish between differing levels of search dog performance and reduce range restriction. To test the value of verbal benchmarking we compared inter-rater reliability, raters' ability to discriminate between different levels of search dog performance, and their use of the whole scale before and after being presented with benchmarked scales for the same traits. Raters scored the performance of two separate types of explosives search dog (High Assurance Search (HAS) and Vehicle Search (VS) dogs), from short (~30 s) video clips, using 11 previously validated traits. Taking each trait in turn, for the first five clips raters were asked to give a score from 1, representing the lowest amount of the trait evident to 5, representing the highest. Raters were given a list of adjective-based benchmarks (e.g., very low, low, intermediate, high, very high) and scored a further five clips for each trait. For certain traits, the reliability of scoring improved when benchmarks were provided (e.g., Motivation and Independence), indicating that their inclusion may potentially reduce ambivalence in scoring, ambiguity of meanings, and cognitive difficulty for raters. However, this effect was not universal, with the ratings of some traits remaining unchanged (e.g., Control), or even reducing in reliability (e.g., Distraction). There were also some differences between VS and HAS (e.g., Confidence reliability increased for VS raters and decreased for HAS raters). There were few improvements in the spread of scores across the range, but some indication of more favorable scoring. This was a small study of operational handlers and trainers utilizing training video footage from realistic operational environments, and there are potential cofounding effects. We discuss possible causal factors, including issues specific to raters and possible deficiencies in the chosen benchmarks, and suggest ways to further improve the effectiveness of rating scales. This study illustrates why it is vitally important to validate all aspects of rating scale design, even if they may seem inconsequential, as relatively small changes to the amount and type of information provided to raters can have both positive and negative impacts on the data obtained.


2020 ◽  
Vol 36 (4) ◽  
Author(s):  
Nguyen Thi Ngoc Quynh ◽  
Nguyen Thi Quynh Yen ◽  
Tran Thi Thu Hien ◽  
Nguyen Thi Phuong Thao ◽  
Bui Thien Sao ◽  
...  

Playing a vital role in assuring reliability of language performance assessment, rater training has been a topic of interest in research on large-scale testing. Similarly, in the context of VSTEP, the effectiveness of the rater training program has been of great concern. Thus, this research was conducted to investigate the impact of the VSTEP speaking rating scale training session in the rater training program provided by University of Languages and International Studies - Vietnam National University, Hanoi. Data were collected from 37 rater trainees of the program. Their ratings before and after the training session on the VSTEP.3-5 speaking rating scales were then compared. Particularly, dimensions of score reliability, criterion difficulty, rater severity, rater fit, rater bias, and score band separation were analyzed. Positive results were detected when the post-training ratings were shown to be more reliable, consistent, and distinguishable. Improvements were more noticeable for the score band separation and slighter in other aspects. Meaningful implications in terms of both future practices of rater training and rater training research methodology could be drawn from the study.


2020 ◽  
pp. 002076402097973
Author(s):  
Alessandro Gentile ◽  
Julio Torales ◽  
Marcelo O’Higgins ◽  
Pamela Figueredo ◽  
Joao Mauricio Castaldelli-Maia ◽  
...  

Background: The current COVID-19 pandemic is affecting mental health of global population and, particularly, of people suffering from preexisting mental disorders. Aims: This study aims to report on findings from a phone-based clinical follow-up conducted in two large catchment areas in Italy and Paraguay, during the COVID-19 lockdown, in order to provide psychiatric assessments and measure the level of stress related to the quarantine in a large sample of psychiatric outpatients. Methods: A clinical phone-based follow-up has been conducted in two large catchment areas in the province of Chieti (Vasto, Italy) and City of Asunción (Paraguay), during the COVID-19 national lockdown. The following rating scales have been employed: Hamilton Anxiety Rating Scale (HAM-A); Hamilton Depression Rating Scale (HAM-D); 18-items Brief Psychiatric Rating Scale (BPRS-18). The psychological distress related to the outbreak has been assessed employing the Impact of Event Scale – Revised (IES-R). Results: A total of 110 outpatients were consecutively included and followed among those reporting a stable phase of illness before the COVID-19 lockdown. Findings confirmed a significant increase of general psychopathology, anxiety and fear as well as mild levels of stress related to the quarantine. Also, significant weight gain during the lockdown was detected among patients. Conclusions: This study confirmed the impact of COVID-19 lockdown on mental health of people suffering from psychiatric disorders and may also add evidence on the employment of digital psychiatry in the current pandemic.


2020 ◽  
Vol 9 (3) ◽  
pp. 291-302
Author(s):  
David Isaacs ◽  
Jessie S. Gibson ◽  
Jeffrey Stovall ◽  
Daniel O. Claassen

Background: Psychiatric symptoms are widely prevalent in Huntington’s disease (HD) and exert greater impact on quality of life than motor manifestations. Despite this, psychiatric symptoms are frequently underrecognized and undertreated. Lack of awareness, or anosognosia, has been observed at all stages of HD and may contribute to diminished patient self-reporting of psychiatric symptoms. Objective: We sought to evaluate the impact of anosognosia on performance of commonly used clinical rating scales for psychiatric manifestations of HD. Methods: We recruited 50 HD patients to undergo a formal psychiatrist evaluation, the Problem Behavior Assessment-Short Form (PBA-s), and validated self-report rating scales for depression, anxiety, and anger. Motor impairment, cognitive function, and total functional capacity were assessed as part of clinical exam. Patient awareness of motor, cognitive, emotional, and functional capacities was quantified using the Anosognosia Rating Scale. Convergent validity, discriminant validity, classification accuracy, and anosognosia effect was determined for each psychiatric symptom rating scale. Results: Anosognosia was identified in one-third of patients, and these patients underrated the severity of depression and anxiety when completing self-report instruments. Anosognosia did not clearly influence self-reported anger, but this result may have been confounded by the sub-optimal discriminant validity of anger rating scales. Conclusion: Anosognosia undermines reliability of self-reported depression and anxiety in HD. Self-report rating scales for depression and anxiety may have a role in screening, but results must be corroborated by provider and caregiver input when anosognosia is present. HD clinical trials utilizing patient-reported outcomes as study endpoints should routinely evaluate participants for anosognosia.


Sign in / Sign up

Export Citation Format

Share Document