The Effect of Rating Scale Design on Extreme Response Tendency in Consumer Product Ratings

Extreme response style is the tendency of individuals to prefer the extreme categories of a rating scale irrespective of item content. It has been shown repeatedly that individual response style differences affect the reliability and validity of item responses and should, therefore, be considered carefully. To account for extreme response style (ERS) in ordered categorical item responses, it has been proposed to model responder-specific sets of category thresholds in connection with established polytomous item response models. An elegant approach to achieve this is to introduce a responder-specific scaling factor that modifies intervals between thresholds. By individually expanding or contracting intervals between thresholds, preferences for selecting either the outer or inner response categories can be modeled. However, for a responder-specific scaling factor to appropriately account for ERS, there are two important aspects that have not been considered previously and which, if ignored, will lead to questionable model properties. Specifically, the centering of threshold parameters and the type of category probability logit need to be considered carefully. In the present article, a scaled threshold model is proposed, which accounts for these considerations. Instructions on model fitting are given together with SAS PROC NLMIXED program code, and the model’s application and interpretation is demonstrated using simulation studies and two empirical examples.

Download Full-text

Item Response Tree Models to Investigate Acquiescence and Extreme Response Styles in Likert-Type Rating Scales

Educational and Psychological Measurement ◽

10.1177/0013164419829855 ◽

2019 ◽

Vol 79 (5) ◽

pp. 911-930 ◽

Cited By ~ 3

Author(s):

Minjeong Park ◽

Amery D. Wu

Keyword(s):

Item Response ◽

Rating Scales ◽

Rating Scale ◽

Explanatory Models ◽

Response Styles ◽

Modeling Framework ◽

Typical Application ◽

Item Response Models ◽

Extreme Response ◽

Tree Models

Item response tree (IRTree) models are recently introduced as an approach to modeling response data from Likert-type rating scales. IRTree models are particularly useful to capture a variety of individuals’ behaviors involving in item responding. This study employed IRTree models to investigate response styles, which are individuals’ tendencies to prefer or avoid certain response categories in a rating scale. Specifically, we introduced two types of IRTree models, descriptive and explanatory models, perceived under a larger modeling framework, called explanatory item response models, proposed by De Boeck and Wilson. This extends the typical application of IRTree models for studying response styles. As a demonstration, we applied the descriptive and explanatory IRTree models to examine acquiescence and extreme response styles in Rosenberg’s Self-Esteem Scale. Our findings suggested the presence of two distinct extreme response styles and acquiescence response style in the scale.

Download Full-text

The Importance of Rating Scale Design in the Measurement of Patient-Reported Outcomes Using Questionnaires or Item Banks

Investigative Opthalmology & Visual Science ◽

10.1167/iovs.12-9728 ◽

2012 ◽

Vol 53 (7) ◽

pp. 4042 ◽

Cited By ~ 20

Author(s):

Jyoti Khadka ◽

Colm McAlinden ◽

Vijaya K. Gothwal ◽

Ecosse L. Lamoureux ◽

Konrad Pesudovs

Keyword(s):

Patient Reported Outcomes ◽

Rating Scale ◽

Item Banks ◽

Patient Reported ◽

Scale Design

Download Full-text

A Cross-National Comparison of Extreme Response Style Measures

International Journal of Market Research ◽

10.2501/ijmr-2014-005 ◽

2014 ◽

Vol 56 (1) ◽

pp. 89-110 ◽

Cited By ~ 9

Author(s):

Robert A. Peterson ◽

Pablo Rhi-Perez ◽

Gerald Albaum

Keyword(s):

Rating Scale ◽

Response Style ◽

Traditional Measure ◽

A Value ◽

Extreme Response Style ◽

Extreme Response ◽

National Comparison ◽

The Individual ◽

Study Participants ◽

Cross National

Five measures of extreme response style were compared across 6,146 study participants from 36 countries: the traditional measure, a modified traditional measure, the individual standard deviation, an index of dispersion and an index of entropy. The traditional measure of extreme response style, whereby the two extreme categories of an item or rating scale are assigned a value of ‘1’, all interior categories are assigned a value of ‘0’ and the sum of the ‘1’ values reflects the extent of extreme responding behaviour, performed slightly better than the other extreme response style measures examined with respect to reliability and ability to discriminate. The traditional measure of extreme response style was positively related to the variance of an attitudinal variable but unrelated to its mean. It was also related to Hofstede's cultural orientation variables of individualism-collectivism and power distance. Future cross-cultural and cross-national empirical research should systematically incorporate measures of extreme responding so that more is learned about the phenomenon and its possible effects.

Download Full-text

The Child Rating Scale and its Use with Middle School-Age Students

Psychological Reports ◽

10.2466/pr0.2000.87.2.381 ◽

2000 ◽

Vol 87 (2) ◽

pp. 381-388

Author(s):

Winston J. Hagborg

Keyword(s):

Middle School ◽

Convergent Validity ◽

Rating Scale ◽

Self Report ◽

School Age ◽

Acting Out ◽

Rule Compliance ◽

Psychological Sense ◽

Scale Design ◽

Report Measure

The Child Rating Scale is a socioemotional self-report rating scale design for elementary school children. This study examined the Child Rating Scale with a middle school-age sample (Grades 5 to 8) of 240 students. The Child Rating Scale's four scales have shown moderate to high coefficients alpha. Factor analysis yielded the 4 underlying factors consistent with the current subscales. Supportive convergent validity was found based on the Child Rating Scale subscales' predicted association with the Self-perception Profile for Children and the Psychological Sense of School Membership–Brief. Consistent with current research, decline over grades in rule compliance/acting out and school interest was documented as well as the expected mean sex differences on these two subscales. Possible areas of study are indicated, and the present study's limitations are described. The Child Rating Scale seems to be a promising self-report measure for middle school-age youth.

Download Full-text

Extreme response tendency as a function of emotional adjustment.

Journal of Abnormal & Social Psychology ◽

10.1037/h0048877 ◽

1964 ◽

Vol 69 (6) ◽

pp. 654-657 ◽

Cited By ~ 17

Author(s):

Melvin Zax ◽

Dwight H. Gardiner ◽

David G. Lowy

Keyword(s):

Emotional Adjustment ◽

Response Tendency ◽

Extreme Response

Download Full-text

Does Benchmarking of Rating Scales Improve Ratings of Search Performance Given by Specialist Search Dog Handlers?

Frontiers in Veterinary Science ◽

10.3389/fvets.2021.545398 ◽

2021 ◽

Vol 8 ◽

Author(s):

Corinna C. A. Clark ◽

Nicola J. Rooney

Keyword(s):

Rating Scales ◽

Rating Scale ◽

Search Performance ◽

Operational Environment ◽

Range Restriction ◽

Video Footage ◽

Before And After ◽

High Assurance ◽

And Performance ◽

Scale Design

Rating scales are widely used to rate working dog behavior and performance. Whilst behaviour scales have been extensively validated, instruments used to rate ability have usually been designed by training and practitioner organizations, and often little consideration has been given to how seemingly insignificant aspects of the scale design might alter the validity of the results obtained. Here we illustrate how manipulating one aspect of rating scale design, the provision of verbal benchmarks or labels (as opposed to just a numerical scale), can affect the ability of observers to distinguish between differing levels of search dog performance in an operational environment. Previous studies have found evidence for range restriction (using only part of the scale) in raters' use of the scales and variability between raters in their understanding of the traits used to measures performance. As provision of verbal benchmarks has been shown to help raters in a variety of disciplines to select appropriate scale categories (or scores), it may be predicted that inclusion of verbal benchmarks will bring raters' conceptualization of the traits closer together, increasing agreement between raters, as well as improving the ability of observers to distinguish between differing levels of search dog performance and reduce range restriction. To test the value of verbal benchmarking we compared inter-rater reliability, raters' ability to discriminate between different levels of search dog performance, and their use of the whole scale before and after being presented with benchmarked scales for the same traits. Raters scored the performance of two separate types of explosives search dog (High Assurance Search (HAS) and Vehicle Search (VS) dogs), from short (~30 s) video clips, using 11 previously validated traits. Taking each trait in turn, for the first five clips raters were asked to give a score from 1, representing the lowest amount of the trait evident to 5, representing the highest. Raters were given a list of adjective-based benchmarks (e.g., very low, low, intermediate, high, very high) and scored a further five clips for each trait. For certain traits, the reliability of scoring improved when benchmarks were provided (e.g., Motivation and Independence), indicating that their inclusion may potentially reduce ambivalence in scoring, ambiguity of meanings, and cognitive difficulty for raters. However, this effect was not universal, with the ratings of some traits remaining unchanged (e.g., Control), or even reducing in reliability (e.g., Distraction). There were also some differences between VS and HAS (e.g., Confidence reliability increased for VS raters and decreased for HAS raters). There were few improvements in the spread of scores across the range, but some indication of more favorable scoring. This was a small study of operational handlers and trainers utilizing training video footage from realistic operational environments, and there are potential cofounding effects. We discuss possible causal factors, including issues specific to raters and possible deficiencies in the chosen benchmarks, and suggest ways to further improve the effectiveness of rating scales. This study illustrates why it is vitally important to validate all aspects of rating scale design, even if they may seem inconsequential, as relatively small changes to the amount and type of information provided to raters can have both positive and negative impacts on the data obtained.

Download Full-text

Rating Scales in Accounting Research: The Impact of Scale Points and Labels

Behavioral Research in Accounting ◽

10.2308/bria-51219 ◽

2015 ◽

Vol 27 (2) ◽

pp. 35-51 ◽

Cited By ~ 19

Author(s):

Jared Eutsler ◽

Bradley Lang

Keyword(s):

Rating Scales ◽

Rating Scale ◽

Statistical Characteristics ◽

Accounting Research ◽

Response Data ◽

Resultant Data ◽

Research Findings ◽

Scale Characteristics ◽

Scale Design ◽

The Impact

ABSTRACT Rating scales are one of the most widely used tools in behavioral research. Decisions regarding scale design can have a potentially profound effect on research findings. Despite this importance, an analysis of extant literature in top accounting journals reveals a wide variety of rating scale compositions. The purpose of this paper is to experimentally investigate the impact of scale characteristics on participants' responses. Two experiments are conducted that manipulate the number of scale points and the corresponding labels to study their influence on the statistical properties of the resultant data. Results suggest that scale design impacts the statistical characteristics of response data and emphasize the importance of labeling all scale points. A scale with all points labeled effectively minimizes response bias, maximizes variance, maximizes power, and minimizes error. This analysis also suggests variance may be maximized when the scale length is set at 7 points. Although researchers commonly believe using additional scale points will maximize variance, results indicate increasing scale points beyond 7 does not increase variance. Taken together, a fully labeled 7-point scale may provide the greatest benefits to researchers. The importance of scale labels provides a significant contribution to accounting research as only 5 percent of the accounting studies reviewed have reported scales with all points labeled.

Download Full-text