Assessing the Validity of Job Ratings: An Empirical Study of False Reporting in Task Inventories

1995 ◽  
Vol 24 (4) ◽  
pp. 451-460 ◽  
Author(s):  
Douglas E. Pine

Despite the widespread use of task inventories in job analysis, little is known about the validity of the obtained task ratings. One approach for examining the validity of such ratings is the use of a “false reporting” index to identify invalid responding. The purpose of this field experiment was to examine the effects of the type of frequency rating scale and method of task inventory administration on the degree of false reporting in task inventory ratings. A total of 177 Correctional Officers from a state correctional system responded to a 68 item task inventory using frequency and importance rating scales. Five of the items in the task inventory were bogus tasks not performed by the target job and formed a false reporting index. In a 2 × 2 design, the type of frequency rating scale (Relative-Time-Spent vs. Actual-Time-Spent) and method of task inventory administration (anonymous vs. identified) were manipulated. Analysis of variance results showed a significantly greater degree of false reporting in Relative-Time-Spent ratings. No significant differences in false reporting were found for method of task inventory administration or scale × method interactions. Overall, 45% of respondents indicated that they performed tasks that were not part of the job, which raises concerns about whether job incumbents are capable of providing accurate and complete task rating data.

2020 ◽  
Vol 6 (1) ◽  
Author(s):  
Michael Seufert

AbstractDue to biased assumptions on the underlying ordinal rating scale in subjective Quality of Experience (QoE) studies, Mean Opinion Score (MOS)-based evaluations provide results, which are hard to interpret and can be misleading. This paper proposes to consider the full QoE distribution for evaluating, reporting, and modeling QoE results instead of relying on MOS-based metrics derived from results based on ordinal rating scales. The QoE distribution can be represented in a concise way by using the parameters of a multinomial distribution without losing any information about the underlying QoE ratings, and even keeps backward compatibility with previous, biased MOS-based results. Considering QoE results as a realization of a multinomial distribution allows to rely on a well-established theoretical background, which enables meaningful evaluations also for ordinal rating scales. Moreover, QoE models based on QoE distributions keep detailed information from the results of a QoE study of a technical system, and thus, give an unprecedented richness of insights into the end users’ experience with the technical system. In this work, existing and novel statistical methods for QoE distributions are summarized and exemplary evaluations are outlined. Furthermore, using the novel concept of quality steps, simulative and analytical QoE models based on QoE distributions are presented and showcased. The goal is to demonstrate the fundamental advantages of considering QoE distributions over MOS-based evaluations if the underlying rating data is ordinal in nature.


2012 ◽  
Vol 21 (4) ◽  
pp. 136-143
Author(s):  
Lynn E. Fox

Abstract The self-anchored rating scale (SARS) is a technique that augments collaboration between Augmentative and Alternative Communication (AAC) interventionists, their clients, and their clients' support networks. SARS is a technique used in Solution-Focused Brief Therapy, a branch of systemic family counseling. It has been applied to treating speech and language disorders across the life span, and recent case studies show it has promise for promoting adoption and long-term use of high and low tech AAC. I will describe 2 key principles of solution-focused therapy and present 7 steps in the SARS process that illustrate how clinicians can use the SARS to involve a person with aphasia and his or her family in all aspects of the therapeutic process. I will use a case study to illustrate the SARS process and present outcomes for one individual living with aphasia.


2006 ◽  
Vol 22 (4) ◽  
pp. 259-267 ◽  
Author(s):  
Eelco Olde ◽  
Rolf J. Kleber ◽  
Onno van der Hart ◽  
Victor J.M. Pop

Childbirth has been identified as a possible traumatic experience, leading to traumatic stress responses and even to the development of posttraumatic stress disorder (PTSD). The current study investigated the psychometric properties of the Dutch version of the Impact of Event Scale-Revised (IES-R) in a group of women who recently gave birth (N = 435). In addition, a comparison was made between the original IES and the IES-R. The scale showed high internal consistency (α = 0.88). Using confirmatory factor analysis no support was found for a three-factor structure of an intrusion, an avoidance, and a hyperarousal factor. Goodness of fit was only reasonable, even after fitting one intrusion item on the hyperarousal scale. The IES-R correlated significantly with scores on depression and anxiety self-rating scales, as well as with scores on a self-rating scale of posttraumatic stress disorder. Although the IES-R can be used for studying posttraumatic stress reactions in women who recently gave birth, the original IES proved to be a better instrument compared to the IES-R. It is concluded that adding the hyperarousal scale to the IES-R did not make the scale stronger.


Methodology ◽  
2011 ◽  
Vol 7 (3) ◽  
pp. 88-95 ◽  
Author(s):  
Jose A. Martínez ◽  
Manuel Ruiz Marín

The aim of this study is to improve measurement in marketing research by constructing a new, simple, nonparametric, consistent, and powerful test to study scale invariance. The test is called D-test. D-test is constructed using symbolic dynamics and symbolic entropy as a measure of the difference between the response patterns which comes from two measurement scales. We also give a standard asymptotic distribution of our statistic. Given that the test is based on entropy measures, it avoids smoothed nonparametric estimation. We applied D-test to a real marketing research to study if scale invariance holds when measuring service quality in a sports service. We considered a free-scale as a reference scale and then we compared it with three widely used rating scales: Likert-type scale from 1 to 5 and from 1 to 7, and semantic-differential scale from −3 to +3. Scale invariance holds for the two latter scales. This test overcomes the shortcomings of other procedures for analyzing scale invariance; and it provides researchers a tool to decide the appropriate rating scale to study specific marketing problems, and how the results of prior studies can be questioned.


2021 ◽  
pp. 001698622098594
Author(s):  
Nielsen Pereira

The purpose of this study was to investigate the validity of the HOPE Scale for identifying gifted English language learners (ELs) and how classroom and English as a second language (ESL) teacher HOPE Scale scores differ. Seventy teachers completed the HOPE Scale on 1,467 students in grades K-5 and four ESL teachers completed the scale on 131 ELs. Measurement invariance tests indicated that the HOPE Scale yields noninvariant latent means across EL and English proficient (EP) samples. However, confirmatory factor analysis results support the use of the scale with ELs or EP students separately. Results also indicate that the rating patterns of classroom and ESL teachers were different and that the HOPE Scale does not yield valid data when used by ESL teachers. Caution is recommended when using the HOPE Scale and other teacher rating scales to compare ELs to EP students. The importance of invariance testing before using an instrument with a population that is different from the one(s) for which the instrument was developed is discussed.


Assessment ◽  
2021 ◽  
pp. 107319112199646
Author(s):  
Olivia Gratz ◽  
Duncan Vos ◽  
Megan Burke ◽  
Neelkamal Soares

To date, there is a paucity of research conducting natural language processing (NLP) on the open-ended responses of behavior rating scales. Using three NLP lexicons for sentiment analysis of the open-ended responses of the Behavior Assessment System for Children-Third Edition, the researchers discovered a moderately positive correlation between the human composite rating and the sentiment score using each of the lexicons for strengths comments and a slightly positive correlation for the concerns comments made by guardians and teachers. In addition, the researchers found that as the word count increased for open-ended responses regarding the child’s strengths, there was a greater positive sentiment rating. Conversely, as word count increased for open-ended responses regarding child concerns, the human raters scored comments more negatively. The authors offer a proof-of-concept to use NLP-based sentiment analysis of open-ended comments to complement other data for clinical decision making.


2021 ◽  
Vol 10 (14) ◽  
pp. 3056
Author(s):  
Ada Holak ◽  
Michał Czapla ◽  
Marzena Zielińska

Background: The all-too-frequent failure to rate pain intensity, resulting in the lack of or inadequacy of pain management, has long ceased to be an exclusive problem of the young patient, becoming a major public health concern. This study aimed to evaluate the methods used for reducing post-traumatic pain in children and the frequency of use of such methods. Additionally, the methods of pain assessment and the frequency of their application in this age group were analysed. Methods: A retrospective analysis of 2452 medical records of emergency medical teams dispatched to injured children aged 0–18 years in the area around Warsaw (Poland). Results: Of all injured children, 1% (20 out of 2432) had their pain intensity rated, and the only tool used for this assessment was the numeric rating scale (NRS). Children with burns most frequently received a single analgesic drug or cooling (56.2%), whereas the least frequently used method was multimodal treatment combining pharmacotherapy and cooling (13.5%). Toddlers constituted the largest percentage of patients who were provided with cooling (12%). Immobilisation was most commonly used in adolescents (29%) and school-age children (n = 186; 24%). Conclusions: Low frequency of pain assessment emphasises the need to provide better training in the use of various pain rating scales and protocols. What is more, non-pharmacological methods (cooling and immobilisation) used for reducing pain in injured children still remain underutilized.


2021 ◽  
Vol 14 ◽  
pp. 175628482110217
Author(s):  
Hang Yang ◽  
Honglin Chen ◽  
Bing Hu

Background: Centrally mediated abdominal pain syndrome (CAPS) is characterized by continuous or frequently recurring abdominal pain and can result in functional loss across several life domains. The efficacy of the present management methods has not been established yet. We performed a prospective randomized controlled trial to explore the short-term efficacy of local analgesic (lidocaine) and opioid analgesic (sufentanil) in patients with CAPS. Methods: We consecutively enrolled 130 patients who met the Rome IV CAPS criteria and divided them into the sufentanil + lidocaine (S + L) group and sufentanil (S) group. Patients completed the pain rating scales, including the numeric rating scale (NRS) and verbal rating scale (VRS), 60 min before colonoscopy. All the patients were initially administered sufentanil. In the S + L group, we sprayed a 5 ml solution of lidocaine on the surface of ascending, transverse, descending, and sigmoid colon during colonoscope withdrawal, while 5 ml saline was sprayed in the S group. Follow up was performed 1 day, 3 days, 1 week, 2 weeks, 1 month, and 3 months after colonoscopy, to complete the pain scaling. Results: A comparison of the NRS and VRS showed that there were no significant differences between the S + L and S groups and within each group ( p > 0.05). Conclusions: Local analgesic lidocaine and opioid analgesic sufentanil showed negative efficacy during short-term observation. The opioid receptor blocker sufentanil did not worsen symptoms in patients with CAPS after colonoscopy under general anesthesia in the short term. [chictr.org.cn, Chinese Clinical Trial Identifier, ChiCTR-IOR-16008187]


2021 ◽  
Vol 7 ◽  
pp. 205520762110012
Author(s):  
John E Leikauf ◽  
Carlos Correa ◽  
Andrew N Bueno ◽  
Vicente Peris Sempere ◽  
Leanne M Williams

Introduction To address the need for non-pharmacologic, scalable approaches for managing attention-deficit and hyperactivity disorder (ADHD) in young people, we report the results of a study of an application developed for a wearable device (Apple Watch) that was designed to track movement and provide visual and haptic feedback for ADHD. Methods Six-week, open label pilot study with structured rating scales ADHD and semi-structured qualitative interview. Apple Watch software application given to users that uses actigraphy and graphic interface as well as haptic feedback to provide feedback to users about level of movement during periods of intentional focus. Linear mixed models to estimate trajectories. Results Thirty-two participants entered the study. This application was associated with improvement in ADHD symptoms over the 6 weeks of the study. We observed an ADHD-Rating Scale change of β = −1.2 units/week (95% CI = −0.56 to −1.88, F = 13.4, P = .0004). Conclusions These positive clinical outcomes highlight the promise of such wearable applications for ADHD and the need to pursue their further development.


2020 ◽  
Vol 16 (1) ◽  
pp. 87-121
Author(s):  
Bárbara Eizaga-Rebollar ◽  
Cristina Heras-Ramírez

AbstractThe study of pragmatic competence has gained increasing importance within second language assessment over the last three decades. However, its study in L2 language testing is still scarce. The aim of this paper is to research the extent to which pragmatic competence as defined by the Common European Framework of Reference for Languages (CEFR) has been accommodated in the task descriptions and rating scales of two of the most popular Oral Proficiency Interviews (OPIs) at a C1 level: Cambridge’s Certificate in Advanced English (CAE) and Trinity’s Integrated Skills in English (ISE) III. To carry out this research, OPI tests are first defined, highlighting their differences from L2 pragmatic tests. After pragmatic competence in the CEFR is examined, focusing on the updates in the new descriptors, CAE and ISE III formats, structure and task characteristics are compared, showing that, while the formats and some characteristics are found to differ, the structures and task types are comparable. Finally, we systematically analyse CEFR pragmatic competence in the task skills and rating scale descriptors of both OPIs. The findings show that the task descriptions incorporate mostly aspects of discourse and design competence. Additionally, we find that each OPI is seen to prioritise different aspects of pragmatic competence within their rating scale, with CAE focusing mostly on discourse competence and fluency, and ISE III on functional competence. Our study shows that the tests fail to fully accommodate all aspects of pragmatic competence in the task skills and rating scales, although the aspects they do incorporate follow the CEFR descriptors on pragmatic competence. It also reveals a mismatch between the task competences being tested and the rating scale. To conclude, some research lines are proposed.


Sign in / Sign up

Export Citation Format

Share Document