Expert ratings of computer capabilities to answer PIAAC numeracy questions, by expert

Abstract:Bayesian inferencing as a machine learning technique was evaluated for identifying pre-crash activity and crash type from accident narratives describing 3,686 motor vehicle crashes. It was hypothesized that a Bayesian model could learn from a computer search for 63 keywords related to accident categories. Learning was described in terms of the ability to accurately classify previously unclassifiable narratives not containing the original keywords. When narratives contained keywords, the results obtained using both the Bayesian model and keyword search corresponded closely to expert ratings (P(detection)≥0.9, and P(false positive)≤0.05). For narratives not containing keywords, when the threshold used by the Bayesian model was varied between p>0.5 and p>0.9, the overall probability of detecting a category assigned by the expert varied between 67% and 12%. False positives correspondingly varied between 32% and 3%. These latter results demonstrated that the Bayesian system learned from the results of the keyword searches.

Download Full-text

A Comparison of Expert Ratings and Marker-Less Hand Tracking Along OSATS-Derived Motion Scales

IEEE Transactions on Human-Machine Systems ◽

10.1109/thms.2020.3035763 ◽

2020 ◽

pp. 1-10

Author(s):

David P. Azari ◽

Brady L. Miller ◽

Brian V. Le ◽

Jacob A. Greenberg ◽

Reginald C. Bruskewitz ◽

...

Keyword(s):

Hand Tracking ◽

Expert Ratings

Download Full-text

Erring Experts? A Critique of Wine Ratings as Hedonic Scaling

Journal of Wine Economics ◽

10.1017/jwe.2020.42 ◽

2020 ◽

Vol 15 (4) ◽

pp. 386-393

Author(s):

Denton Marks

Keyword(s):

External Validity ◽

Rating Scales ◽

Food Science ◽

Sensory Sensitivity ◽

Behavioral Sciences ◽

Transaction Prices ◽

Magnitude Scales ◽

Expert Ratings ◽

Jel Classifications

AbstractConsumers use expert ratings to help choose wine, and economists find correlations between ratings and transaction prices. Rating scales resemble hedonic scales in the behavioral sciences, which suffer from an “intersubjectivity” problem. Taste is a private sensation; people taste differently (an external validity problem), so ratings are often unreliable hedonic markers of enjoyment. But why? Hedonic measurements from food science (“general Labeled Magnitude Scales”) attempt to adjust for differences in perceived sensory sensitivity and offer clues. Resulting insights illustrate wine ratings’ shortcomings as reliable guides to enjoyment. (JEL Classifications: C14, D12, D91, L15, L66)

Download Full-text

A New Method for Selecting the Best Design

Ergonomics in Design The Quarterly of Human Factors Applications ◽

10.1177/1064804611428927 ◽

2012 ◽

Vol 20 (1) ◽

pp. 11-17

Author(s):

Mark E. Benden ◽

Kristen Miller ◽

Eric Wilke ◽

Eduardo Ibarra

Keyword(s):

Patient Outcomes ◽

New Method ◽

Driver Safety ◽

Transport Vehicle ◽

Medical Transport ◽

Expert Ratings ◽

Vehicle Seat ◽

Improve Patient ◽

Individual Expert ◽

Selection Of

In this article the authors illustrate how individual expert ratings can be employed to prioritize specifications for use in forced rankings. Those rankings are then used to select a design with the best overall usability. The authors provide an example of this approach in the selection of a medical transport vehicle seat to produce a more ergonomic product that could improve patient outcomes and driver safety.

Download Full-text

Non-expert ratings of infant and parent emotion: Concordance with expert coding and relevance to early autism risk

International Journal of Behavioral Development ◽

10.1177/0165025409350365 ◽

2009 ◽

Vol 34 (1) ◽

pp. 88-95 ◽

Cited By ~ 18

Author(s):

Jason K. Baker ◽

John D. Haltigan ◽

Ryan Brewster ◽

James Jaccard ◽

Daniel Messinger

Keyword(s):

Autism Spectrum ◽

Parent Ratings ◽

Face To Face ◽

Coding Systems ◽

Behavioral Coding ◽

Novel Approach ◽

The Face ◽

High Concordance ◽

Spectrum Disorders ◽

Expert Ratings

This study investigated a novel approach to obtaining data on parent and infant emotion during the Face-to-Face/Still-Face paradigm, and examined these data in light of previous findings regarding early autism risk. One-hundred and eighty eight non-expert students rated 38 parents and infant siblings of children who did (20) or did not (18) have autism spectrum disorders. Ratings averaged across 10 non-experts exhibited high concordance with expert facial-action codes for infant emotion, and 20 non-experts were required for reliable parent ratings. Findings replicated the well-established still-face effect and identified subtle risk associations consonant with results from previous investigations. The unique information offered by intuitive non-expert ratings is discussed as an alternative to complex and costly behavioral coding systems.

Download Full-text

Nonsuicidal self-injury disorder: Clinician and expert ratings

Psychiatry Research ◽

10.1016/j.psychres.2013.08.047 ◽

2013 ◽

Vol 210 (3) ◽

pp. 940-944 ◽

Cited By ~ 19

Author(s):

Gregory J. Lengel ◽

Stephanie N. Mullins-Sweatt

Keyword(s):

Self Injury ◽

Expert Ratings ◽

Nonsuicidal Self Injury

Download Full-text

Creating a Chinese suicide dictionary for identifying suicide risk on social media

PeerJ ◽

10.7717/peerj.1455 ◽

2015 ◽

Vol 3 ◽

pp. e1455 ◽

Cited By ~ 10

Author(s):

Meizhen Lv ◽

Ang Li ◽

Tianli Liu ◽

Tingshao Zhu

Keyword(s):

Social Media ◽

Suicide Risk ◽

Classification Performance ◽

Support Vector ◽

Accurate Identification ◽

Vector Machines ◽

Social Media Service ◽

Linguistic Inquiry ◽

Suicide Prevention Programs ◽

Expert Ratings

Introduction.Suicide has become a serious worldwide epidemic. Early detection of individual suicide risk in population is important for reducing suicide rates. Traditional methods are ineffective in identifying suicide risk in time, suggesting a need for novel techniques. This paper proposes to detect suicide risk on social media using a Chinese suicide dictionary.Methods.To build the Chinese suicide dictionary, eight researchers were recruited to select initial words from 4,653 posts published on Sina Weibo (the largest social media service provider in China) and two Chinese sentiment dictionaries (HowNet and NTUSD). Then, another three researchers were recruited to filter out irrelevant words. Finally, remaining words were further expanded using a corpus-based method. After building the Chinese suicide dictionary, we tested its performance in identifying suicide risk on Weibo. First, we made a comparison of the performance in both detecting suicidal expression in Weibo posts and evaluating individual levels of suicide risk between the dictionary-based identifications and the expert ratings. Second, to differentiate between individuals with high and non-high scores on self-rating measure of suicide risk (Suicidal Possibility Scale, SPS), we built Support Vector Machines (SVM) models on the Chinese suicide dictionary and the Simplified Chinese Linguistic Inquiry and Word Count (SCLIWC) program, respectively. After that, we made a comparison of the classification performance between two types of SVM models.Results and Discussion.Dictionary-based identifications were significantly correlated with expert ratings in terms of both detecting suicidal expression (r= 0.507) and evaluating individual suicide risk (r= 0.455). For the differentiation between individuals with high and non-high scores on SPS, the Chinese suicide dictionary (t1:F1= 0.48; t2:F1= 0.56) produced a more accurate identification than SCLIWC (t1:F1= 0.41; t2:F1= 0.48) on different observation windows.Conclusions.This paper confirms that, using social media, it is possible to implement real-time monitoring individual suicide risk in population. Results of this study may be useful to improve Chinese suicide prevention programs and may be insightful for other countries.

Download Full-text

Development of Brazilian prototypes for short-term psychotherapies

Trends in Psychiatry and Psychotherapy ◽

10.1590/2237-6089-2015-0039 ◽

2016 ◽

Vol 38 (2) ◽

pp. 71-79 ◽

Cited By ~ 1

Author(s):

Fernanda Barcellos Serralta ◽

John Stuart Ablon

Keyword(s):

Psychodynamic Psychotherapy ◽

Behavioral Therapy ◽

Psychotherapy Process ◽

Cognitive Behavioral ◽

Factor Scores ◽

Short Term ◽

Initial Support ◽

Type Factor ◽

Expert Ratings

Abstract Introduction: The Psychotherapy Process Q-Set (PQS) prototype method is used to measure the extent to which ideal processes of different psychotherapies are present in real cases, allowing researchers to examine how adherence to these models relates to or predicts change. Results from studies of short-term psychotherapies suggest that the original psychodynamic prototype is more suitable for studying psychoanalysis and long-term psychodynamic psychotherapy than its time-limited counterparts. Furthermore, culture probably influences how therapies are typically conducted in a given country. Therefore, it seems appropriate to develop Brazilian prototypes on which to base studies of short-term psychodynamic and cognitive-behavioral processes in this country. Objective: To develop prototypes for studying processes of short-term psychotherapies and to examine the degree of adherence of two real psychotherapy cases to these models. Methods: Expert clinicians used the PQS to rate a hypothetical ideal session of either short-term psychodynamic psychotherapy (STPP) or cognitive-behavioral therapy (CBT). Ratings were submitted to Q-type factor analysis to confirm the two groups. Regressive factor scores were rank ordered to describe the prototypes. These ideal models were correlated with ratings of actual therapy processes in two complete psychotherapy cases, one STPP and the other CBT. Results: Agreement levels between expert ratings were high and the two ideal models were confirmed. As expected, the PQS ratings for actual STPP and CBT cases had significant correlations with their respective ideal models, but the STPP case also adhered to the CBT prototype. Conclusion: Overall, the findings reveal the adequacy of the prototypes for time-limited therapies, providing initial support of their validity.

Download Full-text

Bayesian Crime Investigations: Integrating Actuarial And Expert Models

10.32920/ryerson.14655891.v1 ◽

2021 ◽

Author(s):

Jared C. Allen

Keyword(s):

Regression Analysis ◽

Prediction Accuracy ◽

Linear Trend ◽

Regression Equations ◽

Criminal Offender ◽

Offender Characteristics ◽

Subjective Input ◽

Expert Ratings ◽

Analysis Models ◽

Expert Models

In response to concerns that some of the most methodologically rigorous predictive studies of criminal offender characteristics may yet be less generalizable and applicable than advertised or assumed, this research first tests how well seven regression analysis models (represented by 28 equations) predict characteristics across three conditions: familiar cases (used to create the regressions), less familiar cases (native to the sample used to create the regressions) and foreign cases (from a similar but novel sample). Here a linear trend shows overfitting of the models to their own sample: a drop-off in prediction accuracy relative to simple mean-based prediction as cases become more foreign (ηp 2 = .646). In response to hopes that subjective input from expert police investigators could be integrated into the models to correct for this overfitting bias, this research also tests an algorithm combining expert ratings with the regression equations. Here moderate and significant improvement in novel-case prediction is observed overall (p = .036, r = .44) and equations for all twelve expert participants are shown to improve prediction to varying degrees. These results suggest that current best methods would perform poorly in the field, but can be improved by expert insight.

Download Full-text

Statistical Modeling of Expert Ratings on Medical Treatment Appropriateness

Journal of the American Statistical Association ◽

10.1080/01621459.1993.10476291 ◽

1993 ◽

Vol 88 (422) ◽

pp. 421-427 ◽

Cited By ~ 48

Author(s):

John S. Uebersax

Keyword(s):

Medical Treatment ◽

Statistical Modeling ◽

Expert Ratings

Download Full-text