Expert ratings of computer capabilities to answer PIAAC numeracy questions, by expert

Keyword(s):  
1996 ◽  
Vol 35 (04/05) ◽  
pp. 309-316 ◽  
Author(s):  
M. R. Lehto ◽  
G. S. Sorock

Abstract:Bayesian inferencing as a machine learning technique was evaluated for identifying pre-crash activity and crash type from accident narratives describing 3,686 motor vehicle crashes. It was hypothesized that a Bayesian model could learn from a computer search for 63 keywords related to accident categories. Learning was described in terms of the ability to accurately classify previously unclassifiable narratives not containing the original keywords. When narratives contained keywords, the results obtained using both the Bayesian model and keyword search corresponded closely to expert ratings (P(detection)≥0.9, and P(false positive)≤0.05). For narratives not containing keywords, when the threshold used by the Bayesian model was varied between p>0.5 and p>0.9, the overall probability of detecting a category assigned by the expert varied between 67% and 12%. False positives correspondingly varied between 32% and 3%. These latter results demonstrated that the Bayesian system learned from the results of the keyword searches.


Author(s):  
David P. Azari ◽  
Brady L. Miller ◽  
Brian V. Le ◽  
Jacob A. Greenberg ◽  
Reginald C. Bruskewitz ◽  
...  
Keyword(s):  

2020 ◽  
Vol 15 (4) ◽  
pp. 386-393
Author(s):  
Denton Marks

AbstractConsumers use expert ratings to help choose wine, and economists find correlations between ratings and transaction prices. Rating scales resemble hedonic scales in the behavioral sciences, which suffer from an “intersubjectivity” problem. Taste is a private sensation; people taste differently (an external validity problem), so ratings are often unreliable hedonic markers of enjoyment. But why? Hedonic measurements from food science (“general Labeled Magnitude Scales”) attempt to adjust for differences in perceived sensory sensitivity and offer clues. Resulting insights illustrate wine ratings’ shortcomings as reliable guides to enjoyment. (JEL Classifications: C14, D12, D91, L15, L66)


Author(s):  
Mark E. Benden ◽  
Kristen Miller ◽  
Eric Wilke ◽  
Eduardo Ibarra

In this article the authors illustrate how individual expert ratings can be employed to prioritize specifications for use in forced rankings. Those rankings are then used to select a design with the best overall usability. The authors provide an example of this approach in the selection of a medical transport vehicle seat to produce a more ergonomic product that could improve patient outcomes and driver safety.


2009 ◽  
Vol 34 (1) ◽  
pp. 88-95 ◽  
Author(s):  
Jason K. Baker ◽  
John D. Haltigan ◽  
Ryan Brewster ◽  
James Jaccard ◽  
Daniel Messinger

This study investigated a novel approach to obtaining data on parent and infant emotion during the Face-to-Face/Still-Face paradigm, and examined these data in light of previous findings regarding early autism risk. One-hundred and eighty eight non-expert students rated 38 parents and infant siblings of children who did (20) or did not (18) have autism spectrum disorders. Ratings averaged across 10 non-experts exhibited high concordance with expert facial-action codes for infant emotion, and 20 non-experts were required for reliable parent ratings. Findings replicated the well-established still-face effect and identified subtle risk associations consonant with results from previous investigations. The unique information offered by intuitive non-expert ratings is discussed as an alternative to complex and costly behavioral coding systems.


2013 ◽  
Vol 210 (3) ◽  
pp. 940-944 ◽  
Author(s):  
Gregory J. Lengel ◽  
Stephanie N. Mullins-Sweatt

PeerJ ◽  
2015 ◽  
Vol 3 ◽  
pp. e1455 ◽  
Author(s):  
Meizhen Lv ◽  
Ang Li ◽  
Tianli Liu ◽  
Tingshao Zhu

Introduction.Suicide has become a serious worldwide epidemic. Early detection of individual suicide risk in population is important for reducing suicide rates. Traditional methods are ineffective in identifying suicide risk in time, suggesting a need for novel techniques. This paper proposes to detect suicide risk on social media using a Chinese suicide dictionary.Methods.To build the Chinese suicide dictionary, eight researchers were recruited to select initial words from 4,653 posts published on Sina Weibo (the largest social media service provider in China) and two Chinese sentiment dictionaries (HowNet and NTUSD). Then, another three researchers were recruited to filter out irrelevant words. Finally, remaining words were further expanded using a corpus-based method. After building the Chinese suicide dictionary, we tested its performance in identifying suicide risk on Weibo. First, we made a comparison of the performance in both detecting suicidal expression in Weibo posts and evaluating individual levels of suicide risk between the dictionary-based identifications and the expert ratings. Second, to differentiate between individuals with high and non-high scores on self-rating measure of suicide risk (Suicidal Possibility Scale, SPS), we built Support Vector Machines (SVM) models on the Chinese suicide dictionary and the Simplified Chinese Linguistic Inquiry and Word Count (SCLIWC) program, respectively. After that, we made a comparison of the classification performance between two types of SVM models.Results and Discussion.Dictionary-based identifications were significantly correlated with expert ratings in terms of both detecting suicidal expression (r= 0.507) and evaluating individual suicide risk (r= 0.455). For the differentiation between individuals with high and non-high scores on SPS, the Chinese suicide dictionary (t1:F1= 0.48; t2:F1= 0.56) produced a more accurate identification than SCLIWC (t1:F1= 0.41; t2:F1= 0.48) on different observation windows.Conclusions.This paper confirms that, using social media, it is possible to implement real-time monitoring individual suicide risk in population. Results of this study may be useful to improve Chinese suicide prevention programs and may be insightful for other countries.


2016 ◽  
Vol 38 (2) ◽  
pp. 71-79 ◽  
Author(s):  
Fernanda Barcellos Serralta ◽  
John Stuart Ablon

Abstract Introduction: The Psychotherapy Process Q-Set (PQS) prototype method is used to measure the extent to which ideal processes of different psychotherapies are present in real cases, allowing researchers to examine how adherence to these models relates to or predicts change. Results from studies of short-term psychotherapies suggest that the original psychodynamic prototype is more suitable for studying psychoanalysis and long-term psychodynamic psychotherapy than its time-limited counterparts. Furthermore, culture probably influences how therapies are typically conducted in a given country. Therefore, it seems appropriate to develop Brazilian prototypes on which to base studies of short-term psychodynamic and cognitive-behavioral processes in this country. Objective: To develop prototypes for studying processes of short-term psychotherapies and to examine the degree of adherence of two real psychotherapy cases to these models. Methods: Expert clinicians used the PQS to rate a hypothetical ideal session of either short-term psychodynamic psychotherapy (STPP) or cognitive-behavioral therapy (CBT). Ratings were submitted to Q-type factor analysis to confirm the two groups. Regressive factor scores were rank ordered to describe the prototypes. These ideal models were correlated with ratings of actual therapy processes in two complete psychotherapy cases, one STPP and the other CBT. Results: Agreement levels between expert ratings were high and the two ideal models were confirmed. As expected, the PQS ratings for actual STPP and CBT cases had significant correlations with their respective ideal models, but the STPP case also adhered to the CBT prototype. Conclusion: Overall, the findings reveal the adequacy of the prototypes for time-limited therapies, providing initial support of their validity.


2021 ◽  
Author(s):  
Jared C. Allen

In response to concerns that some of the most methodologically rigorous predictive studies of criminal offender characteristics may yet be less generalizable and applicable than advertised or assumed, this research first tests how well seven regression analysis models (represented by 28 equations) predict characteristics across three conditions: familiar cases (used to create the regressions), less familiar cases (native to the sample used to create the regressions) and foreign cases (from a similar but novel sample). Here a linear trend shows overfitting of the models to their own sample: a drop-off in prediction accuracy relative to simple mean-based prediction as cases become more foreign (ηp 2 = .646). In response to hopes that subjective input from expert police investigators could be integrated into the models to correct for this overfitting bias, this research also tests an algorithm combining expert ratings with the regression equations. Here moderate and significant improvement in novel-case prediction is observed overall (p = .036, r = .44) and equations for all twelve expert participants are shown to improve prediction to varying degrees. These results suggest that current best methods would perform poorly in the field, but can be improved by expert insight.


Sign in / Sign up

Export Citation Format

Share Document