scholarly journals Examining the Validity of Adaptive Comparative Judgment for Peer Evaluation in a Design Thinking Course

2021 ◽  
Vol 6 ◽  
Author(s):  
Nathan Mentzer ◽  
Wonki Lee ◽  
Scott Ronald Bartholomew

Adaptive comparative judgment (ACJ) is a holistic judgment approach used to evaluate the quality of something (e.g., student work) in which individuals are presented with pairs of work and select the better item from each pair. This approach has demonstrated high levels of reliability with less bias than other approaches, hence providing accurate values in summative and formative assessment in educational settings. Though ACJ itself has demonstrated significantly high reliability levels, relatively few studies have investigated the validity of peer-evaluated ACJ in the context of design thinking. This study explored peer-evaluation, facilitated through ACJ, in terms of construct validity and criterion validity (concurrent validity and predictive validity) in the context of a design thinking course. Using ACJ, undergraduate students (n = 597) who took a design thinking course during Spring 2019 were invited to evaluate design point-of-view (POV) statements written by their peers. As a result of this ACJ exercise, each POV statement attained a specific parameter value, which reflects the quality of POV statements. In order to examine the construct validity, researchers conducted a content analysis, comparing the contents of the 10 POV statements with highest scores (parameter values) and the 10 POV statements with the lowest scores (parameter values)—as derived from the ACJ session. For the criterion validity, we studied the relationship between peer-evaluated ACJ and grader’s rubric-based grading. To study the concurrent validity, we investigated the correlation between peer-evaluated ACJ parameter values and grades assigned by course instructors for the same POV writing task. Then, predictive validity was studied by exploring if peer-evaluated ACJ of POV statements were predictive of students’ grades on the final project. Results showed that the contents of the statements with the highest parameter values were of better quality compared to the statements with the lowest parameter values. Therefore, peer-evaluated ACJ showed construct validity. Also, though peer-evaluated ACJ did not show concurrent validity, it did show moderate predictive validity.

Author(s):  
Yannik Terhorst ◽  
Paula Philippi ◽  
Lasse Sander ◽  
Dana Schultchen ◽  
Sarah Paganini ◽  
...  

BACKGROUND Mobile health apps (MHA) have the potential to improve health care. The commercial MHA market is rapidly growing, but the content and quality of available MHA are unknown. Consequently, instruments of high psychometric quality for the assessment of the quality and content of MHA are highly needed. The Mobile Application Rating Scale (MARS) is one of the most widely used tools to evaluate the quality of MHA in various health domains. Only few validation studies investigating its psychometric quality exist with selected samples of MHAs. No study has evaluated the construct validity of the MARS and concurrent validity to other instruments. OBJECTIVE This study evaluates the construct validity, concurrent validity, reliability, and objectivity, of the MARS. METHODS MARS scoring data was pooled from 15 international app quality reviews to evaluate the psychometric properties of the MARS. The MARS measures app quality across four dimensions: engagement, functionality, aesthetics and information quality. App quality is determined for each dimension and overall. Construct validity was evaluated by assessing related competing confirmatory models that were explored by confirmatory factor analysis (CFA). A combination of non-centrality (RMSEA), incremental (CFI, TLI) and residual (SRMR) fit indices was used to evaluate the goodness of fit. As a measure of concurrent validity, the correlations between the MARS and 1) another quality assessment tool called ENLIGHT, and 2) user star-rating extracted from app stores were investigated. Reliability was determined using Omega. Objectivity was assessed in terms of intra-class correlation. RESULTS In total, MARS ratings from 1,299 MHA covering 15 different health domains were pooled for the analysis. Confirmatory factor analysis confirmed a bifactor model with a general quality factor and an additional factor for each subdimension (RMSEA=0.074, TLI=0.922, CFI=0.940, SRMR=0.059). Reliability was good to excellent (Omega 0.79 to 0.93). Objectivity was high (ICC=0.82). The overall MARS rating was positively associated with ENLIGHT (r=0.91, P<0.01) and user-ratings (r=0.14, P<0.01). CONCLUSIONS he psychometric evaluation of the MARS demonstrated its suitability for the quality assessment of MHAs. As such, the MARS could be used to make the quality of MHA transparent to health care stakeholders and patients. Future studies could extend the present findings by investigating the re-test reliability and predictive validity of the MARS.


1985 ◽  
Vol 56 (1) ◽  
pp. 255-259 ◽  
Author(s):  
Virginia P. Richmond ◽  
Fran Dickson-Markman

The predictive validity of the Writing Apprehension Test developed by J. A. Daly and M. D. Miller was examined. Study 1 indicated that the test is a meaningful predictor of students' American College Test scores in general and particulatly of English scores. This predictive power remained after removing variance attributable to test anxiety. Study 2 showed that apprehension about writing significantly predicted quality of message while test anxiety did not. Students scoring high reported more state anxiety about writing than those with moderate or low apprehension about writing. These studies support the predictive validity of the Writing Apprehension Test, but raise a question concerning its construct validity.


1981 ◽  
Vol 41 (4) ◽  
pp. 1215-1222
Author(s):  
Barbara S. Plake ◽  
Elizabeth P. Smith ◽  
Don C. Damsteegt

Since 1960 educational researchers have been using the Achievement Anxiety Test (AAT), which presumably measures independent aspects of achievement anxiety: facilitating and debilitating components. In several earlier investigations, developers have reported construct and predictive validity of the subscales from their use of anxiety measures and performance criteria. However, the hypothesized independence of the subscales and underlying factor structure of the test have been overlooked. Because the two subscales have been used to provide a difference score, the status of the interrelationship between the subscales is of concern. This study was concerned with the concurrent validity of the AAT and with its factor structure to provide empirical evidence about the quality of AAT.


Animals ◽  
2020 ◽  
Vol 10 (12) ◽  
pp. 2382
Author(s):  
Rebecca L. Hunt ◽  
Gary C. W. England ◽  
Lucy Asher ◽  
Helen Whiteside ◽  
Naomi D. Harvey

Working dog organisations regularly assess the behaviour of puppies to monitor progression. Here, we tested the predictive validity (for predicting success in guide dog training) of a shortened version of a previously developed juvenile dog behaviour questionnaire (the refined puppy walker questionnaire, r-PWQ) and compared it with the Canine Behavioral Assessment and Research Questionnaire (C-BARQ). The r-PWQ is used by Guide Dogs UK, whereas the C-BARQ was designed for pet dogs and is used by some other guide dog schools internationally. A cohort of dogs aged eight months (n = 359) were scored concurrently on the r-PWQ and C-BARQ. Analogous traits between the questionnaires were evaluated for internal consistency and association with training outcome and compared for concurrent validity. The r-PWQ was associated with training outcome for five scales (r-Excitability, Trainability, Animal Chase, r-Attachment and attention seeking and Distractibility) and the C-BARQ for two scales (Excitability and Separation-related behaviour). There were significant correlations between analogous C-BARQ and r-PWQ trait scores (p < 0.001) except for Separation-related behaviour and questionnaire scales had similar internal consistencies. The r-PWQ may be more suitable to use with guide dog schools. However, due to the correlation between analogous scales (except for “Distractibility”) some scales could be substituted for one another when reviewing the behaviour of dogs between guide dog schools using different questionnaires.


Crisis ◽  
2013 ◽  
Vol 34 (1) ◽  
pp. 13-21 ◽  
Author(s):  
Philip J. Batterham ◽  
Alison L. Calear ◽  
Helen Christensen

Background: There are presently no validated scales to adequately measure the stigma of suicide in the community. The Stigma of Suicide Scale (SOSS) is a new scale containing 58 descriptors of a “typical” person who completes suicide. Aims: To validate the SOSS as a tool for assessing stigma toward suicide, to examine the scale’s factor structure, and to assess correlates of stigmatizing attitudes. Method: In March 2010, 676 staff and students at the Australian National University completed the scale in an online survey. The construct validity of the SOSS was assessed by comparing its factors with factors extracted from the Suicide Opinion Questionnaire (SOQ). Results: Three factors were identified: stigma, isolation/depression, and glorification/normalization. Each factor had high internal consistency and strong concurrent validity with the Suicide Opinion Questionnaire. More than 25% of respondents agreed that people who suicided were “weak,” “reckless,” or “selfish.” Respondents who were female, who had a psychology degree, or who spoke only English at home were less stigmatizing. A 16-item version of the scale also demonstrated robust psychometric properties. Conclusions: The SOSS is the first attitudes scale designed to directly measure the stigma of suicide in the community. Results suggest that psychoeducation may successfully reduce stigma.


2018 ◽  
Author(s):  
Camilla Kao ◽  
Che-I Kao ◽  
Russell Furr

In science, safety can seem unfashionable. Satisfying safety requirements can slow the pace of research, make it cumbersome, or cost significant amounts of money. The logic of rules can seem unclear. Compliance can feel like a negative incentive. So besides the obvious benefit that safety keeps one safe, why do some scientists preach "safe science is good science"? Understanding the principles that underlie this maxim might help to create a strong positive incentive to incorporate safety into the pursuit of groundbreaking science.<div><br></div><div>This essay explains how safety can enhance the quality of an experiment and promote innovation in one's research. Being safe induces a researcher to have <b>greater control</b> over an experiment, which reduces the <b>uncertainty</b> that characterizes the experiment. Less uncertainty increases both <b>safety</b> and the <b>quality</b> of the experiment, the latter including <b>statistical quality</b> (reproducibility, sensitivity, etc.) and <b>countless other properties</b> (yield, purity, cost, etc.). Like prototyping in design thinking and working under the constraint of creative limitation in the arts, <b>considering safety issues</b> is a hands-on activity that involves <b>decision-making</b>. Making decisions leads to new ideas, which spawns <b>innovation</b>.</div>


SLEEP ◽  
2021 ◽  
Vol 44 (Supplement_2) ◽  
pp. A201-A202
Author(s):  
Kristina Puzino ◽  
Susan Calhoun ◽  
Allison Harvey ◽  
Julio Fernandez-Mendoza

Abstract Introduction The Sleep Inertia Questionnaire (SIQ) was developed and validated in patients with mood disorders to evaluate difficulties with becoming fully awake after nighttime sleep or daytime naps in a multidimensional manner. However, few data are available regarding its psychometric properties in clinical samples with sleep disorders. Methods 211 patients (43.0±16.4 years old, 68% female, 17% minority) evaluated at the Behavioral Sleep Medicine (BSM) program of Penn State Health Sleep Research & Treatment Center completed the SIQ. All patients were diagnosed using ICSD-3 criteria, with 111 receiving a diagnosis of chronic insomnia disorder (CID), 48 of a central disorder of hypersomnolence (CDH), and 52 of other sleep disorders (OSD). Structural equation modelling was used to conduct confirmatory factor analysis (CFA) of the SIQ. Results CFA supported four SIQ dimensions of “physiological”, “cognitive”, “emotional” and “response to” (RSI) sleep inertia with adequate goodness-of-fit (TLI=0.90, CFI=0.91, GFI=0.85, RMSEA=0.08). Internal consistency was high (α=0.94), including that of its dimensions (physiological α=0.89, cognitive α=0.94, emotional α=0.67, RSI α=0.78). Dimension inter-correlations were moderate to high (r=0.42–0.93, p&lt;0.01), indicating good construct validity. Convergent validity showed moderate correlations with Epworth sleepiness scale (ESS) scores (r=0.38) and large correlations with Flinders fatigue scale (FFS) scores (r=0.65). Criterion validity showed significantly (p&lt;0.01) higher scores in subjects with CDH (69.0±16.6) as compared to those with CID (54.4±18.3) or OSD (58.5±20.0). A SIQ cut-off score ≥57.5 provided a sensitivity/specificity of 0.77/0.65, while a cut-off score ≥61.5 provided a sensitivity/specificity of 0.71/0.70 to identify CDH vs. ESS&lt;10 (AUC=0.76). Conclusion The SIQ shows satisfactory indices of reliability and construct validity in a clinically-diverse sleep disorders sample. Its criterion validity is supported by its divergent association with hypersomnia vs. insomnia disorders, as well as its adequate sensitivity/specificity to identify patients with CDH. The SIQ can help clinicians easily assess the complex dimensionality of sleep inertia and target behavioral sleep treatments. Future studies should confirm the best SIQ cut-off score by including good sleeping controls, while clinical studies should determine its minimal clinically important difference after pharmacological or behavioral treatments. Support (if any):


2021 ◽  
Vol 11 (15) ◽  
pp. 6955
Author(s):  
Andrzej Rysak ◽  
Magdalena Gregorczyk

This study investigates the use of the differential transform method (DTM) for integrating the Rössler system of the fractional order. Preliminary studies of the integer-order Rössler system, with reference to other well-established integration methods, made it possible to assess the quality of the method and to determine optimal parameter values that should be used when integrating a system with different dynamic characteristics. Bifurcation diagrams obtained for the Rössler fractional system show that, compared to the RK4 scheme-based integration, the DTM results are more resistant to changes in the fractionality of the system.


2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Daren K. Heyland ◽  
J. Paige Pope ◽  
Xuran Jiang ◽  
Andrew G. Day

Abstract Background People are living longer than ever before. However, with living longer comes increased problems that negatively impact on quality of life and the quality of death. Tools are needed to help individuals assess whether they are practicing the best attitudes and behaviors that are associated with a future long life, high quality of life, high quality of death and a satisfying post-death legacy. The purpose of paper is to describe the process we used to develop a novel questionnaire (“Preparedness for the Future Questionnaire™ or Prep FQ”) and to define its psychometric properties. Methods Using a multi-step development procedure, items were generated, for the new questionnaire after which the psychometric properties were tested with a heterogeneous sample of 502 Canadians. Using an online polling panel, respondents were asked to complete demographic questions as well as the Prep-FQ, Global Rating of Life Satisfaction, the Keyes Psychological Well-Being scale and the Short-Form 12. Results The final version of the questionnaire contains 34 items in 8 distinct domains (“Medico-legal”, “Social”, “Psychological Well-being”, “Planning”, “Enrichment”, “Positive Health Behaviors”, “Negative Health Behaviors”, and “Late-life Planning”). We observed minimum missing data and good usage of all response options. The average overall Prep FQ score is 51.2 (SD = 13.3). The Cronbach alphas assessing internal reliability for the Prep FQ domains ranged from 0.33 to 0.88. The intra-class correlation coefficient (ICC) used to assess the test–retest reliability had an overall score of 0.87. For the purposes of establishing construct validity, all the pre-specified relationships between Prep FQ and the other questionnaires were met. Conclusion Analyses of this novel measure offered support for its face validity, construct validity, test–retest reliability, and internal consistency. With the development of this useful and valid scale, future research can utilize this measure to engage people in the process of comprehensively assessing and improving their state of preparedness for the future, tracking their progress along the way. Ultimately, this program of research aims to improve the quality and quantity of peoples live by helping them ‘think ahead’ and ‘plan ahead’ on the aspects of their daily life that matter to their future.


2020 ◽  
Vol 30 (Supplement_5) ◽  
Author(s):  
I Houkes ◽  
E Hazelzet ◽  
P Mignon ◽  
A de Rijk

Abstract Background Sustainable employability (SE) is top priority. However, employers find it difficult to develop SE interventions. Measures based on the employee perspective of SE that would give direction to interventions, currently fall short, particularly for the understudied group of employees with lower levels of education (1/3 of the Dutch labor population). Earlier, the Maastricht Instrument for SE (MAISE-NL) was developed and validated in a sample of high-educated employees. This study aims to adjust and validate MAISE-NL for use among Lower Educated employees (MAISE-LE). Methods By means of focus groups consisting of employees with lower levels of education, items and response categories of MAISE have been aligned with the perceptions of these employees. Other items from subscales such as job control, self-efficacy and lifestyle were added. Language was checked for clarity and ambiguity. A questionnaire containing these items, as well as proxy variables (health and vitality) and demographics, was answered online by 944 lower educated employees from five organizations (response rates 44-64%). Construct validity, reliability and criterion validity were tested through PCA, CFA, Cronbach's alpha and correlations. Results MAISE-LE comprises 10 scales divided over four areas: (1) Level of SE; (2) Factors affecting my SE; (3) Overall responsibility for SE; and (4) Responsibility for factors affecting my SE. Preliminary results indicate that reliability, construct and criterion validity were adequate to good. Employees' SE was moderate to high, and was generally considered a shared responsibility of employee and employer. The latter varies per factor though. Employees wish to participate more in decisions regarding their work. Conclusions The MAISE-LE appeared to be reliable and valid. We recommend that employers use the MAISE-LE as a needs assessment in order to develop SE interventions that will be readily accepted and effective for employees with lower levels of education. Key messages MAISE-LE (Maastricht Instrument for Sustainable Employability) is a new instrument for measuring SE and the responsibility for SE from the perspective of employees with lower levels of education. The MAISE-LE will facilitate employers in the development of effective SE interventions, which align with the needs of this vulnerable group of employees.


Sign in / Sign up

Export Citation Format

Share Document