scholarly journals Evaluating a first fully automated interview grounded in Multiple Mini Interview (MMI) methodology: results from a feasibility study

Author(s):  
Alison Callwood ◽  
Lee Gillam ◽  
Angelos Christidis ◽  
Jia Doulton ◽  
Jenny Harris ◽  
...  

AbstractObjectivesGlobal, Covid-driven restrictions around face-to-face interviews for healthcare student selection have forced admissions staff to rapidly adopt adapted online systems before supporting evidence is available. We have developed, what we believe is, the first fully automated interview grounded in Multiple Mini-Interview (MMI) methodology. This study aimed to explore test re-test reliability, acceptability and usability of the system.Design, setting and participantsmixed-methods feasibility study in Physician Associate (PA) programmes from two UK and one US university during 2019 - 2020.Primary, secondary outcomesFeasibility measures (test retest reliability acceptability and usability) were assessed using intra-class correlation (ICC), descriptive statistics, thematic and content analysis.MethodsVolunteers took (T1), then repeated (T2), the automated MMI, with a seven-day interval (+/− 2) then completed an evaluation questionnaire. Admissions staff participated in focus group discussions.ResultsSixty-two students and seven admission staff participated; 34 students and four staff from UK and 28 students and three staff from US universities.Good-excellent test-retest reliability was observed with T1 and T2 ICC between 0.62-0.81 (p<0.001) when assessed by individual total scores (range 80.6-119), station total scores 0.6-0.91, p<0.005, individual site (all ICC≥ 0.76 p<0.001) and mean test retest across sites 0.82 p<0.001 (95% CI 0.7-0.9).Admissions staff reported potential to reduce resource costs and bias through a more objective screening tool for pre-selection or to replace some MMI stations in a ‘hybrid model’. Maintaining human interaction through ‘touch points’ was considered essential.Users positively evaluated the system, stating it was intuitive with an accessible interface. Concepts chosen for dynamic probing needed to be appropriately tailored.ConclusionThese preliminary findings suggest that the system is reliable, generating consistent scores for candidates and is acceptable to end-users provided human touchpoints are maintained. Thus, there is evidence for the potential of such an automated system to augment healthcare student selection processes.

1997 ◽  
Vol 64 (5) ◽  
pp. 270-276 ◽  
Author(s):  
Johanne Desrosiers ◽  
Annie Rochette ◽  
Réjean Hébert ◽  
Gina Bravo

Several dexterity tests have been developed, including the Minnesota Rate of Manipulation Test (MRMT) and a new version, the Minnesota Manual Dexterity Test (MMDT). The objectives of the study were: a) to verify the test-retest reliability of the MMDT; b) to compare the MRMT and the MMDT; c) to study the concurrent validity of the MMDT; and d) to establish reference values for elderly people with the MMDT. Two hundred and forty-seven community-living healthy elderly were evaluated with the MMDT, and two other dexterity tests, the Box and Block Test (BBT) and the Purdue Pegboard (PP). Thirty-five of them were evaluated twice with the MMDT and 44 were evaluated with both the MMDT and MRMT. The results show that the test-retest reliability of the MMDT is acceptable to high (intraclass correlation coefficients of 0.79 to 0.87, depending on the subtest) and the validity of the test is demonstrated by significant correlations between the MMDT, the BBT and the PP (0.63 to 0.67). There is a high correlation (0.85 to 0.95) between the MMDT and the MMRT in spite of different results. The reference values will help occupational therapists to differentiate better between real dexterity difficulties and those that may be attributed to normal aging.


2020 ◽  
Vol 35 (6) ◽  
pp. 848-848
Author(s):  
David C ◽  
Vasserman M ◽  
Brooks B ◽  
Macallister W

Abstract Objective The Grooved Pegboard Test (GPT) is among the most commonly used fine motor tasks, though there is limited data on its basic psychometric properties in children and adolescents with medical conditions. The purpose of this study was to establish test reliability for the GPT within this group. Method Participants (N = 44; 22 males, 22 females) were children and adolescents clinically referred for neuropsychological evaluation. Diagnoses included epilepsy (n = 24), cardiac conditions (n = 13), other (n = 5). Each completed the GPT twice: once in the morning and once in the afternoon, ranging from 64-390 minutes apart (x-=263 min., SD = 60 min.). Spearman correlations assessed test–retest reliability for speed of completion for both dominant (DH) and non-dominant hands (NDH) trials and number of peg drops. Paired sample t-test assessed for practice effects between administrations. Results Ages ranged between 6.11 to 18.10 years (x-=12.52 yrs., SD = 3.19 yrs.). GPT raw scores for first presentation ranged from 25-296 seconds (DH x-=80.91, SD = 25.1; NDH x-=95.34, SD = 49.42). The GPT showed high test–retest reliability for DH (ρ = 0.80, p &lt; 0.001) and NDH (ρ = 0.83, p &lt; 0.001). Number of drops showed non-significant correlations across trials (DH ρ = −0.03, p = 0.87; NDH ρ = 0.11, p = 0.49). Practice effects were identified for the DH (t = −3.25, p = 0.002) but not NDH (t = −1.83, p = 0.074). Conclusion Strong test–retest reliability of the GPT speed of completion in this population supports stability of test results over time, though practice effects are seen at short intervals. Number of pegs dropped, however, lacks sufficient retest reliability and may be of lesser clinical utility. Overall, this study provides increased confidence for continued use of the GPT.


2021 ◽  
Vol 12 ◽  
Author(s):  
Stefania Franja ◽  
Anna E. McCrae ◽  
Tina Jahnel ◽  
Ashley N. Gearhardt ◽  
Stuart G. Ferguson

Objective: Food-related attentional bias has been defined as the tendency to give preferential attention to food-related stimuli. Attentional bias is of interest as studies have found that increased attentional bias is associated with obesity; others, however, have not. A possible reason for mixed results may be that there is no agreed upon measure of attentional bias: studies differ in both measurement and scoring of attentional bias. Additionally, little is known about the stability of attentional bias over time. The present study aims to compare attentional bias measures generated from commonly used attentional bias tasks and scoring protocols, and to test re-test reliability.Methods: As part of a larger study, 69 participants (67% female) completed two food-related visual probe tasks at baseline: lexical (words as stimuli), and pictorial (pictures as stimuli). Reaction time bias scores (attentional bias scores) for each task were calculated in three different ways: by subtracting the reaction times for the trials where probes replaced (1) neutral stimuli from the trials where the probes replaced all food stimuli, (2) neutral stimuli from the trials where probes replaced high caloric food stimuli, and (3) neutral stimuli from low caloric food stimuli. This resulted in three separate attentional bias scores for each task. These reaction time results were then correlated. The pictorial visual probe task was administered a second time 14-days later to assess test-retest reliability.Results: Regardless of the scoring use, lexical attentional bias scores were minimal, suggesting minimal attentional bias. Pictorial task attentional bias scores were larger, suggesting greater attentional bias. The correlation between the various scores was relatively small (r = 0.13–0.20). Similarly, test-retest reliability for the pictorial task was poor regardless of how the test was scored (r = 0.20–0.41).Conclusion: These results suggest that at least some of the variation in findings across attentional bias studies could be due to differences in the way that attentional bias is measured. Future research may benefit from either combining eye-tracking measurements in addition to reaction times.


2009 ◽  
Vol 6 (3) ◽  
pp. 367-373 ◽  
Author(s):  
Gavin R. McCormack ◽  
Alan Shiell ◽  
Patricia K. Doyle-Baker ◽  
Christine Friedenreich ◽  
Bev Sandalack ◽  
...  

Background:Capturing neighborhood-specific physical activity is necessary to advance understanding of the relations between neighborhood walkability and physical activity. This study examined the test–retest reliability of previously developed items (from the Neighborhood Physical Activity Questionnaire) for capturing setting-specific physical activity among Canadian adults.Methods:Randomly sampled adults (N = 117) participated in 2 telephone interviews 2 to 5 days apart. Respondents were asked a series of items capturing frequency and duration of transportation-related walking, recreational walking, and moderate- and vigorous-intensity physical activity undertaken inside and outside the neighborhood in a usual week. The test–test reliability of reported physical activity levels were then examined using intraclass and Spearman’s rank correlations, kappa coefficients, and overall agreement.Results:Participation, frequency, and the duration of transportation-related and recreational walking and vigorous-intensity physical activity inside and outside the neighborhood showed moderate to excellent test–retest reliability. Moderate reliability was found for moderate-intensity physical activity undertaken inside (k = .48; ICC frequency = .38; ICC duration = .39) and outside (k = .51; ICC frequency = .79; ICC duration = .31) the neighborhood.Conclusions:Neighborhood-specific physical activity items administered by telephone interview are reliable and are therefore appropriate for use in future studies examining neighborhood walk-ability and physical activity.


Author(s):  
Jonathan Mak ◽  
Neil Rens ◽  
Dasha Savage ◽  
Helle Nielsen-Bowles ◽  
Doran Triggs ◽  
...  

Abstract Aims  The 6-min-walk test (6MWT) is a validated proxy for frailty and a predictor of clinical outcomes, yet is not widely used due to implementation challenges. This comparative effectiveness study assesses the reliability and repeatability of a home-based 6MWT compared to in-clinic 6MWTs in patients with cardiovascular disease. Methods and results  One hundred and ten (110) patients scheduled for cardiac or vascular surgery were enrolled during a study period from June 2018 to December 2019 at the Palo Alto VA Hospital. Subjects were provided with an Apple iPhone 7 and Apple Watch Series 3 loaded with the VascTrac research study application and performed a supervised in-clinic 6MWT during enrolment, at 2 weeks, 1, 3, and 6 months post-operatively. Subjects also received notifications to perform at-home smartphone-based 6MWTs once a week for a duration of 6 months. Test–retest reliability of in-clinic measurements and at-home measurements was assessed with an industry standard Cronbach’s alpha reliability test. Test–retest reliability for in-clinic ground truth 6MWT steps vs. in-clinic iPhone 6MWT steps was 0·99, showing high reliability between the two tested measurements. When comparing for in-clinic ground truth 6MWT steps vs. neighbouring at-home iPhone 6MWT steps, reliability was 0·74. Conclusion  Running the test–reliability test on both measurements shows that an iPhone 6MWT test is reliable compared to an in-clinic ground truth measurement in patients with cardiovascular disease.


Perception ◽  
10.1068/p7113 ◽  
2012 ◽  
Vol 41 (2) ◽  
pp. 193-203 ◽  
Author(s):  
Roland Weierstall ◽  
Bettina M Pause

A key function of the olfactory system is the detection of differences in odour quality. Therefore, a test was developed to assess odour discrimination ability in normosmic humans. Out of six monomolecular substances (capric acid, coumarin, eugenol, geraniol, phenylethyl alcohol, and vanillin) quaternary mixtures were prepared. Within one item, three odour mixtures were presented (triangle forced-choice procedure). The deviant odour contained the same substances as the two remaining odours; however, the proportions were changed. Study 1 (120 participants) aimed to select items that contribute to a high internal consistency. Study 2 (104 participants) assessed test–retest reliability, parallel test reliability and test validity. Out of 45 items, a 15-item test (Düsseldorf Odour Discrimination Test, DODT) with an internal consistency of 0.61 and medium item difficulties was prepared. The test–retest reliability of the DODT was 0.66 (test interval = 4 weeks) and the parallel test reliability 0.42. The DODT correlated significantly with the University of Pennsylvania Smell Identification Test and to a lesser extent with the phenylethyl alcohol odour threshold test. As the DODT did not correlate with the odour discrimination test of the Sniffin' Sticks, the two tests seem to measure different performances of the olfactory system.


BJGP Open ◽  
2018 ◽  
Vol 2 (1) ◽  
pp. bjgpopen18X101385 ◽  
Author(s):  
Loes J Meijer ◽  
Esther de Groot ◽  
Maarten van Smeden ◽  
François G Schellevis ◽  
Roger AMJ Damoiseaux

BackgroundCollaboration between medical professionals from separate organisations is necessary to deliver good patient care. This care is influenced by professionals’ perceptions about their collaboration. Until now, no instrument to measure such perceptions was available in the Netherlands. A questionnaire developed and validated in Spain was translated to assess perceptions about clinicians’ collaboration in primary and secondary care in the Dutch setting.AimValidation in the Dutch setting of a Spanish questionnaire that aimed to assess perceptions of clinicians about interorganisational collaboration.Design & settingAfter translation, cultural adaptation, and pre-testing, the questionnaire was sent to GPs and secondary care clinicians (SCCs) in three regions in the Netherlands. The responses of 445 responders were used to assess the validity and reliability of the questionnaire.MethodA confirmatory factor analysis (CFA) and an exploratory factor analysis (EFA) were performed to study the construct validity of the hypothesised factor model underlying the questionnaire. Test-retest reliability was evaluated using weighted Kappa statistics.ResultsResults of the CFA indicated poor fit of the hypothesised factor structure. EFA, executed separately for each region, showed a highly unstable factor structure. The test-retest reliability analysis demonstrated low re-test reliability.ConclusionThe underlying factor structure of a Spanish questionnaire could not be reproduced. The construct validity and reliability of this questionnaire were insufficient to warrant use in the Dutch setting. This study demonstrates the need for evaluating validity and reliability of questionnaires in local settings.


2020 ◽  
Vol 35 (6) ◽  
pp. 1036-1036
Author(s):  
Chaudhary Z ◽  
Hubley A

Abstract Objective Reliability and validity evidence related to the Five Point Test (FPT) scores is severely limited. The primary purpose of this study was to examine psychometric evidence related to two commonly used FPT scores (number of unique designs (UD), percentage of repetitions (PR)) using one-week test–retest reliability, correlations with demographic and neuropsychological variables, and convergent validity in line with a regression-based explanation-focused view of validity. Methods The sample consisted of 86 cognitively intact, non-depressed adult men and women ages 21–82 years (M = 52.7, SD = 17.7) with 7–21 years of education (M = 14.2, SD = 3.13) recruited from the general community and tested individually. Results UD ranged from 8–60 (M = 35.4) and PR ranged from 0–45% (M = 6.9%). Test–retest coefficients were .83 for UD but only .43 for PR. Age was significantly correlated with UD (r = −.59) and PR (r = .23). Education was significantly correlated with UD (r = .26) but not PR (r = −.10). There were no gender differences. UD showed significant bivariate correlations with WAIS-III Block Design, Trail-Making Test (TMT) A, TMT-B, Bicycle Drawing Test, and FAS Verbal Fluency but, together in a regression, only age and TMT-B remained significant. PR scores did not correlate significantly with any neuropsychological variables. Conclusion UD showed strong test–retest reliability. UD performance tends to be poorer with older age and less education. The meaning and interpretation of UD performance using a regression-based explanation-focused view of validity will be discussed. PR reliability is poor even over a short interval and attenuates subsequent statistical findings. Use of PR is not recommended in research or practice.


2011 ◽  
Vol 91 (1) ◽  
pp. 102-113 ◽  
Author(s):  
Abigail L. Leddy ◽  
Beth E. Crowner ◽  
Gammon M. Earhart

Background Gait impairments, balance impairments, and falls are prevalent in individuals with Parkinson disease (PD). Although the Berg Balance Scale (BBS) can be considered the reference standard for the determination of fall risk, it has a noted ceiling effect. Development of ceiling-free measures that can assess balance and are good at discriminating “fallers” from “nonfallers” is needed. Objective The purpose of this study was to compare the Functional Gait Assessment (FGA) and the Balance Evaluation Systems Test (BESTest) with the BBS among individuals with PD and evaluate the tests' reliability, validity, and discriminatory sensitivity and specificity for fallers versus nonfallers. Design This was an observational study of community-dwelling individuals with idiopathic PD. Methods The BBS, FGA, and BESTest were administered to 80 individuals with PD. Interrater reliability (n=15) was assessed by 3 raters. Test-retest reliability was based on 2 tests of participants (n=24), 2 weeks apart. Intraclass correlation coefficients (2,1) were used to calculate reliability, and Spearman correlation coefficients were used to assess validity. Cutoff points, sensitivity, and specificity were based on receiver operating characteristic plots. Results Test-retest reliability was .80 for the BBS, .91 for the FGA, and .88 for the BESTest. Interrater reliability was greater than .93 for all 3 tests. The FGA and BESTest were correlated with the BBS (r=.78 and r=.87, respectively). Cutoff scores to identify fallers were 47/56 for the BBS, 15/30 for the FGA, and 69% for the BESTest. The overall accuracy (area under the curve) for the BBS, FGA, and BESTest was .79, .80, and .85, respectively. Limitations Fall reports were retrospective. Conclusion Both the FGA and the BESTest have reliability and validity for assessing balance in individuals with PD. The BESTest is most sensitive for identifying fallers.


Author(s):  
Giulia Gagliardini ◽  
Laura Gatti ◽  
Antonello Colli

The aim of this study was to provide data on the Inter-Rater Reliability (IRR) and the test-retest reliability of the Mentalization Imbalances Scale (MIS) and the Modes of Mentalization Scale (MMS) in two different studies. Three junior raters and two senior raters assessed blindly 15 session transcripts of psychotherapy of five patients, using both the MIS and the MMS. The same 15 sessions were rated after the junior raters completed a training at the use of the scales and after on month from the end of the training to assess testretest reliability. Four therapists used the MIS and the MMS to provide different ratings of 22 patients undergoing a psychotherapy in different settings. Intraclass Correlation Coefficient (ICC) values ranged from sufficient to good and increased after the training. Test re-test reliability was sufficient for both scales (Study 1). ICC values ranged from sufficient to good, and were globally higher than the ones found in the first study sample (Study 2). Our results provide support to the inter-rater reliability of the MIS and the MMS.


Sign in / Sign up

Export Citation Format

Share Document