Editorial: Research in Real-World Settings: Challenging the Limits of Experimental Design

Author(s):  
Elizabeth McCauley ◽  
Douglas K. Novins
JAMIA Open ◽  
2020 ◽  
Vol 3 (2) ◽  
pp. 243-251
Author(s):  
Vincent J Major ◽  
Neil Jethani ◽  
Yindalon Aphinyanaphongs

Abstract Objective One primary consideration when developing predictive models is downstream effects on future model performance. We conduct experiments to quantify the effects of experimental design choices, namely cohort selection and internal validation methods, on (estimated) real-world model performance. Materials and Methods Four years of hospitalizations are used to develop a 1-year mortality prediction model (composite of death or initiation of hospice care). Two common methods to select appropriate patient visits from their encounter history (backwards-from-outcome and forwards-from-admission) are combined with 2 testing cohorts (random and temporal validation). Two models are trained under otherwise identical conditions, and their performances compared. Operating thresholds are selected in each test set and applied to a “real-world” cohort of labeled admissions from another, unused year. Results Backwards-from-outcome cohort selection retains 25% of candidate admissions (n = 23 579), whereas forwards-from-admission selection includes many more (n = 92 148). Both selection methods produce similar performances when applied to a random test set. However, when applied to the temporally defined “real-world” set, forwards-from-admission yields higher areas under the ROC and precision recall curves (88.3% and 56.5% vs. 83.2% and 41.6%). Discussion A backwards-from-outcome experiment manipulates raw training data, simplifying the experiment. This manipulated data no longer resembles real-world data, resulting in optimistic estimates of test set performance, especially at high precision. In contrast, a forwards-from-admission experiment with a temporally separated test set consistently and conservatively estimates real-world performance. Conclusion Experimental design choices impose bias upon selected cohorts. A forwards-from-admission experiment, validated temporally, can conservatively estimate real-world performance. LAY SUMMARY The routine care of patients stands to benefit greatly from assistive technologies, including data-driven risk assessment. Already, many different machine learning and artificial intelligence applications are being developed from complex electronic health record data. To overcome challenges that arise from such data, researchers often start with simple experimental approaches to test their work. One key component is how patients (and their healthcare visits) are selected for the study from the pool of all patients seen. Another is how the group of patients used to create the risk estimator differs from the group used to evaluate how well it works. These choices complicate how the experimental setting compares to the real-world application to patients. For example, different selection approaches that depend on each patient’s future outcome can simplify the experiment but are impractical upon implementation as these data are unavailable. We show that this kind of “backwards” experiment optimistically estimates how well the model performs. Instead, our results advocate for experiments that select patients in a “forwards” manner and “temporal” validation that approximates training on past data and implementing on future data. More robust results help gauge the clinical utility of recent works and aid decision-making before implementation into practice.


Author(s):  
Martyna Bogacz ◽  
Stephane Hess ◽  
Chiara Calastri ◽  
Charisma F. Choudhury ◽  
Alexander Erath ◽  
...  

The use of virtual reality (VR) in transport research offers the opportunity to collect behavioral data in a controlled dynamic setting. VR settings are useful in the context of hypothetical situations in which real-world data does not exist or in situations which involve risk and safety issues making real-world data collection infeasible. Nevertheless, VR studies can contribute to transport-related research only if the behavior elicited in a virtual environment closely resembles real-world behavior. Importantly, as VR is a relatively new research tool, the best-practice with regards to the experimental design is still to be established. In this paper, we contribute to a better understanding of the implications of the choice of the experimental setup by comparing cycling behavior in VR between two groups of participants in similar immersive scenarios, the first group controlling the maneuvers using a keyboard and the other group riding an instrumented bicycle. We critically compare the speed, acceleration, braking and head movements of the participants in the two experiments. We also collect electroencephalography (EEG) data to compare the alpha wave amplitudes and assess the engagement levels of participants in the two settings. The results demonstrate the ability of VR to elicit behavioral patterns in line with those observed in the real-world and indicate the importance of the experimental design in a VR environment beyond the choice of audio-visual stimuli. The findings will be useful for researchers in designing the experimental setup of VR for behavioral data collection.


2019 ◽  
Vol 41 (4) ◽  
pp. 567-583
Author(s):  
Kim Strandberg ◽  
Janne Berg

This article reports on deliberation between citizens and politicians in a citizens’ forum about closing small schools and building a school center in a Finnish municipality. This real-world policy issue was highly contested at the time of the deliberation. The purpose of the study was to analyze both the magnitude of opinion changes and potential opinion convergence between citizens and politicians. The citizens’ forum used an experimental design whereby half of the groups engaged in discussion under the guidance of a facilitator with discussion rules, and the other half of the groups had no facilitation or rules. The analyses test how potential opinion changes are mediated by how deliberators experience discussion quality and moderated by the type of participant (citizen or politician). The findings show that opinion changes, both regarding magnitude and convergence, are not uniform for politicians and citizens. Moreover, this study shows how different layers of opinions may be affected differently by deliberation.


2019 ◽  
Author(s):  
Vincent J Major ◽  
Neil Jethani ◽  
Yindalon Aphinyanaphongs

AbstractObjectiveThe main criteria for choosing how models are built is the subsequent effect on future (estimated) model performance. In this work, we evaluate the effects of experimental design choices on both estimated and actual model performance.Materials and MethodsFour years of hospital admissions are used to develop a 1 year end-of-life prediction model. Two common methods to select appropriate prediction timepoints (backwards-from-outcome and forwards-from-admission) are introduced and combined with two ways of separating cohorts for training and testing (internal and temporal). Two models are trained in identical conditions, and their performances are compared. Finally, operating thresholds are selected in each test set and applied in a final, ‘real-world’ cohort consisting of one year of admissions.ResultsBackwards-from-outcome cohort selection discards 75% of candidate admissions (n=23,579), whereas forwards-from-admission selection includes many more (n=92,148). Both selection methods produce similar global performances when applied to an internal test set. However, when applied to the temporally defined ‘real-world’ set, forwards-from-admission yields higher areas under the ROC and precision recall curves (88.3 and 56.5% vs. 83.2 and 41.6%).DiscussionA backwards-from-outcome experiment effectively transforms the training data such that it no longer resembles real-world data. This results in optimistic estimates of test set performance, especially at high precision. In contrast, a forwards-from-admission experiment with a temporally separated test set consistently and conservatively estimates real-world performance.ConclusionExperimental design choices impose bias upon selected cohorts. A forwards-from-admission experiment, validated temporally, can conservatively estimate real-world performance.


PLoS ONE ◽  
2021 ◽  
Vol 16 (10) ◽  
pp. e0258442
Author(s):  
Sean C. Epstein ◽  
Timothy J. P. Bray ◽  
Margaret A. Hall-Craggs ◽  
Hui Zhang

This paper proposes a task-driven computational framework for assessing diffusion MRI experimental designs which, rather than relying on parameter-estimation metrics, directly measures quantitative task performance. Traditional computational experimental design (CED) methods may be ill-suited to experimental tasks, such as clinical classification, where outcome does not depend on parameter-estimation accuracy or precision alone. Current assessment metrics evaluate experiments’ ability to faithfully recover microstructural parameters rather than their task performance. The method we propose addresses this shortcoming. For a given MRI experimental design (protocol, parameter-estimation method, model, etc.), experiments are simulated start-to-finish and task performance is computed from receiver operating characteristic (ROC) curves and associated summary metrics (e.g. area under the curve (AUC)). Two experiments were performed: first, a validation of the pipeline’s task performance predictions against clinical results, comparing in-silico predictions to real-world ROC/AUC; and second, a demonstration of the pipeline’s advantages over traditional CED approaches, using two simulated clinical classification tasks. Comparison with clinical datasets validates our method’s predictions of (a) the qualitative form of ROC curves, (b) the relative task performance of different experimental designs, and (c) the absolute performance (AUC) of each experimental design. Furthermore, we show that our method outperforms traditional task-agnostic assessment methods, enabling improved, more useful experimental design. Our pipeline produces accurate, quantitative predictions of real-world task performance. Compared to current approaches, such task-driven assessment is more likely to identify experimental designs that perform well in practice. Our method is not limited to diffusion MRI; the pipeline generalises to any task-based quantitative MRI application, and provides the foundation for developing future task-driven end-to end CED frameworks.


1982 ◽  
Vol 76 (3) ◽  
pp. 561-574 ◽  
Author(s):  
Gary J. Miller ◽  
Joe A. Oppenheimer

Most rational choice theories of committee decision making predict a process of competitive coalition formation leading to a minimum winning coalition. Committee experiments reported to date tend to support these theories. However, both theories and committee experiments are contradicted by the evidence of real-world legislatures making distributive decisions; these decisions are characterized by coalitions of the whole providing virtually all members with a share of distributive benefits. The results in this article help to resolve this contradiction by showing that if the committee experimental design includes a universalistic alternative which provides a high level of expected benefits for committee members, it will be selected. Competitive coalition formation occurs in experimental settings which do not include such an alternative. The results call into question the generality of ordinalist theories of competitive coalition formation.


Author(s):  
Alex Griffiths ◽  
Oliver Shannon ◽  
Jamie Matu ◽  
Roderick King ◽  
Kevin Deighton ◽  
...  

Abstract Background A recent commentary has been published on our meta-analysis, which investigated substrate oxidation during exercise matched for relative intensities in hypoxia compared with normoxia. Within this commentary, the authors proposed that exercise matched for absolute intensities in hypoxia compared with normoxia, should have been included within the analysis, as this model provides a more suitable experimental design when considering nutritional interventions in hypoxia. Main body Within this response, we provide a rationale for the use of exercise matched for relative intensities in hypoxia compared with normoxia. Specifically, we argue that this model provides a physiological stimulus replicable of real world situations, by reducing the absolute workload undertaken in hypoxia. Further, the use of exercise matched for relative intensities isolates the metabolic response to hypoxia, rather than the increased relative exercise intensity experienced in hypoxia when utilising exercise matched for absolute intensities. In addition, we also report previously unpublished data analysed at the time of the original meta-analysis, assessing substrate oxidation during exercise matched for absolute intensities in hypoxia compared with normoxia. Conclusion An increased reliance on carbohydrate oxidation was observed during exercise matched for absolute intensities in hypoxia compared with normoxia. These data now provide a comparable dataset for the use of researchers and practitioners alike in the design of nutritional interventions for relevant populations.


2017 ◽  
Vol 40 ◽  
Author(s):  
David Trafimow ◽  
Yogesh J. Raut

AbstractThis commentary on Jussim (2012) makes two points: (1) Effect sizes often reflect artifacts of experimental design rather than real-world relevance, and (2) any argument dependent on effect sizes must correct for attenuation due to instrument reliabilities. A formula for making this correction is presented, and its ramifications on the debate over accuracy in person perception are discussed.


Author(s):  
Howard Moskowitz ◽  
Voltiza Prendi ◽  
Attila Gere ◽  
Ariola Harizi ◽  
Petraq Papajorgji

Two groups of 51 US respondents each evaluated combinations of statements about the problems and solutions that a country might face. The two studies were run a year apart, May 2019 (before the Covid-19 pandemic) and May 2020 (at the then current height of the pandemic). The problems and solutions were combined by experimental design, creating a unique set of 24 vignettes for each respondent. The responses to the vignettes (negative versus positive outcome, based on the vignette) were deconstructed to the contribution of each of the 16 elements (four problems, 12 solutions). Three mind-sets emerged, based on clustering the pattern of responses to the 16 elements from each of the 100 respondents: MS1–Startups, students; MS2–Change and Investment; MS3–Family social. Each mind-set shows a specific pattern of responses to problems, solutions, and the effect of Covid-19. The granularity afforded by Mind Genomics allows the researcher a new and profoundly deeper understanding of the mind of the citizen, opening a new area of psychological science. The three mind-sets distribute similarly through the population, requiring short intervention, the Personal Viewpoint Identifier, a set of six questions, the pattern of response to which assigns a new person to one of the three mind-sets.


Author(s):  
Jenni M. Buckley ◽  
Jill S. Higginson ◽  
Amy C. Bucha ◽  
Ashu Khandha ◽  
Dawn Elliott ◽  
...  

Undergraduate laboratory exercises in core engineering courses do not always subscribe to the Problem Based Learning approaches advocated by the educational community [1–3]. Common shortcomings include “cookie cutter” labs where students are not engaged in experimental design as well as a general detachment from the “real world” application of the laboratory exercise. Soft skills like proper documentation during experiments as well as scientific writing may also be overlooked in lab curricula. This is unfortunate, as the undergraduate laboratory experience is a perfect opportunity to develop essential research skills as well as inspire and excite students about their chosen field.


Sign in / Sign up

Export Citation Format

Share Document