A Framework for Systematic Assessment of Clinical Trial Population Representativeness Using Electronic Health Records Data

2021 ◽  
Vol 12 (04) ◽  
pp. 816-825
Author(s):  
Yingcheng Sun ◽  
Alex Butler ◽  
Ibrahim Diallo ◽  
Jae Hyun Kim ◽  
Casey Ta ◽  
...  

Abstract Background Clinical trials are the gold standard for generating robust medical evidence, but clinical trial results often raise generalizability concerns, which can be attributed to the lack of population representativeness. The electronic health records (EHRs) data are useful for estimating the population representativeness of clinical trial study population. Objectives This research aims to estimate the population representativeness of clinical trials systematically using EHR data during the early design stage. Methods We present an end-to-end analytical framework for transforming free-text clinical trial eligibility criteria into executable database queries conformant with the Observational Medical Outcomes Partnership Common Data Model and for systematically quantifying the population representativeness for each clinical trial. Results We calculated the population representativeness of 782 novel coronavirus disease 2019 (COVID-19) trials and 3,827 type 2 diabetes mellitus (T2DM) trials in the United States respectively using this framework. With the use of overly restrictive eligibility criteria, 85.7% of the COVID-19 trials and 30.1% of T2DM trials had poor population representativeness. Conclusion This research demonstrates the potential of using the EHR data to assess the clinical trials population representativeness, providing data-driven metrics to inform the selection and optimization of eligibility criteria.

2014 ◽  
Vol 05 (02) ◽  
pp. 463-479 ◽  
Author(s):  
P. Ryan ◽  
Y. Zhang ◽  
F. Liu ◽  
J. Gao ◽  
J.T. Bigger ◽  
...  

SummaryObjective: To improve the transparency of clinical trial generalizability and to illustrate the method using Type 2 diabetes as an example.Methods: Our data included 1,761 diabetes clinical trials and the electronic health records (EHR) of 26,120 patients with Type 2 diabetes who visited Columbia University Medical Center of New-York Presbyterian Hospital. The two populations were compared using the Generalizability Index for Study Traits (GIST) on the earliest diagnosis age and the mean hemoglobin A1c (HbA1c) values.Results: Greater than 70% of Type 2 diabetes studies allow patients with HbA1c measures between 7 and 10.5, but less than 40% of studies allow HbA1c<7 and fewer than 45% of studies allow HbA1c>10.5. In the real-world population, only 38% of patients had HbA1c between 7 and 10.5, with 12% having values above the range and 52% having HbA1c<7. The GIST for HbA1c was 0.51. Most studies adopted broad age value ranges, with the most common restrictions excluding patients >80 or <18 years. Most of the real-world population fell within this range, but 2% of patients were <18 at time of first diagnosis and 8% were >80. The GIST for age was 0.75. Conclusions: We contribute a scalable method to profile and compare aggregated clinical trial target populations with EHR patient populations. We demonstrate that Type 2 diabetes studies are more generalizable with regard to age than they are with regard to HbA1c. We found that the generalizability of age increased from Phase 1 to Phase 3 while the generalizability of HbA1c decreased during those same phases. This method can generalize to other medical conditions and other continuous or binary variables. We envision the potential use of EHR data for examining the generaliz-ability of clinical trials and for defining population-representative clinical trial eligibility criteria.Citation: Weng C, Li Y, Ryan P, Zhang Y, Liu F, Gao J, Bigger JT, Hripcsak G. A distribution-based method for assessing the differences between clinical trial target populations and patient populations in electronic health records. Appl Clin Inf 2014; 5: 463–479 http://dx.doi.org/10.4338/ACI-2013-12-RA-0105


2015 ◽  
Vol 22 (e1) ◽  
pp. e141-e150 ◽  
Author(s):  
Riccardo Miotto ◽  
Chunhua Weng

Abstract Objective To develop a cost-effective, case-based reasoning framework for clinical research eligibility screening by only reusing the electronic health records (EHRs) of minimal enrolled participants to represent the target patient for each trial under consideration. Materials and Methods The EHR data—specifically diagnosis, medications, laboratory results, and clinical notes—of known clinical trial participants were aggregated to profile the “target patient” for a trial, which was used to discover new eligible patients for that trial. The EHR data of unseen patients were matched to this “target patient” to determine their relevance to the trial; the higher the relevance, the more likely the patient was eligible. Relevance scores were a weighted linear combination of cosine similarities computed over individual EHR data types. For evaluation, we identified 262 participants of 13 diversified clinical trials conducted at Columbia University as our gold standard. We ran a 2-fold cross validation with half of the participants used for training and the other half used for testing along with other 30 000 patients selected at random from our clinical database. We performed binary classification and ranking experiments. Results The overall area under the ROC curve for classification was 0.95, enabling the highlight of eligible patients with good precision. Ranking showed satisfactory results especially at the top of the recommended list, with each trial having at least one eligible patient in the top five positions. Conclusions This relevance-based method can potentially be used to identify eligible patients for clinical trials by processing patient EHR data alone without parsing free-text eligibility criteria, and shows promise of efficient “case-based reasoning” modeled only on minimal trial participants.


2020 ◽  
pp. 929-937
Author(s):  
Danielle Potter ◽  
Raven Brothers ◽  
Andrej Kolacevski ◽  
Jacob E. Koskimaki ◽  
Amy McNutt ◽  
...  

PURPOSE ASCO, through its wholly owned subsidiary, CancerLinQ LLC, developed CancerLinQ, a learning health system for oncology. A learning health system is important for oncology patients because less than 5% of patients with cancer enroll in clinical trials, leaving evidence gaps for patient populations not enrolled in trials. In addition, clinical trial populations often differ from the overall cancer population with respect to age, race, performance status, and other clinical parameters. MATERIALS AND METHODS Working with subscribing practices, CancerLinQ accepts data from electronic health records and transforms the local representation of a patient’s care into a standardized representation on the basis of the Quality Data Model from the National Quality Forum. CancerLinQ provides this information back to the subscribing practice through a series of tools that support quality improvement. CancerLinQ also creates de-identified data sets for secondary research use. RESULTS As of March 2020, CancerLinQ includes data from 63 organizations across the United States that use nine different electronic health records. The database includes 1,426,015 patients with a primary cancer diagnosis, of which 238,680 have had additional information abstracted from unstructured content. CONCLUSION As CancerLinQ continues to onboard subscribing practices, the breadth of potential applications for a learning health care system widen. Future practice-facing tools could include real-world data visualization, recommendations for treatment of patients with actionable genetic variations, and identification of patients who may be eligible for clinical trials. Feeding these insights back into oncology practice ensures that we learn how to treat patients with cancer not just on the basis of the selective experience of the 5% that enroll in clinical trials, but from the real-world experience of the entire spectrum of patients with cancer in the United States.


2020 ◽  
Vol 15 (1) ◽  
pp. 5-21
Author(s):  
Konstantinos Vezertzis ◽  
George I. Lambrou ◽  
Dimitrios Koutsouris

Background: According to European legislation, a clinical trial is a research involving patients, which also includes a research end-product. The main objective of the clinical trial is to prove that the research product, i.e. a proposed medication or treatment, is effective and safe for patients. The implementation, development, and operation of a patient database, which will function as a matrix of samples with the appropriate parameterization, may provide appropriate tools to generate samples for clinical trials. Aim: The aim of the present work is to review the literature with respect to the up-to-date progress on the development of databases for clinical trials and patient recruitment using free and open-source software in the field of endocrinology. Methods: An electronic literature search was conducted by the authors from 1984 to June 2019. Original articles and systematic reviews selected, and the titles and abstracts of papers screened to determine whether they met the eligibility criteria, and full texts of the selected articles were retrieved. Results: The present review has indicated that the electronic health records are related with both the patient recruitment and the decision support systems in the domain of endocrinology. The free and open-source software provides integrated solutions concerning electronic health records, patient recruitment, and the decision support systems. Conclusions: The patient recruitment relates closely to the electronic health record. There is maturity at the academic and research level, which may lead to good practices for the deployment of the electronic health record in selecting the right patients for clinical trials.


2020 ◽  
Vol 7 (Supplement_1) ◽  
pp. S819-S820
Author(s):  
Jonathan Todd ◽  
Jon Puro ◽  
Matthew Jones ◽  
Jee Oakley ◽  
Laura A Vonnahme ◽  
...  

Abstract Background Over 80% of tuberculosis (TB) cases in the United States are attributed to reactivation of latent TB infection (LTBI). Eliminating TB in the United States requires expanding identification and treatment of LTBI. Centralized electronic health records (EHRs) are an unexplored data source to identify persons with LTBI. We explored EHR data to evaluate TB and LTBI screening and diagnoses within OCHIN, Inc., a U.S. practice-based research network with a high proportion of Federally Qualified Health Centers. Methods From the EHRs of patients who had an encounter at an OCHIN member clinic between January 1, 2012 and December 31, 2016, we extracted demographic variables, TB risk factors, TB screening tests, International Classification of Diseases (ICD) 9 and 10 codes, and treatment regimens. Based on test results, ICD codes, and treatment regimens, we developed a novel algorithm to classify patient records into LTBI categories: definite, probable or possible. We used multivariable logistic regression, with a referent group of all cohort patients not classified as having LTBI or TB, to identify associations between TB risk factors and LTBI. Results Among 2,190,686 patients, 6.9% (n=151,195) had a TB screening test; among those, 8% tested positive. Non-U.S. –born or non-English–speaking persons comprised 24% of our cohort; 11% were tested for TB infection, and 14% had a positive test. Risk factors in the multivariable model significantly associated with being classified as having LTBI included preferring non-English language (adjusted odds ratio [aOR] 4.20, 95% confidence interval [CI] 4.09–4.32); non-Hispanic Asian (aOR 5.17, 95% CI 4.94–5.40), non-Hispanic black (aOR 3.02, 95% CI 2.91–3.13), or Native Hawaiian/other Pacific Islander (aOR 3.35, 95% CI 2.92–3.84) race; and HIV infection (aOR 3.09, 95% CI 2.84–3.35). Conclusion This study demonstrates the utility of EHR data for understanding TB screening practices and as an important data source that can be used to enhance public health surveillance of LTBI prevalence. Increasing screening among high-risk populations remains an important step toward eliminating TB in the United States. These results underscore the importance of offering TB screening in non-U.S.–born populations. Disclosures All Authors: No reported disclosures


2018 ◽  
Vol 136 (2) ◽  
pp. 164 ◽  
Author(s):  
Michele C. Lim ◽  
Michael V. Boland ◽  
Colin A. McCannel ◽  
Arvind Saini ◽  
Michael F. Chiang ◽  
...  

2015 ◽  
Vol 22 (6) ◽  
pp. 1220-1230 ◽  
Author(s):  
Huan Mo ◽  
William K Thompson ◽  
Luke V Rasmussen ◽  
Jennifer A Pacheco ◽  
Guoqian Jiang ◽  
...  

Abstract Background Electronic health records (EHRs) are increasingly used for clinical and translational research through the creation of phenotype algorithms. Currently, phenotype algorithms are most commonly represented as noncomputable descriptive documents and knowledge artifacts that detail the protocols for querying diagnoses, symptoms, procedures, medications, and/or text-driven medical concepts, and are primarily meant for human comprehension. We present desiderata for developing a computable phenotype representation model (PheRM). Methods A team of clinicians and informaticians reviewed common features for multisite phenotype algorithms published in PheKB.org and existing phenotype representation platforms. We also evaluated well-known diagnostic criteria and clinical decision-making guidelines to encompass a broader category of algorithms. Results We propose 10 desired characteristics for a flexible, computable PheRM: (1) structure clinical data into queryable forms; (2) recommend use of a common data model, but also support customization for the variability and availability of EHR data among sites; (3) support both human-readable and computable representations of phenotype algorithms; (4) implement set operations and relational algebra for modeling phenotype algorithms; (5) represent phenotype criteria with structured rules; (6) support defining temporal relations between events; (7) use standardized terminologies and ontologies, and facilitate reuse of value sets; (8) define representations for text searching and natural language processing; (9) provide interfaces for external software algorithms; and (10) maintain backward compatibility. Conclusion A computable PheRM is needed for true phenotype portability and reliability across different EHR products and healthcare systems. These desiderata are a guide to inform the establishment and evolution of EHR phenotype algorithm authoring platforms and languages.


BMJ Open ◽  
2019 ◽  
Vol 9 (10) ◽  
pp. e031373 ◽  
Author(s):  
Jennifer Anne Davidson ◽  
Amitava Banerjee ◽  
Rutendo Muzambi ◽  
Liam Smeeth ◽  
Charlotte Warren-Gash

IntroductionCardiovascular diseases (CVDs) are among the leading causes of death globally. Electronic health records (EHRs) provide a rich data source for research on CVD risk factors, treatments and outcomes. Researchers must be confident in the validity of diagnoses in EHRs, particularly when diagnosis definitions and use of EHRs change over time. Our systematic review provides an up-to-date appraisal of the validity of stroke, acute coronary syndrome (ACS) and heart failure (HF) diagnoses in European primary and secondary care EHRs.Methods and analysisWe will systematically review the published and grey literature to identify studies validating diagnoses of stroke, ACS and HF in European EHRs. MEDLINE, EMBASE, SCOPUS, Web of Science, Cochrane Library, OpenGrey and EThOS will be searched from the dates of inception to April 2019. A prespecified search strategy of subject headings and free-text terms in the title and abstract will be used. Two reviewers will independently screen titles and abstracts to identify eligible studies, followed by full-text review. We require studies to compare clinical codes with a suitable reference standard. Additionally, at least one validation measure (sensitivity, specificity, positive predictive value or negative predictive value) or raw data, for the calculation of a validation measure, is necessary. We will then extract data from the eligible studies using standardised tables and assess risk of bias in individual studies using the Quality Assessment of Diagnostic Accuracy Studies 2 tool. Data will be synthesised into a narrative format and heterogeneity assessed. Meta-analysis will be considered when a sufficient number of homogeneous studies are available. The overall quality of evidence will be assessed using the Grading of Recommendations, Assessment, Development and Evaluation tool.Ethics and disseminationThis is a systematic review, so it does not require ethical approval. Our results will be submitted for peer-review publication.PROSPERO registration numberCRD42019123898


2018 ◽  
Author(s):  
Kohei Kajiyama ◽  
Hiromasa Horiguchi ◽  
Takashi Okumura ◽  
Mizuki Morita ◽  
Yoshinobu Kano

Sign in / Sign up

Export Citation Format

Share Document