Representing cancer clinical trial criteria and attributes using ontologies: An NLP-assisted approach.

2020 ◽  
Vol 38 (15_suppl) ◽  
pp. e14079-e14079
Author(s):  
Kyeryoung Lee ◽  
Zhongzhi Liu ◽  
Meng Ma ◽  
Chris Gilman ◽  
Yun Mai ◽  
...  

e14079 Background: Low patient recruitment is one of the main reasons clinical trials fail. Identifying eligible patients for clinical trials using electric health records (EHRs) can help reach accrual targets. Ontology reasoning implemented in Trial2Patient, a scalable system we developed for matching patient to clinical trials, forms the basis for generating patient cohorts in our system. For efficient cohort definition, an attribute ontology for eligibility criteria and entity categorization is a necessary first step. To meet this requirement, we constructed an ontology platform for lung cancer trials. Methods: We classified 128 non-small cell lung cancer and 38 small cell lung cancer trials into different therapy groups. Among the 166 trials we examined, 110 were immuno-oncology therapy-based, 48 were targeted therapy-based, and 8 were chemotherapy or device trials. We analyzed the eligibility criteria for each trial manually to identify entities from all trials as well as indication specific and further therapy group specific entities. To incorporate a semi-automated, natural language process (NLP)-assisted named entity recognition (NER) into the future cohort definition process, we trained NLP and deep learning models for NER and ontology encoding. Attributes generated from 50 processed NSCLC trials were evaluated with our manually curated attributes. The ontology generated from lung cancer was tested in 74 prostate cancer trials for generalizability. Results: The ontology for lung cancer trials, which is generalizable to prostate cancer and other cancer clinical trials, were constructed. Total 507 attributes were extracted and entities were categorized into 8 groups. Evaluation of attributes generated by NLP and deep learning models compared with manually extracted attributes showed high consistency and accuracy. The average precision, recall and F1 values of 15 most commonly appearing entities (disease, histology, targeted therapy, immunotherapy, radiotherapy, neoadjuvant therapy, age, gender, test, vitals, value, drug, gene, mutation, problem) are 0.873, 0.769, and 0.805, respectively. Conclusions: We contribute to a clinical trial ontology platform for lung cancer and prostate cancer trial recruitment. This ontology platform can be expanded to other solid tumors or hematologic malignancies for clinical trial analysis, and can also be applied to generate synthetic control arm cohorts. We believe NLP-assisted NER can be successfully incorporated for the future work of large scale of clinical trial cohort definition.

2018 ◽  
Vol 25 (4) ◽  
Author(s):  
K. Al-Baimani ◽  
H. Jonker ◽  
T. Zhang ◽  
G.D. Goss ◽  
S.A. Laurie ◽  
...  

Background Advanced non-small-cell lung cancer (nsclc) represents a major health issue globally. Systemic treatment decisions are informed by clinical trials, which, over years, have improved the survival of patients with advanced nsclc. The applicability of clinical trial results to the broad lung cancer population is unclear because strict eligibility criteria in trials generally select for optimal patients.Methods We performed a retrospective chart review of all consecutive patients with advanced nsclc seen in outpatient consultation at our academic institution between September 2009 and September 2012, collecting data about patient demographics and cancer characteristics, treatment, and survival from hospital and pharmacy records. Two sets of arbitrary trial eligibility criteria were applied to the cohort. Scenario A stipulated Eastern Cooperative Oncology Group performance status (ecog ps) 0–1, no brain metastasis, creatinine less than 120 μmol/L, and no second malignancy. Less-strict scenario B stipulated ecog ps 0–2 and creatinine less than 120 μmol/L. We then used the two scenarios to analyze treatment and survival of patients by trial eligibility status.Results The 528 included patients had a median age of 67 years, with 55% being men and 58% having adenocarcinoma. Of those 528 patients, 291 received at least 1 line of palliative systemic therapy. Using the scenario A eligibility criteria, 73% were trial-ineligible. However, 46% of “ineligible” patients actually received therapy and experienced survival similar to that of the “eligible” treated patients (10.2 months vs. 11.6 months, p = 0.10). Using the scenario B criteria, only 35% were ineligible, but again, the survival of treated patients was similar in the ineligible and eligible groups (10.1 months vs. 10.9 months, p = 0.57).Conclusions Current trial eligibility criteria are often strict and limit the enrolment of patients in clinical trials. Our results suggest that, depending on the chosen drug, its toxicities and tolerability, eligibility criteria could be carefully reviewed and relaxed.


2020 ◽  
Vol 38 (15_suppl) ◽  
pp. e14054-e14054
Author(s):  
Yun Mai ◽  
Kyeryoung Lee ◽  
Zongzhi Liu ◽  
Zhiqiang Li ◽  
Scott Jones ◽  
...  

e14054 Background: Matching clinical attributes (i.e. indications, lab tests, treatment regimens) of clinical trial eligibility criteria with real world patient data is extremely challenging. Attribute phenotyping is one of the key components of Trial2Patient, a customized system developed by Sema4 to find patients for clinical trials. Transforming treatment regimens to a standard ontology and encoding drugs with standard nomenclatures will facilitate the semantic retrieval of treatments mentioned in clinical trial criteria. This will also enable the interoperation between different data sources that is often required for fast-learning and scalable healthcare information system. Methods: Free text containing treatment regimen/medication terms were extracted and preprocessed from three sources: 1) clinical trials listed in a commercial database citeline.com, 2) clinical trials listed in clinicaltrials.gov, and 3) National Comprehensive Cancer Network (NCCN) Guidelines. The regimen terms such as neoadjuvant therapy for non-small cell lung cancer, checkpoint inhibitor, EGFR inhibitor, androgen deprivation therapy (ADT), among many others, were profiled by AI methods (i.e. pattern reorganization and rule-based) and knowledge engineering via Sema4’s in-house knowledge base (CAV), Pharmaprojects in citline.com and NCCN Guidelines. The drugs related to each regimen were identified and mapped to RxCUI via RxNorm ontology. Results: We identified 76 regimen terms for non-small cell lung cancer (NSCLC), small cell lung cancer (SCLC) and prostate cancer (e.g. PD-L1 ≥1% nonsquamous NSCLC, bone antiresorptive therapy for M1 castration resistant prostate cancer), and 14,476 drug-category pair (e.g. pembrolizumab is a PD-1 inhibitor, pembrolizumab is used as the third line and beyond systemic therapy for M1 CRPC). All drugs identified were mapped to RxCUI for real world patient matching. Conclusions: This approach systematically extracted and normalized regimens and medications mentioned in clinical trials in NSCLC, SCLC and prostate cancer to standard codes. These standardized data can be used in mapping treatment histories of a patient to the eligibility criteria for clinical studies or for identifying studies relevant to a patient. The outcome of profiling cancer treatment regimens through standard ontology RxNorm may be particularly valuable in cancer studies based on real-world evidence.


2021 ◽  
Vol 39 (15_suppl) ◽  
pp. 6592-6592
Author(s):  
Yun Mai ◽  
Kyeryoung Lee ◽  
Zongzhi Liu ◽  
Meng Ma ◽  
Christopher Gilman ◽  
...  

6592 Background: Clinical trial phenotyping is the process of extracting clinical features and patient characteristics from eligibility criteria. Phenotyping is a crucial step that precedes automated cohort identification from patient electronic health records (EHRs) against trial criteria. We establish a clinical trial phenotyping pipeline to transform clinical trial eligibility criteria into computable criteria and enable high throughput cohort selection in EHRs. Methods: Formalized clinical trial criteria attributes were acquired from a natural-language processing (NLP)-assisted approach. We implemented a clinical trial phenotyping pipeline that included three components: First, a rule-based knowledge engineering component was introduced to annotate the trial attributes into a computable and customizable granularity from EHRs. The second component involved normalizing annotated attributes using standard terminologies and pre-defined reference tables. Third, a knowledge base of computable criteria attributes was built to match patients to clinical trials. We evaluated the pipeline performance by independent manual review. The inter-rater agreement of the annotation was measured on a random sample of the knowledge base. The accuracy of the pipeline was evaluated on a subset of randomly selected matched patients for a subset of randomly selected attributes. Results: Our pipeline phenotyped 2954 clinical trials from five cancer types including Non-Small Cell Lung Cancer, Small Cell Lung Cancer, Prostate Cancer, Breast Cancer, and Multiple Myeloma. We built a knowledge base of 256 computable attributes that included comorbidities, comorbidity-related treatment, previous lines of therapy, laboratory tests, and performance such as ECOG and Karnofsky score. Among 256 attributes, 132 attributes were encoded using standard terminologies and 124 attributes were normalized to customized concepts. The inter-rater agreement of the annotation measured by Cohen’s Kappa coefficient was 0.83. We applied the knowledge base to our EHRs and efficiently identified 33258 potential subjects for cancer clinical trials. Our evaluation on the patient matching indicated the F1 score was 0.94. Conclusions: We established a clinical trial phenotyping pipeline and built a knowledge base of computable criteria attributes that enabled efficient screening of EHRs for patients meeting clinical trial eligibility criteria, providing an automated way to efficiently and accurately identify clinical trial cohorts. The application of this knowledge base to patient matching from EHR data across different institutes demonstrates its generalization capability. Taken together, this knowledge base will be particularly valuable in computer-assisted clinical trial subject selection and clinical trial protocol design in cancer studies based on real-world evidence.


2021 ◽  
Vol 39 (28_suppl) ◽  
pp. 59-59
Author(s):  
Woojung Lee ◽  
Scott Spencer ◽  
Josh John Carlson ◽  
Tam Dinh ◽  
Victoria Dayer ◽  
...  

59 Background: The use of comprehensive genomic profiling (CGP) in cancer patients could lead to additional enrollment in clinical trials that study novel genetic biomarkers, potentially reducing treatment costs for payers and improving health outcomes for patients. Our objective was to estimate the number of additional clinical trials in which patients with non-small cell lung cancer (NSCLC) could potentially enroll due to the use of CGP vs. a comparator panel of 50 genes or less. Methods: Clinical trials in NSCLC that started between 2015 - 2020 were identified from the Aggregate Analysis of ClinicalTrials.gov (AACT) database. Trials with unknown status or study sites outside the United States only were excluded. We abstracted information on required genetic alterations based on the study eligibility criteria. We calculated the incremental number of trials available to patients due to results generated by CGP (FoundationOne CDx, 324 genes) vs. a commercially available comparator panel that was 50 genes or less (Oncomine Dx Target Test, 23 genes) by phase and calendar year. The additional trials were characterized by disease severity, type of therapy, and setting. Results: Enrollment eligibility was dependent on genetic variant status in 35% (250/709) of all identified NSCLC trials. There were 29 (248 vs. 219) additional clinical trials available to patients through the use of CGP, 12% of all gene-specific trials for NSCLC. We identified 45 uses of genetic markers in the 29 additional clinical trials. The most frequent genetic marker in the incremental trials was microsatellite instability, accounting for 44% of all identified markers (20/45). The incremental number of trials available to patients due to the use of CGP did not vary significantly over time but varied by phase – most of the additional clinical trials were in phase 1 or 2 (28/29, 97%). Most of the incremental trials were in metastatic disease (22/29, 76%) and were conducted in academic or advanced community settings (18/29, 62%). The most frequently studied type of intervention in these studies was targeted monotherapy (8/29, 28%), followed by immuno-monotherapy (7/29, 24%). Conclusions: Clinical trials in NSCLC initiated over the past 5 years have consistently included CGP-specific genes or markers in eligibility criteria. Patients with NSCLC have the potential to benefit from the use of CGP as compared to smaller gene panels through improved access to clinical trials.[Table: see text]


2019 ◽  
Vol 3 (2) ◽  
Author(s):  
Jennifer S Davis ◽  
Erin Prophet ◽  
Ho-Lan Peng ◽  
Hwa Young Lee ◽  
Rebecca S S Tidwell ◽  
...  

Abstract Background New, effective treatments have resulted in long-term survival for small subgroups of metastatic non-small cell lung cancer (NSCLC) patients. However, knowledge of long-term survivor frequency and characteristics prior to modern therapies is lacking. Methods Surveillance Epidemiology and End Results (SEER) patients with stage IV NSCLC diagnosed from 1991 to 2007 and followed through 2012 were dichotomized by survival time into the 10% who lived 21 months or longer (long-term survivors) vs the remaining 90% and compared with participants in a representative clinical trial of molecular profiling and targeted therapies (CUSTOM). Results Among the 44 387 SEER patients, the 10% identified as long-term survivors were distinguishable from the remaining 90% by younger age, female sex, Asian race, adenocarcinoma histology, tumor grade, tumor site, and surgery. From 1991–1994 to 2003–2007, median survival increased by 6 months from 30 to 36 months among long-term survivors but by only 1 month from 3 to 4 months among the remaining 90%. Among the 165 participants in the CUSTOM trial, 54% met our SEER criterion of long-term survival by living for 21 months or longer. Conclusions Among SEER patients with stage IV NSCLC, long-term survivors had a median survival approximately 10 times that of the remaining 90%. Long-term survivors accounted for more than one-half of the participants in a representative clinical trial. Caution is required when extrapolating the outcomes of participants in clinical trials to patients in routine clinical practice.


2008 ◽  
Vol 26 (25) ◽  
pp. 4116-4123 ◽  
Author(s):  
James E. Herndon ◽  
Alice B. Kornblith ◽  
Jimmie C. Holland ◽  
Electra D. Paskett

Purpose To investigate the effect of socioeconomic status, as measured by education, on the survival of 1,577 lung cancer patients treated on 11 studies conducted by the Cancer and Leukemia Group B. Patients and Methods Sociodemographic data, including education, was reported by the patient at the time of clinical trial accrual. Cox proportional hazards model stratified by treatment arm/study was used to examine the effect of education on survival after adjustment for known prognostic factors. Results The patient population included 1,177 patients diagnosed with non–small-cell lung cancer (NSCLC; stage III or IV) and 400 patients diagnosed with small-cell lung cancer (SCLC; extensive or limited). Patients with less than an eighth grade education (13% of patients) were significantly more likely to be male, nonwhite, and older; have a performance status (PS) of 1 or 2; and have chest pain. Significant predictors of poor survival in the final model included male sex, PS of 1 or 2, dyspnea, weight loss, liver or bone metastases, unmarried, presence of adrenal metastases and high alkaline phosphatase levels among patients with NSCLC, and high WBC levels among patients with advanced disease. Education was not predictive of survival. Conclusion The physical condition of patients with low education who enroll onto clinical trials is worse than patients with higher education. Once enrolled onto a clinical trial, education does not affect the survival of patients with SCLC or stage III or IV NSCLC. The standardization of treatment and follow-up within a clinical trial, regardless of education, is one possible explanation for this lack of effect.


Sign in / Sign up

Export Citation Format

Share Document