scholarly journals The National COVID Cohort Collaborative (N3C): Rationale, Design, Infrastructure, and Deployment

Author(s):  
Melissa A Haendel ◽  
Christopher G Chute ◽  
Kenneth Gersing

Abstract Objective COVID-19 poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many centers. Methods The Clinical and Translational Science Award (CTSA) Program and scientific community created N3C to overcome technical, regulatory, policy, and governance barriers to sharing and harmonizing individual-level clinical data. We developed solutions to extract, aggregate, and harmonize data across organizations and data models, and created a secure data enclave to enable efficient, transparent, and reproducible collaborative analytics. Organized in inclusive workstreams, in two months we created: legal agreements and governance for organizations and researchers; data extraction scripts to identify and ingest positive, negative, and possible COVID-19 cases; a data quality assurance and harmonization pipeline to create a single harmonized dataset; population of the secure data enclave with data, machine learning, and statistical analytics tools; dissemination mechanisms; and a synthetic data pilot to democratize data access. Discussion The N3C has demonstrated that a multi-site collaborative learning health network can overcome barriers to rapidly build a scalable infrastructure incorporating multi-organizational clinical data for COVID-19 analytics. We expect this effort to save lives by enabling rapid collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care and thereby reduce the immediate and long-term impacts of COVID-19. Lay Summary COVID-19 poses societal challenges that require expeditious data and knowledge sharing. Though medical records are abundant, they are largely inaccessible to outside researchers. Statistical, machine learning, and causal research are most successful with large datasets beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many clinical centers to reveal patterns in COVID-19 patients. To create N3C, the community had to overcome technical, regulatory, policy, and governance barriers to sharing patient-level clinical data. In less than 2 months, we developed solutions to acquire and harmonize data across organizations and created a secure data environment to enable transparent and reproducible collaborative research. We expect the N3C to help save lives by enabling collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care needs and thereby reduce the immediate and long-term impacts of COVID-19.

2021 ◽  
Vol 09 (02) ◽  
pp. E233-E238
Author(s):  
Rajesh N. Keswani ◽  
Daniel Byrd ◽  
Florencia Garcia Vicente ◽  
J. Alex Heller ◽  
Matthew Klug ◽  
...  

Abstract Background and study aims Storage of full-length endoscopic procedures is becoming increasingly popular. To facilitate large-scale machine learning (ML) focused on clinical outcomes, these videos must be merged with the patient-level data in the electronic health record (EHR). Our aim was to present a method of accurately linking patient-level EHR data with cloud stored colonoscopy videos. Methods This study was conducted at a single academic medical center. Most procedure videos are automatically uploaded to the cloud server but are identified only by procedure time and procedure room. We developed and then tested an algorithm to match recorded videos with corresponding exams in the EHR based upon procedure time and room and subsequently extract frames of interest. Results Among 28,611 total colonoscopies performed over the study period, 21,170 colonoscopy videos in 20,420 unique patients (54.2 % male, median age 58) were matched to EHR data. Of 100 randomly sampled videos, appropriate matching was manually confirmed in all. In total, these videos represented 489,721 minutes of colonoscopy performed by 50 endoscopists (median 214 colonoscopies per endoscopist). The most common procedure indications were polyp screening (47.3 %), surveillance (28.9 %) and inflammatory bowel disease (9.4 %). From these videos, we extracted procedure highlights (identified by image capture; mean 8.5 per colonoscopy) and surrounding frames. Conclusions We report the successful merging of a large database of endoscopy videos stored with limited identifiers to rich patient-level data in a highly accurate manner. This technique facilitates the development of ML algorithms based upon relevant patient outcomes.


2020 ◽  
Author(s):  
Konrad Turek ◽  
Matthijs Kalmijn ◽  
Thomas Leopold

The Comparative Panel File (CPF) harmonises the world's largest and longest-running household panel surveys from seven countries: Australia (HILDA), Germany (SOEP), Great Britain (BHPS and UKHLS), South Korea (KLIPS), Russia (RLMS), Switzerland (SHP), and the United States (PSID). The project aims to support the social science community in the analysis of comparative life course data. The CPF is not a data product but an open-source code that integrates individual and household panel data from all seven surveys into a harmonised three-level data structure. In this manual, we present the design and content of the CPF, explain the logic of the project, workflow and technical details. We also describe the CPF's open-science platform. More at: www.cpfdata.com


2020 ◽  
Author(s):  
Vinay Srinivas Bharadhwaj ◽  
Mehdi Ali ◽  
Colin Birkenbihl ◽  
Sarah Mubeen ◽  
Jens Lehmann ◽  
...  

AbstractAs machine learning and artificial intelligence become more useful in the interpretation of biomedical data, their utility depends on the data used to train them. Due to the complexity and high dimensionality of biomedical data, there is a need for approaches that combine prior knowledge around known biological interactions with patient data. Here, we present CLEP, a novel approach that generates new patient representations by leveraging both prior knowledge and patient-level data. First, given a patient-level dataset and a knowledge graph containing relations across features that can be mapped to the dataset, CLEP incorporates patients into the knowledge graph as new nodes connected to their most characteristic features. Next, CLEP employs knowledge graph embedding models to generate new patient representations that can ultimately be used for a variety of downstream tasks, ranging from clustering to classification. We demonstrate how using new patient representations generated by CLEP significantly improves performance in classifying between patients and healthy controls for a variety of machine learning models, as compared to the use of the original transcriptomics data. Furthermore, we also show how incorporating patients into a knowledge graph can foster the interpretation and identification of biological features characteristic of a specific disease or patient subgroup. Finally, we released CLEP as an open source Python package together with examples and documentation.


2019 ◽  
Vol 35 (S1) ◽  
pp. 46-47
Author(s):  
Noreen Downes ◽  
Jan Jones ◽  
Anne Lee ◽  
Pauline McGuire

IntroductionMedicines regulation has become increasingly adaptive to support earlier patient access but the immature clinical data is often challenging for health technology assessment decision-makers due to high levels of uncertainty on long term risks and benefits. Scottish Medicines Consortium (SMC) is therefore exploring new, more adaptive approaches to help manage this challenge.MethodsSMC consulted with key stakeholders including clinicians, the pharmaceutical industry and patient groups on a number of options that would allow the committee to make an interim decision that would be revisited based on later evidence. The ability to collect robust patient level data given data capabilities in National Health Service Scotland (NHSScotland) was an important consideration.ResultsTo ensure that additional evidence would be available to inform a re-assessment, the new approach applies to medicines with a Conditional Marketing Authorisation (MA) from the European Medicines Agency (EMA). This obligates the company to provide specified clinical data to the regulator within a pre-set timeframe. For these medicines, the SMC decision-making committee can accept or not recommend the medicine as at present but can also accept the medicine on an interim basis, if the regulator's mandated Specific Obligations are likely to address the uncertainties in the clinical evidence. When the regulator converts the MA from conditional to standard, the company is required to make a further SMC submission to allow a reassessment and a final decision. The company can also provide additional supplementary post-licensing patient level evidence at reassessment.ConclusionsThis new decision option allows SMC to test an approach to managing uncertainty targeted at a small number of promising new medicines where there is unmet patient need, with the reassurance that a final decision will be supported by additional clinical data.


Sign in / Sign up

Export Citation Format

Share Document