Blockchain-authenticated sharing of cancer patient genomic and clinical outcomes data.

e19358 Background: Efficiently sharing health data produced during standard care could dramatically accelerate progress in cancer treatments but various barriers make this difficult. Not sharing these data to ensure patient privacy is at the cost of little to no learning from real-world data produced during cancer care. Furthermore, recent research has demonstrated a willingness of cancer patients to share their treatment experiences to fuel research, despite potential risks to privacy. The objective of this study was to design, pilot, and release a decentralized, scalable, efficient, economical, and secure strategy for dissemination of de-identified clinical and genomic data with a focus on late stage cancer. Methods: We created and piloted a blockchain-authenticated system to enable securely sharing de-identified patient data derived from standard of care imaging, genomic testing, and electronic health records (EHR), called the Cancer Gene Trust (CGT). We prospectively consented and collected data for a pilot cohort (n = 18), which we uploaded to CGT. EHR data were extracted from both a hospital cancer registry and a common data model format to identify optimal data extraction and dissemination practices. Specifically, we scored and compared the level of completeness between two EHR data extraction formats against the gold-standard source documentation for patients with available data (n = 17). Results: While the total completeness scores were greater for the registry reports than the common data model, this difference was not statistically significant. We did find that some specific data fields, such as histology site, were better captured using the registry reports, which can be used to improve the continually adapting common data model. In terms of the overall pilot study, we found that CGT enables rapid integration of real-world cancer patient data in a more clinically useful timeframe. We also developed an open-source web application to allow users to seamlessly search, browse, explore, and download CGT data. Conclusions: Our pilot demonstrates the willingness of cancer patients to participate in data sharing and how blockchain-enabled structures can maintain relationships between individual data elements while preserving patient privacy, empowering findings by third party researchers and clinicians. We demonstrate the feasibility of CGT as a framework to share health data trapped in silos to further cancer research. Further studies to optimize data representation, stream, and integrity are required.

Download Full-text

Blockchain-Authenticated Sharing of Genomic and Clinical Outcomes Data of Patients With Cancer: A Prospective Cohort Study

Journal of Medical Internet Research ◽

10.2196/16810 ◽

2020 ◽

Vol 22 (3) ◽

pp. e16810 ◽

Cited By ~ 3

Author(s):

Benjamin Scott Glicksberg ◽

Shohei Burns ◽

Rob Currie ◽

Ann Griffin ◽

Zhen Jane Wang ◽

...

Keyword(s):

Real World ◽

Web Application ◽

Data Extraction ◽

Data Representation ◽

Health Data ◽

Common Data Model ◽

Patient Privacy ◽

Real World Data ◽

World Data ◽

Patients With Cancer

Background Efficiently sharing health data produced during standard care could dramatically accelerate progress in cancer treatments, but various barriers make this difficult. Not sharing these data to ensure patient privacy is at the cost of little to no learning from real-world data produced during cancer care. Furthermore, recent research has demonstrated a willingness of patients with cancer to share their treatment experiences to fuel research, despite potential risks to privacy. Objective The objective of this study was to design, pilot, and release a decentralized, scalable, efficient, economical, and secure strategy for the dissemination of deidentified clinical and genomic data with a focus on late-stage cancer. Methods We created and piloted a blockchain-authenticated system to enable secure sharing of deidentified patient data derived from standard of care imaging, genomic testing, and electronic health records (EHRs), called the Cancer Gene Trust (CGT). We prospectively consented and collected data for a pilot cohort (N=18), which we uploaded to the CGT. EHR data were extracted from both a hospital cancer registry and a common data model (CDM) format to identify optimal data extraction and dissemination practices. Specifically, we scored and compared the level of completeness between two EHR data extraction formats against the gold standard source documentation for patients with available data (n=17). Results Although the total completeness scores were greater for the registry reports than those for the CDM, this difference was not statistically significant. We did find that some specific data fields, such as histology site, were better captured using the registry reports, which can be used to improve the continually adapting CDM. In terms of the overall pilot study, we found that CGT enables rapid integration of real-world data of patients with cancer in a more clinically useful time frame. We also developed an open-source Web application to allow users to seamlessly search, browse, explore, and download CGT data. Conclusions Our pilot demonstrates the willingness of patients with cancer to participate in data sharing and how blockchain-enabled structures can maintain relationships between individual data elements while preserving patient privacy, empowering findings by third-party researchers and clinicians. We demonstrate the feasibility of CGT as a framework to share health data trapped in silos to further cancer research. Further studies to optimize data representation, stream, and integrity are required.

Download Full-text

Blockchain-Authenticated Sharing of Genomic and Clinical Outcomes Data of Patients With Cancer: A Prospective Cohort Study (Preprint)

10.2196/preprints.16810 ◽

2019 ◽

Author(s):

Benjamin Scott Glicksberg ◽

Shohei Burns ◽

Rob Currie ◽

Ann Griffin ◽

Zhen Jane Wang ◽

...

Keyword(s):

Real World ◽

Web Application ◽

Data Extraction ◽

Data Representation ◽

Health Data ◽

Common Data Model ◽

Patient Privacy ◽

Real World Data ◽

World Data ◽

Patients With Cancer

BACKGROUND Efficiently sharing health data produced during standard care could dramatically accelerate progress in cancer treatments, but various barriers make this difficult. Not sharing these data to ensure patient privacy is at the cost of little to no learning from real-world data produced during cancer care. Furthermore, recent research has demonstrated a willingness of patients with cancer to share their treatment experiences to fuel research, despite potential risks to privacy. OBJECTIVE The objective of this study was to design, pilot, and release a decentralized, scalable, efficient, economical, and secure strategy for the dissemination of deidentified clinical and genomic data with a focus on late-stage cancer. METHODS We created and piloted a blockchain-authenticated system to enable secure sharing of deidentified patient data derived from standard of care imaging, genomic testing, and electronic health records (EHRs), called the Cancer Gene Trust (CGT). We prospectively consented and collected data for a pilot cohort (N=18), which we uploaded to the CGT. EHR data were extracted from both a hospital cancer registry and a common data model (CDM) format to identify optimal data extraction and dissemination practices. Specifically, we scored and compared the level of completeness between two EHR data extraction formats against the gold standard source documentation for patients with available data (n=17). RESULTS Although the total completeness scores were greater for the registry reports than those for the CDM, this difference was not statistically significant. We did find that some specific data fields, such as histology site, were better captured using the registry reports, which can be used to improve the continually adapting CDM. In terms of the overall pilot study, we found that CGT enables rapid integration of real-world data of patients with cancer in a more clinically useful time frame. We also developed an open-source Web application to allow users to seamlessly search, browse, explore, and download CGT data. CONCLUSIONS Our pilot demonstrates the willingness of patients with cancer to participate in data sharing and how blockchain-enabled structures can maintain relationships between individual data elements while preserving patient privacy, empowering findings by third-party researchers and clinicians. We demonstrate the feasibility of CGT as a framework to share health data trapped in silos to further cancer research. Further studies to optimize data representation, stream, and integrity are required.

Download Full-text

Proposal and Assessment of a De-Identification Strategy to Enhance Anonymity of the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM) in a Public Cloud-Computing Environment: Anonymization of Medical Data Using Privacy Models (Preprint)

10.2196/preprints.19597 ◽

2020 ◽

Cited By ~ 1

Author(s):

Seungho Jeon ◽

Jeongeun Seo ◽

Sukyoung Kim ◽

Jeongmoon Lee ◽

Jong-Ho Kim ◽

...

Keyword(s):

Cloud Computing ◽

Data Model ◽

Personal Information ◽

Health Data ◽

Common Data Model ◽

Patient Privacy ◽

Medical Outcomes ◽

Cloud Computing System ◽

Privacy Models ◽

Identification Strategy

BACKGROUND De-identifying personal information is critical when using personal health data for secondary research. The Observational Medical Outcomes Partnership Common Data Model (CDM), defined by the nonprofit organization Observational Health Data Sciences and Informatics, has been gaining attention for its use in the analysis of patient-level clinical data obtained from various medical institutions. When analyzing such data in a public environment such as a cloud-computing system, an appropriate de-identification strategy is required to protect patient privacy. OBJECTIVE This study proposes and evaluates a de-identification strategy that is comprised of several rules along with privacy models such as k-anonymity, l-diversity, and t-closeness. The proposed strategy was evaluated using the actual CDM database. METHODS The CDM database used in this study was constructed by the Anam Hospital of Korea University. Analysis and evaluation were performed using the ARX anonymizing framework in combination with the k-anonymity, l-diversity, and t-closeness privacy models. RESULTS The CDM database, which was constructed according to the rules established by Observational Health Data Sciences and Informatics, exhibited a low risk of re-identification: The highest re-identifiable record rate (11.3%) in the dataset was exhibited by the DRUG_EXPOSURE table, with a re-identification success rate of 0.03%. However, because all tables include at least one “highest risk” value of 100%, suitable anonymizing techniques are required; moreover, the CDM database preserves the “source values” (raw data), a combination of which could increase the risk of re-identification. Therefore, this study proposes an enhanced strategy to de-identify the source values to significantly reduce not only the highest risk in the k-anonymity, l-diversity, and t-closeness privacy models but also the overall possibility of re-identification. CONCLUSIONS Our proposed de-identification strategy effectively enhanced the privacy of the CDM database, thereby encouraging clinical research involving multiple centers.

Download Full-text

Proposal and Assessment of a De-Identification Strategy to Enhance Anonymity of the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM) in a Public Cloud-Computing Environment: Anonymization of Medical Data Using Privacy Models

Journal of Medical Internet Research ◽

10.2196/19597 ◽

2020 ◽

Vol 22 (11) ◽

pp. e19597

Author(s):

Seungho Jeon ◽

Jeongeun Seo ◽

Sukyoung Kim ◽

Jeongmoon Lee ◽

Jong-Ho Kim ◽

...

Keyword(s):

Cloud Computing ◽

Data Model ◽

Personal Information ◽

Health Data ◽

Common Data Model ◽

Patient Privacy ◽

Medical Outcomes ◽

Cloud Computing System ◽

Privacy Models ◽

Identification Strategy

Background De-identifying personal information is critical when using personal health data for secondary research. The Observational Medical Outcomes Partnership Common Data Model (CDM), defined by the nonprofit organization Observational Health Data Sciences and Informatics, has been gaining attention for its use in the analysis of patient-level clinical data obtained from various medical institutions. When analyzing such data in a public environment such as a cloud-computing system, an appropriate de-identification strategy is required to protect patient privacy. Objective This study proposes and evaluates a de-identification strategy that is comprised of several rules along with privacy models such as k-anonymity, l-diversity, and t-closeness. The proposed strategy was evaluated using the actual CDM database. Methods The CDM database used in this study was constructed by the Anam Hospital of Korea University. Analysis and evaluation were performed using the ARX anonymizing framework in combination with the k-anonymity, l-diversity, and t-closeness privacy models. Results The CDM database, which was constructed according to the rules established by Observational Health Data Sciences and Informatics, exhibited a low risk of re-identification: The highest re-identifiable record rate (11.3%) in the dataset was exhibited by the DRUG_EXPOSURE table, with a re-identification success rate of 0.03%. However, because all tables include at least one “highest risk” value of 100%, suitable anonymizing techniques are required; moreover, the CDM database preserves the “source values” (raw data), a combination of which could increase the risk of re-identification. Therefore, this study proposes an enhanced strategy to de-identify the source values to significantly reduce not only the highest risk in the k-anonymity, l-diversity, and t-closeness privacy models but also the overall possibility of re-identification. Conclusions Our proposed de-identification strategy effectively enhanced the privacy of the CDM database, thereby encouraging clinical research involving multiple centers.

Download Full-text

Assessment of inter-institutional post-operative hypoparathyroidism status using a common data model (Preprint)

10.2196/preprints.30635 ◽

2021 ◽

Author(s):

Joon-Hyop Lee ◽

Suhyun Kim ◽

Kwangsoo Kim ◽

Young Jun Chai ◽

Hyeong Won Yu ◽

...

Keyword(s):

Data Model ◽

Tumor Resection ◽

Health Data ◽

Common Data Model ◽

Tertiary Institutions ◽

Logistic Analysis ◽

Large Databases ◽

Tertiary Hospitals ◽

Tingling Sensation ◽

Transient Hypoparathyroidism

BACKGROUND Post-thyroidectomy hypoparathyroidism may result in various transient or permanent symptoms, ranging from tingling sensation to severe breathing difficulties. Its incidence varies among surgeons and institutions, making it difficult to determine its actual incidence and associated factors. OBJECTIVE This study attempted to estimate the incidence of post-operative hypoparathyroidism in patients at two tertiary institutions that share a common data model, the Observational Health Data Sciences and Informatics. METHODS This study used the Common Data Model to extract explicitly specified encoding and relationships among concepts using standardized vocabularies. The EDI-codes of various thyroid disorders and thyroid operations were extracted from two separate tertiary hospitals between January 2013 and December 2018. Patients were grouped into no evidence of/transient/permanent hypoparathyroidism groups to analyze the likelihood of hypoparathyroidism occurrence related to operation types and diagnosis RESULTS Of the 4848 eligible patients at the two institutions who underwent thyroidectomy, 1370 (28.26%) experienced transient hypoparathyroidism and 251 (5.18%) experienced persistent hypoparathyroidism. Univariate logistic regression analysis predicted that, relative to total bilateral thyroidectomy, radical tumor resection was associated with a 48% greater likelihood of transient hypoparathyroidism and a 102% greater likelihood of persistent hypoparathyroidism. Moreover, multivariate logistic analysis found that radical tumor resection was associated with a 50% greater likelihood of transient hypoparathyroidism and a 97% greater likelihood of persistent hypoparathyroidism than total bilateral thyroidectomy. CONCLUSIONS These findings, by integrating and analyzing two databases, suggest that this analysis could be expanded to include other large databases that share the same Observational Health Data Sciences and Informatics protocol.

Download Full-text

Study of Data Integration Model Based on Network Technology

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.268-270.1868 ◽

2011 ◽

Vol 268-270 ◽

pp. 1868-1873

Author(s):

Li Jun Yang

Keyword(s):

Data Model ◽

Data Representation ◽

Heterogeneous Data ◽

Structured Data ◽

Common Data Model ◽

Network Technology ◽

Interaction Technique ◽

Integration Model ◽

Heterogeneous Data Sources ◽

Meaningful Research

The existence of heterogeneous data sources brings great inconvenience to realize the exchange visits to data between different information systems. Therefore, it becomes a meaningful research topic to solve the problem of realizing convenient and flexible exchange visits. This paper combines the data representation format of XML generally used in current network with an interaction technique of WebService, and constructs a UDM data model, which can implement structured data of relational type as well as describe unstructured data and self-describing semi-structured data. So UDM data model can be used as a common data model integrated by heterogeneous data to integrate these heterogeneous data.

Download Full-text

Pharmacovigilance and Clinical Environment: Utilizing OMOP-CDM and OHDSI Software Stack to Integrate EHR Data

Studies in Health Technology and Informatics - Public Health and Informatics ◽

10.3233/shti210232 ◽

2021 ◽

Author(s):

Vlasios K. Dimitriadis ◽

George I. Gavriilidis ◽

Pantelis Natsiavas

Keyword(s):

Information Technology ◽

Electronic Health Records ◽

Drug Safety ◽

Data Model ◽

Health Data ◽

Common Data Model ◽

Clinical Environment ◽

Drug Reactions ◽

Medical Outcomes ◽

Integrate Data

Information Technology (IT) and specialized systems could have a prominent role towards the support of drug safety processes, both in the clinical context but also beyond that. PVClinical project aims to build an IT platform, enabling the investigation of potential Adverse Drug Reactions (ADRs). In this paper, we outline the utilization of Observational Medical Outcomes Partnership – Common Data Model (OMOP-CDM) and the openly available Observational Health Data Sciences and Informatics (OHDSI) software stack as part of PVClinical platform. OMOP-CDM offers the capacity to integrate data from Electronic Health Records (EHRs) (e.g., encounters, patients, providers, diagnoses, drugs, measurements and procedures) via an accepted data model. Furthermore, the OHDSI software stack provides valuable analytics tools which could be used to address important questions regarding drug safety quickly and efficiently, enabling the investigation of potential ADRs in the clinical environment.

Download Full-text

Gastrointestinal and Nongastrointestinal Complications of Esophagogastroduodenoscopy and Colonoscopy in the Real World: A Nationwide Standard Cohort Using the Common Data Model Database

Gut and Liver ◽

10.5009/gnl20222 ◽

2021 ◽

Author(s):

Ha Il Kim ◽

Jin Young Yoon ◽

Min Seob Kwak ◽

Jae Myung Cha

Keyword(s):

Real World ◽

Data Model ◽

Common Data Model ◽

The Real ◽

The Common ◽

Model Database

Download Full-text

OEN: Multi-center, international, real-world evidence studies performed using health records without data pooling—The use of a common data model and shared analytical methods.

Journal of Clinical Oncology ◽

10.1200/jco.2021.39.15_suppl.e13554 ◽

2021 ◽

Vol 39 (15_suppl) ◽

pp. e13554-e13554

Author(s):

Bethany Levick ◽

Sue Cheeseman ◽

Eun Ji Nam ◽

Haewon Doh ◽

Subin Lim ◽

...

Keyword(s):

Real World ◽

Data Model ◽

Clinical Care ◽

Common Data Model ◽

Local Data ◽

Large Hospital ◽

Patient Level Data ◽

Patient Level ◽

Real World Evidence ◽

Level Data

e13554 Background: The value of real-world evidence derived from the care of patients managed outside the context of clinical trials is well recognised. However, the ability to link data from multiple centres, especially those from different countries, is complicated by complex legal and information governance differences. The Oncology Evidence Network is a collaboration of large hospital centres, with strong clinical informatics capabilities in six countries in Europe and Asia working with the support of an industrial partner to provide high quality, real world data reflecting routine clinical care. We have developed an efficient workflow based on a study-specific common data model (CDM) clinically validated at each site and analysed with a single analysis script, which embeds a set of data quality rules. Local implementation allows each centre to generate analytical outputs aligned across the different sites without the need for any patient level data to leave the participating site. This approach has been designed and tested in Epithelial Ovarian Cancer (EOC) patients. Methods: A CDM was agreed using expert advisors from each centre. Clinical alignment was achieved through iterative assessment of clinical vignettes, to ensure common definitions of clinical assessment, prognosis, and treatment algorithms in EOC patients. A data guide detailing variable level derivations and validation rules, general data coding principles, and conversions/codes from international coding systems was developed. The analysis scripts were implemented as a bespoke package (OpenOvary) in R. The package includes functions to validate the data against the CDM, and generate a standard output including tables, numerical summaries and Kaplan-Meier analysis of progression and overall survival. Results: 2,925 patient records from 6 centres across 6 countries were included in the study with 27 key data items curated by each centre. Treatment data is available detailing relevant surgical procedures and their outcomes, and regimens of SACT throughout patients’ care from diagnosis to death. Data completeness was generally high for key data items, with missing data ranging from 0-16% for FIGO stage at diagnosis and 0-14% for tumour morphology. The CDM and R script will be made publicly available for other centres to adopt and facilitate analysis of their local data. Conclusions: This collaboration has brought together a substantial body of data describing the care and outcomes for EOC patients. A CDM and flexible shared analysis approach enabled unified analysis and reporting whilst avoiding the transfer of patient level data and its pooling into a common database. The process of clinical and data alignment has generated a replicable model for rapid extension to other study centres to join the EOC study, or application to other disease areas.

Download Full-text

Real-World Use of Colonoscopy in an Older Population: A Nationwide Standard Cohort Study Using a Common Data Model

Digestive Diseases and Sciences ◽

10.1007/s10620-020-06494-x ◽

2020 ◽

Author(s):

Ha Il Kim ◽

Jin Young Yoon ◽

Min Seob Kwak ◽

Jae Myung Cha

Keyword(s):

Cohort Study ◽

Real World ◽

Data Model ◽

Common Data Model ◽

Older Population

Download Full-text