scholarly journals Designing a standard protocol for manually reviewing patient data demographics for record linkage

2019 ◽  
Vol 2 (1) ◽  
Author(s):  
Sen Xiong ◽  
Shuan Grannis, MD, MS, FAAP

Background and Hypothesis: Accurate record linkage is essential to address fragmentation of patient data across independent healthcare organizations. To accurately evaluate record linkage methods, so-called “gold standard” data sets with labeled true matches and non-matches are needed. Human review, the process of manually assessing potentially linked patient demographic records and determining whether the record pair belongs to an idiosyncratic individual, is needed to create these datasets. However, the human review process is susceptible to bias and human error. Consequently, record linkage accuracy evaluations are prone to be biased by inaccurate gold standards. Consistent and scientifically rigorous methods for creating gold standard record linkage data sets must be developed, as none have yet been described. In this study, we describe a repeatable process for developing consistent manually reviewed datasets and analyze the results obtained from 15 human reviews of 200 record pairs following our protocol. Experimental Design/Methods: We obtained patient records from the Indiana Network for Patient Care and Marion County Health Department. We created record pairs for manual reviews by probabilistically linking datasets using multiple blocking schemes. Two-hundred record pairs were then manually reviewed by 15 different individuals and the results were analyzed. Results: Across the 200 record pairs reviewed by 15 reviewers, 155 were nondiscordant pairs whereas 45 were discordant, 40 among which were the result of outliers. Conclusion and Potential Impact: From the record pair evaluation results, some empirical rules can be established for the process of manual review, though the nuances of evaluation reasoning require more discussion and a larger sample size. Nonetheless, establishing a standard for manual reviewing is a step towards better health care and complete patient records.

2018 ◽  
Vol 11 (4) ◽  
pp. 87-98
Author(s):  
Abdullah Alamri

Healthcare systems have evolved to become more patient-centric. Many efforts have been made to transform paper-based patient data to automated medical information by developing electronic healthcare records (EHRs). Several international EHRs standards have been enabling healthcare interoperability and communication among a wide variety of medical centres. It is a dual-model methodology which comprises a reference information model and an archetype model. The archetype is responsible for the definition of clinical concepts which has limitations in terms of supporting complex reasoning and knowledge discovery requirements. The objective of this article is to propose a semantic-mediation architecture to support semantic interoperability among healthcare organizations. It provides an intermediate semantic layer to exploit clinical information based on richer ontological representations to create a “model of meaning” for enabling semantic mediation. The proposed model also provides secure mechanisms to allow interoperable sharing of patient data between healthcare organizations.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Yue Jiao ◽  
Fabienne Lesueur ◽  
Chloé-Agathe Azencott ◽  
Maïté Laurent ◽  
Noura Mebirouk ◽  
...  

Abstract Background Linking independent sources of data describing the same individuals enable innovative epidemiological and health studies but require a robust record linkage approach. We describe a hybrid record linkage process to link databases from two independent ongoing French national studies, GEMO (Genetic Modifiers of BRCA1 and BRCA2), which focuses on the identification of genetic factors modifying cancer risk of BRCA1 and BRCA2 mutation carriers, and GENEPSO (prospective cohort of BRCAx mutation carriers), which focuses on environmental and lifestyle risk factors. Methods To identify as many as possible of the individuals participating in the two studies but not registered by a shared identifier, we combined probabilistic record linkage (PRL) and supervised machine learning (ML). This approach (named “PRL + ML”) combined together the candidate matches identified by both approaches. We built the ML model using the gold standard on a first version of the two databases as a training dataset. This gold standard was obtained from PRL-derived matches verified by an exhaustive manual review. Results The Random Forest (RF) algorithm showed a highest recall (0.985) among six widely used ML algorithms: RF, Bagged trees, AdaBoost, Support Vector Machine, Neural Network. Therefore, RF was selected to build the ML model since our goal was to identify the maximum number of true matches. Our combined linkage PRL + ML showed a higher recall (range 0.988–0.992) than either PRL (range 0.916–0.991) or ML (0.981) alone. It identified 1995 individuals participating in both GEMO (6375 participants) and GENEPSO (4925 participants). Conclusions Our hybrid linkage process represents an efficient tool for linking GEMO and GENEPSO. It may be generalizable to other epidemiological studies involving other databases and registries.


Blood ◽  
2021 ◽  
Author(s):  
Alexandra Sipol ◽  
Erik Hameister ◽  
Busheng Xue ◽  
Julia Hofstetter ◽  
Maxim Barenboim ◽  
...  

Cancer cells are in most instances characterized by rapid proliferation and uncontrolled cell division. Hence, they must adapt to proliferation-induced metabolic stress through intrinsic or acquired anti-metabolic stress responses to maintain homeostasis and survival. One mechanism to achieve this is to reprogram gene expression in a metabolism-dependent manner. MondoA (also known as MLXIP), a member of the MYC interactome, has been described as an example of such a metabolic sensor. However, the role of MondoA in malignancy is not fully understood and the underlying mechanism in metabolic responses remains elusive. By assessing patient data sets we found that MondoA overexpression is associated with a worse survival in pediatric common acute lymphoblastic leukemia (B-ALL). Using CRISPR/Cas9 and RNA interference approaches, we observed that MondoA depletion reduces transformational capacity of B-ALL cells in vitro and dramatically inhibits malignant potential in an in vivo mouse model. Interestingly, reduced expression of MondoA in patient data sets correlated with enrichment in metabolic pathways. The loss of MondoA correlated with increased tricarboxylic acid (TCA) cycle activity. Mechanistically, MondoA senses metabolic stress in B-ALL cells by restricting oxidative phosphorylation through reduced PDH activity. Glutamine starvation conditions greatly enhance this effect and highlight the inability to mitigate metabolic stress upon loss of MondoA in B-ALL. Our findings give a novel insight into the function of MondoA in pediatric B-ALL and support the notion that MondoA inhibition in this entity offers a therapeutic opportunity and should be further explored.


Author(s):  
Reima Suomi

Healthcare is on of the industries that is currently fast adopting information technology (IT) into use. Electronic patient records (EPRs) are at the hearth of healthcare information technology applications. However, patient data is seldom efficiently organized even within one organization, and when patient data is needed in applications covering several organizations, the situation becomes even more complicated. We draw some lessons on how EPR systems should look like from the customer relationship management literature point of view: After all, patients are the customers of healthcare institutions. As a guiding framework for this analysis we use the concepts developed by (Winter, Ammenwerth, et al,. 2001). Then we proceed to discuss how EPR systems diffuse in the healthcare industry and use the Internet standards adoption (ISA) model presented by (Hovav, Patnayakuni, et al., 2004) as a starting point. We apply this model to the diffusion of EPR systems in the healthcare industry. We found big differences between customer relationship management and EPRs management. Customer relationship management aims at long-term relationships and customer profitability, which are not strong goals for EPR systems. Our analysis too resulted to the conclusion that the practical innovation adoption bath for EPRs over paper-based patient records is that of adoption through coexistence.


Author(s):  
Fernando Enrique Lopez Martinez ◽  
Edward Rolando Núñez-Valdez

IoT, big data, and artificial intelligence are currently three of the most relevant and trending pieces for innovation and predictive analysis in healthcare. Many healthcare organizations are already working on developing their own home-centric data collection networks and intelligent big data analytics systems based on machine-learning principles. The benefit of using IoT, big data, and artificial intelligence for community and population health is better health outcomes for the population and communities. The new generation of machine-learning algorithms can use large standardized data sets generated in healthcare to improve the effectiveness of public health interventions. A lot of these data come from sensors, devices, electronic health records (EHR), data generated by public health nurses, mobile data, social media, and the internet. This chapter shows a high-level implementation of a complete solution of IoT, big data, and machine learning implemented in the city of Cartagena, Colombia for hypertensive patients by using an eHealth sensor and Amazon Web Services components.


BMJ Open ◽  
2020 ◽  
Vol 10 (8) ◽  
pp. e037719
Author(s):  
Helen Strongman ◽  
Rachael Williams ◽  
Krishnan Bhaskaran

ObjectivesTo describe the benefits and limitations of using individual and combinations of linked English electronic health data to identify incident cancers.Design and settingOur descriptive study uses linked English Clinical Practice Research Datalink primary care; cancer registration; hospitalisation and death registration data.Participants and measuresWe implemented case definitions to identify first site-specific cancers at the 20 most common sites, based on the first ever cancer diagnosis recorded in each individual or commonly used combination of data sources between 2000 and 2014. We calculated positive predictive values and sensitivities of each definition, compared with a gold standard algorithm that used information from all linked data sets to identify first cancers. We described completeness of grade and stage information in the cancer registration data set.Results165 953 gold standard cancers were identified. Positive predictive values of all case definitions were ≥80% and ≥94% for the four most common cancers (breast, lung, colorectal and prostate). Sensitivity for case definitions that used cancer registration alone or in combination was ≥92% for the four most common cancers and ≥80% across all cancer sites except bladder cancer (65% using cancer registration alone). For case definitions using linked primary care, hospitalisation and death registration data, sensitivity was ≥89% for the four most common cancers, and ≥80% for all cancer sites except kidney (69%), oral cavity (76%) and ovarian cancer (78%). When primary care or hospitalisation data were used alone, sensitivities were generally lower and diagnosis dates were delayed. Completeness of staging data in cancer registration data was high from 2012 (minimum 76.0% in 2012 and 86.4% in 2014 for the four most common cancers).ConclusionsAscertainment of incident cancers was good when using cancer registration data alone or in combination with other data sets, and for the majority of cancers when using a combination of primary care, hospitalisation and death registration data.


JAMIA Open ◽  
2019 ◽  
Vol 2 (4) ◽  
pp. 562-569 ◽  
Author(s):  
Jiang Bian ◽  
Alexander Loiacono ◽  
Andrei Sura ◽  
Tonatiuh Mendoza Viramontes ◽  
Gloria Lipori ◽  
...  

Abstract Objective To implement an open-source tool that performs deterministic privacy-preserving record linkage (RL) in a real-world setting within a large research network. Materials and Methods We learned 2 efficient deterministic linkage rules using publicly available voter registration data. We then validated the 2 rules’ performance with 2 manually curated gold-standard datasets linking electronic health records and claims data from 2 sources. We developed an open-source Python-based tool—OneFL Deduper—that (1) creates seeded hash codes of combinations of patients’ quasi-identifiers using a cryptographic one-way hash function to achieve privacy protection and (2) links and deduplicates patient records using a central broker through matching of hash codes with a high precision and reasonable recall. Results We deployed the OneFl Deduper (https://github.com/ufbmi/onefl-deduper) in the OneFlorida, a state-based clinical research network as part of the national Patient-Centered Clinical Research Network (PCORnet). Using the gold-standard datasets, we achieved a precision of 97.25∼99.7% and a recall of 75.5%. With the tool, we deduplicated ∼3.5 million (out of ∼15 million) records down to 1.7 million unique patients across 6 health care partners and the Florida Medicaid program. We demonstrated the benefits of RL through examining different disease profiles of the linked cohorts. Conclusions Many factors including privacy risk considerations, policies and regulations, data availability and quality, and computing resources, can impact how a RL solution is constructed in a real-world setting. Nevertheless, RL is a significant task in improving the data quality in a network so that we can draw reliable scientific discoveries from these massive data resources.


2005 ◽  
Vol 44 (05) ◽  
pp. 631-638 ◽  
Author(s):  
J. Roukema ◽  
A. M. van Ginneken ◽  
M. de Wilde ◽  
J. van der Lei ◽  
R. K. Los

Summary Objective: OpenSDE is an application that supports structured recording of narrative patient data to enable use of the data in both clinical practice and clinical research. Reliability and accuracy of collected data are essential for subsequent data use. In this study we analyze the uniformity of data entered with OpenSDE. Our objective is to obtain insight into the consensus and differences of recorded data. Methods: Three pediatricians transcribed 20 paper patient records using OpenSDE. The transcribed records were compared and all recorded findings were classified into one of six categories of difference. Results: Of all findings 22% were recorded identically; 17% of the findings were recorded differently (predominantly as free text); 61% was omitted, inferred, or in conflict with the paper record. Conclusion: The results of this study show that recording patient data using structured data entry does not necessarily lead to uniformly structured data.


Sign in / Sign up

Export Citation Format

Share Document