scholarly journals Amalgamation of cloud-based colonoscopy videos with patient-level metadata to facilitate large-scale machine learning

2021 ◽  
Vol 09 (02) ◽  
pp. E233-E238
Author(s):  
Rajesh N. Keswani ◽  
Daniel Byrd ◽  
Florencia Garcia Vicente ◽  
J. Alex Heller ◽  
Matthew Klug ◽  
...  

Abstract Background and study aims Storage of full-length endoscopic procedures is becoming increasingly popular. To facilitate large-scale machine learning (ML) focused on clinical outcomes, these videos must be merged with the patient-level data in the electronic health record (EHR). Our aim was to present a method of accurately linking patient-level EHR data with cloud stored colonoscopy videos. Methods This study was conducted at a single academic medical center. Most procedure videos are automatically uploaded to the cloud server but are identified only by procedure time and procedure room. We developed and then tested an algorithm to match recorded videos with corresponding exams in the EHR based upon procedure time and room and subsequently extract frames of interest. Results Among 28,611 total colonoscopies performed over the study period, 21,170 colonoscopy videos in 20,420 unique patients (54.2 % male, median age 58) were matched to EHR data. Of 100 randomly sampled videos, appropriate matching was manually confirmed in all. In total, these videos represented 489,721 minutes of colonoscopy performed by 50 endoscopists (median 214 colonoscopies per endoscopist). The most common procedure indications were polyp screening (47.3 %), surveillance (28.9 %) and inflammatory bowel disease (9.4 %). From these videos, we extracted procedure highlights (identified by image capture; mean 8.5 per colonoscopy) and surrounding frames. Conclusions We report the successful merging of a large database of endoscopy videos stored with limited identifiers to rich patient-level data in a highly accurate manner. This technique facilitates the development of ML algorithms based upon relevant patient outcomes.

2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Chuan Hong ◽  
Everett Rush ◽  
Molei Liu ◽  
Doudou Zhou ◽  
Jiehuan Sun ◽  
...  

AbstractThe increasing availability of electronic health record (EHR) systems has created enormous potential for translational research. However, it is difficult to know all the relevant codes related to a phenotype due to the large number of codes available. Traditional data mining approaches often require the use of patient-level data, which hinders the ability to share data across institutions. In this project, we demonstrate that multi-center large-scale code embeddings can be used to efficiently identify relevant features related to a disease of interest. We constructed large-scale code embeddings for a wide range of codified concepts from EHRs from two large medical centers. We developed knowledge extraction via sparse embedding regression (KESER) for feature selection and integrative network analysis. We evaluated the quality of the code embeddings and assessed the performance of KESER in feature selection for eight diseases. Besides, we developed an integrated clinical knowledge map combining embedding data from both institutions. The features selected by KESER were comprehensive compared to lists of codified data generated by domain experts. Features identified via KESER resulted in comparable performance to those built upon features selected manually or with patient-level data. The knowledge map created using an integrative analysis identified disease-disease and disease-drug pairs more accurately compared to those identified using single institution data. Analysis of code embeddings via KESER can effectively reveal clinical knowledge and infer relatedness among codified concepts. KESER bypasses the need for patient-level data in individual analyses providing a significant advance in enabling multi-center studies using EHR data.


2017 ◽  
Vol 1 (S1) ◽  
pp. 14-14
Author(s):  
William G. Adams ◽  
Michael Mendis ◽  
Shiby Thomas ◽  
David Center ◽  
Sara Curran

OBJECTIVES/SPECIFIC AIMS: The primary objective of this effort is to develop and distribute an easy to use i2b2 component that is capable of evaluating diverse complex relationships for a wide variety of exposures and outcomes over time. In this manner we are able to leverage the unique design of the i2b2 database to support health services research, comparative effectiveness, and quality improvement using a single tool. Furthermore, our novel database redesign has the potential to provide user-friendly access to individual and group CHC data for CER. METHODS/STUDY POPULATION: For this project we used software experts, clinical informatics specialists, and the existing i2b2 open-source software to convert our legacy HOME Cell into a web-client version. The tool will be used to study health outcomes within a network of Boston based Community Health Centers and the largest safety-net hospital in New England, Boston Medical Center. RESULTS/ANTICIPATED RESULTS: The new web-client HOME Cell will allow i2b2 users to model virtually any exposure (including therapeutic interventions such as medications or tests) in i2b2 against any outcome accounting for complex temporal relationships and other factors. In addition we plan to use our new Community Health Center views to enhance our community engagement activities by allowing direct access to their data for our partners. DISCUSSION/SIGNIFICANCE OF IMPACT: Our project addresses multiple national priorities related to data sharing, clinical research informatics, and comparative effectiveness. The web-client version of the HOME Cell substantially improves our community’s access to HOME Cell functionality and is a novel, sharable resource for use within the CTSA/NCATS community. Our approach provides a new way to perform large-scale collaborative research without the need to actually move patient-level data and has demonstrated that CER, health services research, and quality measurement can share a common framework. In addition, and as demonstrated in our earlier pilot work, the HOME Cell also has the potential to support large-scale multivariate analyses in a distributed manner that does not require sharing of patient-level data. We believe our approach has great promise for supporting the reuse of clinical data for rapid, transparent, health outcome assessments on a national scale. Our efforts support multiple strategic goals including: (1) support for building national clinical and translational research capacity by enhancing a broadly adopted informatics tool (i2b2); (2) enhanced consortium-wide collaborations by offering a tool that can be easily shared within the CTSA network to support multi-institutional collaboration; and (3) improving the health of our communities by offering a tool that has the potential to provide new insights into health care processes and outcomes that could drive innovation and improvement activities.


2021 ◽  
Author(s):  
Chuan Hong ◽  
Everett Rush ◽  
Molei Liu ◽  
Doudou Zhou ◽  
Jiehuan Sun ◽  
...  

ABSTRACTObjectiveThe increasing availability of Electronic Health Record (EHR) systems has created enormous potential for translational research. Even with a working knowledge of EHR, it is difficult to know all the relevant codes related to a phenotype due to the large number of codes available. Traditional data mining approaches often require the use of patient-level data, which hinders the ability to share data across institutions to establish a cooperative and integrated knowledge network. In this project, we demonstrate that multi-center large-scale code embeddings can be used to efficiently identify relevant features related to a disease or condition of interest.MethodWe constructed large-scale code embeddings for a wide range of codified concepts, including diagnosis codes, medications, procedures, and laboratory tests from EHRs from two large medical centers. We developed knowledge extraction via sparse embedding regression (KESER) for feature selection and integrative network analysis based on the trained code embeddings. We evaluated the quality of the code embeddings and assessed the performance of KESER in feature selection for eight diseases. Besides, we developed an integrated clinical knowledge map combining embedding data from both institutions.ResultsThe features selected by KESER were comprehensive compared to lists of codified data generated by domain experts. Additionally, features identified automatically via KESER used in the development of phenotype algorithms resulted in comparable performance to those built upon features selected manually or identified via existing feature selection methods with patient-level data. The knowledge map created using an integrative analysis identified disease-disease and disease-drug pairs more accurately compared to those identified using single institution data.ConclusionAnalysis of code embeddings via KESER can effectively reveal clinical knowledge and infer relatedness among diseases, treatment, procedures, and laboratory measurement. This approach automates the grouping of clinical features facilitating studies of the condition. KESER bypasses the need for patient-level data in individual analyses providing a significant advance in enabling multi-center studies using EHR data.


2020 ◽  
Vol 41 (S1) ◽  
pp. s168-s169
Author(s):  
Rebecca Choudhury ◽  
Ronald Beaulieu ◽  
Thomas Talbot ◽  
George Nelson

Background: As more US hospitals report antibiotic utilization to the CDC, standardized antimicrobial administration ratios (SAARs) derived from patient care unit-based antibiotic utilization data will increasingly be used to guide local antibiotic stewardship interventions. Location-based antibiotic utilization surveillance data are often utilized given the relative ease of ascertainment. However, aggregating antibiotic use data on a unit basis may have variable effects depending on the number of clinical teams providing care. In this study, we examined antibiotic utilization from units at a tertiary-care hospital to illustrate the potential challenges of using unit-based antibiotic utilization to change individual prescribing. Methods: We used inpatient pharmacy antibiotic use administration records at an adult tertiary-care academic medical center over a 6-month period from January 2019 through June 2019 to describe the geographic footprints and AU of medical, surgical, and critical care teams. All teams accounting for at least 1 patient day present on each unit during the study period were included in the analysis, as were all teams prescribing at least 1 antibiotic day of therapy (DOT). Results: The study population consisted of 24 units: 6 ICUs (25%) and 18 non-ICUs (75%). Over the study period, the average numbers of teams caring for patients in ICU and non-ICU wards were 10.2 (range, 3.2–16.9) and 13.7 (range, 10.4–18.9), respectively. Units were divided into 3 categories by the number of teams, accounting for ≥70% of total patient days present (Fig. 1): “homogenous” (≤3), “pauciteam” (4–7 teams), and “heterogeneous” (>7 teams). In total, 12 (50%) units were “pauciteam”; 7 (29%) were “homogeneous”; and 5 (21%) were “heterogeneous.” Units could also be classified as “homogenous,” “pauciteam,” or “heterogeneous” based on team-level antibiotic utilization or DOT for specific antibiotics. Different patterns emerged based on antibiotic restriction status. Classifying units based on vancomycin DOT (unrestricted) exhibited fewer “heterogeneous” units, whereas using meropenem DOT (restricted) revealed no “heterogeneous” units. Furthermore, the average number of units where individual clinical teams prescribed an antibiotic varied widely (range, 1.4–12.3 units per team). Conclusions: Unit-based antibiotic utilization data may encounter limitations in affecting prescriber behavior, particularly on units where a large number of clinical teams contribute to antibiotic utilization. Additionally, some services prescribing antibiotics across many hospital units may be minimally influenced by unit-level data. Team-based antibiotic utilization may allow for a more targeted metric to drive individual team prescribing.Funding: NoneDisclosures: None


Sign in / Sign up

Export Citation Format

Share Document