Implementing Structured Clinical Templates at a Single Tertiary Hospital: Survey Study (Preprint)

BACKGROUND Electronic health record (EHR) systems have been widely adopted in hospitals. However, since current EHRs mainly focus on lowering the number of paper documents used, they have suffered from poor search function and reusability capabilities. To overcome these drawbacks, structured clinical templates have been proposed; however, they are not widely used owing to the inconvenience of data entry. OBJECTIVE This study aims to verify the usability of structured templates by comparing data entry times. METHODS A Korean tertiary hospital has implemented structured clinical templates with the modeling of clinical contents for the last 6 years. As a result, 1238 clinical content models (ie, body measurements, vital signs, and allergies) have been developed and 492 models for 13 clinical templates, including pathology reports, were applied to EHRs for clinical practice. Then, to verify the usability of the structured templates, data entry times from free-texts and four structured pathology report templates were compared using 4391 entries from structured data entry (SDE) log data and 4265 entries from free-text log data. In addition, a paper-based survey and a focus group interview were conducted with 23 participants from three different groups, including EHR developers, pathology transcriptionists, and clinical data extraction team members. RESULTS Based on the analysis of time required for data entry, in most cases, beginner users of the structured clinical templates required at most 70.18% more time for data entry. However, as users became accustomed to the templates, they were able to enter data more quickly than via free-text entry: at least 1 minute and 23 seconds (16.8%) up to 5 minutes and 42 seconds (27.6%). Interestingly, well-designed thyroid cancer pathology reports required 14.54% less data entry time from the beginning of the SDE implementation. In the interviews and survey, we confirmed that most of the interviewees agreed on the need for structured templates. However, they were skeptical about structuring all the items included in the templates. CONCLUSIONS The increase in initial elapsed time led users to hold a negative opinion of SDE, despite its benefits. To overcome these obstacles, it is necessary to structure the clinical templates for optimum use. In addition, user experience in terms of ease of data entry must be considered as an essential aspect in the development of structured clinical templates.

Download Full-text

Implementing Structured Clinical Templates at a Single Tertiary Hospital: Survey Study

JMIR Medical Informatics ◽

10.2196/13836 ◽

2020 ◽

Vol 8 (4) ◽

pp. e13836

Author(s):

Ji Eun Hwang ◽

Byung Ook Seoung ◽

Sang-Oh Lee ◽

Soo-Yong Shin

Keyword(s):

Data Extraction ◽

Vital Signs ◽

Data Entry ◽

Tertiary Hospital ◽

Survey Study ◽

Pathology Report ◽

Free Text ◽

Log Data ◽

Pathology Reports ◽

Time Required

Background Electronic health record (EHR) systems have been widely adopted in hospitals. However, since current EHRs mainly focus on lowering the number of paper documents used, they have suffered from poor search function and reusability capabilities. To overcome these drawbacks, structured clinical templates have been proposed; however, they are not widely used owing to the inconvenience of data entry. Objective This study aims to verify the usability of structured templates by comparing data entry times. Methods A Korean tertiary hospital has implemented structured clinical templates with the modeling of clinical contents for the last 6 years. As a result, 1238 clinical content models (ie, body measurements, vital signs, and allergies) have been developed and 492 models for 13 clinical templates, including pathology reports, were applied to EHRs for clinical practice. Then, to verify the usability of the structured templates, data entry times from free-texts and four structured pathology report templates were compared using 4391 entries from structured data entry (SDE) log data and 4265 entries from free-text log data. In addition, a paper-based survey and a focus group interview were conducted with 23 participants from three different groups, including EHR developers, pathology transcriptionists, and clinical data extraction team members. Results Based on the analysis of time required for data entry, in most cases, beginner users of the structured clinical templates required at most 70.18% more time for data entry. However, as users became accustomed to the templates, they were able to enter data more quickly than via free-text entry: at least 1 minute and 23 seconds (16.8%) up to 5 minutes and 42 seconds (27.6%). Interestingly, well-designed thyroid cancer pathology reports required 14.54% less data entry time from the beginning of the SDE implementation. In the interviews and survey, we confirmed that most of the interviewees agreed on the need for structured templates. However, they were skeptical about structuring all the items included in the templates. Conclusions The increase in initial elapsed time led users to hold a negative opinion of SDE, despite its benefits. To overcome these obstacles, it is necessary to structure the clinical templates for optimum use. In addition, user experience in terms of ease of data entry must be considered as an essential aspect in the development of structured clinical templates.

Download Full-text

A Synoptic Reporting System for Bone Marrow Aspiration and Core Biopsy Specimens

Archives of Pathology & Laboratory Medicine ◽

10.5858/2006-130-1825-asrsfb ◽

2006 ◽

Vol 130 (12) ◽

pp. 1825-1829 ◽

Cited By ~ 3

Author(s):

Manjula Murari ◽

Rakesh Pandey

Keyword(s):

Bone Marrow ◽

Reporting System ◽

Data Entry ◽

Structured Data ◽

Pathology Report ◽

Free Text ◽

Electronic Systems ◽

Report Generation ◽

Pathology Reports ◽

Synoptic Reporting

Abstract Context.—Advances in information technology have made electronic systems productive tools for pathology report generation. Structured data formats are recommended for better understanding of pathology reports by clinicians and for retrieval of pathology reports. Suitable formats need to be developed to include structured data elements for report generation in electronic systems. Objective.—To conform to the requirement of protocol-based reporting and to provide uniform and standardized data entry and retrieval, we developed a synoptic reporting system for generation of bone marrow cytology and histology reports for incorporation into our hospital information system. Design.—A combination of macro text, short preformatted templates of tabular data entry sheets, and canned files was developed using a text editor enabling protocol-based input. The system is flexible and has facility for appending free text entry. It also incorporates SNOMED coding and codes for teaching, research, and internal auditing. Results.—This synoptic reporting system is easy to use and adaptable. Features and advantages include pick-up text with defined choices, flexibility for appending free text, facility for data entry for protocol-based reports for research use, standardized and uniform format of reporting, comparable follow-up reports, minimized typographical and transcription errors, and saving on reporting time, thus helping shorten the turnaround time. Conclusions.—Simple structured pathology report templates are a powerful means for supporting uniformity in reporting as well as subsequent data viewing and extraction, particularly suitable to computerized reporting.

Download Full-text

Automating the Capture of Structured Pathology Data for Prostate Cancer Clinical Care and Research

JCO Clinical Cancer Informatics ◽

10.1200/cci.18.00084 ◽

2019 ◽

pp. 1-8 ◽

Cited By ~ 1

Author(s):

Anobel Y. Odisho ◽

Mark Bridge ◽

Mitchell Webb ◽

Niloufar Ameli ◽

Renu S. Eapen ◽

...

Keyword(s):

Prostate Cancer ◽

Gold Standard ◽

Clinical Care ◽

Data Entry ◽

Structured Data ◽

Lymph Node Status ◽

Free Text ◽

Routine Clinical Care ◽

Pathology Reports ◽

Pathology Data

Purpose Cancer pathology findings are critical for many aspects of care but are often locked away as unstructured free text. Our objective was to develop a natural language processing (NLP) system to extract prostate pathology details from postoperative pathology reports and a parallel structured data entry process for use by urologists during routine documentation care and compare accuracy when compared with manual abstraction and concordance between NLP and clinician-entered approaches. Materials and Methods From February 2016, clinicians used note templates with custom structured data elements (SDEs) during routine clinical care for men with prostate cancer. We also developed an NLP algorithm to parse radical prostatectomy pathology reports and extract structured data. We compared accuracy of clinician-entered SDEs and NLP-parsed data to manual abstraction as a gold standard and compared concordance (Cohen’s κ) between approaches assuming no gold standard. Results There were 523 patients with NLP-extracted data, 319 with SDE data, and 555 with manually abstracted data. For Gleason scores, NLP and clinician SDE accuracy was 95.6% and 95.8%, respectively, compared with manual abstraction, with concordance of 0.93 (95% CI, 0.89 to 0.98). For margin status, extracapsular extension, and seminal vesicle invasion, stage, and lymph node status, NLP accuracy was 94.8% to 100%, SDE accuracy was 87.7% to 100%, and concordance between NLP and SDE ranged from 0.92 to 1.0. Conclusion We show that a real-world deployment of an NLP algorithm to extract pathology data and structured data entry by clinicians during routine clinical care in a busy clinical practice can generate accurate data when compared with manual abstraction for some, but not all, components of a prostate pathology report.

Download Full-text

Automated Analysis and Codification of Free Text Pathology Reports

Blood ◽

10.1182/blood.v112.11.2394.2394 ◽

2008 ◽

Vol 112 (11) ◽

pp. 2394-2394

Author(s):

Bill Long ◽

Aliyah Rahemtullah ◽

Christiana E. Toomey ◽

Adam Ackerman ◽

Jeremy S. Abramson ◽

...

Keyword(s):

B Cell ◽

Cell Lymphoma ◽

B Cell Lymphoma ◽

Final Diagnosis ◽

Hematologic Malignancies ◽

Cancer Center ◽

Pathology Report ◽

Free Text ◽

Large B Cell Lymphoma ◽

Pathology Reports

Abstract Introduction: Epidemiologic research in hematologic malignancies has been dependent on three sources of data; SEER data, patients accrued to large clinical trials, and databases generated by individual providers or departments. Each of these sets of data have major limitations. In theory the adoption of computerized databases by pathology departments and the adoption of electronic medical records should have greatly expanded the ability to identify patients with particular types of malignancies. In practice the need to manually classify tens of thousands of free text pathology reports has made this resource unavailable. We have developed a computer program to extract and codify the final diagnoses described in free text pathology reports according to the World Health Organization (WHO) classification of hematologic malignancies. This enables us to identify sets of cases of desired pathologies for further study. Methods: A medical records review protocol was approved by the Dana Farber Harvard Cancer Center Institutional Review Board. Using our clinical lymphoma database we collected records of patients at Massachusetts General Hospital (MGH) Cancer Center who carried a diagnosis of follicular lymphoma (FL) and diffuse large B-cell lymphoma (DLBCL).The program is modified from one developed to extract diagnoses from discharge summaries in a different context [Long, AMIA 2007]. The approach is to use punctuation and a few words (conjunctions and some common verbs) to divide the text into phrases and then use a search procedure to find the most specific matching concepts in the UMLS (Unified Medical Language System). The search uses all of the alternate phrases for each concept as included in the UMLS normalized string table, matched against normalized subphrases from the text. We mapped UMLS concepts to the desired WHO concepts. To ensure a complete list of UMLS concepts we used the hierarchical relations in the UMLS and examined both more specific and more general concepts for possible inclusion in the search. Since not all diseases in a report are part of the diagnosis, we developed pattern matching procedures to identify the parts of the pathology report containing the final diagnosis (as opposed to “note” or “clinical data” that might have patient specific clinical information that are not diagnosed in the current pathology report). The program also uses a strategy for identifying modifiers that change the sense of the diagnosis (“suggestive of”, “rule out”, “no”, etc.) to exclude diseases that are absent or only possible. Results: We used the system to identify cases of FL and DLBCL and compared the results to lists of cases generated manually. The current program was 90% accurate in automatically classifying pathology reports as describing follicular lymphoma. Of 150 cases of FL, the program found 133 (eg, MALIGNANT LYMPHOMA, FOLLICLE CENTER) and 3 were in reports not available to the program (133/147 90%). Of 100 DLBCL cases, 76 were available and 63 were found (83%) (eg, HISTIOCYTE-RICH LARGE B-CELL LYMPHOMA). There were several reasons for the missed cases. Most commonly (13) the diagnosis was not in the identified diagnosis section, either because the section was divided by a note or the diagnosis was in an addendum. Only 3 cases had phrases not found in the UMLS. For 2 others, the program missed because the phrase was not contiguous (eg, B-CELL LYMPHOMA, CONSISTENT WITH FOLLICLE CENTER CELL TYPE). In 5 of the DLBCL cases the stated final diagnosis was more general than the desired diagnoses and 2 of the FL cases were listed as “strongly suggestive” which the program concluded was not a definite diagnosis. Discussion: This program is useful for identifying desired cohorts of cases and can be improved with better identification of the sections containing the diagnosis, the addition of a few missing phrases as they are discovered, and the addition of techniques for handling common discontinuities in disease descriptions (eg, allowing “consistent with” in the phrase). We have shown that very simple natural language techniques are sufficient to extract most of the desired disease descriptions from free text reports, enabling automatic selection of cases and greatly enhancing the usefulness of large repositories of pathology reports.

Download Full-text

Automated Classification of Free-text Pathology Reports for Registration of Incident Cases of Cancer

Methods of Information in Medicine ◽

10.3414/me11-01-0005 ◽

2012 ◽

Vol 51 (03) ◽

pp. 242-251 ◽

Cited By ~ 23

Author(s):

G. Defossez ◽

A. Burgun ◽

P. le Beux ◽

P. Levillain ◽

P. Ingrand ◽

...

Keyword(s):

Supervised Machine Learning ◽

Cancer Registration ◽

International Agency ◽

Support Vector ◽

Svm Classifier ◽

Pathology Report ◽

Free Text ◽

Multiple Primary Cancer ◽

Pathology Reports ◽

Incident Cases

SummaryObjective: Our study aimed to construct and evaluate functions called “classifiers”, produced by supervised machine learning techniques, in order to categorize automatically pathology reports using solely their content.Methods: Patients from the Poitou-Charentes Cancer Registry having at least one pathology report and a single non-metastatic invasive neoplasm were included. A descriptor weighting function accounting for the distribution of terms among targeted classes was developed and compared to classic methods based on inverse document frequencies. The classification was performed with support vector machine (SVM) and Naive Bayes classifiers. Two levels of granularity were tested for both the topographical and the morphological axes of the ICD-O3 code. The ability to correctly attribute a precise ICD-O3 code and the ability to attribute the broad category defined by the International Agency for Research on Cancer (IARC) for the multiple primary cancer registration rules were evaluated using F1-measures.Results: 5121 pathology reports produced by 35 pathologists were selected. The best performance was achieved by our class-weighted descriptor, associated with a SVM classifier. Using this method, the pathology reports were properly classified in the IARC categories with F1-measures of 0.967 for both topography and morphology. The ICD-O3 code attribution had lower performance with a 0.715 F1-measure for topography and 0.854 for morphology.Conclusion: These results suggest that free-text pathology reports could be useful as a data source for automated systems in order to identify and notify new cases of cancer. Future work is needed to evaluate the improvement in performance obtained from the use of natural language processing, including the case of multiple tumor description and possible incorporation of other medical documents such as surgical reports.

Download Full-text

A Comparative Analysis of Active Learning for Biomedical Text Mining

Applied System Innovation ◽

10.3390/asi4010023 ◽

2021 ◽

Vol 4 (1) ◽

pp. 23

Author(s):

Usman Naseem ◽

Matloob Khushi ◽

Shah Khalid Khan ◽

Kamran Shaukat ◽

Mohammad Ali Moni

Keyword(s):

Machine Learning ◽

Active Learning ◽

De Novo ◽

Comparative Investigation ◽

Free Text ◽

Machine Learning Applications ◽

Text Information ◽

Pathology Reports ◽

Time Required ◽

The Cost

An enormous amount of clinical free-text information, such as pathology reports, progress reports, clinical notes and discharge summaries have been collected at hospitals and medical care clinics. These data provide an opportunity of developing many useful machine learning applications if the data could be transferred into a learn-able structure with appropriate labels for supervised learning. The annotation of this data has to be performed by qualified clinical experts, hence, limiting the use of this data due to the high cost of annotation. An underutilised technique of machine learning that can label new data called active learning (AL) is a promising candidate to address the high cost of the label the data. AL has been successfully applied to labelling speech recognition and text classification, however, there is a lack of literature investigating its use for clinical purposes. We performed a comparative investigation of various AL techniques using ML and deep learning (DL)-based strategies on three unique biomedical datasets. We investigated random sampling (RS), least confidence (LC), informative diversity and density (IDD), margin and maximum representativeness-diversity (MRD) AL query strategies. Our experiments show that AL has the potential to significantly reducing the cost of manual labelling. Furthermore, pre-labelling performed using AL expediates the labelling process by reducing the time required for labelling.

Download Full-text

The Registry Case Finding Engine (CaFE): An automated approach for cancer patient identification from unstructured, free-text pathology reports

Journal of Clinical Oncology ◽

10.1200/jco.2006.24.18_suppl.6080 ◽

2006 ◽

Vol 24 (18_suppl) ◽

pp. 6080-6080

Author(s):

D. A. Hanauer ◽

A. M. Chinnaiyan ◽

G. Miela ◽

D. W. Blayney

Keyword(s):

Cancer Registry ◽

Basal Cell ◽

Pathology Report ◽

Free Text ◽

Coding System ◽

Patient Identification ◽

Time Saving ◽

Custom Made ◽

Basal Cell Carcinomas ◽

Pathology Reports

6080 Background: A vital component to maintaining an accurate cancer registry is the identification of patients with cancer. The University of Michigan Cancer Registry identifies more than 90% of all registry patients by manually reading free-text pathology reports and their associated SNOMED codes. This method is labor and time intensive and is subject to errors of omission. Methods: We created an application to scan free-text pathology reports to identify cases of interest to the registry. It uses a custom-made list of approximately 3,300 words, phrases, and SNOMED codes to positively identify relevant cases and to eliminate non-relevant cases, including those which may mention cancer-related terms. Experienced registrars reviewed 2,451 pathology reports and marked cases of interest to the registry; this served as the gold standard. These reports were also analyzed by the Registry CaFE. The time required for case identification was recorded for both processes. Results: Experienced registrars marked 795 (32.4%) cases as being of interest compared to the CaFE which marked 1,009 (41.1%). The sensitivity of the CaFE was 100% whereas the specificity was 87.1%. An analysis of the 214 errors made by the CaFE revelead that 30 cases (14%) were due to incorrect SNOMED codes assigned by our auto-coding system (Cerner Corporation, Kansas City, MO) and 89 (41.6%) were either skin squamous or basal cell carcinomas (most non-melanomatous skin cancers are not tracked in the registry). Registrars required an average of 21 seconds per pathology report whereas the Registry CaFE processed each report in less than a second. Conclusions: The Registry CaFE identified all relevant cases and correctly eliminated most cases that were not important; it is both effective and time-saving. Future efforts directed at improving the CaFE for squamous and basal cell carcinomas would yield the largest improvement in accuracy. No significant financial relationships to disclose.

Download Full-text

Correction of Misspellings and Typographical Errors in a Free-Text Medical English Information Storage and Retrieval System

Methods of Information in Medicine ◽

10.1055/s-0038-1636470 ◽

1979 ◽

Vol 18 (04) ◽

pp. 228-234 ◽

Cited By ~ 12

Author(s):

D. M. Joseph ◽

Ruth L. Wong

Keyword(s):

Error Correction ◽

Information Storage And Retrieval ◽

Information Storage ◽

Free Text ◽

Storage And Retrieval ◽

Large Teaching Hospital ◽

Interactive Application ◽

Pathology Reports ◽

Error Correction Algorithm ◽

Time Required

The errors studied are misspellings and typographical errors made by the physician house staff, surgical pathologists, and secretary/typists of a large teaching hospital. The 6,019 errors studies were encountered in the compilation of a LEXICON now containing 24,135 medical and non-medical terms (including errors) from Tissue Examination Request Forms and Surgical Pathology Reports. An automated error correction algorithm was sought to reduce the tedious task of manual encoding of errors, and eliminate the need for storing errors occupying 24.9% of the LEXICON storage space. The errors were classified into 23 types, and it was found that 84.2% of the errors were in the 11 first order categories.Existing error correction algorithms were analyzed with respect to possible application to our medical sample. Two were selected for experimentation, the Baskin-Selfridge algorithm and SOUNDEX. Results showed that Baskin-Selfridge worked quite well, but was too slow to be applied singularly. SOUNDEX was reasonable in speed, but had too many mismatches to be applied singularly in a non-interactive application. SOUNDEX was modified phonologically and with respect to code length in various ways and some experimental data showed improvements.The optimal design for the medical LEXICON sample appears to be a two-step process. The modified version of SOUNDEX will quickly select the most likely corrections for the error (experimental average is 2.38 choices/error). Then the Baskin-Selfridge will decide which, if any, is the actual correct form of the error. By only considering a very small number of choices, the time required for the Baskin-Selfridge algorithm becomes trivial.On the basis of experimental results, it is estimated that this combination will reduce manual encoding of errors by 60—70% and reduce the storage required for the LEXICON by approximately 15%.

Download Full-text

Timeliness of breast tissue pathology reports: Quality thresholds for breast centers.

Journal of Clinical Oncology ◽

10.1200/jco.2011.29.27_suppl.204 ◽

2011 ◽

Vol 29 (27_suppl) ◽

pp. 204-204

Author(s):

C. S. Kaufman ◽

L. D. Shockney ◽

J. Landercasper ◽

B. Rabinowitz ◽

J. B. Askew ◽

...

Keyword(s):

Data Entry ◽

Geographic Location ◽

Threshold Level ◽

Mean Value ◽

Pathology Report ◽

Surgical Biopsy ◽

National Quality ◽

Breast Centers ◽

Pathology Reports ◽

Quality Threshold

204 Background: The National Quality Measures for Breast Centers (NQMBC) was started in 2005 to provide quality metrics for breast centers. Each center enters confidential aggregate results to questions regarding the process of breast care. Each data entry can be immediately compared with all other centers. We reviewed data regarding the timeliness of pathology reports after tissue sampling to look for variation among different types of breast centers. Methods: We reviewed the following measures: 1) Time between needle/core biopsy and availability of pathology report; 2) Time between surgical biopsy and availability of pathology report; and 3) Time between initial cancer surgery and availability of pathology report. We looked for variation in results according to structural differences of breast centers. Results: There were 391 center entries reviewing over 21,000 patient encounters between 2005 and 2011. Results included percentiles at 10, 25, 50, 75, and 90 and mean value. Overall, pathology reports for needle biopsy were available much sooner (mean 1.4 days) than any surgical biopsy or excision (2.1, 2.4 days). Results were not altered according to demographic characteristics (type of center, cancer volume, financial structure, size of city and geographic location). Conclusions: The timeliness of pathology reporting of breast specimens is very consistent throughout the participating NQMBC centers, regardless of type and size of breast center. In view of the consistency of those results, a threshold level of care (10th percentile) may be identified above which all centers should be expected to perform. This minimum level of performance is called the quality threshold. Centers that do not perform above this quality threshold should be ready to explain why their performance falls below this level. Centers who surpass this quality threshold should strive for a quality goal (50th percentile). We expect that by setting both the quality threshold and the quality goal for each measurement, we will have ongoing encouragement for achievable improvement for each measure.

Download Full-text

Comparison of Machine Learning Algorithms for the Prediction of Current Procedural Terminology (CPT) Codes from Pathology Reports

10.1101/2021.03.13.21253502 ◽

2021 ◽

Author(s):

Joshua Levy ◽

Nishitha Vattikonda ◽

Christian Haudenschild ◽

Brock Christensen ◽

Louis Vaickus

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Machine Learning Algorithms ◽

Pathology Report ◽

Free Text ◽

Medical Procedure ◽

Current Procedural Terminology ◽

Learning Methods ◽

Clinical Narrative ◽

Pathology Reports

AbstractBackgroundPathology reports serve as an auditable trail of a patient’s clinical narrative containing important free text pertaining to diagnosis, prognosis and specimen processing. Recent works have utilized sophisticated natural language processing (NLP) pipelines which include rule-based or machine learning analytics to uncover patterns from text to inform clinical endpoints and biomarker information. While deep learning methods have come to the forefront of NLP, there have been limited comparisons with the performance of other machine learning methods in extracting key insights for prediction of medical procedure information (Current Procedural Terminology; CPT codes), that informs insurance claims, medical research, and healthcare policy and utilization. Additionally, the utility of combining and ranking information from multiple report subfields as compared to exclusively using the diagnostic field for the prediction of CPT codes and signing pathologist remains unclear.MethodsAfter passing pathology reports through a preprocessing pipeline, we utilized advanced topic modeling techniques such as UMAP and LDA to identify topics with diagnostic relevance in order to characterize a cohort of 93,039 pathology reports at the Dartmouth-Hitchcock Department of Pathology and Laboratory Medicine (DPLM). We separately compared XGBoost, SVM, and BERT methodologies for prediction of 38 different CPT codes using 5-fold cross validation, using both the diagnostic text only as well as text from all subfields. We performed similar analyses for characterizing text from a group of the twenty pathologists with the most pathology report sign-outs. Finally, we interpreted report and cohort level important words using TF-IDF, Shapley Additive Explanations (SHAP), attention, and integrated gradients.ResultsWe identified 10 topics for both the diagnostic-only and all-fields text, which pertained to diagnostic and procedural information respectively. The topics were associated with select CPT codes, pathologists and report clusters. Operating on the diagnostic text alone, XGBoost performed similarly to BERT for prediction of CPT codes. When utilizing all report subfields, XGBoost outperformed BERT for prediction of CPT codes, though XGBoost and BERT performed similarly for prediction of signing pathologist. Both XGBoost and BERT outperformed SVM. Utilizing additional subfields of the pathology report increased prediction accuracy for the CPT code and pathologist classification tasks. Misclassification of pathologist was largely subspecialty related. We identified text that is CPT and pathologist specific.ConclusionsOur approach generated CPT code predictions with an accuracy higher than that reported in previous literature. While diagnostic text is an important information source for NLP pipelines in pathology, additional insights may be extracted from other report subfields. Although deep learning approaches did not outperform XGBoost approaches, they may lend valuable information to pipelines that combine image, text and -omics information. Future resource-saving opportunities exist for utilizing pathology reports to help hospitals detect mis-billing and estimate productivity metrics that pertain to pathologist compensation (RVU’s).

Download Full-text