Machine Learning increases efficiency and timeliness of National Procedural Clinical Standards KPI reporting (Preprint)
BACKGROUND Background: Quality Assurance activities are frequently dependent on manual assessment of text-based records. Increasingly, these records have digital structures that may be amenable to computer analysis. We used the Australian Commission for Safety and Quality in Healthcare (ACSQHC) National Clinical Care Colonoscopy standard reporting requirement as a proof of concept for an analytics process to streamline and reduce manual reporting overheads. The endoscopy unit performs approximately 4,500 colonoscopies (mainly outpatient) per year. Quarterly reporting of colonoscopy outcomes requires approximately 30 hours of manual data abstraction, collation and combination from a variety of electronic databases. The most time consuming is manual retrieval and abstraction of histopathology records from the EMR. OBJECTIVE 1. To reduce the manual overheads of quarterly National Standards KPI reporting for colonoscopy compliance using an automated data pipeline and Artificial Intelligence tools. 2. The service also wished to minimise the risk of failure to follow up in new cancer diagnoses for outpatient colonoscopies. 3. To develop a data and analytic pipeline that would be easily re-purposed for additional standards, audit and research projects. METHODS A data pipeline and analysis environment were established in the hospitals’ secure Microsoft Azure databricks resource. A Training data set of 1000 colonoscopies was extracted using from the procedural Provation database using the the ProvationMD ® reporting tool and linked to relevant histopathology reports provided from the Clinical Research Data Warehouse (CRDW). The Machine Learning (ML) training data set was created when histopathological reports were manually coded by Gastroenterology Registrars & nurses into the following categories: Adenoma Clinically Significant Sessile Serrated Adenoma Cancer Adequate Bowel Preparation Complete examination A variety of Natural Language Processing (NLP) & ML models were assessed and refined to minimize error rate. Sensitivity was prioritised for the diagnosis of Cancer to minimize missed cases. Reporting to clinicians and quality co-ordinators was established using Microsoft Power BI. RESULTS The Naïve Bayes model for multinomial data resulted in high accuracy, but impacted recall. Sensitivity improved using a virtual ensemble approach, layering models within the processing pipeline and maximised using Microsoft’s ® Text Analytics – Healthcare NLP model with our custom Naïve Bayes model. F1 scores between 0.89 and 0.93 were achieved. The algorithm checks daily for new data and performs the analysis. Quarterly analysis and reporting time decreased from 30 hours to less than 5 minutes and reports can now be continuously updated in the Microsoft Power BI reporting portal. CONCLUSIONS Advanced analytic techniques can be deployed for mandatory quality reporting in a secure, cloud based, hospital data domain. The cost was far less than the manual processes it replaces. Reporting is more timely as it is automated. The potential for training such algorithms for other QA reporting is high. Text based research and audit within the free text domain of the EMR clinical documentation also becomes possible. CLINICALTRIAL Not applicable