Abstract
Background
The District Health Information Software 2 (DHIS2) is widely used by countries for national-level aggregate reporting of health data. To best leverage DHIS2 data for decision-making, countries need to ensure that data within their systems are of the highest quality. Comprehensive, systematic and transparent data cleaning approaches form a core component of preparing DHIS2 data for use. Unfortunately, there is paucity of exhaustive and systematic descriptions of data cleaning processes employed on DHIS2-based data. In this paper, we describe results of systematic data cleaning approach applied on a national-level DHIS2 instance, using Kenya as the case example.
Methods
Broeck et al’s framework, involving repeated cycles of a three-phase process (data screening, data diagnosis and data treatment), was employed on six HIV indicator reports collected monthly from all care facilities in Kenya from 2011 to 2018. This resulted to repeated facility reporting instances. Quality dimensions evaluated included reporting rate, reporting timeliness, and indicator completeness of submitted reports each done per facility per year. The various error types were categorized, and Friedman analyses of variance conducted to examine differences in distribution of facilities by error types. Data cleaning was done during the treatment phases.
Results
A generic five-step data cleaning sequence was developed and applied in cleaning HIV indicator data reports extracted from DHIS2. Initially, 93,179 facility reporting instances were extracted from year 2011 to 2018. 50.23% of these instances submitted no reports and were removed. Of the remaining reporting instances, there was over reporting in 0.03%. Quality issues related to timeliness included scenarios where reports were empty or had data but were never on time. Percentage of reporting instances in these scenarios varied by reporting type. Of submitted reports empty reports also varied by report type and ranged from 1.32–18.04%. Report quality varied significantly by facility distribution (p = 0.00) and report type.
Conclusions
The case instance of Kenya reveals significant data quality issues for HIV reported data that were not detected by the inbuilt error detection procedures within DHIS2. More robust and systematic data cleaning processes should be integrated to current DHIS2 implementations to ensure highest quality data.