scholarly journals Data Cleaning Process for HIV Indicator Data Extracted from DHIS2 National Reporting System: Case Example of Kenya

2020 ◽  
Author(s):  
Milka Gesicho ◽  
Ankica Babic ◽  
Martin Were

Abstract Background The District Health Information Software 2 (DHIS2) is widely used by countries for national-level aggregate reporting of health data. To best leverage DHIS2 data for decision-making, countries need to ensure that data within their systems are of the highest quality. Comprehensive, systematic and transparent data cleaning approaches form a core component of preparing DHIS2 data for use. Unfortunately, there is paucity of exhaustive and systematic descriptions of data cleaning processes employed on DHIS2-based data. In this paper, we describe results of systematic data cleaning approach applied on a national-level DHIS2 instance, using Kenya as the case example. Methods Broeck et al’s framework, involving repeated cycles of a three-phase process (data screening, data diagnosis and data treatment), was employed on six HIV indicator reports collected monthly from all care facilities in Kenya from 2011 to 2018. This resulted to repeated facility reporting instances. Quality dimensions evaluated included reporting rate, reporting timeliness, and indicator completeness of submitted reports each done per facility per year. The various error types were categorized, and Friedman analyses of variance conducted to examine differences in distribution of facilities by error types. Data cleaning was done during the treatment phases. Results A generic five-step data cleaning sequence was developed and applied in cleaning HIV indicator data reports extracted from DHIS2. Initially, 93,179 facility reporting instances were extracted from year 2011 to 2018. 50.23% of these instances submitted no reports and were removed. Of the remaining reporting instances, there was over reporting in 0.03%. Quality issues related to timeliness included scenarios where reports were empty or had data but were never on time. Percentage of reporting instances in these scenarios varied by reporting type. Of submitted reports empty reports also varied by report type and ranged from 1.32–18.04%. Report quality varied significantly by facility distribution (p = 0.00) and report type. Conclusions The case instance of Kenya reveals significant data quality issues for HIV reported data that were not detected by the inbuilt error detection procedures within DHIS2. More robust and systematic data cleaning processes should be integrated to current DHIS2 implementations to ensure highest quality data.

Author(s):  
Milka Bochere Gesicho ◽  
Martin Chieng Were ◽  
Ankica Babic

Abstract Background The District Health Information Software-2 (DHIS2) is widely used by countries for national-level aggregate reporting of health-data. To best leverage DHIS2 data for decision-making, countries need to ensure that data within their systems are of the highest quality. Comprehensive, systematic, and transparent data cleaning approaches form a core component of preparing DHIS2 data for analyses. Unfortunately, there is paucity of exhaustive and systematic descriptions of data cleaning processes employed on DHIS2-based data. The aim of this study was to report on methods and results of a systematic and replicable data cleaning approach applied on HIV-data gathered within DHIS2 from 2011 to 2018 in Kenya, for secondary analyses. Methods Six programmatic area reports containing HIV-indicators were extracted from DHIS2 for all care facilities in all counties in Kenya from 2011 to 2018. Data variables extracted included reporting rate, reporting timeliness, and HIV-indicator data elements per facility per year. 93,179 facility-records from 11,446 health facilities were extracted from year 2011 to 2018. Van den Broeck et al.’s framework, involving repeated cycles of a three-phase process (data screening, data diagnosis and data treatment), was employed semi-automatically within a generic five-step data-cleaning sequence, which was developed and applied in cleaning the extracted data. Various quality issues were identified, and Friedman analysis of variance conducted to examine differences in distribution of records with selected issues across eight years. Results Facility-records with no data accounted for 50.23% and were removed. Of the remaining, 0.03% had over 100% in reporting rates. Of facility-records with reporting data, 0.66% and 0.46% were retained for voluntary medical male circumcision and blood safety programmatic area reports respectively, given that few facilities submitted data or offered these services. Distribution of facility-records with selected quality issues varied significantly by programmatic area (p < 0.001). The final clean dataset obtained was suitable to be used for subsequent secondary analyses. Conclusions Comprehensive, systematic, and transparent reporting of cleaning-process is important for validity of the research studies as well as data utilization. The semi-automatic procedures used resulted in improved data quality for use in secondary analyses, which could not be secured by automated procedures solemnly.


2020 ◽  
Author(s):  
Milka Gesicho ◽  
Martin Were ◽  
Ankica Babic

Abstract Background: The District Health Information Software-2 (DHIS2) is widely used by countries for national-level aggregate reporting of health-data. To best leverage DHIS2 data for decision-making, countries need to ensure that data within their systems are of the highest quality. Comprehensive, systematic, and transparent data cleaning approaches form a core component of preparing DHIS2 data for analyses. Unfortunately, there is paucity of exhaustive and systematic descriptions of data cleaning processes employed on DHIS2-based data. The aim of this study was to report on methods and results of a systematic and replicable data cleaning approach applied on HIV-data gathered within DHIS2 from 2011 to 2018 in Kenya, for secondary analyses. Methods: Six programmatic area reports containing HIV-indicators were extracted from DHIS2 for all care facilities in all counties in Kenya from 2011 to 2018. Data variables extracted included reporting rate, reporting timeliness, and HIV-indicator data elements per facility per year. 93,179 facility-records from 11,446 health facilities were extracted from year 2011 to 2018. Van den Broeck et al’s framework, involving repeated cycles of a three-phase process (data screening, data diagnosis and data treatment), was employed semi-automatically within a generic five-step data-cleaning sequence, which was developed and applied in cleaning the extracted data. Various quality issues were identified, and Friedman analysis of variance conducted to examine differences in distribution of records with selected issues across eight years. Results: Facility-records with no data accounted for 50.23% and were removed. Of the remaining, 0.03% had over 100% in reporting rates. Of facility-records with reporting data, 0.66% and 0.46% were retained for voluntary medical male circumcision and blood safety programmatic area reports respectively, given that few facilities submitted data or offered these services. Distribution of facility-records with selected quality issues varied significantly by programmatic area (p<0.001). The final clean dataset obtained was suitable to be used for subsequent secondary analyses. Conclusions: Comprehensive, systematic, and transparent reporting of cleaning-process is important for validity of the research studies as well as data utilization. The semi-automatic procedures used resulted in improved data quality for use in secondary analyses, which could not be secured by automated procedures solemnly.


2017 ◽  
Vol 871 ◽  
pp. 52-59
Author(s):  
Christian Sand ◽  
Stephanie Kawan ◽  
Tobias Lechler ◽  
Manuel Neher ◽  
Daniel Schweigert ◽  
...  

Conventional serial and workshop productions use specific parameter ranges to evaluate the quality of a process. Our research showed that parameters within tolerances do not ensure good quality of the final product due to malicious parameter combinations along the assembly line. Therefore, data sets from assembly processes like force-way or force-time curves and quality measurements are evaluated in this novel approach. Using Fourier Transform, k-means, decision trees and a dynamic envelope curve, classification and process monitoring are processed in time and frequency domain. This enables new possibilities to characterize quality and process data, for advanced error detection as well as a more simplified tracing of faults. Here, holistic optimization and monitoring follows two strategies. First, a simplified tracing approach of malicious impacts regards quality results from test benches. Therefore, assembly processes are monitored and characterized by quality data. Second, defective influences, like tool break or calibration errors, are linked to variations of the usual process behavior. Here, the error detection approach focuses on process data from single assembly stations. This approach uses three different methods. First, Fourier Transform extracts additional information from process, energy and quality data. Second, k-means algorithm is used to cluster quality data and extend the data base. Third, a decision tree classifies the quality of the final good and characterizes assembly processes. Last, results of k-means clustering and selected classification methods are compared. This combination allows to increase process quality, improve product quality and reduce failure costs.


Energies ◽  
2021 ◽  
Vol 14 (2) ◽  
pp. 304
Author(s):  
Sakthivel Ganesan ◽  
Prince Winston David ◽  
Praveen Kumar Balachandran ◽  
Devakirubakaran Samithas

Since most of our industries use induction motors, it is essential to develop condition monitoring systems. Nowadays, industries have power quality issues such as sag, swell, harmonics, and transients. Thus, a condition monitoring system should have the ability to detect various faults, even in the presence of power quality issues. Most of the fault diagnosis and condition monitoring methods proposed earlier misidentified the faults and caused the condition monitoring system to fail because of misclassification due to power quality. The proposed method uses power quality data along with starting current data to identify the broken rotor bar and bearing fault in induction motors. The discrete wavelet transform (DWT) is used to decompose the current waveform, and then different features such as mean, standard deviation, entropy, and norm are calculated. The neural network (NN) classifier is used for classifying the faults and for analyzing the classification accuracy for various cases. The classification accuracy is 96.7% while considering power quality issues, whereas in a typical case, it is 93.3%. The proposed methodology is suitable for hardware implementation, which merges mean, standard deviation, entropy, and norm with the consideration of power quality issues, and the trained NN proves stable in the detection of the rotor and bearing faults.


2018 ◽  
Vol 34 (3) ◽  
pp. 581-597 ◽  
Author(s):  
Asaph Young Chun ◽  
Steven G. Heeringa ◽  
Barry Schouten

Abstract We discuss an evidence-based approach to guiding real-time design decisions during the course of survey data collection. We call it responsive and adaptive design (RAD), a scientific framework driven by cost-quality tradeoff analysis and optimization that enables the most efficient production of high-quality data. The notion of RAD is not new; nor is it a silver bullet to resolve all the difficulties of complex survey design and challenges. RAD embraces precedents and variants of responsive design and adaptive design that survey designers and researchers have practiced over decades. In this paper, we present the four pillars of RAD: survey process data and auxiliary information, design features and interventions, explicit quality and cost metrics, and a quality-cost optimization tailored to survey strata. We discuss how these building blocks of RAD are addressed by articles published in the 2017 JOS special issue and this special section. It is a tale of the three perspectives filling in each other. We carry over each of these three perspectives to articulate the remaining challenges and opportunities for the advancement of RAD. We recommend several RAD ideas for future research, including survey-assisted population modeling, rigorous optimization strategies, and total survey cost modeling.


2021 ◽  
Author(s):  
Patrick Sullivan ◽  
Cory R Woodyatt ◽  
Oskian Kouzouian ◽  
Kristen Parrish ◽  
Jennifer Taussig ◽  
...  

UNSTRUCTURED Objectives: America’s HIV Epidemic Analysis Dashboard (AHEAD) is a data visualization tool that displays relevant data on the 6 HIV indicators provided by CDC that can be used to monitor progress towards ending the HIV epidemic in local communities across the U.S. The objective of AHEAD is to make data available to stakeholders that can be used to measure national and local progress towards 2025 and 2030 Ending the HIV Epidemic in the U.S. (EHE) goals and to help jurisdictions make local decisions that are grounded in high-quality data. Methods: AHEAD displays data from public health data systems (e.g., surveillance systems, Census data), organized around the six EHE indicators (incidence, knowledge of status, diagnoses, linkage to HIV medical care, viral suppression, and PrEP coverage). Data are displayed for each of the EHE priority areas (48 counties Washington, D.C. and San Juan, PR) which accounted for more than 50% of all U.S. HIV diagnoses in 2016 and 2017 and seven primarily Southern states with high rates of HIV in rural communities. AHEAD also displays data for the 43 remaining states for which data are available. Data features prioritize interactive data-visualization tools that allow users to compare indicator data stratified by sex at birth, race, age, and transmission category within a jurisdiction (when available) or compare data on EHE indicators between jurisdictions. Results: AHEAD was launched on August 14, 2020. In the 11 months since its launch, the Dashboard has been visited 26,591 times by 17,600 unique users. About a third of all users returned to the Dashboard at least once. On average, users engaged with 2.4 pages during their visit to the Dashboard, indicating that the average user goes beyond the informational landing page to engage with one or more pages of data and content. The most frequently visited content pages are the Jurisdictions webpages. Conclusions: The Ending the HIV Epidemic plan is described as a “whole of society” effort. Societal public health initiatives require objective indicators and require that all societal stakeholders have transparent access to indicator data at the level of the health jurisdictions responsible for meeting the goals of the plan. Data transparency empowers local stakeholders to track movement towards EHE goals, identify areas with needs for improvement, make data-informed adjustments to deploy the expertise and resources required to locally tailor and implement strategies to end the HIV epidemic in their jurisdiction.


2021 ◽  
Vol 3 (1) ◽  
pp. 23-34
Author(s):  
Surna Lastri ◽  
Fitri Yunina ◽  
Masriani Masriani

This study aims to analyse factors that influence financial report quality. Data collection was done using a questionnaire in Regional Financial Management Agency Aceh Barat District. The population was employee at the finance department. The data were analysed using multiple logistic regression. The result shows that employment of financial accounting standards (FAS) and government employee competencies are partially and simultaneously influence the quality of the financial report.


Sign in / Sign up

Export Citation Format

Share Document