scholarly journals The Use of Density-Based Spatial Clustering of Application With Noise (DBSCAN) for Record Linkage in An Observational HIV Cohort

Author(s):  
Victor Olago ◽  
Lina Bartels ◽  
Tafadzwa Dhokotera ◽  
Lina Bartels ◽  
Julia Bohlius ◽  
...  

IntroductionThe South African HIV Cancer Match (SAM) study is a probabilistic record linkage study involving creation of an HIV cohort from laboratory records from the National Health Laboratory Service (NHLS). This cohort was linked to the pathology based South African National Cancer Registry to establish cancer incidences among HIV positive population in South Africa. As the number of HIV records increases, there is need for more efficient ways of de-duplicating this big-data. In this work, we used clustering to perform big-data deduplication. Objectives and ApproachOur objective was to use DBSCAN as clustering algorithm together with bi-gram word analyser to perform big-data deduplication in resource-limited settings. We used HIV related laboratory records from entire South Africa collated in the NHLS Corporate Data Warehouse for period 2004-2014. This involved data pre-processing, deterministic deduplication, ngrams generation, features generation using Term Frequency Inverse Document Frequency vectorizer, clustering using DBSCAN and assigning cluster labels for records that potentially belonged to the same person. We used records with national identification numbers to assess quality of deduplication by calculating precision, recall and f-measure. ResultsWe had 51,563,127 HIV related laboratory records. Deterministic deduplication resulted in 20,387,819 patient record deduplicates. With DBSCAN clustering we further reduced this to 14,849,524 patient record clusters. In this final dataset, 3,355,544 (22.60%) patients had negative HIV test, 11,316,937 (76.21%) had evidence for HIV infection, and for 177,043 (1.19%) the HIV status could not be determined. The precision, recall and f-measure based on 1,865,445 records with national identification numbers were 0.96, 0.94 and 0.95, respectively. Conclusion / ImplicationsOur study demonstrated that DBSCAN clustering is an effective way of deduplicating big datasets in resource-limited settings. This enabled refining of an HIV observational database by accurately linking test records that potentially belonged to the same person. The methodology creates opportunities for easy data profiling to inform public health decision making.

Vaccine ◽  
2019 ◽  
Vol 37 (1) ◽  
pp. 25-33 ◽  
Author(s):  
Meredith L. McMorrow ◽  
Stefano Tempia ◽  
Sibongile Walaza ◽  
Florette K. Treurnicht ◽  
Wayne Ramkrishna ◽  
...  

2020 ◽  
Author(s):  
Youngji Jo ◽  
Lise Jamieson ◽  
Ijeoma Edoka ◽  
Lawrence Long ◽  
Sheetal Silal ◽  
...  

Background South Africa recently experienced a first peak in COVID-19 cases and mortality. Dexamethasone and remdesivir both have the potential to reduce COVID-related mortality, but their cost-effectiveness in a resource-limited setting with scant intensive care resources is unknown. Methods We projected intensive care unit (ICU) needs and capacity from August 2020 to January 2021 using the South African National COVID-19 Epi Model. We assessed cost-effectiveness of 1) administration of dexamethasone to ventilated patients and remdesivir to non-ventilated patients, 2) dexamethasone alone to both non-ventilated and ventilated patients, 3) remdesivir to non-ventilated patients only, and 4) dexamethasone to ventilated patients only; all relative to a scenario of standard care. We estimated costs from the healthcare system perspective in 2020 USD, deaths averted, and the incremental cost effectiveness ratios of each scenario. Results Remdesivir for non-ventilated patients and dexamethasone for ventilated patients was estimated to result in 1,111 deaths averted (assuming a 0-30% efficacy of remdesivir) compared to standard care, and save $11.5 million. The result was driven by the efficacy of the drugs, and the reduction of ICU-time required for patients treated with remdesivir. The scenario of dexamethasone alone to ventilated and non-ventilated patients requires additional $159,000 and averts 1,146 deaths, resulting in $139 per death averted, relative to standard care. Conclusions The use of dexamethasone for ventilated and remdesivir for non-ventilated patients is likely to be cost-saving compared to standard care. Given the economic and health benefits of both drugs, efforts to ensure access to these medications is paramount.


2010 ◽  
Vol 57 (2) ◽  
pp. 109-119 ◽  
Author(s):  
K. K. Venkatesh ◽  
G. de Bruyn ◽  
E. Marinda ◽  
K. Otwombe ◽  
R. van Niekerk ◽  
...  

Author(s):  
Mazvita Sengayi ◽  
Adrian Spörri ◽  
Eliane Rohner ◽  
Michael Vinikoor ◽  
Hans Prozesky ◽  
...  

ABSTRACT BackgroundSub-Saharan Africa is the region most heavily affected by the HIV/AIDS epidemic. HIV increases the risk of developing cancer but the ascertainment of cancers in patients attending antiretroviral therapy (ART) treatment programs might be incomplete. To estimate the under-ascertainment of cancer we compared incidence rates of AIDS-defining cancers in South African HIV cohorts with and without cancer case ascertainment through record linkage with the National Cancer Registry. MethodsWe used the data of adult (≥16 years) HIV-positive persons receiving care between 2004 and 2011 at one of four ART programs in South Africa. These programs collaborate with the International Epidemiologic Databases to Evaluate AIDS Southern Africa (www.iedea-sa.org) and collected data for AIDS-defining cancers but not for other cancers. To improve cancer ascertainment we probabilistically linked patient records (using first name, surname, age, and gender) from two HIV cohorts with the cancer records of the South African National Cancer Registry. We calculated incidence rates per 100,000 person-years after starting ART for the AIDS-defining cancers, i.e. Kaposi sarcoma (KS), invasive cervical cancer (ICC) and non-Hodgkin lymphoma (NHL). We compared incidence rates before and after inclusion of record linkage identified cancer cases using the attributable fraction of cancers identified with 95% confidence intervals (CI). ResultsA total of 49,207 adults starting ART in South Africa were included. 65% of patients were female, median age at starting ART was 35 years (interquartile range 30-41 years). We identified a total of 471 incident cancer cases. With record linkage the incidence increased from 81 to 292 for KS, from 1 to 119 for NHL and 12 to 497 for ICC per 100,000 person-years. The attributable fraction of cancers identified was 72% (95% CI 63-79%) for KS, 98% (95% CI 94-99%) for NHL and 98% (95% CI 95-99%) for ICC. ConclusionAscertainment of cancer in HIV program data in African settings is incomplete. This case study has shown that probabilistic record linkage to cancer registries is both feasible and essential for cancer ascertainment in HIV cohorts in South Africa.


Sign in / Sign up

Export Citation Format

Share Document