scholarly journals A General Framework for Multiple-Recapture Estimation that Incorporates Linkage Error Correction

2021 ◽  
Vol 37 (3) ◽  
pp. 699-718
Author(s):  
Daan Zult ◽  
Peter-Paul de Wolf ◽  
Bart F. M. Bakker ◽  
Peter van der Heijden

Abstract The size of a partly observed population is often estimated with the capture-recapture model. An important assumption of this chat model is that sources can be perfectly linked. This assumption is of relevance if the identification of records is not obtained by some perfect identifier (such as an id code) but by indirect identifiers (such as name and address). In that case, the perfect linkage assumption is often violated, which in general leads to biased population size estimates. Initial suggestions to solve this use record linkage probabilities to correct the capture-recapture model. In this article we provide a general framework, based on the standard log-linear modelling approach, that generalises this work towards the inclusion of additional sources and covariates. We show that the method performs well in a simulation study.

2019 ◽  
Author(s):  
Abu Abdul-Quader

BACKGROUND Population size estimation of people who inject drugs (PWID) in Ho Chi Minh City (HCMC), Vietnam relied on the UNAIDS Estimation and Projection Package and reports from the city police department. The two estimates vary widely. OBJECTIVE To estimate the population size of people who inject drugs in Ho Chi Minh City, Vietnam METHODS Using Respondent-driven sampling (RDS), we implemented two-source capture-recapture method to estimate the population size of PWID in HCMC in 2017 in 7 out of 24 districts. The study included men or women aged at least 18 years who reported injecting illicit drugs in the last 90 days and who had lived in the city the past six months. We calculated two sets of size estimates, the first assumed that all participants in each survey round resided in the district where the survey was conducted, the second, used the district of residence as reported by the participant. District estimates were summed to obtain an aggregate estimate for the seven districts. To calculate the city total, we weighted the population size estimates for each district by the inverse of the stratum specific sampling probabilities. RESULTS The first estimate resulted in a population size of 19,155 (95% CI: 17,006–25,039). The second one generated a smaller population size estimate of 12,867 (95% CI: 11,312–17,393). CONCLUSIONS The two-survey capture-recapture exercise provided two disparate estimates of PWID in HCMC. For planning HIV prevention and care service needs among PWID in HCMC, both estimates may need to be taken into consideration together with size estimates from other sources.


2015 ◽  
Vol 31 (3) ◽  
pp. 415-429 ◽  
Author(s):  
Loredana Di Consiglio ◽  
Tiziana Tuoto

Abstract The Capture-recapture method is a well-known solution for evaluating the unknown size of a population. Administrative data represent sources of independent counts of a population and can be jointly exploited for applying the capture-recapture method. Of course, administrative sources are affected by over- or undercoverage when considered separately. The standard Petersen approach is based on strong assumptions, including perfect record linkage between lists. In reality, record linkage results can be affected by errors. A simple method for achieving linkage error-unbiased population total estimates is proposed in Ding and Fienberg (1994). In this article, an extension of the Ding and Fienberg model by relaxing their conditions is proposed. The procedures are illustrated for estimating the total number of road casualties, on the basis of a probabilistic record linkage between two administrative data sources. Moreover, a simulation study is developed, providing evidence that the adjusted estimator always performs better than the Petersen estimator.


2013 ◽  
Vol 142 (1) ◽  
pp. 200-207 ◽  
Author(s):  
S. A. McDONALD ◽  
S. J. HUTCHINSON ◽  
C. SCHNIER ◽  
A. McLEOD ◽  
D. J. GOLDBERG

SUMMARYIn countries maintaining national hepatitis C virus (HCV) surveillance systems, a substantial proportion of individuals report no risk factors for infection. Our goal was to estimate the proportion of diagnosed HCV antibody-positive persons in Scotland (1991–2010) who probably acquired infection through injecting drug use (IDU), by combining data on IDU risk from four linked data sources using log-linear capture–recapture methods. Of 25 521 HCV-diagnosed individuals, 14 836 (58%) reported IDU risk with their HCV diagnosis. Log-linear modelling estimated a further 2484 HCV-diagnosed individuals with IDU risk, giving an estimated prevalence of 83. Stratified analyses indicated variation across birth cohort, with estimated prevalence as low as 49% in persons born before 1960 and greater than 90% for those born since 1960. These findings provide public-health professionals with a more complete profile of Scotland's HCV-infected population in terms of transmission route, which is essential for targeting educational, prevention and treatment interventions.


2013 ◽  
Vol 37 (2) ◽  
pp. 205
Author(s):  
Richard C. Turner ◽  
Katina D'Onise ◽  
Yan Wang

Objective. Capture-recapture analysis was used to more accurately quantify the admission rate for acute pancreatitis in a regional hospital setting, in comparison to the usual method of case ascertainment. Reasons for differences in capture for the various methods were also sought. Methods. Admissions for acute pancreatitis were enumerated over a 40-month period using three data sources: hospital classification of admission diagnoses, prospective case identification, and receipt of diagnosis-specific pathology specimens. Capture-recapture analysis was applied with log-linear modelling to account for likely dependency between data sources. Covariates were noted to explain capture probability by the various data sources and for eventual stratification in the analysis process. Results. For the census period, there were 304 admissions after merging of data sources, giving a crude admission rate of 7.6 per month. Crude ascertainment rates for discharge records and prospective identification were 44% and 52% respectively. Following log-linear modelling, total admissions more than doubled to 644 (adjusted admission rate 16.1 per month). Of the covariates considered, admissions of less than three days’ duration and those occurring in December and January were significantly associated with increased capture by the hospital discharge records data source. Conclusions. In this clinical setting, admissions for acute pancreatitis are grossly underestimated by the standard case ascertainment method. The reasons for this are not clear. Hospital discharge records are nevertheless more effective than prospective case ascertainment for certain cases, such as brief admissions and those in holiday periods. What is known about the topic? Capture–recapture analysis was originally developed in animal ecology, but has since been used to estimate both prevalent and incident cases of human disease. What does this paper add? This study exposes possible deficiencies in the single-source case ascertainment methods used by most hospitals to enumerate incident cases. It is the first time that capture–recapture techniques have been used to estimate acute pancreatitis admissions. What are the implications for practitioners? To obtain accurate admissions estimates for diseases such as acute pancreatitis, capture–recapture analysis with multiple data sources is advisable. One possible solution may be to conduct intermittent prospective censuses to complement existing retrospective ascertainment methods. On a more general level, clinical staff should be better trained to provide more accurate and detailed information in case records.


2019 ◽  
Vol 17 (4) ◽  
pp. 277-289
Author(s):  
Larissa Hermes Thomas Tombini ◽  
Emil Kupek

Objective: To estimate the number of 15-79-year-old individuals infected with HIV in the Santa Catarina state, Brazil, during the period 2008-2017. Methods: Three official registers of the HIV-infected individuals were compiled: SINAN for the HIV/AIDS epidemiological surveillance, SIM for mortality and SISCEL for the HIV viral load and CD4/CD8 cell count. Their records were linked by a unique personal identifier. Capture-recapture estimates were obtained by log-linear modelling with both the main effects and interaction between the registers, adjusted for age, sex and period. An adjustment for underreporting of AIDS-related deaths used published data on ill-defined causes of death and AIDS mortality. Results: After data sorting, 67340 HIV/AIDS records were identified: 29734 (44.2%) by SINAN, 5540 (8.2%) by SIM and 32066 (47.6%) by SISCEL. After record linkage, the HIV population size was estimated at 45707, whereas the capture-recapture method added 44 individuals. The number of new HIV/AIDS notifications per year increased significantly in 2014-2017 compared to the period 2011-2013 among 15-34-year-old men and less so for older men and women. Including 1512 unreported AIDS-related deaths gave an estimated 47263 HIV-infected individuals with 95% confidence interval (CI) of 47245-47282 and corresponding incidence of 93 (95% CI 91-96) p/100000. Case ascertainment of 62.9%, 78.5% and 67.8% was estimated for SINAN, SIM and SISCEL, respectively. Conclusion: Three major HIV/AIDS registers in Brazil showed significant under-notification of the HIV/AIDS epidemiological surveillance amenable to significant improvement by routine record linkage.


2018 ◽  
Author(s):  
Reena H Doshi ◽  
Kevin Apodaca ◽  
Moses Ogwal ◽  
Rommel Bain ◽  
Ermias Amene ◽  
...  

BACKGROUND Key populations, including people who inject drugs (PWID), men who have sex with men (MSM), and female sex workers (FSW), are disproportionately affected by the HIV epidemic. Understanding the magnitude of, and informing the public health response to, the HIV epidemic among these populations requires accurate size estimates. However, low social visibility poses challenges to these efforts. OBJECTIVE The objective of this study was to derive population size estimates of PWID, MSM, and FSW in Kampala using capture-recapture. METHODS Between June and October 2017, unique objects were distributed to the PWID, MSM, and FSW populations in Kampala. PWID, MSM, and FSW were each sampled during 3 independent captures; unique objects were offered in captures 1 and 2. PWID, MSM, and FSW sampled during captures 2 and 3 were asked if they had received either or both of the distributed objects. All captures were completed 1 week apart. The numbers of PWID, MSM, and FSW receiving one or both objects were determined. Population size estimates were derived using the Lincoln-Petersen method for 2-source capture-recapture (PWID) and Bayesian nonparametric latent-class model for 3-source capture-recapture (MSM and FSW). RESULTS We sampled 467 PWID in capture 1 and 450 in capture 2; a total of 54 PWID were captured in both. We sampled 542, 574, and 598 MSM in captures 1, 2, and 3, respectively. There were 70 recaptures between captures 1 and 2, 103 recaptures between captures 2 and 3, and 155 recaptures between captures 1 and 3. There were 57 MSM captured in all 3 captures. We sampled 962, 965, and 1417 FSW in captures 1, 2, and 3, respectively. There were 316 recaptures between captures 1 and 2, 214 recaptures between captures 2 and 3, and 235 recaptures between captures 1 and 3. There were 109 FSW captured in all 3 rounds. The estimated number of PWID was 3892 (3090-5126), the estimated number of MSM was 14,019 (95% credible interval (CI) 4995-40,949), and the estimated number of FSW was 8848 (95% CI 6337-17,470). CONCLUSIONS Our population size estimates for PWID, MSM, and FSW in Kampala provide critical population denominator data to inform HIV prevention and treatment programs. The 3-source capture-recapture is a feasible method to advance key population size estimation.


2008 ◽  
Vol 136 (12) ◽  
pp. 1606-1616 ◽  
Author(s):  
N. A. H. VAN HEST ◽  
A. STORY ◽  
A. D. GRANT ◽  
D. ANTOINE ◽  
J. P. CROFTS ◽  
...  

SUMMARYIn 1999 the Enhanced Tuberculosis Surveillance (ETS) system was introduced in the United Kingdom to strengthen surveillance of tuberculosis (TB). The aim of this study was to assess the use of record-linkage and capture–recapture methodology for estimating the completeness of TB reporting in England between 1999 and 2002. Due to the size of the TB data sources sophisticated record-linkage software was required and the proportion of false-positive cases among unlinked hospital-derived TB records was estimated through a population mixture model. This study showed that record-linkage of TB data sources and cross-validation with additional TB-related datasets improved data quality as well as case ascertainment. Since the introduction of ETS observed completeness of notification in England has increased and the results were consistent with expected levels of under-notification. Completeness of notification estimated by a log-linear capture–recapture model was highly inconsistent with prior estimates and the validity of this methodology was further examined.


10.2196/12118 ◽  
2019 ◽  
Vol 5 (3) ◽  
pp. e12118 ◽  
Author(s):  
Reena H Doshi ◽  
Kevin Apodaca ◽  
Moses Ogwal ◽  
Rommel Bain ◽  
Ermias Amene ◽  
...  

Background Key populations, including people who inject drugs (PWID), men who have sex with men (MSM), and female sex workers (FSW), are disproportionately affected by the HIV epidemic. Understanding the magnitude of, and informing the public health response to, the HIV epidemic among these populations requires accurate size estimates. However, low social visibility poses challenges to these efforts. Objective The objective of this study was to derive population size estimates of PWID, MSM, and FSW in Kampala using capture-recapture. Methods Between June and October 2017, unique objects were distributed to the PWID, MSM, and FSW populations in Kampala. PWID, MSM, and FSW were each sampled during 3 independent captures; unique objects were offered in captures 1 and 2. PWID, MSM, and FSW sampled during captures 2 and 3 were asked if they had received either or both of the distributed objects. All captures were completed 1 week apart. The numbers of PWID, MSM, and FSW receiving one or both objects were determined. Population size estimates were derived using the Lincoln-Petersen method for 2-source capture-recapture (PWID) and Bayesian nonparametric latent-class model for 3-source capture-recapture (MSM and FSW). Results We sampled 467 PWID in capture 1 and 450 in capture 2; a total of 54 PWID were captured in both. We sampled 542, 574, and 598 MSM in captures 1, 2, and 3, respectively. There were 70 recaptures between captures 1 and 2, 103 recaptures between captures 2 and 3, and 155 recaptures between captures 1 and 3. There were 57 MSM captured in all 3 captures. We sampled 962, 965, and 1417 FSW in captures 1, 2, and 3, respectively. There were 316 recaptures between captures 1 and 2, 214 recaptures between captures 2 and 3, and 235 recaptures between captures 1 and 3. There were 109 FSW captured in all 3 rounds. The estimated number of PWID was 3892 (3090-5126), the estimated number of MSM was 14,019 (95% credible interval (CI) 4995-40,949), and the estimated number of FSW was 8848 (95% CI 6337-17,470). Conclusions Our population size estimates for PWID, MSM, and FSW in Kampala provide critical population denominator data to inform HIV prevention and treatment programs. The 3-source capture-recapture is a feasible method to advance key population size estimation.


2006 ◽  
Vol 135 (6) ◽  
pp. 1021-1029 ◽  
Author(s):  
N. A. H. van HEST ◽  
F. SMIT ◽  
H. W. M. BAARS ◽  
G. De VRIES ◽  
P. E. W. De HAAS ◽  
...  

SUMMARYThe aim of this study was to describe a systematic process of record-linkage, cross-validation, case-ascertainment and capture–recapture analysis to assess the quality of tuberculosis registers and to estimate the completeness of notification of incident tuberculosis cases in The Netherlands in 1998. After record-linkage and cross-validation 1499 tuberculosis patients were identified, of whom 1298 were notified, resulting in an observed under-notification of 13·4%. After adjustment for possible imperfect record-linkage and remaining false-positive hospital cases observed under-notification was 7·3%. Log-linear capture–recapture analysis initially estimated a total number of 2053 (95% CI 1871–2443) tuberculosis cases, resulting in an estimated under-notification of 36·8%. After adjustment for possible imperfect record-linkage and remaining false-positive hospital cases various capture–recapture models estimated under-notification at 13·6%. One of the reasons for the higher than expected estimated under-notification in a country with a well-organized system of tuberculosis control might be that some tuberculosis cases, e.g. extrapulmonary tuberculosis, are managed by clinicians less familiar with notification of infectious diseases. This study demonstrates the possible impact of violation of assumptions underlying capture–recapture analysis, especially the perfect record-linkage, perfect positive predictive value and absent three-way interaction assumptions.


Sign in / Sign up

Export Citation Format

Share Document