scholarly journals Deriving household composition using population-scale electronic health record data—A reproducible methodology

PLoS ONE ◽  
2021 ◽  
Vol 16 (3) ◽  
pp. e0248195
Author(s):  
Rhodri D. Johnson ◽  
Lucy J. Griffiths ◽  
Joe P. Hollinghurst ◽  
Ashley Akbari ◽  
Alexandra Lee ◽  
...  

Background Physical housing and household composition have an important role in the lives of individuals and drive health and social outcomes, and inequalities. Most methods to understand housing composition are based on survey or census data, and there is currently no reproducible methodology for creating population-level household composition measures using linked administrative data. Methods Using existing, and more recent enhancements to the address-data linkage methods in the SAIL Databank using Residential Anonymised Linking Fields we linked individuals to properties using the anonymised Welsh Demographic Service data in the SAIL Databank. We defined households, household size, and household composition measures based on adult to child relationships, and age differences between residents to create relative age measures. Results Two relative age-based algorithms were developed and returned similar results when applied to population and household-level data, describing household composition for 3.1 million individuals within 1.2 million households in Wales. Developed methods describe binary, and count level generational household composition measures. Conclusions Improved residential anonymised linkage field methods in SAIL have led to improved property-level data linkage, allowing the design and application of household composition measures that assign individuals to shared residences and allow the description of household composition across Wales. The reproducible methods create longitudinal, household-level composition measures at a population-level using linked administrative data. Such measures are important to help understand more detail about an individual’s home and area environment and how that may affect the health and wellbeing of the individual, other residents, and potentially into the wider community.

2020 ◽  
Author(s):  
Rhodri David Johnson ◽  
Lucy J. Griffiths ◽  
Joe Hollinghurst ◽  
Ashley Akbari ◽  
Alexandra Lee ◽  
...  

BackgroundPhysical housing and household composition have an important role in the lives of individuals and drive health and social outcomes, and inequalities. Most methods to understand housing composition are based on survey or census data, and there is currently no reproducible methodology for creating population-level household composition measures using linked administrative data.MethodsUsing existing, and more recent enhancements to the address-data linkage methods in the SAIL Databank using Residential Anonymised Linking Fields we linked individuals to properties using the anonymised Welsh Demographic Service data in the SAIL Databank. We defined households, household size, and household composition measures based on adult to child relationships, and age differences between residents to create relative age measures.ResultsTwo relative age-based algorithms were developed and returned similar results when applied to population and household-level data, describing household composition for 3.1 million individuals within 1.2 million households in Wales. Developed methods describe binary, and count level generational household composition measures.ConclusionsImproved residential anonymised linkage field methods in SAIL have led to improved property-level data linkage, allowing the design and application of household composition measures that assign individuals to shared residences and allow the description of household composition across Wales. The reproducible methods create longitudinal, household-level composition measures at a population-level using linked administrative data. Such measures are important to help understand more detail about an individual’s home and area environment and how that may affect the health and wellbeing of the individual, other residents, and potentially into the wider community.


Author(s):  
Rhodri David Johnson ◽  
Liz Trinder ◽  
Simon Thompson ◽  
Jon Smart ◽  
Alexandra Lee ◽  
...  

Introduction Better use of administrative data is essential to enhance understanding about the family justice system, and characteristics and outcomes for children and families. The Nuffield Family Justice Observatory Data Partnership supports this aim through analyses of core family justice datasets. When a child is involved in family court proceedings in Wales, Cafcass Cymru are employed to represent a child’s best interests.  This paper provides an overview of the Cafcass Cymru data, and linkage to population level health and other administrative datasets held within the Secure Anonymised Information Linkage (SAIL) Databank. Two data linkage example analyses are described. Further research opportunities are outlined. Methods Cafcass Cymru data was transferred to SAIL using a standardised approach to provide de-identified data with Anonymised Linking Fields (ALF) for successfully matched records. Three cohorts were created: all individuals involved in family court applications; all individuals with an ALF allowing subsequent health data linkage; and all individuals with a Residential Anonymised Linking Field (RALF) and Lower Super Output Area (LSOA) enabling area level deprivation analysis. Results Cafcass Cymru data are available containing 12,745 public law applications between 2011 and 2019, with 52,023 applications from 2005 to 2019 for private law. The overall match rate was 80%, with variations observed by time, law type, roles, gender and age. Forty per cent had hospital inpatient admissions 2 years prior or after application receipt at Cafcass Cymru, of which 27% were for emergency admissions; 54% had an emergency department attendance and 61% an outpatient appointment during the same period. Individuals involved in public or private law applications were more likely to reside in deprived areas. Conclusion The Nuffield Family Justice Observatory Data Partnership will enhance research opportunities to better understand the family justice system and outcomes for children and families. Population level Cafcass Cymru data can be accessed through the SAIL Databank. Forthcoming data acquisition will also facilitate further analyses and insight.


2021 ◽  
Vol 50 (Supplement_1) ◽  
Author(s):  
Nadia Khan ◽  
Liane Ioannou ◽  
Charles Pilgrim ◽  
Arul Earnest ◽  
Ashika Maharaj ◽  
...  

Abstract Background Linked, population-level data is valuable for mapping patterns of care and evaluating health service utilisation, particularly in difficult-to-reach populations. Upper gastrointestinal (UGI) cancers have a dismal prognosis, creating difficulties engaging patients in research. The utility of a linked dataset in this population is of high value. Methods Key objectives included identifying the operational and feasibility issues associated with linking Australian state-based administrative and registry data for understanding health service utilisation in UGI cancers. Datasets pertained to hospital admissions, radiotherapy, community health, primary care, palliative care, Medicare and Pharmaceutical Benefits Schedule’s and UGI cancers. Results From a logistical perspective, data access request approval processes varied, with some requiring consent to be sought from individual services contributing data. The availability of unique person-level identifying information varied widely. Additionally, the time period of data capture differed between and within datasets, limiting the quality of the linked data. Significant costs were associated with linking with primary care and Medicare and Pharmaceutical Benefits Schedule’s. Federal dataset linkage required at least a one-year waiting period. Conclusions Whilst in theory data linkage is a powerful mechanism for obtaining population-level data, in reality, there are many logistical and financial barriers to linking multiple datasets. Consequently, critical data, which has the potential to inform policy and improve patient outcomes, cannot be procured. Key messages Logistical and financial challenges are associated with linking administrative and registry datasets for research, limiting the potential of data linkage.


Author(s):  
Katie Irvine ◽  
Vivienna Ong ◽  
Simon Cooper ◽  
Sarah Thackway

IntroductionMany population data linkage centres have been established to provide a mechanism for making linked administrative data available to approved third parties within robust governance frameworks. While current models support a wide variety of research, modifications are required for linked administrative data to better position biobanking research infrastructure. Objectives and ApproachWe have sought to reconfigure population data linkage services to enhance the value of a newly established state-of-the art population and disease biobank embedded within a state based pathology network, equipped with robotic technology, with the capacity to store and process more than 3 million samples from participants consenting to data linkage and future unspecified research. ResultsThree data service streams have been developed: longitudinal data linkage, cohort management and targeted recruitment. Traditional infrastructure for population data linkage will support the longitudinal data linkage stream, making data and biospecimens available for research, without direct patient identifiers. Technical and governance changes are necessary to enable the rapid release of contemporaneous patient and health system data for cohort management and recruitment purposes. The cohort management stream seeks to significantly reduce the manual follow-up of administrative data. The newly developed targeted recruitment service will leverage on the jurisdictional data holdings and structure of the health system and pathology network, to identify optimal sites and service providers for patient recruitment at scale, in an expedited manner. Conclusion/ImplicationsModest changes to population data infrastructure have significant potential to enhance biobank research infrastructure. By fast tracking biospecimen accrual for diseases of population subgroups of strategic importance, this new service is intended to promote biobank viability, accelerate the pace of clinical trials recruitment and improve patient access to trials.


2020 ◽  
Vol 27 (3) ◽  
pp. e100161
Author(s):  
Stephanie Garies ◽  
Erik Youngson ◽  
Boglarka Soos ◽  
Brian Forst ◽  
Kimberley Duerksen ◽  
...  

ObjectiveTo describe the process for linking electronic medical record (EMR) and administrative data in Alberta and examine the advantages and limitations of utilising linked data for hypertension surveillance.MethodsDe-identified EMR data from 323 primary care providers contributing to the Canadian Primary Care Sentinel Surveillance Network (CPCSSN) in Alberta were used. Mapping files from each contributing provider were generated from their EMR to facilitate linkage to administrative data within the provincial health data warehouse. Deterministic linkage was conducted using valid personal healthcare number (PHN) with age and/or sex. Characteristics of patients and providers in the linked cohort were compared with population-level sources. Criteria used to define hypertension in both sources were examined.ResultsData were successfully linked for 6307 hypertensive patients (96.2% of eligible patients) from 49 contributing providers. Non-linkages from invalid PHN (n=246) occurred more for deceased patients and those with fewer primary care encounters, with differences due to type of EMR and patient EMR status. The linked cohort had more patients who were female, >60 years and residing in rural areas compared to the provincial healthcare registry. Family physicians were more often female and medically trained in Canada compared to all physicians in Alberta. Most patients (>97%) had ≥1 record in the registry, pharmacy, emergency/ambulatory care and claims databases; 44.3% had ≥1 record in the hospital discharge database.ConclusionEMR-administrative data linkage has the potential to enhance hypertension surveillance. The current linkage process in Alberta is limited and subject to selection bias. Processes to address these deficiencies are under way.


Author(s):  
Heidi J Welberry ◽  
Henry Brodaty ◽  
Benjumin Hsu ◽  
Sebastiano Barbieri ◽  
Louisa R Jorm

IntroductionThere is no gold standard method for monitoring dementia incidence in Australia. Routinely collected linked administrative data are increasingly being used to monitor endpoints in observational studies and clinical trials and could benefit dementia research. Objectives and ApproachThis study examines dementia incidence within different Australian administrative datasets and how characteristics vary across datasets for groups detected as having dementia. This was an observational data linkage study based on a prospective cohort of 267,153 people in New South Wales, Australia from the 45 and Up Study. Participants completed a survey in 2006-2009 and dementia was identified using linked pharmaceutical claims (provided by Services Australia), hospitalisations, assessments of aged care eligibility, care needs at entry to residential aged care and death certificates. Data linkage was undertaken by the Centre for Health Record Linkage (CHeReL) and the Australian Institute of Health and Welfare. Age-specific and age-standardised incidence rates, incidence rate ratios and survival from first dementia diagnosis were calculated. ResultsAge-standardised dementia incidence was 16.9 cases per 1000 person years (PY) for people aged 65 years and over. Estimates for those aged 80-89 years were closest to published incidence rates (91% of rates for high-income countries). Relationships with dementia incidence were inconsistent across datasets for characteristics including sex, relative socio-economic disadvantage, support network size, marital status, functional limitations and diabetes. Median survival from first pharmaceutical claim for an anti-dementia medicine was 3.7 years compared to 3.0 years from first aged care eligibility assessment, 2.0 years from a dementia-related hospitalisation and 1.8 years from first residential aged care needs assessment. Conclusion / ImplicationsPeople identified with dementia in different administrative datasets have different characteristics, reflecting the factors that drive interaction with specific services. Bias may be introduced if single data sources are used to identify dementia as an outcome in observational studies.


Author(s):  
Daniel A Thompson ◽  
Mark Nieuwenhuijsen ◽  
James White ◽  
Rebecca Lovell ◽  
Mathew White ◽  
...  

IntroductionA growing evidence base indicates health benefits are associated with access to green-blue spaces (GBS), such as beaches and parks. However, few studies have examined associations with changes in access to GBS over time. Objectives and ApproachWe have linked cross-sector data collected within Wales, United Kingdom, quarterly from 2008 to 2019, to examine the impact of GBS access on individual-level well-being and common mental health disorders (CMD). We created a longitudinal dataset of GBS access metrics, derived from satellite and administrative data sources, for 1.4 million homes in Wales. These household-level metrics were linked to individuals using the Welsh Demographic Service Dataset within the Secure Anonymised Information Linkage (SAIL) Databank. Linkage to Welsh Longitudinal General Practice data within SAIL enabled us to identify individual-level CMD over time. We also linked individual-level self-reported GBS use and well-being data from the National Survey for Wales (NSW) to routine data for cross-sectional survey participants. ResultsWe created a longitudinal cohort panel capturing all 2.84 million adults aged 16+ living in Wales between 2008 and 2019 and with a general practitioner (GP) registration. Individual-level health data and household-level environmental metrics were linked for each quarter an individual is in the study. Household addresses were linked to 97% of the cohort, creating 110+ million rows of anonymously linked cross-sector data. The cohort provides an average follow-up period of 8 years, during which 565,168 (20%) adults received at least one CMD diagnosis or symptom. Conclusion / ImplicationsThis example of multi-sectoral data linkage across multiple environmental and administrative data sources has created a rich data source, which we will use toquantify the impact of changes in GBS access on individual–level CMD and well-being. This evidence will inform policy in the areas of health, planning and the environment.


Author(s):  
Joel Stafford

Background with rationale An overarching concern influencing models of data linkage for public good is the maintenance of personal privacy. This concern is at times so strong that it prevents or slows the progress of achieving worthwhile linked administrative datasets across allied government departments, and even between distinct units within a single department. Where linkage has succeeded it has generally produced data sets that, by design, are difficult or impossible to re-identify, therefore meeting the requirement to guard privacy at the costs of the resulting data’s value to government decision makers. Main Aim The main aim of this paper is to convey criteria to inform data linkage policy and practice in government that maintains a central role for privacy, but which can better deliver on the promise of high value data for policy. Methods/Approach This paper is informed by the Tassie Kids project, a longitudinal linked administrative data study using an embedded researcher model underway in Tasmania, Australia. Among other outcomes, the project was designed to assist allied government agencies to identify key policy leverage points across multiple services. Using the Tassie Kids project as a case study this paper asks why allied departments don’t routinely link administrative data. Several important linked administrative data design principles are drawn from discussion of this question. Results The paper explains the practice implications of these design principles relevant to policy analysis and information management units in government. Conclusion The paper concludes with the suggestion that high value linked administrative data is data that maximises its representation of the dynamic mechanisms that affect the outcomes desired by government, while simultaneously minimising the data’s distance from its point of origin.


Sign in / Sign up

Export Citation Format

Share Document