Deriving household composition using population-scale Electronic Health Record data – a reproducible methodology

BackgroundPhysical housing and household composition have an important role in the lives of individuals and drive health and social outcomes, and inequalities. Most methods to understand housing composition are based on survey or census data, and there is currently no reproducible methodology for creating population-level household composition measures using linked administrative data.MethodsUsing existing, and more recent enhancements to the address-data linkage methods in the SAIL Databank using Residential Anonymised Linking Fields we linked individuals to properties using the anonymised Welsh Demographic Service data in the SAIL Databank. We defined households, household size, and household composition measures based on adult to child relationships, and age differences between residents to create relative age measures.ResultsTwo relative age-based algorithms were developed and returned similar results when applied to population and household-level data, describing household composition for 3.1 million individuals within 1.2 million households in Wales. Developed methods describe binary, and count level generational household composition measures.ConclusionsImproved residential anonymised linkage field methods in SAIL have led to improved property-level data linkage, allowing the design and application of household composition measures that assign individuals to shared residences and allow the description of household composition across Wales. The reproducible methods create longitudinal, household-level composition measures at a population-level using linked administrative data. Such measures are important to help understand more detail about an individual’s home and area environment and how that may affect the health and wellbeing of the individual, other residents, and potentially into the wider community.

Download Full-text

Deriving household composition using population-scale electronic health record data—A reproducible methodology

PLoS ONE ◽

10.1371/journal.pone.0248195 ◽

2021 ◽

Vol 16 (3) ◽

pp. e0248195

Author(s):

Rhodri D. Johnson ◽

Lucy J. Griffiths ◽

Joe P. Hollinghurst ◽

Ashley Akbari ◽

Alexandra Lee ◽

...

Keyword(s):

Administrative Data ◽

Data Linkage ◽

Population Level ◽

Household Composition ◽

Electronic Health Record Data ◽

Household Level ◽

Relative Age ◽

Level Data ◽

Address Data ◽

Linked Administrative Data

Background Physical housing and household composition have an important role in the lives of individuals and drive health and social outcomes, and inequalities. Most methods to understand housing composition are based on survey or census data, and there is currently no reproducible methodology for creating population-level household composition measures using linked administrative data. Methods Using existing, and more recent enhancements to the address-data linkage methods in the SAIL Databank using Residential Anonymised Linking Fields we linked individuals to properties using the anonymised Welsh Demographic Service data in the SAIL Databank. We defined households, household size, and household composition measures based on adult to child relationships, and age differences between residents to create relative age measures. Results Two relative age-based algorithms were developed and returned similar results when applied to population and household-level data, describing household composition for 3.1 million individuals within 1.2 million households in Wales. Developed methods describe binary, and count level generational household composition measures. Conclusions Improved residential anonymised linkage field methods in SAIL have led to improved property-level data linkage, allowing the design and application of household composition measures that assign individuals to shared residences and allow the description of household composition across Wales. The reproducible methods create longitudinal, household-level composition measures at a population-level using linked administrative data. Such measures are important to help understand more detail about an individual’s home and area environment and how that may affect the health and wellbeing of the individual, other residents, and potentially into the wider community.

Download Full-text

How population-level data linkage might impact on dental research

Community Dentistry And Oral Epidemiology ◽

10.1111/j.1600-0528.2012.00726.x ◽

2012 ◽

Vol 40 ◽

pp. 90-94 ◽

Cited By ~ 6

Author(s):

Linda Slack-Smith

Keyword(s):

Data Linkage ◽

Population Level ◽

Dental Research ◽

Level Data

Download Full-text

517Challenges in data linkage – experiences from an upper gastrointestinal cancer data linkage study

International Journal of Epidemiology ◽

10.1093/ije/dyab168.342 ◽

2021 ◽

Vol 50 (Supplement_1) ◽

Author(s):

Nadia Khan ◽

Liane Ioannou ◽

Charles Pilgrim ◽

Arul Earnest ◽

Ashika Maharaj ◽

...

Keyword(s):

Primary Care ◽

Health Service ◽

Data Linkage ◽

Population Level ◽

Data Access ◽

Health Service Utilisation ◽

Service Utilisation ◽

Financial Barriers ◽

Level Data ◽

Upper Gastrointestinal

Abstract Background Linked, population-level data is valuable for mapping patterns of care and evaluating health service utilisation, particularly in difficult-to-reach populations. Upper gastrointestinal (UGI) cancers have a dismal prognosis, creating difficulties engaging patients in research. The utility of a linked dataset in this population is of high value. Methods Key objectives included identifying the operational and feasibility issues associated with linking Australian state-based administrative and registry data for understanding health service utilisation in UGI cancers. Datasets pertained to hospital admissions, radiotherapy, community health, primary care, palliative care, Medicare and Pharmaceutical Benefits Schedule’s and UGI cancers. Results From a logistical perspective, data access request approval processes varied, with some requiring consent to be sought from individual services contributing data. The availability of unique person-level identifying information varied widely. Additionally, the time period of data capture differed between and within datasets, limiting the quality of the linked data. Significant costs were associated with linking with primary care and Medicare and Pharmaceutical Benefits Schedule’s. Federal dataset linkage required at least a one-year waiting period. Conclusions Whilst in theory data linkage is a powerful mechanism for obtaining population-level data, in reality, there are many logistical and financial barriers to linking multiple datasets. Consequently, critical data, which has the potential to inform policy and improve patient outcomes, cannot be procured. Key messages Logistical and financial challenges are associated with linking administrative and registry datasets for research, limiting the potential of data linkage.

Download Full-text

Multimethod, multidataset analysis reveals paradoxical relationships between sociodemographic factors, Hispanic ethnicity and diabetes

BMJ Open Diabetes Research & Care ◽

10.1136/bmjdrc-2020-001725 ◽

2020 ◽

Vol 8 (2) ◽

pp. e001725

Author(s):

Gabriel M Knight ◽

Gabriela Spencer-Bonilla ◽

David M Maahs ◽

Manuel R Blum ◽

Areli Valencia ◽

...

Keyword(s):

Census Data ◽

Characteristic Curve ◽

Population Level ◽

Sociodemographic Factors ◽

Robert Wood Johnson Foundation ◽

County Level ◽

Ethnic Density ◽

Hispanic Ethnicity ◽

Individual Level ◽

Level Data

IntroductionPopulation-level and individual-level analyses have strengths and limitations as do ‘blackbox’ machine learning (ML) and traditional, interpretable models. Diabetes mellitus (DM) is a leading cause of morbidity and mortality with complex sociodemographic dynamics that have not been analyzed in a way that leverages population-level and individual-level data as well as traditional epidemiological and ML models. We analyzed complementary individual-level and county-level datasets with both regression and ML methods to study the association between sociodemographic factors and DM.Research design and methodsCounty-level DM prevalence, demographics, and socioeconomic status (SES) factors were extracted from the 2018 Robert Wood Johnson Foundation County Health Rankings and merged with US Census data. Analogous individual-level data were extracted from 2007 to 2016 National Health and Nutrition Examination Survey studies and corrected for oversampling with survey weights. We used multivariate linear (logistic) regression and ML regression (classification) models for county (individual) data. Regression and ML models were compared using measures of explained variation (area under the receiver operating characteristic curve (AUC) and R2).ResultsAmong the 3138 counties assessed, the mean DM prevalence was 11.4% (range: 3.0%–21.1%). Among the 12 824 individuals assessed, 1688 met DM criteria (13.2% unweighted; 10.2% weighted). Age, gender, race/ethnicity, income, and education were associated with DM at the county and individual levels. Higher county Hispanic ethnic density was negatively associated with county DM prevalence, while Hispanic ethnicity was positively associated with individual DM. ML outperformed regression in both datasets (mean R2 of 0.679 vs 0.610, respectively (p<0.001) for county-level data; mean AUC of 0.737 vs 0.727 (p<0.0427) for individual-level data).ConclusionsHispanic individuals are at higher risk of DM, while counties with larger Hispanic populations have lower DM prevalence. Analyses of population-level and individual-level data with multiple methods may afford more confidence in results and identify areas for further study.

Download Full-text

Likelihood of knee replacement surgery up to 15 years after sports injury: A population-level data linkage study

Journal of Science and Medicine in Sport ◽

10.1016/j.jsams.2018.12.010 ◽

2019 ◽

Vol 22 (6) ◽

pp. 629-634 ◽

Cited By ~ 2

Author(s):

Ilana N. Ackerman ◽

Megan A. Bohensky ◽

Joanne L. Kemp ◽

Richard de Steiger

Keyword(s):

Knee Replacement ◽

Data Linkage ◽

Population Level ◽

Sports Injury ◽

Linkage Study ◽

Knee Replacement Surgery ◽

Replacement Surgery ◽

Level Data

Download Full-text

P2Y12 Inhibitor Therapy is Underutilised in Patients Hospitalised for Acute Myocardial Infarction: Results From a New Population-Level Data Linkage in Australia

Heart Lung and Circulation ◽

10.1016/j.hlc.2018.06.647 ◽

2018 ◽

Vol 27 ◽

pp. S333

Author(s):

D. Brieger ◽

M. Falster ◽

A. Schaffer ◽

L. Jorm ◽

S. Pearson ◽

...

Keyword(s):

Myocardial Infarction ◽

Acute Myocardial Infarction ◽

Data Linkage ◽

Population Level ◽

Inhibitor Therapy ◽

P2y12 Inhibitor ◽

Level Data

Download Full-text

EPS5.5 How does having cystic fibrosis impact on birth weight? A population level data linkage study in Wales

Journal of Cystic Fibrosis ◽

10.1016/s1569-1993(17)30316-8 ◽

2017 ◽

Vol 16 ◽

pp. S48

Author(s):

D.K. Schlüter ◽

R. Griffiths ◽

A. Akbari ◽

M. Heaven ◽

P.J. Diggle ◽

...

Keyword(s):

Cystic Fibrosis ◽

Birth Weight ◽

Data Linkage ◽

Population Level ◽

Linkage Study ◽

Level Data

Download Full-text

Evidence‐practice gaps in P2Y 12 inhibitor use after hospitalisation for acute myocardial infarction: findings from a new population‐level data linkage in Australia

Internal Medicine Journal ◽

10.1111/imj.15036 ◽

2020 ◽

Author(s):

Michael O Falster ◽

Andrea L Schaffer ◽

Andrew Wilson ◽

Arthur Nasis ◽

Louisa R Jorm ◽

...

Keyword(s):

Myocardial Infarction ◽

Acute Myocardial Infarction ◽

Data Linkage ◽

Population Level ◽

Level Data ◽

Practice Gaps

Download Full-text

A New Methodological Approach to Study Household Structure From Census and Survey Data

Sociological Methods & Research ◽

10.1177/0049124120986192 ◽

2021 ◽

pp. 004912412098619

Author(s):

Simona Bignami-Van Assche ◽

Virginie Boulet ◽

Charles-Olivier Simard

Keyword(s):

Sequence Analysis ◽

Survey Data ◽

Indigenous Peoples ◽

Census Data ◽

Methodological Approach ◽

Household Structure ◽

Mathematical Representation ◽

Household Level ◽

Level Data ◽

Household Members

How household-level data from censuses and surveys are analyzed to study household structure is an issue that has received little attention. The present study proposes a new methodological approach to address this gap. Specifically, we introduce the idea of the household configuration as a mathematical representation of observations from the household roster that uses the tools of sequence analysis to study relationships between household members. This “household configuration approach” is statistically efficient, captures the heterogeneity of family forms in a population, and is computationally simple. An application to Canadian census data for Indigenous and non-Indigenous peoples shows that our approach can yield interesting insights into household structure, otherwise not readily obtained.

Download Full-text

Hasty Pudding Versus Tasty Bread: Regional Variations in Diet and Nutrition during the Industrial Revolution

Local Population Studies ◽

10.35488/lps89.2012.9 ◽

2012 ◽

pp. 9-30 ◽

Cited By ~ 1

Author(s):

Sara Horrell ◽

Deborah Oxley

Keyword(s):

Dairy Products ◽

Industrial Revolution ◽

Local Economy ◽

Household Level ◽

Level Data ◽

Other Information ◽

Level Information ◽

Local Supply ◽

Key Aspects ◽

Nutritional Score

Using parish-level information from Sir F.M. Eden's The state of the poor (1797) we can identify typical diets for the counties of England. These diets varied considerably and afforded very different standards of nutrition. We compute a nutritional score for this diet, paying attention to the presence of vitamins, minerals and micronutrients shown to be essential for health and growth in constructing this measure. Other information in the reports allows us to relate county-level nutrition to factors in the local economy. In particular we find nutrition was positively related to the availability of common land in the area and to women's remunerated work if conducted from home. Lack of common land and little local supply of dairy products also pushed households into buying white wheaten bread rather than baking their own wholemeal loaf. Replicating some of this analysis with household-level data confirms these results. Diet also maps onto stature: male convicts to Australia were significantly taller if they originated in a county with a more nutritious diet. This verifies the important impact of nutrition on stature and demonstrates the sensitivity of height as a measure of key aspects of welfare.

Download Full-text