scholarly journals Deriving household composition using population-scale Electronic Health Record data – a reproducible methodology

2020 ◽  
Author(s):  
Rhodri David Johnson ◽  
Lucy J. Griffiths ◽  
Joe Hollinghurst ◽  
Ashley Akbari ◽  
Alexandra Lee ◽  
...  

BackgroundPhysical housing and household composition have an important role in the lives of individuals and drive health and social outcomes, and inequalities. Most methods to understand housing composition are based on survey or census data, and there is currently no reproducible methodology for creating population-level household composition measures using linked administrative data.MethodsUsing existing, and more recent enhancements to the address-data linkage methods in the SAIL Databank using Residential Anonymised Linking Fields we linked individuals to properties using the anonymised Welsh Demographic Service data in the SAIL Databank. We defined households, household size, and household composition measures based on adult to child relationships, and age differences between residents to create relative age measures.ResultsTwo relative age-based algorithms were developed and returned similar results when applied to population and household-level data, describing household composition for 3.1 million individuals within 1.2 million households in Wales. Developed methods describe binary, and count level generational household composition measures.ConclusionsImproved residential anonymised linkage field methods in SAIL have led to improved property-level data linkage, allowing the design and application of household composition measures that assign individuals to shared residences and allow the description of household composition across Wales. The reproducible methods create longitudinal, household-level composition measures at a population-level using linked administrative data. Such measures are important to help understand more detail about an individual’s home and area environment and how that may affect the health and wellbeing of the individual, other residents, and potentially into the wider community.

PLoS ONE ◽  
2021 ◽  
Vol 16 (3) ◽  
pp. e0248195
Author(s):  
Rhodri D. Johnson ◽  
Lucy J. Griffiths ◽  
Joe P. Hollinghurst ◽  
Ashley Akbari ◽  
Alexandra Lee ◽  
...  

Background Physical housing and household composition have an important role in the lives of individuals and drive health and social outcomes, and inequalities. Most methods to understand housing composition are based on survey or census data, and there is currently no reproducible methodology for creating population-level household composition measures using linked administrative data. Methods Using existing, and more recent enhancements to the address-data linkage methods in the SAIL Databank using Residential Anonymised Linking Fields we linked individuals to properties using the anonymised Welsh Demographic Service data in the SAIL Databank. We defined households, household size, and household composition measures based on adult to child relationships, and age differences between residents to create relative age measures. Results Two relative age-based algorithms were developed and returned similar results when applied to population and household-level data, describing household composition for 3.1 million individuals within 1.2 million households in Wales. Developed methods describe binary, and count level generational household composition measures. Conclusions Improved residential anonymised linkage field methods in SAIL have led to improved property-level data linkage, allowing the design and application of household composition measures that assign individuals to shared residences and allow the description of household composition across Wales. The reproducible methods create longitudinal, household-level composition measures at a population-level using linked administrative data. Such measures are important to help understand more detail about an individual’s home and area environment and how that may affect the health and wellbeing of the individual, other residents, and potentially into the wider community.


2021 ◽  
Vol 50 (Supplement_1) ◽  
Author(s):  
Nadia Khan ◽  
Liane Ioannou ◽  
Charles Pilgrim ◽  
Arul Earnest ◽  
Ashika Maharaj ◽  
...  

Abstract Background Linked, population-level data is valuable for mapping patterns of care and evaluating health service utilisation, particularly in difficult-to-reach populations. Upper gastrointestinal (UGI) cancers have a dismal prognosis, creating difficulties engaging patients in research. The utility of a linked dataset in this population is of high value. Methods Key objectives included identifying the operational and feasibility issues associated with linking Australian state-based administrative and registry data for understanding health service utilisation in UGI cancers. Datasets pertained to hospital admissions, radiotherapy, community health, primary care, palliative care, Medicare and Pharmaceutical Benefits Schedule’s and UGI cancers. Results From a logistical perspective, data access request approval processes varied, with some requiring consent to be sought from individual services contributing data. The availability of unique person-level identifying information varied widely. Additionally, the time period of data capture differed between and within datasets, limiting the quality of the linked data. Significant costs were associated with linking with primary care and Medicare and Pharmaceutical Benefits Schedule’s. Federal dataset linkage required at least a one-year waiting period. Conclusions Whilst in theory data linkage is a powerful mechanism for obtaining population-level data, in reality, there are many logistical and financial barriers to linking multiple datasets. Consequently, critical data, which has the potential to inform policy and improve patient outcomes, cannot be procured. Key messages Logistical and financial challenges are associated with linking administrative and registry datasets for research, limiting the potential of data linkage.


2020 ◽  
Vol 8 (2) ◽  
pp. e001725
Author(s):  
Gabriel M Knight ◽  
Gabriela Spencer-Bonilla ◽  
David M Maahs ◽  
Manuel R Blum ◽  
Areli Valencia ◽  
...  

IntroductionPopulation-level and individual-level analyses have strengths and limitations as do ‘blackbox’ machine learning (ML) and traditional, interpretable models. Diabetes mellitus (DM) is a leading cause of morbidity and mortality with complex sociodemographic dynamics that have not been analyzed in a way that leverages population-level and individual-level data as well as traditional epidemiological and ML models. We analyzed complementary individual-level and county-level datasets with both regression and ML methods to study the association between sociodemographic factors and DM.Research design and methodsCounty-level DM prevalence, demographics, and socioeconomic status (SES) factors were extracted from the 2018 Robert Wood Johnson Foundation County Health Rankings and merged with US Census data. Analogous individual-level data were extracted from 2007 to 2016 National Health and Nutrition Examination Survey studies and corrected for oversampling with survey weights. We used multivariate linear (logistic) regression and ML regression (classification) models for county (individual) data. Regression and ML models were compared using measures of explained variation (area under the receiver operating characteristic curve (AUC) and R2).ResultsAmong the 3138 counties assessed, the mean DM prevalence was 11.4% (range: 3.0%–21.1%). Among the 12 824 individuals assessed, 1688 met DM criteria (13.2% unweighted; 10.2% weighted). Age, gender, race/ethnicity, income, and education were associated with DM at the county and individual levels. Higher county Hispanic ethnic density was negatively associated with county DM prevalence, while Hispanic ethnicity was positively associated with individual DM. ML outperformed regression in both datasets (mean R2 of 0.679 vs 0.610, respectively (p<0.001) for county-level data; mean AUC of 0.737 vs 0.727 (p<0.0427) for individual-level data).ConclusionsHispanic individuals are at higher risk of DM, while counties with larger Hispanic populations have lower DM prevalence. Analyses of population-level and individual-level data with multiple methods may afford more confidence in results and identify areas for further study.


2017 ◽  
Vol 16 ◽  
pp. S48
Author(s):  
D.K. Schlüter ◽  
R. Griffiths ◽  
A. Akbari ◽  
M. Heaven ◽  
P.J. Diggle ◽  
...  

2021 ◽  
pp. 004912412098619
Author(s):  
Simona Bignami-Van Assche ◽  
Virginie Boulet ◽  
Charles-Olivier Simard

How household-level data from censuses and surveys are analyzed to study household structure is an issue that has received little attention. The present study proposes a new methodological approach to address this gap. Specifically, we introduce the idea of the household configuration as a mathematical representation of observations from the household roster that uses the tools of sequence analysis to study relationships between household members. This “household configuration approach” is statistically efficient, captures the heterogeneity of family forms in a population, and is computationally simple. An application to Canadian census data for Indigenous and non-Indigenous peoples shows that our approach can yield interesting insights into household structure, otherwise not readily obtained.


2012 ◽  
pp. 9-30 ◽  
Author(s):  
Sara Horrell ◽  
Deborah Oxley

Using parish-level information from Sir F.M. Eden's The state of the poor (1797) we can identify typical diets for the counties of England. These diets varied considerably and afforded very different standards of nutrition. We compute a nutritional score for this diet, paying attention to the presence of vitamins, minerals and micronutrients shown to be essential for health and growth in constructing this measure. Other information in the reports allows us to relate county-level nutrition to factors in the local economy. In particular we find nutrition was positively related to the availability of common land in the area and to women's remunerated work if conducted from home. Lack of common land and little local supply of dairy products also pushed households into buying white wheaten bread rather than baking their own wholemeal loaf. Replicating some of this analysis with household-level data confirms these results. Diet also maps onto stature: male convicts to Australia were significantly taller if they originated in a county with a more nutritious diet. This verifies the important impact of nutrition on stature and demonstrates the sensitivity of height as a measure of key aspects of welfare.


Sign in / Sign up

Export Citation Format

Share Document