scholarly journals When the Census Comes Marching In: Challenges and Successes in Linking Individual-Level Census Records to the Utah Population Database

Author(s):  
Ken Smith ◽  
Alison Fraser

IntroductionThe availability of historic, individual-level census records in the United States has grown in recent years. With access to identifiers, it is possible to link these records to existing databases. The performance of and strategy for these linking efforts is not well characterized. Objectives and ApproachThe Utah Population Database (UPDB), launched in 1975, is a population registry comprising comprehensive data from genealogies, medical/vital records, and numerous administrative and demographic records spanning the past two centuries. UPDB initially did not hold individual-level US Census records until now. UPDB has massive volumes of identifiers that we have cleaned and it therefore represents a “gold standard” representation of Utah’s population. The objective here is to describe the methods used and the record linking performance applied to census records that we have linked to the UPDB for persons appearing in the 1880, 1900, 1910, 1920, 1930 and 1940 censuses. ResultsWe collaborated with FamilyTree, Ancestry, and IPUMS (University of Minnesota) for keying and preparing data from the 1880-1940 censuses.  We then linked these records to the UPDB using probabilistic record linking methods and manual review.   Linking rates by census year varied by the quality of records and electronic data capture and by specific Census fields for a given census.  Data quality was somewhat lower for the 1910 and 1940 censuses and hence they had lower linking rates (66.9% and 70.4, respectively). Household heads enjoyed higher linking rates (72% was the lowest, in 1940). We used household heads to help guide links to offspring and spouses whose linking rates exceeded 75% in general.  Non-family members and single men linked at much lower rates (<50%). Conclusion/ImplicationsThis study found that linking census records to an existing population registry is feasible and with relative success. Using household/genealogy structure of the census is useful when linking to the genealogies in the UPDB. These links allow studies of effects of early life conditions on later life outcomes.

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Nathan Maassel ◽  
Abbie Saccary ◽  
Daniel Solomon ◽  
David Stitelman ◽  
Yunshan Xu ◽  
...  

Abstract Background Despite a national decrease in emergency department visits in the United States during the first 10 months of the pandemic, preliminary Consumer Product Safety Commission data indicate increased firework-related injuries. We hypothesized an increase in firework-related injuries during 2020 compared to years prior related to a corresponding increase in consumer firework sales. Methods The National Electronic Injury Surveillance System (NEISS) was queried from 2018 to 2020 for cases with product codes 1313 (firework injury) and narratives containing “fireworks”. Population-based national estimates were calculated using US Census data, then compared across the three years of study inclusion. Patient demographic and available injury information was also tracked and compared across the three years. Firework sales data obtained from the American Pyrotechnics Association were determined for the same time period to examine trends in consumption. Results There were 935 firework-related injuries reported to the NEISS from 2018 to 2020, 47% of which occurred during 2020. National estimates for monthly injuries per million were 1.6 times greater in 2020 compared to 2019 (p < 0.0001) with no difference between 2018 and 2019 (p = 0.38). The same results were found when the month of July was excluded. Firework consumption in 2020 was 1.5 times greater than 2019 or 2018, with a 55% increase in consumer fireworks and 22% decrease in professional fireworks sales. Conclusions Firework-related injures saw a substantial increase in 2020 compared to the two years prior, corroborated by a proportional increase in consumer firework sales. Increased incidence of firework-related injuries was detected even with the exclusion of the month of July, suggesting that the COVID-19 pandemic may have impacted firework epidemiology more broadly than US Independence Day celebrations.


1991 ◽  
Vol 11 (4) ◽  
pp. 357-398 ◽  
Author(s):  
Michael L. Cohen

ABSTRACTThe census is a social fact, the outcome of a process that involves the interaction of public laws and institutions and citizens' responses to an official inquiry. However, it is not a ‘hard’ fact. Reasons for inevitable defects in the census count are listed in the first section; the second section reports efforts by the US Census Bureau to identify sources of error in census coverage, and make estimates of the size of the errors. The use of census data for policy purposes, such as political representation and allocating funds, makes these defects controversial. Errors may be removed by making adjustments to the initial census count. However, because adjustment reallocates resources between groups, it has become the subject of political conflict. The paper describes the conflict between statistical practices, laws and public policy about census adjustment in the United States, and concludes by considering the extent to which causes in America are likely to be found in other countries.


2019 ◽  
Vol 9 (5) ◽  
pp. 587-595 ◽  
Author(s):  
Carmen S Arriola ◽  
Lindsay Kim ◽  
Gayle Langley ◽  
Evan J Anderson ◽  
Kyle Openo ◽  
...  

Abstract Background Respiratory syncytial virus (RSV) is a major cause of hospitalizations in young children. We estimated the burden of community-onset RSV-associated hospitalizations among US children aged &lt;2 years by extrapolating rates of RSV-confirmed hospitalizations in 4 surveillance states and using probabilistic multipliers to adjust for ascertainment biases. Methods From October 2014 through April 2015, clinician-ordered RSV tests identified laboratory-confirmed RSV hospitalizations among children aged &lt;2 years at 4 influenza hospitalization surveillance network sites. Surveillance populations were used to estimate age-specific rates of RSV-associated hospitalization, after adjusting for detection probabilities. We extrapolated these rates using US census data. Results We identified 1554 RSV-associated hospitalizations in children aged &lt;2 years. Of these, 27% were admitted to an intensive care unit, 6% needed mechanical ventilation, and 5 died. Most cases (1047/1554; 67%) had no underlying condition. Adjusted age-specific RSV hospitalization rates per 100 000 population were 1970 (95% confidence interval [CI],1787 to 2177), 897 (95% CI, 761 to 1073), 531 (95% CI, 459 to 624), and 358 (95% CI, 317 to 405) for ages 0–2, 3–5, 6–11, and 12–23 months, respectively. Extrapolating to the US population, an estimated 49 509–59 867 community-onset RSV-associated hospitalizations among children aged &lt;2 years occurred during the 2014–2015 season. Conclusions Our findings highlight the importance of RSV as a cause of hospitalization, especially among children aged &lt;2 months. Our approach to estimating RSV-related hospitalizations could be used to provide a US baseline for assessing the impact of future interventions.


Neurology ◽  
2019 ◽  
Vol 92 (10) ◽  
pp. e1029-e1040 ◽  
Author(s):  
Mitchell T. Wallin ◽  
William J. Culpepper ◽  
Jonathan D. Campbell ◽  
Lorene M. Nelson ◽  
Annette Langer-Gould ◽  
...  

ObjectiveTo generate a national multiple sclerosis (MS) prevalence estimate for the United States by applying a validated algorithm to multiple administrative health claims (AHC) datasets.MethodsA validated algorithm was applied to private, military, and public AHC datasets to identify adult cases of MS between 2008 and 2010. In each dataset, we determined the 3-year cumulative prevalence overall and stratified by age, sex, and census region. We applied insurance-specific and stratum-specific estimates to the 2010 US Census data and pooled the findings to calculate the 2010 prevalence of MS in the United States cumulated over 3 years. We also estimated the 2010 prevalence cumulated over 10 years using 2 models and extrapolated our estimate to 2017.ResultsThe estimated 2010 prevalence of MS in the US adult population cumulated over 10 years was 309.2 per 100,000 (95% confidence interval [CI] 308.1–310.1), representing 727,344 cases. During the same time period, the MS prevalence was 450.1 per 100,000 (95% CI 448.1–451.6) for women and 159.7 (95% CI 158.7–160.6) for men (female:male ratio 2.8). The estimated 2010 prevalence of MS was highest in the 55- to 64-year age group. A US north-south decreasing prevalence gradient was identified. The estimated MS prevalence is also presented for 2017.ConclusionThe estimated US national MS prevalence for 2010 is the highest reported to date and provides evidence that the north-south gradient persists. Our rigorous algorithm-based approach to estimating prevalence is efficient and has the potential to be used for other chronic neurologic conditions.


2021 ◽  
Author(s):  
Robert L. Stout ◽  
Steven J. Rigatti

AbstractAs the COVID-19 pandemic continues to ravage the world there is a great need to understand the dynamics of spread. Currently the seroprevalence of asymptomatic COVID-19 doubles every 3 months, this silent epidemic of new infections may be the main driving force behind the rapid increase in SARS-CoV-2 cases.Public health official quickly recognized that clinical cases were just the tip of the iceberg. In fact a great deal of the spread was being driven by the asymptomatically infected who continued to go out, socialize and go to work. While seropositivity is an insensitive marker for acute infection it does tell us about the prevalence COVID-19 in the population.ObjectiveDescribe the seroprevalence of SARS-CoV-2 infection in the United States over time.MethodologyRepeated convenience samples from a commercial laboratory dedicated to the assessment of life insurance applicants were tested for the presence of antibodies to SARS-CoV-2, in several time periods between May and December of 2020. US census data were used to estimate the population prevalence of seropositivity.ResultsThe raw seroprevalence in the May-June, September, and December timeframes were 3.0%, 6.6% and 10.4%, respectively. Higher rates were noted in younger vs. older age groups. Total estimated seroprevalence in the US is estimated at 25.7 million cases.ConclusionsThe seroprevalence of SARS-CoV-2 demonstrates a significantly larger pool of individuals who have contract COVID-19 and recovered, implying a lower case rate of hospitalizations and deaths than have been reported so far.


Author(s):  
Robert L. Stout ◽  
Steven J. Rigatti

AbstractAs the COVID-19 pandemic continues to ravage the world there is a great need to understand the dynamics of spread. Currently the seroprevalence of asymptomatic COVID-19 doubles every 3 months, this silent epidemic of new infections may be the main driving force behind the rapid increase in SARS-CoV-2 cases.Public health official quickly recognized that clinical cases were just the tip of the iceberg. In fact a great deal of the spread was being driven by the asymptomatically infected who continued to go out, socialize and go to work. While seropositivity is an insensitive marker for acute infection it does tell us about the prevalence COVID-19 in the population.ObjectiveDescribe the seroprevalence of SARS-CoV-2 infection in the United States over time.MethodologyRepeated convenience samples from a commercial laboratory dedicated to the assessment of life insurance applicants were tested for the presence of antibodies to SARS-CoV-2, in several time periods between May and December of 2020. US census data were used to estimate the population prevalence of seropositivity.ResultsThe raw seroprevalence in the May-June, September, and December timeframes were 3.0%, 6.6% and 10.4%, respectively. Higher rates were noted in younger vs. older age groups. Total estimated seroprevalence in the US is estimated at 25.7 million cases.ConclusionsThe seroprevalence of SARS-CoV-2 demonstrates a significantly larger pool of individuals who have contract COVID-19 and recovered, implying a lower case rate of hospitalizations and deaths than have been reported so far.


2020 ◽  
Author(s):  
Monica E Ellwood-Lowe ◽  
Ruthe Foushee ◽  
Mahesh Srinivasan

Parents with fewer educational and economic resources (low socioeconomic-status, SES) tend to speak less to their children, with consequences for children’s later life outcomes. Despite this well-established and highly popularized link, surprisingly little research addresses why the SES “word gap” exists. Moreover, existing research focuses on individual-level explanations with little attention to structural constraints with which parents must contend. In two pre-registered studies, we test whether experiencing financial scarcity itself can suppress caregivers’ speech to their children. Study 1 suggests that caregivers who are prompted to reflect on scarcity—particularly those who reflect on financial scarcity—speak to their 3-year-olds less than a control group in a subsequent play session. Study 2 finds that caregivers speak less to their children at the end of the month—when they are more likely to be experiencing financial hardship—than the rest of the month. Thus, above and beyond the individual characteristics of parents, structural constraints may affect how much parents speak to their children.


2015 ◽  
Vol 36 (7) ◽  
pp. 1034-1057 ◽  
Author(s):  
Miao Chi

Purpose – The purpose of this paper is to investigate whether immigrants in the USA receive an earnings premium associated with marrying a native. Design/methodology/approach – The raw premium revealed by the 2000 US Census data is suspect due to possible endogeneity and selection bias. Instrumental variables estimation, a sample selection model, and a counterfactual construction method are used to address these issues. Findings – Results suggest a positive and modest intermarriage premium, although the magnitude varies with the estimation technique. The evidence is particularly strong for immigrants with high English proficiency, college graduates, and immigrants older than 12 upon arrival in the USA. Originality/value – It is shown that the size of intermarriage premiums varies significantly across different immigrant groups. The empirical results provide insights into the economic assimilation process and mechanisms through which intermarriage influences the labor market outcomes of immigrants.


Author(s):  
Janet L. Smith ◽  
Zafer Sonmez ◽  
Nicholas Zettel

AbstractIncome inequality in the United States has been growing since the 1980s and is particularly noticeable in large urban areas like the Chicago metro region. While not as high as New York or Los Angeles, the Gini Coefficient for the Chicago metro area (.48) was the same as the United States in 2015 but rising at a faster rate, suggesting it will surpass the US national level in 2020. This chapter examines the Chicago region’s growing income inequality since 1980 using US Census data collected in 1990, 2000, 2010, and 2015, focusing on where people live based on occupation as well as income. When mapped out, the data shows a city and region that is becoming more segregated by occupation and income as it becomes both richer and poorer. A result is a shrinking number of middle-class and mixed neighbourhoods. The resulting patterns of socioeconomic spatial segregation also align with patterns of racial/ethnic segregation attributed to historical housing development and market segmentation, as well as recent efforts to advance Chicago as a global city through tourism and real estate development.


2021 ◽  
Author(s):  
Xiaolin Huang ◽  
Xiaojian Shao ◽  
Li Xing ◽  
Yushan Hu ◽  
Don Sin ◽  
...  

Background: COVID-19 is a highly transmissible infectious disease that has infected over 122 million individuals worldwide. To combat this pandemic, governments around the world have imposed lockdowns. However, the impact of these lockdowns on the rates of COVID-19 transmission in communities is not well known. Here, we used COVID-19 case counts from 3,000+ counties in the United States (US) to determine the relationship between lockdown as well as other county factors and the rate of COVID-19 spread in these communities. Methods: We merged county-specific COVID-19 case counts with US census data and the date of lockdown for each of the counties. We then applied a Functional Principal Component (FPC) analysis on this dataset to generate scores that described the trajectory of COVID-19 spread across the counties. We used machine learning methods to identify important factors in the county including the date of lockdown that significantly influenced the FPC scores. Findings: We found that the first FPC score accounted for up to 92.81% of the variations in the absolute rates of COVID-19 as well as the topology of COVID-19 spread over time at a county level. The relation between incidence of COVID-19 and time at a county level demonstrated a hockey-stick appearance with an inflection point approximately 7 days prior to the county reporting at least 5 new cases of COVID-19; beyond this inflection point, there was an exponential increase in incidence. Among the risk factors, lockdown and total population were the two most significant features of the county that influenced the rate of COVID-19 infection, while the median family income, median age and within-county move also substantially affect COVID spread. Interpretation: Lockdowns are an effective way of controlling the COVID-19 spread in communities. However, significant delays in lockdown cause a dramatic increase in the case counts. Thus, the timing of the lockdown relative to the case count is an important consideration in controlling the pandemic in communities.


Sign in / Sign up

Export Citation Format

Share Document