Differential privacy in the 2020 Census will distort COVID-19 rates

Mapping Intimacies ◽

10.31235/osf.io/mvh5b ◽

2020 ◽

Author(s):

Mathew Hauer ◽

Alexis R Santos-Lozada

Keyword(s):

Differential Privacy ◽

The United States ◽

Mortality Rates ◽

Census Bureau ◽

Mortality Data ◽

Policy Makers ◽

Specific Mortality ◽

Population Sizes ◽

Us Census ◽

Population Counts

Scientists and policy makers rely on accurate population and mortality data to inform efforts regarding the coronavirus disease 2019 (COVID-19) pandemic, with age-specific mortality rates of high importance due to the concentration of COVID-19 deaths at older ages. Population counts – the principal denominators for calculating age-specific mortality rates – will be subject to noise infusion in the United States with the 2020 Census via a disclosure avoidance system based on differential privacy. Using COVID-19 mortality curves from the CDC, we show that differential privacy will introduce substantial distortion in COVID-19 mortality rates – sometimes causing mortality rates to exceed 100\% -- hindering our ability to understand the pandemic. This distortion is particularly large for population groupings with fewer than 1000 persons – 40\% of all county-level age-sex groupings and 60\% of race groupings. The US Census Bureau should consider a larger privacy budget and data users should consider pooling data to increase population sizes to minimize differential privacy’s distortion.

Download Full-text

Differential Privacy in the 2020 Census Will Distort COVID-19 Rates

Socius Sociological Research for a Dynamic World ◽

10.1177/2378023121994014 ◽

2021 ◽

Vol 7 ◽

pp. 237802312199401

Author(s):

Mathew E. Hauer ◽

Alexis R. Santos-Lozada

Keyword(s):

Differential Privacy ◽

The United States ◽

Mortality Rates ◽

Census Bureau ◽

Mortality Data ◽

County Level ◽

Specific Mortality ◽

Population Counts ◽

Privacy Budget ◽

The U.S

Scholars rely on accurate population and mortality data to inform efforts regarding the coronavirus disease 2019 (COVID-19) pandemic, with age-specific mortality rates of high importance because of the concentration of COVID-19 deaths at older ages. Population counts, the principal denominators for calculating age-specific mortality rates, will be subject to noise infusion in the United States with the 2020 census through a disclosure avoidance system based on differential privacy. Using empirical COVID-19 mortality curves, the authors show that differential privacy will introduce substantial distortion in COVID-19 mortality rates, sometimes causing mortality rates to exceed 100 percent, hindering our ability to understand the pandemic. This distortion is particularly large for population groupings with fewer than 1,000 persons: 40 percent of all county-level age-sex groupings and 60 percent of race groupings. The U.S. Census Bureau should consider a larger privacy budget, and data users should consider pooling data to minimize differential privacy’s distortion.

Download Full-text

Changes in Census Data Will Affect Our Understanding of Infant Health

Socius Sociological Research for a Dynamic World ◽

10.1177/23780231211023642 ◽

2021 ◽

Vol 7 ◽

pp. 237802312110236

Author(s):

Alexis R. Santos-Lozada

Keyword(s):

Infant Mortality ◽

Infant Health ◽

Census Data ◽

Differential Privacy ◽

The United States ◽

Mortality Rates ◽

Census Bureau ◽

Health Statistics ◽

Health Dynamics ◽

The Impact

Descriptions of the effect of the implementation of a new disclosure avoidance system (DAS), which relies on differential privacy, emphasize the impact of our understanding of contemporary social and health dynamics. However, focusing on overall population may obscure important changes in subpopulation indicators such as age-specific rates resulting from this implementation. The author provides a visualization that compares infant mortality rates calculated using 2009–2011 county-level average death counts and denominators derived from the traditional and proposed DASs. Death counts come from the National Center for Health Statistics and denominators come from the first U.S. Census Bureau demonstration products. These visualizations indicate that infant mortality rates produced using the proposed DAS are different from those produced using the traditional methods, with higher variation observed for nonmetropolitan counties and areas with smaller populations. These findings suggest that the proposed DAS will hinder our ability to understand contemporary health dynamics in the United States.

Download Full-text

How differential privacy will affect our understanding of population growth in the United States

10.31235/osf.io/pmux7 ◽

2020 ◽

Author(s):

Alexis R Santos-Lozada ◽

Danilo T Perez-Rivera ◽

Aarti C. Bhat

Keyword(s):

Population Growth ◽

Census Data ◽

Differential Privacy ◽

The United States ◽

The Us ◽

Potential Impact ◽

Us Census ◽

Population Counts ◽

Metropolitan Counties ◽

Racial Ethnic

The implementation of a proposed differential privacy algorithm to 2020 US Census data releases, and other census products has brought about discussions about the consistency and reliability of the data produced under the proposed disclosure avoidance system. We test the potential impact of this change in disclosure avoidance systems to the tracking of population growth and distribution using county-level population counts. We ask how population counts produced under the differential privacy algorithm might lead to different conclusions regarding population growth for the total population and three major racial/ethnic groups in comparison to counts produced using the traditional methods. Our results suggest that the implementation of differential privacy, as proposed, will impact our understanding of population changes in the US. We find potential for overstating and understating growth and decline, with these effects being more pronounced for non-Hispanic blacks and Hispanics, as well as for non-metropolitan counties. These findings draw attention to the potential local consequences of the implementation of differential privacy for tracking demographic changes of the US population, which is bound to have implications for our understanding of the transformations the nation is going through.

Download Full-text

Outcomes for Children and Adolescents With Cancer: Challenges for the Twenty-First Century

Journal of Clinical Oncology ◽

10.1200/jco.2009.27.0421 ◽

2010 ◽

Vol 28 (15) ◽

pp. 2625-2634 ◽

Cited By ~ 593

Author(s):

Malcolm A. Smith ◽

Nita L. Seibel ◽

Sean F. Altekruse ◽

Lynn A.G. Ries ◽

Danielle L. Melbert ◽

...

Keyword(s):

United States ◽

Childhood Cancer ◽

Cancer Mortality ◽

Lymphoblastic Leukemia ◽

The United States ◽

Mortality Rates ◽

Mortality Data ◽

Childhood Cancers ◽

Essential Information ◽

The Impact

Purpose This report provides an overview of current childhood cancer statistics to facilitate analysis of the impact of past research discoveries on outcome and provide essential information for prioritizing future research directions. Methods Incidence and survival data for childhood cancers came from the Surveillance, Epidemiology, and End Results 9 (SEER 9) registries, and mortality data were based on deaths in the United States that were reported by states to the Centers for Disease Control and Prevention by underlying cause. Results Childhood cancer incidence rates increased significantly from 1975 through 2006, with increasing rates for acute lymphoblastic leukemia being most notable. Childhood cancer mortality rates declined by more than 50% between 1975 and 2006. For leukemias and lymphomas, significantly decreasing mortality rates were observed throughout the 32-year period, though the rate of decline slowed somewhat after 1998. For remaining childhood cancers, significantly decreasing mortality rates were observed from 1975 to 1996, with stable rates from 1996 through 2006. Increased survival rates were observed for all categories of childhood cancers studied, with the extent and temporal pace of the increases varying by diagnosis. Conclusion When 1975 age-specific death rates for children are used as a baseline, approximately 38,000 childhood malignant cancer deaths were averted in the United States from 1975 through 2006 as a result of more effective treatments identified and applied during this period. Continued success in reducing childhood cancer mortality will require new treatment paradigms building on an increased understanding of the molecular processes that promote growth and survival of specific childhood cancers.

Download Full-text

Choice and Conflict about Census Data Adjusting the American Census Count

Journal of Public Policy ◽

10.1017/s0143814x00006322 ◽

1991 ◽

Vol 11 (4) ◽

pp. 357-398 ◽

Cited By ~ 2

Author(s):

Michael L. Cohen

Keyword(s):

Census Data ◽

Political Representation ◽

The United States ◽

Political Conflict ◽

Census Bureau ◽

Census Count ◽

Social Fact ◽

The Us ◽

Census Adjustment ◽

Us Census

ABSTRACTThe census is a social fact, the outcome of a process that involves the interaction of public laws and institutions and citizens' responses to an official inquiry. However, it is not a ‘hard’ fact. Reasons for inevitable defects in the census count are listed in the first section; the second section reports efforts by the US Census Bureau to identify sources of error in census coverage, and make estimates of the size of the errors. The use of census data for policy purposes, such as political representation and allocating funds, makes these defects controversial. Errors may be removed by making adjustments to the initial census count. However, because adjustment reallocates resources between groups, it has become the subject of political conflict. The paper describes the conflict between statistical practices, laws and public policy about census adjustment in the United States, and concludes by considering the extent to which causes in America are likely to be found in other countries.

Download Full-text

Stroke Mortality Trends in the Population of Klaipėda From 1994 to 2008

Medicina ◽

10.3390/medicina47090071 ◽

2011 ◽

Vol 47 (9) ◽

pp. 512 ◽

Cited By ~ 3

Author(s):

Henrikas Kazlauskas ◽

Nijolė Raškauskienė ◽

Rima Radžiuvienė ◽

Vinsas Janušonis

Keyword(s):

Mortality Rate ◽

Mortality Rates ◽

Mortality Data ◽

World Population ◽

Stroke Mortality ◽

Middle Aged ◽

Elderly Men ◽

Men And Women ◽

Specific Mortality ◽

Permanent Residents

The objective of the study was to evaluate the trends in stroke mortality in the population of Klaipėda aged 35–79 years from 1994 to 2008. Material and Methods. Mortality data on all permanent residents of Klaipėda aged 35–79 years who died from stroke in 1994–2008 were gathered for the study. All death certificates of permanent residents of Klaipėda aged 35–79 years who died during 1994–2008 were examined in this study. The International Classification of Diseases (ICD-9 codes 430–436, and ICD-10 codes I60–I64) was used. Sex-specific mortality rates were standardized according to the Segi’s world population; all the mortality rates were calculated per 100 000 population per year. Trends in stroke mortality were estimated using log-linear regression models. Sex-specific mortality rates and trends were calculated for 3 age groups (35–79, 35–64, and 65–79 years). Results. During the entire study period (1994–2008), a marked decline in stroke mortality with a clear slowdown after 2002 was observed. The average annual percent changes in mortality rates for men and women aged 35–79 years were –4.6% (P=0.041) and –6.5% (P=0.002), respectively. From 1994 to 2002, the stroke mortality rate decreased consistently among both Klaipėda men and women aged 35–64 years (20.4% per year, P=0.002, and 14.7% per year, P=0.006, respectively) and in the elderly population aged 65–79 years (13.8% per year, P=0.005; and 12% per year, P=0.019). During 2003–2008, stroke mortality increased by 16.3% per year in middle-aged men (35–64 years), whereas among women (aged 35–64 and 65–79 years) and elderly men (aged 65–79 years), the age-adjusted mortality rate remained relatively unchanged. Conclusions. Among both men and women, the mortality rates from stroke sharply declined between 1994 and 2008 with a clear slowdown in the decline after 2002. Stroke mortality increased significantly among middle-aged men from 2003, while it remained without significant changes among women of the same age and both elderly men and women.

Download Full-text

Budget sharing for multi-analyst differential privacy

Proceedings of the VLDB Endowment ◽

10.14778/3467861.3467870 ◽

2021 ◽

Vol 14 (10) ◽

pp. 1805-1817

Author(s):

David Pujol ◽

Yikai Wu ◽

Brandon Fain ◽

Ashwin Machanavajjhala

Keyword(s):

Optimization Problem ◽

Differential Privacy ◽

Census Bureau ◽

Query Answering ◽

Multiple Stakeholders ◽

The Us ◽

Us Census ◽

Privacy Budget ◽

Summary Data ◽

Single Set

Large organizations that collect data about populations (like the US Census Bureau) release summary statistics that are used by multiple stakeholders for resource allocation and policy making problems. These organizations are also legally required to protect the privacy of individuals from whom they collect data. Differential Privacy (DP) provides a solution to release useful summary data while preserving privacy. Most DP mechanisms are designed to answer a single set of queries. In reality, there are often multiple stakeholders that use a given data release and have overlapping but not-identical queries. This introduces a novel joint optimization problem in DP where the privacy budget must be shared among different analysts. We initiate study into the problem of DP query answering across multiple analysts. To capture the competing goals and priorities of multiple analysts, we formulate three desiderata that any mechanism should satisfy in this setting - The Sharing Incentive, Non-interference, and Adaptivity - while still optimizing for overall error. We demonstrate how existing DP query answering mechanisms in the multi-analyst settings fail to satisfy at least one of the desiderata. We present novel DP algorithms that provably satisfy all our desiderata and empirically show that they incur low error on realistic tasks.

Download Full-text

Persistent and extreme outliers in causes of death by state, 1999-2013

10.7287/peerj.preprints.1268v2 ◽

2015 ◽

Author(s):

Francis P Boscoe

Keyword(s):

United States ◽

Heart Disease ◽

Causes Of Death ◽

The United States ◽

Mortality Rates ◽

National Rate ◽

Common Causes ◽

Specific Mortality

In the United States, state-specific mortality rates that are high relative to national rates can result from legitimate reasons or from variability in coding practices. This paper identifies instances of state-specific mortality rates that were at least twice the national rate in each of three consecutive five-year periods (termed persistent outliers), along with rates that were at least five times the national rate in at least one five-year period (termed extreme outliers). The resulting set of 71 outliers, 12 of which appeared on both lists, illuminates mortality variations within the country, including some that are amenable to improvement either because they represent preventable causes of death or highlight weaknesses in coding techniques. Because the approach used here is based on relative rather than absolute mortality, it is not dominated by the most common causes of death such as heart disease and cancer.

Download Full-text

Racial disparities in COVID-19 mortality are driven by unequal infection risks.

10.1101/2020.09.10.20192369 ◽

2020 ◽

Cited By ~ 3

Author(s):

Jon Zelner ◽

Rob Trangucci ◽

Ramya Naraharisetti ◽

Alex Cao ◽

Ryan Malosh ◽

...

Keyword(s):

Racial Disparities ◽

Data Science ◽

Age Groups ◽

Case Fatality ◽

The United States ◽

Mortality Rates ◽

Bayesian Regression ◽

Mortality Data ◽

Incidence And Mortality ◽

The U.S

Background. As of August 5, 2020, there were more than 4.8M confirmed and probable cases and 159K deaths attributable to SARS-CoV-2 in the United States, with these numbers undoubtedly reflecting a significant underestimate of the true toll. Geographic, racial-ethnic, age and socioeconomic disparities in exposure and mortality are key features of the first and second wave of the U.S. COVID-19 epidemic. Methods. We used individual-level COVID-19 incidence and mortality data from the U.S. state of Michigan to estimate age-specific incidence and mortality rates by race/ethnic group. Data were analyzed using hierarchical Bayesian regression models, and model results were validated using posterior predictive checks. Findings. In crude and age-standardized analyses we found rates of incidence and mortality more than twice as high than Whites for all groups other than Native Americans. Of these, Blacks experienced the greatest burden of confirmed and probable COVID-19 infection (Age- standardized incidence = 1,644/100,000 population) and mortality (age-standardized mortality rate 251/100,000). These rates reflect large disparities, as Blacks experienced age-standardized incidence and mortality rates 5.6 (95% CI = 5.5, 5.7) and 6.9 (6.5, 7.3) times higher than Whites, respectively. We also found that the bulk of the disparity in mortality between Blacks and Whites is driven by dramatically higher rates of COVID-19 infection across all age groups, particularly among older adults, rather than age-specific variation in case-fatality rates. Interpretation. This work suggests that well-documented racial disparities in COVID-19 mortality in hard-hit settings, such as the U.S. state of Michigan, are driven primarily by variation in household, community and workplace exposure rather than case-fatality rates. Funding. This work was supported by a COVID-PODS grant from the Michigan Institute for Data Science (MIDAS) at the University of Michigan. The funding source had no role in the preparation of this manuscript.

Download Full-text

Credible Regression Approaches to Forecast Mortality for Populations with Limited Data

Risks ◽

10.3390/risks7010027 ◽

2019 ◽

Vol 7 (1) ◽

pp. 27 ◽

Cited By ~ 1

Author(s):

Apostolos Bozikas ◽

Georgios Pitselis

Keyword(s):

Mortality Rates ◽

Random Coefficients ◽

Mortality Data ◽

Extrapolation Methods ◽

Limited Data ◽

Regression Approach ◽

Specific Mortality

In this paper, we propose a credible regression approach with random coefficients to model and forecast the mortality dynamics of a given population with limited data. Age-specific mortality rates are modelled and extrapolation methods are utilized to estimate future mortality rates. The results on Greek mortality data indicate that credibility regression contributed to more accurate forecasts than those produced from the Lee–Carter and Cairns–Blake–Dowd models. An application on pricing insurance-related products is also provided.

Download Full-text