scholarly journals Creating Unbiased Machine Learning Models by Design

2021 ◽  
Vol 14 (11) ◽  
pp. 565
Author(s):  
Joseph L. Breeden ◽  
Eugenia Leonova

Unintended bias against protected groups has become a key obstacle to the widespread adoption of machine learning methods. This work presents a modeling procedure that carefully builds models around protected class information in order to make sure that the final machine learning model is independent of protected class status, even in a nonlinear sense. This procedure works for any machine learning method. The procedure was tested on subprime credit card data combined with demographic data by zip code from the US Census. The census data serves as an imperfect proxy for borrower demographics but serves to illustrate the procedure.

1991 ◽  
Vol 11 (4) ◽  
pp. 357-398 ◽  
Author(s):  
Michael L. Cohen

ABSTRACTThe census is a social fact, the outcome of a process that involves the interaction of public laws and institutions and citizens' responses to an official inquiry. However, it is not a ‘hard’ fact. Reasons for inevitable defects in the census count are listed in the first section; the second section reports efforts by the US Census Bureau to identify sources of error in census coverage, and make estimates of the size of the errors. The use of census data for policy purposes, such as political representation and allocating funds, makes these defects controversial. Errors may be removed by making adjustments to the initial census count. However, because adjustment reallocates resources between groups, it has become the subject of political conflict. The paper describes the conflict between statistical practices, laws and public policy about census adjustment in the United States, and concludes by considering the extent to which causes in America are likely to be found in other countries.


Circulation ◽  
2014 ◽  
Vol 129 (suppl_1) ◽  
Author(s):  
Heidi Mochari-Greenberger ◽  
Amytis Towfighi ◽  
Lori Mosca

Background: Early treatment is associated with better clinical outcomes in stroke, but women must recognize the warning signs of a stroke to reduce delays in treatment. The purpose of this study was to evaluate contemporary knowledge of stroke warning signs and intent to call 9-1-1 first if warning signs occur, among a nationally representative sample of women, overall and by race/ethnic group. Methods: A study of cardiovascular disease awareness and knowledge was conducted by the American Heart Association in 2012 among English speaking US women > 25 years identified through random digit dialing (N=1,205; 54% white, 17% black, 17% Hispanic, 12% other). Demographic data, including race/ethnic group, were evaluated using standardized categorical questions. Knowledge about warning signs of stroke, and what to do first if experiencing signs of a stroke, was assessed by standardized unaided questions. Data were weighted to reflect the US population of women based on the US Census Bureau’s March 2011 Current Population Survey, overall and within ethnic strata. Results: In 2012, half of women surveyed (51%) identified sudden weakness/numbness of face/limb on one side as a stroke warning sign; this did not vary by race/ethnic group. Loss of/trouble talking/understanding speech was identified by 44% of women, and more frequently among white versus Hispanic women (48% vs. 36%; p<.05). Fewer than one in four women identified sudden severe headache (23%), unexplained dizziness (20%), or sudden dizziness/loss of vision (18%) as warning signs, and one in five (20%) did not know one stroke warning sign; these results did not vary by race/ethnicity. The majority of women said that they would call 9-1-1 first if they thought they were experiencing signs of a stroke (84%), and this did not vary among black (86%), Hispanic (79%), or white/other (85%) women. Conclusions: Knowledge of stroke warning signs was low among a nationally representative sample of women, especially among Hispanics. In contrast, knowledge to call 9-1-1 when experiencing signs of stroke was high. These data suggest effort to improve recognition of the warning signs of stroke has potential to reduce treatment delay and improve outcomes among women.


Neurology ◽  
2019 ◽  
Vol 92 (10) ◽  
pp. e1029-e1040 ◽  
Author(s):  
Mitchell T. Wallin ◽  
William J. Culpepper ◽  
Jonathan D. Campbell ◽  
Lorene M. Nelson ◽  
Annette Langer-Gould ◽  
...  

ObjectiveTo generate a national multiple sclerosis (MS) prevalence estimate for the United States by applying a validated algorithm to multiple administrative health claims (AHC) datasets.MethodsA validated algorithm was applied to private, military, and public AHC datasets to identify adult cases of MS between 2008 and 2010. In each dataset, we determined the 3-year cumulative prevalence overall and stratified by age, sex, and census region. We applied insurance-specific and stratum-specific estimates to the 2010 US Census data and pooled the findings to calculate the 2010 prevalence of MS in the United States cumulated over 3 years. We also estimated the 2010 prevalence cumulated over 10 years using 2 models and extrapolated our estimate to 2017.ResultsThe estimated 2010 prevalence of MS in the US adult population cumulated over 10 years was 309.2 per 100,000 (95% confidence interval [CI] 308.1–310.1), representing 727,344 cases. During the same time period, the MS prevalence was 450.1 per 100,000 (95% CI 448.1–451.6) for women and 159.7 (95% CI 158.7–160.6) for men (female:male ratio 2.8). The estimated 2010 prevalence of MS was highest in the 55- to 64-year age group. A US north-south decreasing prevalence gradient was identified. The estimated MS prevalence is also presented for 2017.ConclusionThe estimated US national MS prevalence for 2010 is the highest reported to date and provides evidence that the north-south gradient persists. Our rigorous algorithm-based approach to estimating prevalence is efficient and has the potential to be used for other chronic neurologic conditions.


2015 ◽  
Vol 16 (4) ◽  
pp. 553-573 ◽  
Author(s):  
GAKU ITO ◽  
SUSUMU YAMAKAGE

AbstractThe ‘keep it simple, stupid’ slogan, or the KISS principle has been the basic guideline in agent-based modeling (ABM). While the KISS principle or parsimony is vital in modeling attempts, conventional agent-based models remain abstract and are rarely incorporated or validated with empirical data, leaving the links between theoretical models and empirical phenomena rather loose. This article reexamines the KISS principle and discusses the recent modeling attempts that incorporate and validate agent-based models with spatial (geo-referenced) data, moving beyond the KISS principle. This article also provides a working example of such time and space specified (TASS) agent-based models that incorporates Schelling's (1971) classic model of residential segregation with detailed geo-referenced demographic data on the city of Chicago derived from the US Census 2010.


2014 ◽  
Vol 38 (1-2) ◽  
pp. 251-271 ◽  
Author(s):  
Ann L. Magennis ◽  
Michael G. Lacy

This paper analyzes admissions to the Colorado Insane Asylum from 1879 to 1900. We estimate and compare admission rates across sex, age, marital, occupation, and immigration status using original admission records in combination with US census data from 1870 to1900. We show the extent to which persons in various status groups, who varied in power and social advantage, differed in their risk of being institutionalized in the context of nineteenth-century Colorado. Our analysis showed that admission or commitment to the Asylum did not entail permanent incarceration, as more than half of those admitted were discharged within six months. Men were admitted at higher rates than women, even after adjusting for age. Marital status also affected the risk of admission; single and divorced persons were admitted at about 1.5 times the rate of their married counterparts. Widows of either sex were even more likely to be admitted to the Asylum, and the risk increased with age. Persons in lower income/lower prestige occupations were more likely to be institutionalized. This included occupations in the domestic and personal service category in the US census, and this was evident for both males and females. Foreign-born men and women were admitted at, respectively, twice and three times the rate of their native counterparts, with particularly elevated rates observed among the Irish. In general, admission to the Colorado Insane Asylum appears to differ only in a slightly greater admission of males when compared to similar contemporaneous institutions in the East, despite the obvious differences in the Colorado population size and urban concentration.


2019 ◽  
Vol 37 (15_suppl) ◽  
pp. e18144-e18144
Author(s):  
Laura L Fernandes ◽  
Zhantao Lin ◽  
Lola A. Fashoyin-Aje ◽  
Shenghui Tang ◽  
Rajeshwari Sridhara ◽  
...  

e18144 Background: Many publications report under representation of minorities in certain subgroups, which may limit the generalizability of clinical trial (CT) results. This analysis, investigates and reports enrollment trends in CTs submitted between 2006-2017 in support of marketing applications for drugs indicated for the treatment of urothelial (UC) and renal cancer carcinoma (RCC), and compares them to incidence rates of these diseases by Surveillance, Epidemiology, and End Results (SEER) registry and the US census bureau. Methods: We identified all marketing applications for the treatment of UC and RCC that provided the primary evidence of safety and efficacy and aggregated the demographic data across trials and disease. Using these two pooled datasets, we compared the patient proportions enrolled in each of the race, sex and age categories to the corresponding rates in US cancer population estimated based on the corresponding incidence rates reported by SEER and the US census bureau using a Chi-squared test. Results: The pooled seven UC and 14 RCC CTs provided 2035 and 6757 patients respectively. The results are summarized below for the 939 (46%) UC and 1489 (22%) RCC patients enrolled in the US. Conclusions: Our findings indicate that majority of the patients were enrolled outside of the US. There were lower proportion of Black patients (4% vs 8%), older patients, age ≥ 75 years (30% vs 48%) and males (74% vs 80%) enrolled in UC population in the US. Higher proportions were observed in both White (89% vs 85%) and Asian (4% vs 2%) patients in UC and in White (90% vs 79%) patients in RCC.[Table: see text]


2008 ◽  
Vol 2 (4) ◽  
pp. 215-223 ◽  
Author(s):  
Joan Brunkard ◽  
Gonza Namulanda ◽  
Raoult Ratard

ABSTRACTObjective: Hurricane Katrina struck the US Gulf Coast on August 29, 2005, causing unprecedented damage to numerous communities in Louisiana and Mississippi. Our objectives were to verify, document, and characterize Katrina-related mortality in Louisiana and help identify strategies to reduce mortality in future disasters.Methods: We assessed Hurricane Katrina mortality data sources received in 2007, including Louisiana and out-of-state death certificates for deaths occurring from August 27 to October 31, 2005, and the Disaster Mortuary Operational Response Team's confirmed victims' database. We calculated age-, race-, and sex-specific mortality rates for Orleans, St Bernard, and Jefferson Parishes, where 95% of Katrina victims resided and conducted stratified analyses by parish of residence to compare differences between observed proportions of victim demographic characteristics and expected values based on 2000 US Census data, using Pearson chi square and Fisher exact tests.Results: We identified 971 Katrina-related deaths in Louisiana and 15 deaths among Katrina evacuees in other states. Drowning (40%), injury and trauma (25%), and heart conditions (11%) were the major causes of death among Louisiana victims. Forty-nine percent of victims were people 75 years old and older. Fifty-three percent of victims were men; 51% were black; and 42% were white. In Orleans Parish, the mortality rate among blacks was 1.7 to 4 times higher than that among whites for all people 18 years old and older. People 75 years old and older were significantly more likely to be storm victims (P < .0001).Conclusions: Hurricane Katrina was the deadliest hurricane to strike the US Gulf Coast since 1928. Drowning was the major cause of death and people 75 years old and older were the most affected population cohort. Future disaster preparedness efforts must focus on evacuating and caring for vulnerable populations, including those in hospitals, long-term care facilities, and personal residences. Improving mortality reporting timeliness will enable response teams to provide appropriate interventions to these populations and to prepare and implement preventive measures before the next disaster. (Disaster Med Public Health Preparedness. 2008;2:215–223)


2004 ◽  
Vol 3 (3) ◽  
pp. 507-534 ◽  
Author(s):  
Jennifer Leeman

This article builds on research on institutional language policies and practices, and on studies of the legitimization of racial categories in census data collection, in an exploration of language ideologies in the US Census. It traces the changes in language-related questions in the two centuries of decennial surveys, contextualizing them within a discussion of changing policies and patterns of immigration and nativism, as well as evolving hegemonic notions of race. It is argued that the US Census has historically used language as an index of race and as a means to racialize speakers of languages other than English, constructing them as essentially different and threatening to US cultural and national identity.


Sign in / Sign up

Export Citation Format

Share Document