scholarly journals WHAT PERSONALISATION CAN DO FOR YOU! OR, HOW TO DO RACIAL DISCRIMINATION WITHOUT RACE

Author(s):  
Scott Wark ◽  
Thao Phan

Between 2016 and 2020, Facebook allowed advertisers in the United States to target their advertisements using three broad “ethnic affinity” categories: “African American,” “U.S.-Hispanic,” and “Asian American.” This paper uses the life and death of these “ethnic affinity” categories to argue that they exemplify a novel mode of racialisation made possible by machine learning techniques. These categories worked by analysing users’ preferences and behaviour: they were supposed to capture an “affinity” for a broad demographic group, rather than registering membership of that group. That is, they were supposed to allow advertisers to “personalise” content for users depending on behaviourally determined affinities. We argue that, in effect, Facebook’s ethnic affinity categories were supposed to operationalise a “post-racial” mode of categorising users. But the paradox of personalisation is that in order to apprehend users as individuals, platforms must first assemble them into groups based on their likenesses with other individuals. This article uses an analysis of these categories to argue that even in the absence of data on a user’s race—even after the demise of the categories themselves—users can still be subject to techniques of inclusion or exclusion for discriminatory ends. The inductive machine learning techniques that platforms like Facebook employ to classify users generate “proxies,” like racialised preferences or language use, as racialising substitutes. This article concludes by arguing that Facebook’s ethnic affinity categories in fact typify novel modes of racialisation today.

2021 ◽  
Author(s):  
Serkan Varol ◽  
Serkan Catma ◽  
Diana Reindl ◽  
Elizabeth Serieux

BACKGROUND Vaccine refusal still poses a risk to reaching herd immunity in the United States. The existing literature focuses on identifying the predictors that would impact the willingness to accept (WTA) vaccines using survey data. These variables range from the socio-demographic characteristics of the participants to the perceptions and attitudes towards the vaccines so each variable’s statistical relationship with the WTA a vaccine can be investigated. However, while the results of these studies may have important implications for understanding vaccine hesitancy by offering interpretation of the statistical relationships, the prediction of vaccine decision-making has rarely been investigated OBJECTIVE We aimed to identify the factors that contribute to the prediction of COVID-19 vaccine acceptors and refusers using machine learning METHODS A nationwide survey was administered online in November, 2020 to assess American public perceptions and attitudes towards COVID-19 vaccines. Seven machine learning techniques were utilized to identify the model with the highest predictive power. Moreover, a set of variables that would contribute the most to the predictions of vaccine acceptors and refusers was identified using Gini importance based on Random Forest structure RESULTS The resulting machine learning algorithm has better prediction ability for willingness to accept (82%) versus reject (51%) a COVID-19 vaccine. In terms of predictive success, the Random Forest model outperformed the other machine learning techniques with a 69.52% accuracy rate. Worrying about (re) contracting Covid 19 and opinions regarding mandatory face covering were identified as the most important predictors of vaccine decision-making CONCLUSIONS The complexity of vaccine hesitancy needs to be investigated thoroughly before the threshold needed to reach population immunity can be achieved. Predictive analytics can help the public health officials design and deliver individually tailored vaccination programs that would increase the overall vaccine uptake.


Author(s):  
Mercedes Barrachina ◽  
Laura Valenzuela López

Sleep disorders are related to many different diseases, and they could have a significant impact in patients' health, causing an economic impact to the society and to the national health systems. In the United States, according to information from the Center for Disease Control and Prevention, those disorders are affecting 50-70 million in the adult population. Sleep disorders are causing annually around 40,000 deaths due to cardiovascular problems, and they cost the health system more than 16 billion. In other countries, such as in Spain, those disorders affect up to 48% of the adult population. The main objective of this chapter is to review and evaluate the different machine learning techniques utilized by researchers and medical professionals to identify, assess, and characterize sleep disorders. Moreover, some future research directions are proposed considering the evaluated area.


2019 ◽  
Vol 19 (11) ◽  
pp. 2541-2549
Author(s):  
Chris Houser ◽  
Jacob Lehner ◽  
Nathan Cherry ◽  
Phil Wernette

Abstract. Rip currents and other surf hazards are an emerging public health issue globally. Lifeguards, warning flags, and signs are important, and to varying degrees they are effective strategies to minimize risk to beach users. In the United States and other jurisdictions around the world, lifeguards use coloured flags (green, yellow, and red) to indicate whether the danger posed by the surf and rip hazard is low, moderate, or high respectively. The choice of flag depends on the lifeguard(s) monitoring the changing surf conditions along the beach and over the course of the day using both regional surf forecasts and careful observation. There is a potential that the chosen flag is not consistent with the beach user perception of the risk, which may increase the potential for rescues or drownings. In this study, machine learning is used to determine the potential for error in the flags used at Pensacola Beach and the impact of that error on the number of rescues. Results of a decision tree analysis indicate that the colour flag chosen by the lifeguards was different from what the model predicted for 35 % of days between 2004 and 2008 (n=396/1125). Days when there is a difference between the predicted and posted flag colour represent only 17 % of all rescue days, but those days are associated with ∼60 % of all rescues between 2004 and 2008. Further analysis reveals that the largest number of rescue days and total number of rescues are associated with days where the flag deployed over-estimated the surf and hazard risk, such as a red or yellow flag flying when the model predicted a green flag would be more appropriate based on the wind and wave forcing alone. While it is possible that the lifeguards were overly cautious, it is argued that they most likely identified a rip forced by a transverse-bar and rip morphology common at the study site. Regardless, the results suggest that beach users may be discounting lifeguard warnings if the flag colour is not consistent with how they perceive the surf hazard or the regional forecast. Results suggest that machine learning techniques have the potential to support lifeguards and thereby reduce the number of rescues and drownings.


2021 ◽  
Vol 8 ◽  
Author(s):  
Keith Carlson ◽  
Faraz Dadgostari ◽  
Michael A. Livermore ◽  
Daniel N. Rockmore

This paper introduces a novel linked structure-content representation of federal statutory law in the United States and analyzes and quantifies its structure using tools and concepts drawn from network analysis and complexity studies. The organizational component of our representation is based on the explicit hierarchical organization within the United States Code (USC) as well an embedded cross-reference citation network. We couple this structure with a layer of content-based similarity derived from the application of a “topic model” to the USC. The resulting representation is the first that explicitly models the USC as a “multinetwork” or “multilayered network” incorporating hierarchical structure, cross-references, and content. We report several novel descriptive statistics of this multinetwork. These include the results of this first application of the machine learning technique of topic modeling to the USC as well as multiple measures articulating the relationships between the organizational and content network layers. We find a high degree of assortativity of “titles” (the highest level hierarchy within the USC) with related topics. We also present a link prediction task and show that machine learning techniques are able to recover information about structure from content. Success in this prediction task has a natural interpretation as indicating a form of mutual information. We connect the relational findings between organization and content to a measure of “ease of search” in this large hyperlinked document that has implications for the ways in which the structure of the USC supports (or doesn’t support) broad useful access to the law. The measures developed in this paper have the potential to enable comparative work in the study of statutory networks that ranges across time and geography.


2019 ◽  
Vol 63 (3) ◽  
pp. 435-447
Author(s):  
Mohsen Salehi ◽  
Jafar Razmara ◽  
Shahriar Lotfi

Abstract Breast cancer survivability has always been an important and challenging issue for researchers. Different methods have been utilized mostly based on machine learning techniques for prediction of survivability among cancer patients. The most comprehensive available database of cancer incidence is SEER in the United States, which has been frequently used for different research purposes. In this paper, a new data mining has been performed on the SEER database in order to investigate the ability of machine learning techniques for survivability prediction of breast cancer patients. To this end, the data related to breast cancer incidence have been preprocessed to remove unusable records from the dataset. In sequel, two machine learning techniques were developed based on the Multi-Layer Perceptron (MLP) learner machine including MLP stacked generalization and mixture of MLP-experts to make predictions over the database. The machines have been evaluated using K-fold cross-validation technique. The evaluation of the predictors revealed an accuracy of 84.32% and 83.86% by the mixture of MLP-experts and MLP stacked generalization methods, respectively. This indicates that the predictors can be significantly used for survivability prediction suggesting time- and cost-effective treatment for breast cancer patients.


2002 ◽  
Vol 11 (02) ◽  
pp. 267-282 ◽  
Author(s):  
AGAPITO LEDEZMA ◽  
RICARDO ALER ◽  
DANIEL BORRAJO

Nowadays, there is no doubt that machine learning techniques can be successfully applied to data mining tasks. Currently, the combination of several classifiers is one of the most active fields within inductive machine learning. Examples of such techniques are boosting, bagging and stacking. From these three techniques, stacking is perhaps the less used one. One of the main reasons for this relates to the difficulty to define and parameterize its components: selecting which combination of base classifiers to use, and which classifier to use as the meta-classifier. One could use for that purpose simple search methods (e.g. hill climbing), or more complex ones (e.g. genetic algorithms). But before search is attempted, it is important to know the properties of the search space itself. In this paper we study exhaustively the space of Stacking systems that can be built by using four base learning systems: C4.5, IB1, Naive Bayes, and PART. We have also used the Multiple Linear Response (MLR) as meta-classifier. The properties of this state-space obtained in this paper will be useful for designing new Stacking-based algorithms and tools.


2019 ◽  
Author(s):  
Seyyed Ali Davari ◽  
Anthony S. Wexler

Abstract. The United States Environmental Protection Agency (US EPA) list of Hazardous Air Pollutants (HAPs) includes metal elements suspected or associated with development of cancer. Traditional techniques for detecting and quantifying toxic metallic elements in the atmosphere are either not real time, hindering identification of sources, or limited by instrument costs. Spark emission spectroscopy is a promising and cost effective technique that can be used for analyzing toxic metallic elements in real time. Here, we have developed a cost-effective spark emission spectroscopy system to quantify the concentration of toxic metallic elements targeted by US EPA. Specifically, Cr, Cu, Ni, and Pb solutions were diluted and deposited on the ground electrode of the spark emission system. Least Absolute Shrinkage and Selection Operator (LASSO) was optimized and employed to detect useful features from the spark-generated plasma emissions. The optimized model was able to detect atomic emission lines along with other features to build a regression model that predicts the concentration of toxic metallic elements from the observed spectra. The limits of detections (LOD) were estimated using the detected features and compared to the traditional single-feature approach. LASSO is capable of detecting highly sensitive features in the input spectrum; however for some elements the single-feature LOD marginally outperforms LASSO LOD. The combination of low cost instruments with advanced machine learning techniques for data analysis could pave the path forward for data driven solutions to costly measurements.


2020 ◽  
Vol 13 (10) ◽  
pp. 5369-5377
Author(s):  
Seyyed Ali Davari ◽  
Anthony S. Wexler

Abstract. The United States Environmental Protection Agency (US EPA) list of hazardous air pollutants (HAPs) includes toxic metal suspected or associated with development of cancer. Traditional techniques for detecting and quantifying toxic metals in the atmosphere are either not real time, hindering identification of sources, or limited by instrument costs. Spark emission spectroscopy is a promising and cost-effective technique that can be used for analyzing toxic metals in real time. Here, we have developed a cost-effective spark emission spectroscopy system to quantify the concentration of toxic metals targeted by the US EPA. Specifically, Cr, Cu, Ni, and Pb solutions were diluted and deposited on the ground electrode of the spark emission system. The least absolute shrinkage and selection operator (LASSO) was optimized and employed to detect useful features from the spark-generated plasma emissions. The optimized model was able to detect atomic emission lines along with other features to build a regression model that predicts the concentration of toxic metals from the observed spectra. The limits of detections (LODs) were estimated using the detected features and compared to the traditional single-feature approach. LASSO is capable of detecting highly sensitive features in the input spectrum; however, for some toxic metals the single-feature LOD marginally outperforms LASSO LOD. The combination of low-cost instruments with advanced machine learning techniques for data analysis could pave the path forward for data-driven solutions to costly measurements.


2019 ◽  
Author(s):  
Sing-Chun Wang ◽  
Yuxuan Wang

Abstract. Occurrences of devastating wildfires have been on the rise in the United States for the past decades. While the environmental controls, including weather, climate, and fuels, are known to play important roles in controlling wildfires, the interrelationships between fires and the environmental controls are highly complex and may not be well represented by traditional parametric regressions. Here we develop a model integrating multiple machine learning algorithms to predict gridded monthly wildfire burned area during 2002–2015 over the South Central United States and identify the relative importance of the environmental drivers on the burned area for both the winter-spring and summer fire seasons of that region. The developed model is able to alleviate the issue of unevenly-distributed burned area data and achieve a cross-validation (CV) R2 value of 0.42 and 0.40 for the two fire seasons. For the total burned area over the study domain, the model can explain 50 % and 79 % of interannual total burned area for the winter-spring and summer fire season, respectively. The prediction model ranks relative humidity (RH) anomalies and preceding months’ drought severity as the top two most important predictors on the gridded burned area for both fire seasons. Sensitivity experiments with the model show that the effect of climate change represented by a group of climate-anomaly variables contributes the most to the burned area for both fire seasons. Antecedent fuel amount and conditions are found to outweigh weather effects for the burned area in the winter-spring fire season, while the current-month fire weather is more important for the summer fire season likely due to the controlling effect of weather on fuel moisture in this season. This developed model allows us to predict gridded burned area and to access specific fire management strategies for different fire mechanisms in the two seasons.


Sign in / Sign up

Export Citation Format

Share Document