WHAT PERSONALISATION CAN DO FOR YOU! OR, HOW TO DO RACIAL DISCRIMINATION WITHOUT RACE

AoIR Selected Papers of Internet Research ◽

10.5210/spir.v2021i0.12261 ◽

2021 ◽

Author(s):

Scott Wark ◽

Thao Phan

Keyword(s):

United States ◽

Machine Learning ◽

Asian American ◽

Language Use ◽

The United States ◽

Machine Learning Techniques ◽

Life And Death ◽

Demographic Group ◽

Learning Techniques ◽

Inductive Machine Learning

Between 2016 and 2020, Facebook allowed advertisers in the United States to target their advertisements using three broad “ethnic affinity” categories: “African American,” “U.S.-Hispanic,” and “Asian American.” This paper uses the life and death of these “ethnic affinity” categories to argue that they exemplify a novel mode of racialisation made possible by machine learning techniques. These categories worked by analysing users’ preferences and behaviour: they were supposed to capture an “affinity” for a broad demographic group, rather than registering membership of that group. That is, they were supposed to allow advertisers to “personalise” content for users depending on behaviourally determined affinities. We argue that, in effect, Facebook’s ethnic affinity categories were supposed to operationalise a “post-racial” mode of categorising users. But the paradox of personalisation is that in order to apprehend users as individuals, platforms must first assemble them into groups based on their likenesses with other individuals. This article uses an analysis of these categories to argue that even in the absence of data on a user’s race—even after the demise of the categories themselves—users can still be subject to techniques of inclusion or exclusion for discriminatory ends. The inductive machine learning techniques that platforms like Facebook employ to classify users generate “proxies,” like racialised preferences or language use, as racialising substitutes. This article concludes by arguing that Facebook’s ethnic affinity categories in fact typify novel modes of racialisation today.

Download Full-text

Primary Factors Influencing the Decision to Vaccinate Against COVID-19 in the United States: A Predictive Analytics Approach (Preprint)

10.2196/preprints.34210 ◽

2021 ◽

Author(s):

Serkan Varol ◽

Serkan Catma ◽

Diana Reindl ◽

Elizabeth Serieux

Keyword(s):

Machine Learning ◽

Decision Making ◽

Predictive Analytics ◽

The United States ◽

Machine Learning Techniques ◽

Vaccine Hesitancy ◽

Willingness To Accept ◽

Perceptions And Attitudes ◽

Learning Techniques ◽

Vaccine Decision Making

BACKGROUND Vaccine refusal still poses a risk to reaching herd immunity in the United States. The existing literature focuses on identifying the predictors that would impact the willingness to accept (WTA) vaccines using survey data. These variables range from the socio-demographic characteristics of the participants to the perceptions and attitudes towards the vaccines so each variable’s statistical relationship with the WTA a vaccine can be investigated. However, while the results of these studies may have important implications for understanding vaccine hesitancy by offering interpretation of the statistical relationships, the prediction of vaccine decision-making has rarely been investigated OBJECTIVE We aimed to identify the factors that contribute to the prediction of COVID-19 vaccine acceptors and refusers using machine learning METHODS A nationwide survey was administered online in November, 2020 to assess American public perceptions and attitudes towards COVID-19 vaccines. Seven machine learning techniques were utilized to identify the model with the highest predictive power. Moreover, a set of variables that would contribute the most to the predictions of vaccine acceptors and refusers was identified using Gini importance based on Random Forest structure RESULTS The resulting machine learning algorithm has better prediction ability for willingness to accept (82%) versus reject (51%) a COVID-19 vaccine. In terms of predictive success, the Random Forest model outperformed the other machine learning techniques with a 69.52% accuracy rate. Worrying about (re) contracting Covid 19 and opinions regarding mandatory face covering were identified as the most important predictors of vaccine decision-making CONCLUSIONS The complexity of vaccine hesitancy needs to be investigated thoroughly before the threshold needed to reach population immunity can be achieved. Predictive analytics can help the public health officials design and deliver individually tailored vaccination programs that would increase the overall vaccine uptake.

Download Full-text

Machine Learning Techniques to Identify and Characterize Sleep Disorders Using Biosignals

Advances in Medical Technologies and Clinical Practice - Advancing the Investigation and Treatment of Sleep Disorders Using AI ◽

10.4018/978-1-7998-8018-9.ch008 ◽

2021 ◽

pp. 136-160

Author(s):

Mercedes Barrachina ◽

Laura Valenzuela López

Keyword(s):

Machine Learning ◽

Sleep Disorders ◽

Adult Population ◽

The United States ◽

Machine Learning Techniques ◽

Future Research ◽

Research Directions ◽

Learning Techniques ◽

Control And Prevention ◽

Future Research Directions

Sleep disorders are related to many different diseases, and they could have a significant impact in patients' health, causing an economic impact to the society and to the national health systems. In the United States, according to information from the Center for Disease Control and Prevention, those disorders are affecting 50-70 million in the adult population. Sleep disorders are causing annually around 40,000 deaths due to cardiovascular problems, and they cost the health system more than 16 billion. In other countries, such as in Spain, those disorders affect up to 48% of the adult population. The main objective of this chapter is to review and evaluate the different machine learning techniques utilized by researchers and medical professionals to identify, assess, and characterize sleep disorders. Moreover, some future research directions are proposed considering the evaluated area.

Download Full-text

Machine learning analysis of lifeguard flag decisions and recorded rescues

Natural Hazards and Earth System Science ◽

10.5194/nhess-19-2541-2019 ◽

2019 ◽

Vol 19 (11) ◽

pp. 2541-2549

Author(s):

Chris Houser ◽

Jacob Lehner ◽

Nathan Cherry ◽

Phil Wernette

Keyword(s):

Machine Learning ◽

The United States ◽

Machine Learning Techniques ◽

Public Health Issue ◽

Rip Currents ◽

Effective Strategies ◽

Wave Forcing ◽

Learning Techniques ◽

Yellow Flag ◽

The Impact

Abstract. Rip currents and other surf hazards are an emerging public health issue globally. Lifeguards, warning flags, and signs are important, and to varying degrees they are effective strategies to minimize risk to beach users. In the United States and other jurisdictions around the world, lifeguards use coloured flags (green, yellow, and red) to indicate whether the danger posed by the surf and rip hazard is low, moderate, or high respectively. The choice of flag depends on the lifeguard(s) monitoring the changing surf conditions along the beach and over the course of the day using both regional surf forecasts and careful observation. There is a potential that the chosen flag is not consistent with the beach user perception of the risk, which may increase the potential for rescues or drownings. In this study, machine learning is used to determine the potential for error in the flags used at Pensacola Beach and the impact of that error on the number of rescues. Results of a decision tree analysis indicate that the colour flag chosen by the lifeguards was different from what the model predicted for 35 % of days between 2004 and 2008 (n=396/1125). Days when there is a difference between the predicted and posted flag colour represent only 17 % of all rescue days, but those days are associated with ∼60 % of all rescues between 2004 and 2008. Further analysis reveals that the largest number of rescue days and total number of rescues are associated with days where the flag deployed over-estimated the surf and hazard risk, such as a red or yellow flag flying when the model predicted a green flag would be more appropriate based on the wind and wave forcing alone. While it is possible that the lifeguards were overly cautious, it is argued that they most likely identified a rip forced by a transverse-bar and rip morphology common at the study site. Regardless, the results suggest that beach users may be discounting lifeguard warnings if the flag colour is not consistent with how they perceive the surf hazard or the regional forecast. Results suggest that machine learning techniques have the potential to support lifeguards and thereby reduce the number of rescues and drownings.

Download Full-text

A Multinetwork and Machine Learning Examination of Structure and Content in the United States Code

Frontiers in Physics ◽

10.3389/fphy.2020.625241 ◽

2021 ◽

Vol 8 ◽

Author(s):

Keith Carlson ◽

Faraz Dadgostari ◽

Michael A. Livermore ◽

Daniel N. Rockmore

Keyword(s):

United States ◽

Machine Learning ◽

Topic Model ◽

Citation Network ◽

The United States ◽

Hierarchical Organization ◽

Machine Learning Techniques ◽

Prediction Task ◽

Network Layers ◽

United States Code

This paper introduces a novel linked structure-content representation of federal statutory law in the United States and analyzes and quantifies its structure using tools and concepts drawn from network analysis and complexity studies. The organizational component of our representation is based on the explicit hierarchical organization within the United States Code (USC) as well an embedded cross-reference citation network. We couple this structure with a layer of content-based similarity derived from the application of a “topic model” to the USC. The resulting representation is the first that explicitly models the USC as a “multinetwork” or “multilayered network” incorporating hierarchical structure, cross-references, and content. We report several novel descriptive statistics of this multinetwork. These include the results of this first application of the machine learning technique of topic modeling to the USC as well as multiple measures articulating the relationships between the organizational and content network layers. We find a high degree of assortativity of “titles” (the highest level hierarchy within the USC) with related topics. We also present a link prediction task and show that machine learning techniques are able to recover information about structure from content. Success in this prediction task has a natural interpretation as indicating a form of mutual information. We connect the relational findings between organization and content to a measure of “ease of search” in this large hyperlinked document that has implications for the ways in which the structure of the USC supports (or doesn’t support) broad useful access to the law. The measures developed in this paper have the potential to enable comparative work in the study of statutory networks that ranges across time and geography.

Download Full-text

A Novel Data Mining on Breast Cancer Survivability Using MLP Ensemble Learners

The Computer Journal ◽

10.1093/comjnl/bxz051 ◽

2019 ◽

Vol 63 (3) ◽

pp. 435-447

Author(s):

Mohsen Salehi ◽

Jafar Razmara ◽

Shahriar Lotfi

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Data Mining ◽

Cancer Patients ◽

Cancer Incidence ◽

The United States ◽

Machine Learning Techniques ◽

Breast Cancer Patients ◽

Stacked Generalization ◽

Learning Techniques

Abstract Breast cancer survivability has always been an important and challenging issue for researchers. Different methods have been utilized mostly based on machine learning techniques for prediction of survivability among cancer patients. The most comprehensive available database of cancer incidence is SEER in the United States, which has been frequently used for different research purposes. In this paper, a new data mining has been performed on the SEER database in order to investigate the ability of machine learning techniques for survivability prediction of breast cancer patients. To this end, the data related to breast cancer incidence have been preprocessed to remove unusable records from the dataset. In sequel, two machine learning techniques were developed based on the Multi-Layer Perceptron (MLP) learner machine including MLP stacked generalization and mixture of MLP-experts to make predictions over the database. The machines have been evaluated using K-fold cross-validation technique. The evaluation of the predictors revealed an accuracy of 84.32% and 83.86% by the mixture of MLP-experts and MLP stacked generalization methods, respectively. This indicates that the predictors can be significantly used for survivability prediction suggesting time- and cost-effective treatment for breast cancer patients.

Download Full-text

Building intelligent alarm systems by combining mathematical models and inductive machine learning techniques

International Journal of Bio-Medical Computing ◽

10.1016/0020-7101(95)01165-x ◽

1996 ◽

Vol 41 (2) ◽

pp. 107-124 ◽

Cited By ~ 10

Author(s):

Bert Müller ◽

A. Hasman ◽

J.A. Blom

Keyword(s):

Machine Learning ◽

Mathematical Models ◽

Machine Learning Techniques ◽

Alarm Systems ◽

Learning Techniques ◽

Inductive Machine Learning

Download Full-text

EXPLORING THE STACKING STATE-SPACE

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213002000897 ◽

2002 ◽

Vol 11 (02) ◽

pp. 267-282 ◽

Cited By ~ 1

Author(s):

AGAPITO LEDEZMA ◽

RICARDO ALER ◽

DANIEL BORRAJO

Keyword(s):

Machine Learning ◽

State Space ◽

Linear Response ◽

Search Space ◽

Learning Systems ◽

Hill Climbing ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Inductive Machine Learning ◽

Simple Search

Nowadays, there is no doubt that machine learning techniques can be successfully applied to data mining tasks. Currently, the combination of several classifiers is one of the most active fields within inductive machine learning. Examples of such techniques are boosting, bagging and stacking. From these three techniques, stacking is perhaps the less used one. One of the main reasons for this relates to the difficulty to define and parameterize its components: selecting which combination of base classifiers to use, and which classifier to use as the meta-classifier. One could use for that purpose simple search methods (e.g. hill climbing), or more complex ones (e.g. genetic algorithms). But before search is attempted, it is important to know the properties of the search space itself. In this paper we study exhaustively the space of Stacking systems that can be built by using four base learning systems: C4.5, IB1, Naive Bayes, and PART. We have also used the Multiple Linear Response (MLR) as meta-classifier. The properties of this state-space obtained in this paper will be useful for designing new Stacking-based algorithms and tools.

Download Full-text

Quantification of toxic metallic elements using machine learning techniques and spark emission spectroscopy

10.5194/amt-2019-377 ◽

2019 ◽

Author(s):

Seyyed Ali Davari ◽

Anthony S. Wexler

Keyword(s):

Machine Learning ◽

Real Time ◽

Emission Spectroscopy ◽

Cost Effective ◽

The United States ◽

Machine Learning Techniques ◽

Metallic Elements ◽

Learning Techniques ◽

Single Feature ◽

Us Epa

Abstract. The United States Environmental Protection Agency (US EPA) list of Hazardous Air Pollutants (HAPs) includes metal elements suspected or associated with development of cancer. Traditional techniques for detecting and quantifying toxic metallic elements in the atmosphere are either not real time, hindering identification of sources, or limited by instrument costs. Spark emission spectroscopy is a promising and cost effective technique that can be used for analyzing toxic metallic elements in real time. Here, we have developed a cost-effective spark emission spectroscopy system to quantify the concentration of toxic metallic elements targeted by US EPA. Specifically, Cr, Cu, Ni, and Pb solutions were diluted and deposited on the ground electrode of the spark emission system. Least Absolute Shrinkage and Selection Operator (LASSO) was optimized and employed to detect useful features from the spark-generated plasma emissions. The optimized model was able to detect atomic emission lines along with other features to build a regression model that predicts the concentration of toxic metallic elements from the observed spectra. The limits of detections (LOD) were estimated using the detected features and compared to the traditional single-feature approach. LASSO is capable of detecting highly sensitive features in the input spectrum; however for some elements the single-feature LOD marginally outperforms LASSO LOD. The combination of low cost instruments with advanced machine learning techniques for data analysis could pave the path forward for data driven solutions to costly measurements.

Download Full-text

Quantification of toxic metals using machine learning techniques and spark emission spectroscopy

Atmospheric Measurement Techniques ◽

10.5194/amt-13-5369-2020 ◽

2020 ◽

Vol 13 (10) ◽

pp. 5369-5377

Author(s):

Seyyed Ali Davari ◽

Anthony S. Wexler

Keyword(s):

Machine Learning ◽

Real Time ◽

Toxic Metals ◽

Emission Spectroscopy ◽

Cost Effective ◽

The United States ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Single Feature ◽

Us Epa

Abstract. The United States Environmental Protection Agency (US EPA) list of hazardous air pollutants (HAPs) includes toxic metal suspected or associated with development of cancer. Traditional techniques for detecting and quantifying toxic metals in the atmosphere are either not real time, hindering identification of sources, or limited by instrument costs. Spark emission spectroscopy is a promising and cost-effective technique that can be used for analyzing toxic metals in real time. Here, we have developed a cost-effective spark emission spectroscopy system to quantify the concentration of toxic metals targeted by the US EPA. Specifically, Cr, Cu, Ni, and Pb solutions were diluted and deposited on the ground electrode of the spark emission system. The least absolute shrinkage and selection operator (LASSO) was optimized and employed to detect useful features from the spark-generated plasma emissions. The optimized model was able to detect atomic emission lines along with other features to build a regression model that predicts the concentration of toxic metals from the observed spectra. The limits of detections (LODs) were estimated using the detected features and compared to the traditional single-feature approach. LASSO is capable of detecting highly sensitive features in the input spectrum; however, for some toxic metals the single-feature LOD marginally outperforms LASSO LOD. The combination of low-cost instruments with advanced machine learning techniques for data analysis could pave the path forward for data-driven solutions to costly measurements.

Download Full-text

Predicting wildfire burned area in South Central US using integrated machine learning techniques

10.5194/acp-2019-885 ◽

2019 ◽

Author(s):

Sing-Chun Wang ◽

Yuxuan Wang

Keyword(s):

United States ◽

Machine Learning ◽

The United States ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Environmental Drivers ◽

Fire Season ◽

Burned Area ◽

Environmental Controls ◽

South Central

Abstract. Occurrences of devastating wildfires have been on the rise in the United States for the past decades. While the environmental controls, including weather, climate, and fuels, are known to play important roles in controlling wildfires, the interrelationships between fires and the environmental controls are highly complex and may not be well represented by traditional parametric regressions. Here we develop a model integrating multiple machine learning algorithms to predict gridded monthly wildfire burned area during 2002–2015 over the South Central United States and identify the relative importance of the environmental drivers on the burned area for both the winter-spring and summer fire seasons of that region. The developed model is able to alleviate the issue of unevenly-distributed burned area data and achieve a cross-validation (CV) R2 value of 0.42 and 0.40 for the two fire seasons. For the total burned area over the study domain, the model can explain 50 % and 79 % of interannual total burned area for the winter-spring and summer fire season, respectively. The prediction model ranks relative humidity (RH) anomalies and preceding months’ drought severity as the top two most important predictors on the gridded burned area for both fire seasons. Sensitivity experiments with the model show that the effect of climate change represented by a group of climate-anomaly variables contributes the most to the burned area for both fire seasons. Antecedent fuel amount and conditions are found to outweigh weather effects for the burned area in the winter-spring fire season, while the current-month fire weather is more important for the summer fire season likely due to the controlling effect of weather on fuel moisture in this season. This developed model allows us to predict gridded burned area and to access specific fire management strategies for different fire mechanisms in the two seasons.

Download Full-text