Mitigating Gender Bias in Machine Learning Data Sets

Learning algorithms are central to pattern recognition, artificial intelligence, machine learning, data mining, and statistical learning. The term often implies analysis of large and complex data sets with minimal human intervention. Bayesian learning has been variously described as a method of updating opinion based on new experience, updating parameters of a process model based on data, modelling and analysis of complex phenomena using multiple sources of information, posterior probabilistic expectation, and so on. In all of these guises, it has exploded in popularity over recent years.

Download Full-text

Gender bias in machine learning for sentiment analysis

Online Information Review ◽

10.1108/oir-05-2017-0153 ◽

2018 ◽

Vol 42 (3) ◽

pp. 343-354 ◽

Cited By ~ 3

Author(s):

Mike Thelwall

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Gender Bias ◽

Training Data ◽

Data Sets ◽

Content Type ◽

Female Authors ◽

Gender Biases ◽

Mixed Gender ◽

Gender Specific

Purpose The purpose of this paper is to investigate whether machine learning induces gender biases in the sense of results that are more accurate for male authors or for female authors. It also investigates whether training separate male and female variants could improve the accuracy of machine learning for sentiment analysis. Design/methodology/approach This paper uses ratings-balanced sets of reviews of restaurants and hotels (3 sets) to train algorithms with and without gender selection. Findings Accuracy is higher on female-authored reviews than on male-authored reviews for all data sets, so applications of sentiment analysis using mixed gender data sets will over represent the opinions of women. Training on same gender data improves performance less than having additional data from both genders. Practical implications End users of sentiment analysis should be aware that its small gender biases can affect the conclusions drawn from it and apply correction factors when necessary. Users of systems that incorporate sentiment analysis should be aware that performance will vary by author gender. Developers do not need to create gender-specific algorithms unless they have more training data than their system can cope with. Originality/value This is the first demonstration of gender bias in machine learning sentiment analysis.

Download Full-text

A monitoring system to prepare machine learning data sets for earthquake prediction based on seismic-acoustic signals

2015 9th International Conference on Application of Information and Communication Technologies (AICT) ◽

10.1109/icaict.2015.7338513 ◽

2015 ◽

Author(s):

Alper Vahaplar ◽

Baris Tekin Tezel ◽

Resmiye Nasiboglu ◽

Efendi Nasibov

Keyword(s):

Machine Learning ◽

Monitoring System ◽

Earthquake Prediction ◽

Acoustic Signals ◽

Data Sets ◽

Learning Data

Download Full-text

Bayesian Modelling for Machine Learning

Intelligent Information Technologies ◽

10.4018/978-1-59904-941-0.ch024 ◽

2011 ◽

pp. 421-429

Author(s):

Paul Rippon ◽

Kerrie Mengersen

Keyword(s):

Machine Learning ◽

Process Model ◽

Bayesian Learning ◽

Data Sets ◽

Complex Data ◽

Sources Of Information ◽

Multiple Sources ◽

Complex Data Sets ◽

Complex Phenomena ◽

Learning Data

Learning algorithms are central to pattern recognition, artificial intelligence, machine learning, data mining, and statistical learning. The term often implies analysis of large and complex data sets with minimal human intervention. Bayesian learning has been variously described as a method of updating opinion based on new experience, updating parameters of a process model based on data, modelling and analysis of complex phenomena using multiple sources of information, posterior probabilistic expectation, and so on. In all of these guises, it has exploded in popularity over recent years.

Download Full-text

A TOPSIS Data Mining Demonstration and Application to Credit Scoring

Data Warehousing and Mining ◽

10.4018/978-1-59904-951-9.ch112 ◽

2008 ◽

pp. 1877-1887

Author(s):

Desheng Wu ◽

David L. Olson

Keyword(s):

Machine Learning ◽

Data Mining ◽

Credit Scoring ◽

Multiple Criteria Decision Analysis ◽

Data Sets ◽

Ideal Solution ◽

Explanatory Variables ◽

Large Sets ◽

Order Preference ◽

Learning Data

The technique for order preference by similarity to ideal solution (TOPSIS) is a technique that can consider any number of measures, seeking to identify solutions close to an ideal and far from a nadir solution. TOPSIS has traditionally been applied in multiple criteria decision analysis. In this paper we propose an approach to develop a TOPSIS classifier. We demonstrate its use in credit scoring, providing a way to deal with large sets of data using machine learning. Data sets often contain many potential explanatory variables, some preferably minimized, some preferably maximized. Results are favorable by a comparison with traditional data mining techniques of decision trees. Proposed models are validated using Mont Carlo simulation.

Download Full-text