A World Full of Stereotypes? Further Investigation on Origin and Gender Bias in Multi-Lingual Word Embeddings

Publicly available off-the-shelf word embeddings that are often used in productive applications for natural language processing have been proven to be biased. We have previously shown that this bias can come in different forms, depending on the language and the cultural context. In this work, we extend our previous work and further investigate how bias varies in different languages. We examine Italian and Swedish word embeddings for gender and origin bias, and demonstrate how an origin bias concerning local migration groups in Switzerland is included in German word embeddings. We propose BiasWords, a method to automatically detect new forms of bias. Finally, we discuss how cultural and language aspects are relevant to the impact of bias on the application and to potential mitigation measures.

Download Full-text

Reducing Racial and Gender Bias in Machine Learning and Natural Language Processing Tasks Using a GAN Approach

International Journal of High School Research ◽

10.36838/v3i6.5 ◽

2021 ◽

Vol 3 (6) ◽

pp. 17-24

Author(s):

Isabella S. Mandis

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Gender Bias ◽

Language Processing ◽

And Gender

Download Full-text

Quantification of Gender Bias and Sentiment Toward Political Leaders Over 20 Years of Kenyan News Using Natural Language Processing

Frontiers in Psychology ◽

10.3389/fpsyg.2021.712646 ◽

2021 ◽

Vol 12 ◽

Author(s):

Emma Pair ◽

Nikitha Vicas ◽

Ann M. Weber ◽

Valerie Meausoone ◽

James Zou ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Sentiment Analysis ◽

Gender Equality ◽

Gender Bias ◽

Language Processing ◽

Temporal Trends ◽

Constitutional Amendment ◽

Word Embeddings ◽

Political Leaders

Background: Despite a 2010 Kenyan constitutional amendment limiting members of elected public bodies to < two-thirds of the same gender, only 22 percent of the 12th Parliament members inaugurated in 2017 were women. Investigating gender bias in the media is a useful tool for understanding socio-cultural barriers to implementing legislation for gender equality. Natural language processing (NLP) methods, such as word embedding and sentiment analysis, can efficiently quantify media biases at a scope previously unavailable in the social sciences.Methods: We trained GloVe and word2vec word embeddings on text from 1998 to 2019 from Kenya’s Daily Nation newspaper. We measured gender bias in these embeddings and used sentiment analysis to predict quantitative sentiment scores for sentences surrounding female leader names compared to male leader names.Results: Bias in leadership words for men and women measured from Daily Nation word embeddings corresponded to temporal trends in men and women’s participation in political leadership (i.e., parliamentary seats) using GloVe (correlation 0.8936, p = 0.0067, r2 = 0.799) and word2vec (correlation 0.844, p = 0.0169, r2 = 0.712) algorithms. Women continue to be associated with domestic terms while men continue to be associated with influence terms, for both regular gender words and female and male political leaders’ names. Male words (e.g., he, him, man) were mentioned 1.84 million more times than female words from 1998 to 2019. Sentiment analysis showed an increase in relative negative sentiment associated with female leaders (p = 0.0152) and an increase in positive sentiment associated with male leaders over time (p = 0.0216).Conclusion: Natural language processing is a powerful method for gaining insights into and quantifying trends in gender biases and sentiment in news media. We found evidence of improvement in gender equality but also a backlash from increased female representation in high-level governmental leadership.

Download Full-text

A Natural Language Processing Approach to Measuring Treatment Adherence and Consistency Using Semantic Similarity

AERA Open ◽

10.1177/23328584211028615 ◽

2021 ◽

Vol 7 ◽

pp. 233285842110286

Author(s):

Kylie L. Anglin ◽

Vivian C. Wong ◽

Arielle Boguslav

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Semantic Similarity ◽

Language Processing ◽

Intervention Implementation ◽

Proof Of Concept ◽

Coaching Intervention ◽

Processing Techniques ◽

Teacher Coaching ◽

The Impact

Though there is widespread recognition of the importance of implementation research, evaluators often face intense logistical, budgetary, and methodological challenges in their efforts to assess intervention implementation in the field. This article proposes a set of natural language processing techniques called semantic similarity as an innovative and scalable method of measuring implementation constructs. Semantic similarity methods are an automated approach to quantifying the similarity between texts. By applying semantic similarity to transcripts of intervention sessions, researchers can use the method to determine whether an intervention was delivered with adherence to a structured protocol, and the extent to which an intervention was replicated with consistency across sessions, sites, and studies. This article provides an overview of semantic similarity methods, describes their application within the context of educational evaluations, and provides a proof of concept using an experimental study of the impact of a standardized teacher coaching intervention.

Download Full-text

Application of natural language processing methods to extract coded data from administrative data held in the Scottish Prescribing Information System

International Journal for Population Data Science ◽

10.23889/ijpds.v1i1.263 ◽

2017 ◽

Vol 1 (1) ◽

Author(s):

Clifford Nangle ◽

Stuart McTaggart ◽

Margaret MacLeod ◽

Jackie Caldwell ◽

Marion Bennie

Keyword(s):

Information System ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Drug Exposure ◽

Drug Dose ◽

Free Text ◽

Wide Range ◽

The Impact ◽

Prescribing Information

ABSTRACT ObjectivesThe Prescribing Information System (PIS) datamart, hosted by NHS National Services Scotland receives around 90 million electronic prescription messages per year from GP practices across Scotland. Prescription messages contain information including drug name, quantity and strength stored as coded, machine readable, data while prescription dose instructions are unstructured free text and difficult to interpret and analyse in volume. The aim, using Natural Language Processing (NLP), was to extract drug dose amount, unit and frequency metadata from freely typed text in dose instructions to support calculating the intended number of days’ treatment. This then allows comparison with actual prescription frequency, treatment adherence and the impact upon prescribing safety and effectiveness. ApproachAn NLP algorithm was developed using the Ciao implementation of Prolog to extract dose amount, unit and frequency metadata from dose instructions held in the PIS datamart for drugs used in the treatment of gastrointestinal, cardiovascular and respiratory disease. Accuracy estimates were obtained by randomly sampling 0.1% of the distinct dose instructions from source records, comparing these with metadata extracted by the algorithm and an iterative approach was used to modify the algorithm to increase accuracy and coverage. ResultsThe NLP algorithm was applied to 39,943,465 prescription instructions issued in 2014, consisting of 575,340 distinct dose instructions. For drugs used in the gastrointestinal, cardiovascular and respiratory systems (i.e. chapters 1, 2 and 3 of the British National Formulary (BNF)) the NLP algorithm successfully extracted drug dose amount, unit and frequency metadata from 95.1%, 98.5% and 97.4% of prescriptions respectively. However, instructions containing terms such as ‘as directed’ or ‘as required’ reduce the usability of the metadata by making it difficult to calculate the total dose intended for a specific time period as 7.9%, 0.9% and 27.9% of dose instructions contained terms meaning ‘as required’ while 3.2%, 3.7% and 4.0% contained terms meaning ‘as directed’, for drugs used in BNF chapters 1, 2 and 3 respectively. ConclusionThe NLP algorithm developed can extract dose, unit and frequency metadata from text found in prescriptions issued to treat a wide range of conditions and this information may be used to support calculating treatment durations, medicines adherence and cumulative drug exposure. The presence of terms such as ‘as required’ and ‘as directed’ has a negative impact on the usability of the metadata and further work is required to determine the level of impact this has on calculating treatment durations and cumulative drug exposure.

Download Full-text

Gender Bias in Natural Language Processing Across Human Languages

10.18653/v1/2021.trustnlp-1.6 ◽

2021 ◽

Author(s):

Abigail Matthews ◽

Isabella Grasso ◽

Christopher Mahoney ◽

Yan Chen ◽

Esma Wali ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Gender Bias ◽

Language Processing

Download Full-text

Money Makes the World Go Frowned. Analyzing the Impact of Chinese Foreign Aid on States’ Sentiment Using Natural Language Processing

Chinas Rolle in einer neuen Weltordnung ◽

10.5771/9783828876361-241 ◽

2021 ◽

pp. 241-264

Author(s):

Dennis Hammerschmidt ◽

Cosima Meyer

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Foreign Aid ◽

Language Processing ◽

The World ◽

Chinese Foreign Aid ◽

The Impact

Download Full-text

Analysis of the Impact of the US Presidential Election on the US Economy Based on Natural Language Processing and Big Data

Computational and Experimental Simulations in Engineering - Mechanisms and Machine Science ◽

10.1007/978-3-030-67090-0_39 ◽

2021 ◽

pp. 483-494

Author(s):

Mingzhen Li ◽

Xiangdong Liu

Keyword(s):

Big Data ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Presidential Election ◽

Us Economy ◽

The Us ◽

Us Presidential Election ◽

The Impact

Download Full-text

A Natural Language Processing Approach to Automated Highlighting of New Information in Clinical Notes

Applied Sciences ◽

10.3390/app10082824 ◽

2020 ◽

Vol 10 (8) ◽

pp. 2824

Author(s):

Yu-Hsiang Su ◽

Ching-Ping Chao ◽

Ling-Chien Hung ◽

Sheng-Feng Sung ◽

Pei-Ju Lee

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Task Performance ◽

Language Processing ◽

Automated Identification ◽

Clinical Notes ◽

New Information ◽

Perceived Workload ◽

The Impact ◽

User Experiment

Electronic medical records (EMRs) have been used extensively in most medical institutions for more than a decade in Taiwan. However, information overload associated with rapid accumulation of large amounts of clinical narratives has threatened the effective use of EMRs. This situation is further worsened by the use of “copying and pasting”, leading to lots of redundant information in clinical notes. This study aimed to apply natural language processing techniques to address this problem. New information in longitudinal clinical notes was identified based on a bigram language model. The accuracy of automated identification of new information was evaluated using expert annotations as the reference standard. A two-stage cross-over user experiment was conducted to evaluate the impact of highlighting of new information on task demands, task performance, and perceived workload. The automated method identified new information with an F1 score of 0.833. The user experiment found a significant decrease in perceived workload associated with a significantly higher task performance. In conclusion, automated identification of new information in clinical notes is feasible and practical. Highlighting of new information enables healthcare professionals to grasp key information from clinical notes with less perceived workload.

Download Full-text

Multi-Sense Embeddings per Word

10.31219/osf.io/udfhn ◽

2020 ◽

Author(s):

Masashi Sugiyama

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Research Area ◽

Word Embedding ◽

The Other ◽

Word Embeddings ◽

Word Similarity ◽

Better Than ◽

Non Parametric

Recently, word embeddings have been used in many natural language processing problems successfully and how to train a robust and accurate word embedding system efficiently is a popular research area. Since many, if not all, words have more than one sense, it is necessary to learn vectors for all senses of word separately. Therefore, in this project, we have explored two multi-sense word embedding models, including Multi-Sense Skip-gram (MSSG) model and Non-parametric Multi-sense Skip Gram model (NP-MSSG). Furthermore, we propose an extension of the Multi-Sense Skip-gram model called Incremental Multi-Sense Skip-gram (IMSSG) model which could learn the vectors of all senses per word incrementally. We evaluate all the systems on word similarity task and show that IMSSG is better than the other models.

Download Full-text

Computationally Efficient Learning of Quality Controlled Word Embeddings for Natural Language Processing

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) ◽

10.1109/isvlsi.2019.00033 ◽

2019 ◽

Author(s):

Mohammed Alawad ◽

Georgia Tourassi

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Word Embeddings ◽

Computationally Efficient ◽

Efficient Learning

Download Full-text