Effectiveness of Word Embeddings on Classifiers: A Case Study with Tweets

Word embeddings are known to exhibit stereotypical biases towards gender, race, religion, among other criteria. Severa fairness metrics have been proposed in order to automatically quantify these biases. Although all metrics have a similar objective, the relationship between them is by no means clear. Two issues that prevent a clean comparison is that they operate with different inputs, and that their outputs are incompatible with each other. In this paper we propose WEFE, the word embeddings fairness evaluation framework, to encapsulate, evaluate and compare fairness metrics. Our framework needs a list of pre-trained embeddings and a set of fairness criteria, and it is based on checking correlations between fairness rankings induced by these criteria. We conduct a case study showing that rankings produced by existing fairness methods tend to correlate when measuring gender bias. This correlation is considerably less for other biases like race or religion. We also compare the fairness rankings with an embedding benchmark showing that there is no clear correlation between fairness and good performance in downstream tasks.

Download Full-text

Better Word Representation Vectors Using Syllabic Alphabet: A Case Study of Swahili

Applied Sciences ◽

10.3390/app9183648 ◽

2019 ◽

Vol 9 (18) ◽

pp. 3648

Author(s):

Casper S. Shikali ◽

Zhou Sijie ◽

Liu Qihe ◽

Refuoe Mokhosi

Keyword(s):

Language Processing ◽

Critical Role ◽

Language Model ◽

Central Africa ◽

Spoken Language ◽

Language Models ◽

Word Embeddings ◽

Word Representation

Deep learning has extensively been used in natural language processing with sub-word representation vectors playing a critical role. However, this cannot be said of Swahili, which is a low resource and widely spoken language in East and Central Africa. This study proposed novel word embeddings from syllable embeddings (WEFSE) for Swahili to address the concern of word representation for agglutinative and syllabic-based languages. Inspired by the learning methodology of Swahili in beginner classes, we encoded respective syllables instead of characters, character n-grams or morphemes of words and generated quality word embeddings using a convolutional neural network. The quality of WEFSE was demonstrated by the state-of-art results in the syllable-aware language model on both the small dataset (31.229 perplexity value) and the medium dataset (45.859 perplexity value), outperforming character-aware language models. We further evaluated the word embeddings using word analogy task. To the best of our knowledge, syllabic alphabets have not been used to compose the word representation vectors. Therefore, the main contributions of the study are a syllabic alphabet, WEFSE, a syllabic-aware language model and a word analogy dataset for Swahili.

Download Full-text

Debiasing Multilingual Word Embeddings: A Case Study of Three Indian Languages

10.1145/3465336.3475118 ◽

2021 ◽

Author(s):

Srijan Bansal ◽

Vishal Garimella ◽

Ayush Suhane ◽

Animesh Mukherjee

Keyword(s):

Word Embeddings ◽

Indian Languages

Download Full-text

Tackling neural machine translation in low-resource settings: a Portuguese case study

10.5753/stil.2021.17807 ◽

2021 ◽

Author(s):

Arthur T. Estrella ◽

João B. O. Souza Filho

Keyword(s):

Machine Translation ◽

Data Augmentation ◽

Word Embeddings ◽

Effective Solution ◽

Computational Power ◽

Limited Data ◽

Neural Machine Translation ◽

Low Resource

Neural machine translation (NMT) nowadays requires an increasing amount of data and computational power, so succeeding in this task with limited data and using a single GPU might be challenging. Strategies such as the use of pre-trained word embeddings, subword embeddings, and data augmentation solutions can potentially address some issues faced in low-resource experimental settings, but their impact on the quality of translations is unclear. This work evaluates some of these strategies on two low-resource experiments beyond just reporting BLEU: errors are categorized on the Portuguese-English pair with the help of a translator, considering semantic and syntactic aspects. The BPE subword approach has shown to be the most effective solution, allowing a BLEU increase of 59% p.p. compared to the standard Transformer.

Download Full-text

Code-Switching Language Modeling with Bilingual Word Embeddings: A Case Study for Egyptian Arabic-English

Speech and Computer - Lecture Notes in Computer Science ◽

10.1007/978-3-030-26061-3_17 ◽

2019 ◽

pp. 160-170

Author(s):

Injy Hamed ◽

Moritz Zhu ◽

Mohamed Elmahdy ◽

Slim Abdennadher ◽

Ngoc Thang Vu

Keyword(s):

Language Modeling ◽

Code Switching ◽

Word Embeddings ◽

Egyptian Arabic

Download Full-text

Efficient Estimate of Low-Frequency Words’ Embeddings Based on the Dictionary: A Case Study on Chinese

Applied Sciences ◽

10.3390/app112211018 ◽

2021 ◽

Vol 11 (22) ◽

pp. 11018

Author(s):

Xianwen Liao ◽

Yongzhong Huang ◽

Changfu Wei ◽

Chenhao Zhang ◽

Yongqing Deng ◽

...

Keyword(s):

Natural Language Processing ◽

Language Processing ◽

Low Frequency ◽

Word Embeddings ◽

High Quality ◽

Representation Model ◽

Explanatory Note ◽

Efficient Estimate

Obtaining high-quality embeddings of out-of-vocabularies (OOVs) and low-frequency words is a challenge in natural language processing (NLP). To efficiently estimate the embeddings of OOVs and low-frequency words, we propose a new method that uses the dictionary to estimate the embeddings of OOVs and low-frequency words. More specifically, the explanatory note of an entry in dictionaries accurately describes the semantics of the corresponding word. Naturally, we adopt the sentence representation model to extract the semantics of the explanatory note and regard the semantics as the embedding of the corresponding word. We design a new sentence representation model to encode sentences to extract the semantics from the explanatory notes of entries more efficiently. Based on the assumption that the higher quality of word embeddings will lead to better performance, we design an extrinsic experiment to evaluate the quality of low-frequency words’ embeddings. The experimental results show that the embeddings of low-frequency words estimated by our proposed method have higher quality. In addition, both intrinsic and extrinsic experiments show that our proposed sentence representation model can represent the semantics of sentences well.

Download Full-text

Does the Geometry of Word Embeddings Help Document Classification? A Case Study on Persistent Homology-Based Representations

10.18653/v1/w17-2628 ◽

2017 ◽

Cited By ~ 1

Author(s):

Paul Michel ◽

Abhilasha Ravichander ◽

Shruti Rijhwani

Keyword(s):

Persistent Homology ◽

Document Classification ◽

Word Embeddings

Download Full-text

The Genesis and Internal Dynamics of El Salvador's People's Revolutionary Army, 1970–1976

Behavioral and Brain Sciences ◽

10.1017/s0140525x13009850 ◽

2014 ◽

Vol 38 (01) ◽

pp. 102-129

Author(s):

ALBERTO MARTÍN ÁLVAREZ ◽

EUDALD CORTINA ORERO

Keyword(s):

Latin America ◽

El Salvador ◽

Early Years ◽

Left Wing ◽

Internal Dynamics ◽

Internal Coherence ◽

The 1960S

AbstractUsing interviews with former militants and previously unpublished documents, this article traces the genesis and internal dynamics of the Ejército Revolucionario del Pueblo (People's Revolutionary Army, ERP) in El Salvador during the early years of its existence (1970–6). This period was marked by the inability of the ERP to maintain internal coherence or any consensus on revolutionary strategy, which led to a series of splits and internal fights over control of the organisation. The evidence marshalled in this case study sheds new light on the origins of the armed Salvadorean Left and thus contributes to a wider understanding of the processes of formation and internal dynamics of armed left-wing groups that emerged from the 1960s onwards in Latin America.

Download Full-text

Culture and the plasticity of perception

Behavioral and Brain Sciences ◽

10.1017/s0140525x19002887 ◽

2020 ◽

Vol 43 ◽

Author(s):

Michael Lifshitz ◽

T. M. Luhrmann

Keyword(s):

Cultural Context ◽

Sensory Experience ◽

Cultural Learning ◽

The World

Abstract Culture shapes our basic sensory experience of the world. This is particularly striking in the study of religion and psychosis, where we and others have shown that cultural context determines both the structure and content of hallucination-like events. The cultural shaping of hallucinations may provide a rich case-study for linking cultural learning with emerging prediction-based models of perception.

Download Full-text

Effectiveness of Word Embeddings on Classifiers: A Case Study with Tweets

Lexical and Morpho-syntactic Features in Word Embeddings - A Case Study of Nouns in Swedish

WEFE: The Word Embeddings Fairness Evaluation Framework

Better Word Representation Vectors Using Syllabic Alphabet: A Case Study of Swahili

Debiasing Multilingual Word Embeddings: A Case Study of Three Indian Languages

Tackling neural machine translation in low-resource settings: a Portuguese case study

Code-Switching Language Modeling with Bilingual Word Embeddings: A Case Study for Egyptian Arabic-English

Efficient Estimate of Low-Frequency Words’ Embeddings Based on the Dictionary: A Case Study on Chinese

Does the Geometry of Word Embeddings Help Document Classification? A Case Study on Persistent Homology-Based Representations

The Genesis and Internal Dynamics of El Salvador's People's Revolutionary Army, 1970–1976

Culture and the plasticity of perception

Export Citation Format