Effectiveness of Word Embeddings on Classifiers: A Case Study with Tweets

Author(s):  
Sukanya Manna ◽  
Haruto Nakai
Keyword(s):  
Author(s):  
Pablo Badilla ◽  
Felipe Bravo-Marquez ◽  
Jorge Pérez

Word embeddings are known to exhibit stereotypical biases towards gender, race, religion, among other criteria. Severa fairness metrics have been proposed in order to automatically quantify these biases. Although all metrics have a similar objective, the relationship between them is by no means clear. Two issues that prevent a clean comparison is that they operate with different inputs, and that their outputs are incompatible with each other. In this paper we propose WEFE, the word embeddings fairness evaluation framework, to encapsulate, evaluate and compare fairness metrics. Our framework needs a list of pre-trained embeddings and a set of fairness criteria, and it is based on checking correlations between fairness rankings induced by these criteria. We conduct a case study showing that rankings produced by existing fairness methods tend to correlate when measuring gender bias. This correlation is considerably less for other biases like race or religion. We also compare the fairness rankings with an embedding benchmark showing that there is no clear correlation between fairness and good performance in downstream tasks.


2019 ◽  
Vol 9 (18) ◽  
pp. 3648
Author(s):  
Casper S. Shikali ◽  
Zhou Sijie ◽  
Liu Qihe ◽  
Refuoe Mokhosi

Deep learning has extensively been used in natural language processing with sub-word representation vectors playing a critical role. However, this cannot be said of Swahili, which is a low resource and widely spoken language in East and Central Africa. This study proposed novel word embeddings from syllable embeddings (WEFSE) for Swahili to address the concern of word representation for agglutinative and syllabic-based languages. Inspired by the learning methodology of Swahili in beginner classes, we encoded respective syllables instead of characters, character n-grams or morphemes of words and generated quality word embeddings using a convolutional neural network. The quality of WEFSE was demonstrated by the state-of-art results in the syllable-aware language model on both the small dataset (31.229 perplexity value) and the medium dataset (45.859 perplexity value), outperforming character-aware language models. We further evaluated the word embeddings using word analogy task. To the best of our knowledge, syllabic alphabets have not been used to compose the word representation vectors. Therefore, the main contributions of the study are a syllabic alphabet, WEFSE, a syllabic-aware language model and a word analogy dataset for Swahili.


2021 ◽  
Author(s):  
Srijan Bansal ◽  
Vishal Garimella ◽  
Ayush Suhane ◽  
Animesh Mukherjee

2021 ◽  
Author(s):  
Arthur T. Estrella ◽  
João B. O. Souza Filho

Neural machine translation (NMT) nowadays requires an increasing amount of data and computational power, so succeeding in this task with limited data and using a single GPU might be challenging. Strategies such as the use of pre-trained word embeddings, subword embeddings, and data augmentation solutions can potentially address some issues faced in low-resource experimental settings, but their impact on the quality of translations is unclear. This work evaluates some of these strategies on two low-resource experiments beyond just reporting BLEU: errors are categorized on the Portuguese-English pair with the help of a translator, considering semantic and syntactic aspects. The BPE subword approach has shown to be the most effective solution, allowing a BLEU increase of 59% p.p. compared to the standard Transformer.


2021 ◽  
Vol 11 (22) ◽  
pp. 11018
Author(s):  
Xianwen Liao ◽  
Yongzhong Huang ◽  
Changfu Wei ◽  
Chenhao Zhang ◽  
Yongqing Deng ◽  
...  

Obtaining high-quality embeddings of out-of-vocabularies (OOVs) and low-frequency words is a challenge in natural language processing (NLP). To efficiently estimate the embeddings of OOVs and low-frequency words, we propose a new method that uses the dictionary to estimate the embeddings of OOVs and low-frequency words. More specifically, the explanatory note of an entry in dictionaries accurately describes the semantics of the corresponding word. Naturally, we adopt the sentence representation model to extract the semantics of the explanatory note and regard the semantics as the embedding of the corresponding word. We design a new sentence representation model to encode sentences to extract the semantics from the explanatory notes of entries more efficiently. Based on the assumption that the higher quality of word embeddings will lead to better performance, we design an extrinsic experiment to evaluate the quality of low-frequency words’ embeddings. The experimental results show that the embeddings of low-frequency words estimated by our proposed method have higher quality. In addition, both intrinsic and extrinsic experiments show that our proposed sentence representation model can represent the semantics of sentences well.


2014 ◽  
Vol 38 (01) ◽  
pp. 102-129
Author(s):  
ALBERTO MARTÍN ÁLVAREZ ◽  
EUDALD CORTINA ORERO

AbstractUsing interviews with former militants and previously unpublished documents, this article traces the genesis and internal dynamics of the Ejército Revolucionario del Pueblo (People's Revolutionary Army, ERP) in El Salvador during the early years of its existence (1970–6). This period was marked by the inability of the ERP to maintain internal coherence or any consensus on revolutionary strategy, which led to a series of splits and internal fights over control of the organisation. The evidence marshalled in this case study sheds new light on the origins of the armed Salvadorean Left and thus contributes to a wider understanding of the processes of formation and internal dynamics of armed left-wing groups that emerged from the 1960s onwards in Latin America.


2020 ◽  
Vol 43 ◽  
Author(s):  
Michael Lifshitz ◽  
T. M. Luhrmann

Abstract Culture shapes our basic sensory experience of the world. This is particularly striking in the study of religion and psychosis, where we and others have shown that cultural context determines both the structure and content of hallucination-like events. The cultural shaping of hallucinations may provide a rich case-study for linking cultural learning with emerging prediction-based models of perception.


Sign in / Sign up

Export Citation Format

Share Document