Lexical and Morpho-syntactic Features in Word Embeddings - A Case Study of Nouns in Swedish

WEFE: The Word Embeddings Fairness Evaluation Framework

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/60 ◽

2020 ◽

Author(s):

Pablo Badilla ◽

Felipe Bravo-Marquez ◽

Jorge Pérez

Keyword(s):

Gender Bias ◽

Evaluation Framework ◽

Clear Correlation ◽

Word Embeddings ◽

The Relationship

Word embeddings are known to exhibit stereotypical biases towards gender, race, religion, among other criteria. Severa fairness metrics have been proposed in order to automatically quantify these biases. Although all metrics have a similar objective, the relationship between them is by no means clear. Two issues that prevent a clean comparison is that they operate with different inputs, and that their outputs are incompatible with each other. In this paper we propose WEFE, the word embeddings fairness evaluation framework, to encapsulate, evaluate and compare fairness metrics. Our framework needs a list of pre-trained embeddings and a set of fairness criteria, and it is based on checking correlations between fairness rankings induced by these criteria. We conduct a case study showing that rankings produced by existing fairness methods tend to correlate when measuring gender bias. This correlation is considerably less for other biases like race or religion. We also compare the fairness rankings with an embedding benchmark showing that there is no clear correlation between fairness and good performance in downstream tasks.

Download Full-text

Better Word Representation Vectors Using Syllabic Alphabet: A Case Study of Swahili

Applied Sciences ◽

10.3390/app9183648 ◽

2019 ◽

Vol 9 (18) ◽

pp. 3648

Author(s):

Casper S. Shikali ◽

Zhou Sijie ◽

Liu Qihe ◽

Refuoe Mokhosi

Keyword(s):

Language Processing ◽

Critical Role ◽

Language Model ◽

Central Africa ◽

Spoken Language ◽

Language Models ◽

Word Embeddings ◽

Word Representation

Deep learning has extensively been used in natural language processing with sub-word representation vectors playing a critical role. However, this cannot be said of Swahili, which is a low resource and widely spoken language in East and Central Africa. This study proposed novel word embeddings from syllable embeddings (WEFSE) for Swahili to address the concern of word representation for agglutinative and syllabic-based languages. Inspired by the learning methodology of Swahili in beginner classes, we encoded respective syllables instead of characters, character n-grams or morphemes of words and generated quality word embeddings using a convolutional neural network. The quality of WEFSE was demonstrated by the state-of-art results in the syllable-aware language model on both the small dataset (31.229 perplexity value) and the medium dataset (45.859 perplexity value), outperforming character-aware language models. We further evaluated the word embeddings using word analogy task. To the best of our knowledge, syllabic alphabets have not been used to compose the word representation vectors. Therefore, the main contributions of the study are a syllabic alphabet, WEFSE, a syllabic-aware language model and a word analogy dataset for Swahili.

Download Full-text

ENGLISH IDIOMS CONTAINING HUMAN-BODY PARTS AND THEIR VIETNAMESE EQUIVALENTS: A CASE STUDY OF TWO ENGLISH NOVELS AND THEIR VIETNAMESE TRANSLATION VERSIONS

VNU Journal of Foreign Studies ◽

10.25073/2525-2445/vnufs.4370 ◽

2019 ◽

Vol 35 (3) ◽

Author(s):

Nguyen Thu Hanh ◽

Nguyen Tien Long

Keyword(s):

Human Body ◽

Teaching And Learning ◽

Body Part ◽

Current Paper ◽

Body Parts ◽

English Teaching ◽

Contrast Method ◽

Syntactic Features ◽

Human Body Part

The current paper focuses on investigating the semantic and syntactic features of idioms, including idioms containing human-body parts in the two English novels “The Godfather”, “To Kill A Mockingbird” and their Vietnamese translation versions. Using comparison and contrast method, the paper attempts to point out the equivalent and non-equivalent references of human-body-part idioms found in the two English novels and their Vietnamese translation. The research results will be useful for improving English teaching and learning, especially English idioms, as well as English-Vietnamese translation of idioms.

Download Full-text

Basil Bunting's Briggflatts : a case study in intonational prosody

Language and Literature ◽

10.1177/096394709800700102 ◽

1998 ◽

Vol 7 (1) ◽

pp. 21-38

Author(s):

Ian Pople

Keyword(s):

Syntactic Features ◽

Modification Effect ◽

Basil Bunting ◽

Post Modification ◽

The Way

In this article I wish to contribute to the analysis of prosody in poetry by looking at free or 'unmetred' verse. In particular I focus on the way in which the tools for analysing intonation may be used for analysing the performance of poetry in order to examine the way lineation is formed in unmetred verse. I look at the way tone-unit boundaries are often co-extensive with line endings in unmetred verse. The article follows work in this kind of analysis by Crystal (1975) and concentrates on the poem Briggflatts by Basil Bunting. Syntactic features such as ellipsis and pre- and post-modification effect the placement of tone-unit boundaries. Line endings may also be effected by the poet's use of marked emphases, and by the influence of other prosodies which a sophisticated poet may bring to bear on his material.

Download Full-text

Debiasing Multilingual Word Embeddings: A Case Study of Three Indian Languages

10.1145/3465336.3475118 ◽

2021 ◽

Author(s):

Srijan Bansal ◽

Vishal Garimella ◽

Ayush Suhane ◽

Animesh Mukherjee

Keyword(s):

Word Embeddings ◽

Indian Languages

Download Full-text

Tackling neural machine translation in low-resource settings: a Portuguese case study

10.5753/stil.2021.17807 ◽

2021 ◽

Author(s):

Arthur T. Estrella ◽

João B. O. Souza Filho

Keyword(s):

Machine Translation ◽

Data Augmentation ◽

Word Embeddings ◽

Effective Solution ◽

Computational Power ◽

Limited Data ◽

Neural Machine Translation ◽

Low Resource

Neural machine translation (NMT) nowadays requires an increasing amount of data and computational power, so succeeding in this task with limited data and using a single GPU might be challenging. Strategies such as the use of pre-trained word embeddings, subword embeddings, and data augmentation solutions can potentially address some issues faced in low-resource experimental settings, but their impact on the quality of translations is unclear. This work evaluates some of these strategies on two low-resource experiments beyond just reporting BLEU: errors are categorized on the Portuguese-English pair with the help of a translator, considering semantic and syntactic aspects. The BPE subword approach has shown to be the most effective solution, allowing a BLEU increase of 59% p.p. compared to the standard Transformer.

Download Full-text

Globalization of African American Vernacular English in popular culture

English World-Wide ◽

10.1075/eww.32.1.01lee ◽

2011 ◽

Vol 32 (1) ◽

pp. 1-23 ◽

Cited By ~ 22

Author(s):

Jamie Shinhee Lee

Keyword(s):

African American ◽

Popular Culture ◽

Hip Hop ◽

African American Vernacular English ◽

Vernacular English ◽

Syntactic Features ◽

Lexical Items ◽

American Vernacular

This study examines crossing (Bucholtz 1999; Cutler 1999; Rampton 1995) in Korean hip hop Blinglish as a case study of globalization of African American Vernacular English (AAVE) in popular culture. Blinglish in Korean hip hop can be understood as a prime example of “English from below” (Preisler 1999) to informally express subcultural identity and style. The findings of the study suggest that AAVE features appear at different linguistic levels including lexis, phonology, and morpho-syntax in Korean hip hop Blinglish but do not demonstrate the same degree of AAVE penetration, with a frequency-related hierarchy emerging among these linguistic components. The area of Korean hip hop Blinglish with the heaviest crossing influence from AAVE is found to be lexis followed by phonology. The presence of AAVE syntactic features is somewhat restricted in type and occurrence, indicating that the verbal markers in AAVE are considerably varied and intricate, and syntactic elements are not as easily crossed by non-AAVE speakers as lexical items.

Download Full-text

Code-Switching Language Modeling with Bilingual Word Embeddings: A Case Study for Egyptian Arabic-English

Speech and Computer - Lecture Notes in Computer Science ◽

10.1007/978-3-030-26061-3_17 ◽

2019 ◽

pp. 160-170

Author(s):

Injy Hamed ◽

Moritz Zhu ◽

Mohamed Elmahdy ◽

Slim Abdennadher ◽

Ngoc Thang Vu

Keyword(s):

Language Modeling ◽

Code Switching ◽

Word Embeddings ◽

Egyptian Arabic

Download Full-text

Efficient Estimate of Low-Frequency Words’ Embeddings Based on the Dictionary: A Case Study on Chinese

Applied Sciences ◽

10.3390/app112211018 ◽

2021 ◽

Vol 11 (22) ◽

pp. 11018

Author(s):

Xianwen Liao ◽

Yongzhong Huang ◽

Changfu Wei ◽

Chenhao Zhang ◽

Yongqing Deng ◽

...

Keyword(s):

Natural Language Processing ◽

Language Processing ◽

Low Frequency ◽

Word Embeddings ◽

High Quality ◽

Representation Model ◽

Explanatory Note ◽

Efficient Estimate

Obtaining high-quality embeddings of out-of-vocabularies (OOVs) and low-frequency words is a challenge in natural language processing (NLP). To efficiently estimate the embeddings of OOVs and low-frequency words, we propose a new method that uses the dictionary to estimate the embeddings of OOVs and low-frequency words. More specifically, the explanatory note of an entry in dictionaries accurately describes the semantics of the corresponding word. Naturally, we adopt the sentence representation model to extract the semantics of the explanatory note and regard the semantics as the embedding of the corresponding word. We design a new sentence representation model to encode sentences to extract the semantics from the explanatory notes of entries more efficiently. Based on the assumption that the higher quality of word embeddings will lead to better performance, we design an extrinsic experiment to evaluate the quality of low-frequency words’ embeddings. The experimental results show that the embeddings of low-frequency words estimated by our proposed method have higher quality. In addition, both intrinsic and extrinsic experiments show that our proposed sentence representation model can represent the semantics of sentences well.

Download Full-text

Effectiveness of Word Embeddings on Classifiers: A Case Study with Tweets

2019 IEEE 13th International Conference on Semantic Computing (ICSC) ◽

10.1109/icosc.2019.8665538 ◽

2019 ◽

Cited By ~ 2

Author(s):

Sukanya Manna ◽

Haruto Nakai

Keyword(s):

Word Embeddings

Download Full-text