scholarly journals Lexical and Morpho-syntactic Features in Word Embeddings - A Case Study of Nouns in Swedish

Author(s):  
Ali Basirat ◽  
Marc Tang
Author(s):  
Pablo Badilla ◽  
Felipe Bravo-Marquez ◽  
Jorge Pérez

Word embeddings are known to exhibit stereotypical biases towards gender, race, religion, among other criteria. Severa fairness metrics have been proposed in order to automatically quantify these biases. Although all metrics have a similar objective, the relationship between them is by no means clear. Two issues that prevent a clean comparison is that they operate with different inputs, and that their outputs are incompatible with each other. In this paper we propose WEFE, the word embeddings fairness evaluation framework, to encapsulate, evaluate and compare fairness metrics. Our framework needs a list of pre-trained embeddings and a set of fairness criteria, and it is based on checking correlations between fairness rankings induced by these criteria. We conduct a case study showing that rankings produced by existing fairness methods tend to correlate when measuring gender bias. This correlation is considerably less for other biases like race or religion. We also compare the fairness rankings with an embedding benchmark showing that there is no clear correlation between fairness and good performance in downstream tasks.


2019 ◽  
Vol 9 (18) ◽  
pp. 3648
Author(s):  
Casper S. Shikali ◽  
Zhou Sijie ◽  
Liu Qihe ◽  
Refuoe Mokhosi

Deep learning has extensively been used in natural language processing with sub-word representation vectors playing a critical role. However, this cannot be said of Swahili, which is a low resource and widely spoken language in East and Central Africa. This study proposed novel word embeddings from syllable embeddings (WEFSE) for Swahili to address the concern of word representation for agglutinative and syllabic-based languages. Inspired by the learning methodology of Swahili in beginner classes, we encoded respective syllables instead of characters, character n-grams or morphemes of words and generated quality word embeddings using a convolutional neural network. The quality of WEFSE was demonstrated by the state-of-art results in the syllable-aware language model on both the small dataset (31.229 perplexity value) and the medium dataset (45.859 perplexity value), outperforming character-aware language models. We further evaluated the word embeddings using word analogy task. To the best of our knowledge, syllabic alphabets have not been used to compose the word representation vectors. Therefore, the main contributions of the study are a syllabic alphabet, WEFSE, a syllabic-aware language model and a word analogy dataset for Swahili.


2019 ◽  
Vol 35 (3) ◽  
Author(s):  
Nguyen Thu Hanh ◽  
Nguyen Tien Long

The current paper focuses on investigating the semantic and syntactic features of idioms, including idioms containing human-body parts in the two English novels “The Godfather”, “To Kill A Mockingbird” and their Vietnamese translation versions. Using comparison and contrast method, the paper attempts to point out the equivalent and non-equivalent references of human-body-part idioms found in the two English novels and their Vietnamese translation. The research results will be useful for improving English teaching and learning, especially English idioms, as well as English-Vietnamese translation of idioms.


1998 ◽  
Vol 7 (1) ◽  
pp. 21-38
Author(s):  
Ian Pople

In this article I wish to contribute to the analysis of prosody in poetry by looking at free or 'unmetred' verse. In particular I focus on the way in which the tools for analysing intonation may be used for analysing the performance of poetry in order to examine the way lineation is formed in unmetred verse. I look at the way tone-unit boundaries are often co-extensive with line endings in unmetred verse. The article follows work in this kind of analysis by Crystal (1975) and concentrates on the poem Briggflatts by Basil Bunting. Syntactic features such as ellipsis and pre- and post-modification effect the placement of tone-unit boundaries. Line endings may also be effected by the poet's use of marked emphases, and by the influence of other prosodies which a sophisticated poet may bring to bear on his material.


2021 ◽  
Author(s):  
Srijan Bansal ◽  
Vishal Garimella ◽  
Ayush Suhane ◽  
Animesh Mukherjee

2021 ◽  
Author(s):  
Arthur T. Estrella ◽  
João B. O. Souza Filho

Neural machine translation (NMT) nowadays requires an increasing amount of data and computational power, so succeeding in this task with limited data and using a single GPU might be challenging. Strategies such as the use of pre-trained word embeddings, subword embeddings, and data augmentation solutions can potentially address some issues faced in low-resource experimental settings, but their impact on the quality of translations is unclear. This work evaluates some of these strategies on two low-resource experiments beyond just reporting BLEU: errors are categorized on the Portuguese-English pair with the help of a translator, considering semantic and syntactic aspects. The BPE subword approach has shown to be the most effective solution, allowing a BLEU increase of 59% p.p. compared to the standard Transformer.


2011 ◽  
Vol 32 (1) ◽  
pp. 1-23 ◽  
Author(s):  
Jamie Shinhee Lee

This study examines crossing (Bucholtz 1999; Cutler 1999; Rampton 1995) in Korean hip hop Blinglish as a case study of globalization of African American Vernacular English (AAVE) in popular culture. Blinglish in Korean hip hop can be understood as a prime example of “English from below” (Preisler 1999) to informally express subcultural identity and style. The findings of the study suggest that AAVE features appear at different linguistic levels including lexis, phonology, and morpho-syntax in Korean hip hop Blinglish but do not demonstrate the same degree of AAVE penetration, with a frequency-related hierarchy emerging among these linguistic components. The area of Korean hip hop Blinglish with the heaviest crossing influence from AAVE is found to be lexis followed by phonology. The presence of AAVE syntactic features is somewhat restricted in type and occurrence, indicating that the verbal markers in AAVE are considerably varied and intricate, and syntactic elements are not as easily crossed by non-AAVE speakers as lexical items.


2021 ◽  
Vol 11 (22) ◽  
pp. 11018
Author(s):  
Xianwen Liao ◽  
Yongzhong Huang ◽  
Changfu Wei ◽  
Chenhao Zhang ◽  
Yongqing Deng ◽  
...  

Obtaining high-quality embeddings of out-of-vocabularies (OOVs) and low-frequency words is a challenge in natural language processing (NLP). To efficiently estimate the embeddings of OOVs and low-frequency words, we propose a new method that uses the dictionary to estimate the embeddings of OOVs and low-frequency words. More specifically, the explanatory note of an entry in dictionaries accurately describes the semantics of the corresponding word. Naturally, we adopt the sentence representation model to extract the semantics of the explanatory note and regard the semantics as the embedding of the corresponding word. We design a new sentence representation model to encode sentences to extract the semantics from the explanatory notes of entries more efficiently. Based on the assumption that the higher quality of word embeddings will lead to better performance, we design an extrinsic experiment to evaluate the quality of low-frequency words’ embeddings. The experimental results show that the embeddings of low-frequency words estimated by our proposed method have higher quality. In addition, both intrinsic and extrinsic experiments show that our proposed sentence representation model can represent the semantics of sentences well.


Sign in / Sign up

Export Citation Format

Share Document