scholarly journals An Analysis of Euclidean vs. Graph-Based Framing for Bilingual Lexicon Induction from Word Embedding Spaces

Author(s):  
Kelly Marchisio ◽  
Youngser Park ◽  
Ali Saad-Eldin ◽  
Anton Alyakin ◽  
Kevin Duh ◽  
...  
2018 ◽  
Author(s):  
Ruochen Xu ◽  
Yiming Yang ◽  
Naoki Otani ◽  
Yuexin Wu

2020 ◽  
Vol 8 ◽  
pp. 361-376
Author(s):  
Ion Madrazo Azpiazu ◽  
Maria Soledad Pera

The alignment of word embedding spaces in different languages into a common crosslingual space has recently been in vogue. Strategies that do so compute pairwise alignments and then map multiple languages to a single pivot language (most often English). These strategies, however, are biased towards the choice of the pivot language, given that language proximity and the linguistic characteristics of the target language can strongly impact the resultant crosslingual space in detriment of topologically distant languages. We present a strategy that eliminates the need for a pivot language by learning the mappings across languages in a hierarchical way. Experiments demonstrate that our strategy significantly improves vocabulary induction scores in all existing benchmarks, as well as in a new non-English–centered benchmark we built, which we make publicly available.


Author(s):  
Yuting Song ◽  
Biligsaikhan Batjargal ◽  
Akira Maeda

Cross-lingual word embeddings have been gaining attention because they can capture the semantic meaning of words across languages, which can be applied to cross-lingual tasks. Most methods learn a single mapping (e.g., a linear mapping) to transform a word embedding space from one language to another. To improve bilingual word embeddings, we propose an advanced method that adds a language-specific mapping. We focus on learning Japanese-English bilingual word embedding mapping by considering the specificity of the Japanese language. We evaluated our method by comparing it with single mapping-based-models on bilingual lexicon induction between Japanese and English. We determined that our method was more effective, with significant improvements on words of Japanese origin.


2021 ◽  
Vol 15 (02) ◽  
pp. 263-290
Author(s):  
Renjith P. Ravindran ◽  
Kavi Narayana Murthy

Word embeddings have recently become a vital part of many Natural Language Processing (NLP) systems. Word embeddings are a suite of techniques that represent words in a language as vectors in an n-dimensional real space that has been shown to encode a significant amount of syntactic and semantic information. When used in NLP systems, these representations have resulted in improved performance across a wide range of NLP tasks. However, it is not clear how syntactic properties interact with the more widely studied semantic properties of words. Or what the main factors in the modeling formulation are that encourages embedding spaces to pick up more of syntactic behavior as opposed to semantic behavior of words. We investigate several aspects of word embedding spaces and modeling assumptions that maximize syntactic coherence — the degree to which words with similar syntactic properties form distinct neighborhoods in the embedding space. We do so in order to understand which of the existing models maximize syntactic coherence making it a more reliable source for extracting syntactic category (POS) information. Our analysis shows that syntactic coherence of S-CODE is superior to the other more popular and more recent embedding techniques such as Word2vec, fastText, GloVe and LexVec, when measured under compatible parameter settings. Our investigation also gives deeper insights into the geometry of the embedding space with respect to syntactic coherence, and how this is influenced by context size, frequency of words, and dimensionality of the embedding space.


2019 ◽  
Author(s):  
Barun Patra ◽  
Joel Ruben Antony Moniz ◽  
Sarthak Garg ◽  
Matthew R. Gormley ◽  
Graham Neubig

2021 ◽  
Author(s):  
Niklas Friedrich ◽  
Anne Lauscher ◽  
Simone Paolo Ponzetto ◽  
Goran Glavaš

Sign in / Sign up

Export Citation Format

Share Document