Morphological Segmentation to Improve Crosslingual Word Embeddings for Low Resource Languages

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3390298 ◽

2020 ◽

Vol 19 (5) ◽

pp. 1-15

Author(s):

Santwana Chimalamarri ◽

Dinkar Sitaram ◽

Ashritha Jain

Keyword(s):

Word Embeddings ◽

Low Resource ◽

Morphological Segmentation

Download Full-text

Incorporating word embeddings in unsupervised morphological segmentation

Natural Language Engineering ◽

10.1017/s1351324920000406 ◽

2020 ◽

pp. 1-21

Author(s):

Ahmet Üstün ◽

Burcu Can

Keyword(s):

Semantic Information ◽

Maximum A Posteriori ◽

Word Embeddings ◽

A Posteriori ◽

Low Resource ◽

A Posteriori Estimate ◽

Morphological Segmentation ◽

Vector Representations ◽

Turkish Language ◽

Maximum A Posteriori Estimate

Abstract We investigate the usage of semantic information for morphological segmentation since words that are derived from each other will remain semantically related. We use mathematical models such as maximum likelihood estimate (MLE) and maximum a posteriori estimate (MAP) by incorporating semantic information obtained from dense word vector representations. Our approach does not require any annotated data which make it fully unsupervised and require only a small amount of raw data together with pretrained word embeddings for training purposes. The results show that using dense vector representations helps in morphological segmentation especially for low-resource languages. We present results for Turkish, English, and German. Our semantic MLE model outperforms other unsupervised models for Turkish language. Our proposed models could be also used for any other low-resource language with concatenative morphology.

Download Full-text

Anchor-based Bilingual Word Embeddings for Low-Resource Languages

10.18653/v1/2021.acl-short.30 ◽

2021 ◽

Author(s):

Tobias Eder ◽

Viktor Hangya ◽

Alexander Fraser

Keyword(s):

Word Embeddings ◽

Download Full-text

Unsupervised Morphological Segmentation for Low-Resource Polysynthetic Languages

10.18653/v1/w19-4222 ◽

2019 ◽

Author(s):

Ramy Eskander ◽

Judith Klavans ◽

Smaranda Muresan

Keyword(s):

Low Resource ◽

Morphological Segmentation

Download Full-text

Morphological Word Embeddings for Arabic Neural Machine Translation in Low-Resource Settings

10.18653/v1/w18-1201 ◽

2018 ◽

Author(s):

Pamela Shapiro ◽

Kevin Duh

Keyword(s):

Machine Translation ◽

Word Embeddings ◽

Neural Machine Translation ◽

Low Resource Settings ◽

Download Full-text

Word Embeddings in Low Resource Gujarati Language

2019 International Conference on Document Analysis and Recognition Workshops (ICDARW) ◽

10.1109/icdarw.2019.40090 ◽

2019 ◽

Author(s):

Ishani Joshi ◽

Purvi Koringa ◽

Suman Mitra

Keyword(s):

Word Embeddings ◽

Low Resource ◽

Gujarati Language

Download Full-text

Co-occurrence Weight Selection in Generation of Word Embeddings for Low Resource Languages

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3282443 ◽

2019 ◽

Vol 18 (3) ◽

pp. 1-18 ◽

Author(s):

Veysel Yücesoy ◽

Aykut Koç

Keyword(s):

Word Embeddings ◽

Download Full-text

Learning Contextualised Cross-lingual Word Embeddings and Alignments for Extremely Low-Resource Languages Using Parallel Corpora

10.18653/v1/2021.mrl-1.2 ◽

2021 ◽

Author(s):

Takashi Wada ◽

Tomoharu Iwata ◽

Yuji Matsumoto ◽

Timothy Baldwin ◽

Jey Han Lau

Keyword(s):

Word Embeddings ◽

Parallel Corpora ◽

Low Resource ◽

Download Full-text

Cross-Lingual Word Embeddings for Low-Resource Language Modeling

10.18653/v1/e17-1088 ◽

2017 ◽

Author(s):

Oliver Adams ◽

Adam Makarucha ◽

Graham Neubig ◽

Steven Bird ◽

Trevor Cohn

Keyword(s):

Language Modeling ◽

Word Embeddings ◽

Low Resource ◽

Download Full-text

Named Entity Recognition with Word Embeddings and Wikipedia Categories for a Low-Resource Language

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3015467 ◽

2017 ◽

Vol 16 (3) ◽

pp. 1-19 ◽

Author(s):

Arjun Das ◽

Debasis Ganguly ◽

Utpal Garain

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Word Embeddings ◽

Low Resource ◽

Download Full-text

Linguistically-Informed Training of Acoustic Word Embeddings for Low-Resource Languages

10.21437/interspeech.2019-3119 ◽

2019 ◽

Author(s):

Zixiaofan Yang ◽

Julia Hirschberg

Keyword(s):

Word Embeddings ◽

Download Full-text