Morphological Segmentation to Improve Crosslingual Word Embeddings for Low Resource Languages

Author(s):  
Santwana Chimalamarri ◽  
Dinkar Sitaram ◽  
Ashritha Jain
2020 ◽  
pp. 1-21
Author(s):  
Ahmet Üstün ◽  
Burcu Can

Abstract We investigate the usage of semantic information for morphological segmentation since words that are derived from each other will remain semantically related. We use mathematical models such as maximum likelihood estimate (MLE) and maximum a posteriori estimate (MAP) by incorporating semantic information obtained from dense word vector representations. Our approach does not require any annotated data which make it fully unsupervised and require only a small amount of raw data together with pretrained word embeddings for training purposes. The results show that using dense vector representations helps in morphological segmentation especially for low-resource languages. We present results for Turkish, English, and German. Our semantic MLE model outperforms other unsupervised models for Turkish language. Our proposed models could be also used for any other low-resource language with concatenative morphology.


2021 ◽  
Author(s):  
Tobias Eder ◽  
Viktor Hangya ◽  
Alexander Fraser
Keyword(s):  

2019 ◽  
Author(s):  
Ramy Eskander ◽  
Judith Klavans ◽  
Smaranda Muresan

2021 ◽  
Author(s):  
Takashi Wada ◽  
Tomoharu Iwata ◽  
Yuji Matsumoto ◽  
Timothy Baldwin ◽  
Jey Han Lau

2017 ◽  
Author(s):  
Oliver Adams ◽  
Adam Makarucha ◽  
Graham Neubig ◽  
Steven Bird ◽  
Trevor Cohn

Sign in / Sign up

Export Citation Format

Share Document