Explore different methods to build Multi-Sense Embeddings

Multi-Sense Embeddings per Word

10.31219/osf.io/udfhn ◽

2020 ◽

Author(s):

Masashi Sugiyama

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Research Area ◽

Word Embedding ◽

The Other ◽

Word Embeddings ◽

Word Similarity ◽

Better Than ◽

Non Parametric

Recently, word embeddings have been used in many natural language processing problems successfully and how to train a robust and accurate word embedding system efficiently is a popular research area. Since many, if not all, words have more than one sense, it is necessary to learn vectors for all senses of word separately. Therefore, in this project, we have explored two multi-sense word embedding models, including Multi-Sense Skip-gram (MSSG) model and Non-parametric Multi-sense Skip Gram model (NP-MSSG). Furthermore, we propose an extension of the Multi-Sense Skip-gram model called Incremental Multi-Sense Skip-gram (IMSSG) model which could learn the vectors of all senses per word incrementally. We evaluate all the systems on word similarity task and show that IMSSG is better than the other models.

Download Full-text

NMT Multi-Sense Embeddings per Word

10.31219/osf.io/k623t ◽

2019 ◽

Author(s):

William Jin

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Research Area ◽

Word Embedding ◽

The Other ◽

Word Embeddings ◽

Word Similarity ◽

Better Than ◽

Non Parametric

Recently, word embeddings have been used in many natural language processing problems successfully and how to train a robust and accurate word embedding system efficiently is a popular research area. Since many, if not all, words have more than one sense, it is necessary to learn vectors for all senses of word separately. Therefore, in this project, we have explored two multi-sense word embedding models, including Multi-Sense Skip-gram (MSSG) model and Non-parametric Multi-sense Skip Gram model (NP-MSSG). Furthermore, we propose an extension of the Multi-Sense Skip-gram model called Incremental Multi-Sense Skip-gram (IMSSG) model which could learn the vectors of all senses per word incrementally. We evaluate all the systems on word similarity task and show that IMSSG is better than the other models.

Download Full-text

Getting in Shape: Word Embedding SubSpaces

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/761 ◽

2019 ◽

Author(s):

Tianyuan Zhou ◽

João Sedoc ◽

Jordan Rodu

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Semantic Similarity ◽

Language Processing ◽

Theoretical Framework ◽

Word Embedding ◽

Word Embeddings ◽

Empirical Results ◽

Linear Alignment ◽

The Relationship

Many tasks in natural language processing require the alignment of word embeddings. Embedding alignment relies on the geometric properties of the manifold of word vectors. This paper focuses on supervised linear alignment and studies the relationship between the shape of the target embedding. We assess the performance of aligned word vectors on semantic similarity tasks and find that the isotropy of the target embedding is critical to the alignment. Furthermore, aligning with an isotropic noise can deliver satisfactory results. We provide a theoretical framework and guarantees which aid in the understanding of empirical results.

Download Full-text

A Survey of Cross-lingual Word Embedding Models

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.11640 ◽

2019 ◽

Vol 65 ◽

pp. 569-631 ◽

Cited By ~ 19

Author(s):

Sebastian Ruder ◽

Ivan Vulić ◽

Anders Søgaard

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Word Meaning ◽

Word Embedding ◽

Word Embeddings ◽

Objective Functions ◽

Future Challenges ◽

Cross Lingual ◽

Data Requirements

Cross-lingual representations of words enable us to reason about word meaning in multilingual contexts and are a key facilitator of cross-lingual transfer when developing natural language processing models for low-resource languages. In this survey, we provide a comprehensive typology of cross-lingual word embedding models. We compare their data requirements and objective functions. The recurring theme of the survey is that many of the models presented in the literature optimize for the same objectives, and that seemingly different models are often equivalent, modulo optimization strategies, hyper-parameters, and such. We also discuss the different ways cross-lingual word embeddings are evaluated, as well as future challenges and research horizons.

Download Full-text

Word Embedding Techniques for Sentiment Analyzers

10.4018/978-1-7998-8061-5.ch013 ◽

2021 ◽

pp. 233-252

Author(s):

Upendar Rao Rayala ◽

Karthick Seshadri

Keyword(s):

Social Networks ◽

Natural Language Processing ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Word Embedding ◽

Word Embeddings ◽

The Public ◽

The Given ◽

Research Domain

Sentiment analysis is perceived to be a multi-disciplinary research domain composed of machine learning, artificial intelligence, deep learning, image processing, and social networks. Sentiment analysis can be used to determine opinions of the public about products and to find the customers' interest and their feedback through social networks. To perform any natural language processing task, the input text/comments should be represented in a numerical form. Word embeddings represent the given text/sentences/words as a vector that can be employed in performing subsequent natural language processing tasks. In this chapter, the authors discuss different techniques that can improve the performance of sentiment analysis using concepts and techniques like traditional word embeddings, sentiment embeddings, emoticons, lexicons, and neural networks. This chapter also traces the evolution of word embedding techniques with a chronological discussion of the recent research advancements in word embedding techniques.

Download Full-text

Leveraging Vector Space Similarity for Learning Cross-Lingual Word Embeddings: A Systematic Review

Digital ◽

10.3390/digital1030011 ◽

2021 ◽

Vol 1 (3) ◽

pp. 145-161

Author(s):

Kowshik Bhowmik ◽

Anca Ralescu

Keyword(s):

Systematic Review ◽

Natural Language Processing ◽

Literature Review ◽

Natural Language ◽

Vector Space ◽

Language Processing ◽

Word Embedding ◽

Word Embeddings ◽

Low Resource ◽

Cross Lingual

This article presents a systematic literature review on quantifying the proximity between independently trained monolingual word embedding spaces. A search was carried out in the broader context of inducing bilingual lexicons from cross-lingual word embeddings, especially for low-resource languages. The returned articles were then classified. Cross-lingual word embeddings have drawn the attention of researchers in the field of natural language processing (NLP). Although existing methods have yielded satisfactory results for resource-rich languages and languages related to them, some researchers have pointed out that the same is not true for low-resource and distant languages. In this paper, we report the research on methods proposed to provide better representation for low-resource and distant languages in the cross-lingual word embedding space.

Download Full-text

BnVec: Towards the Development of Word Embedding for Bangla Language Processing

International Journal of Engineering & Technology ◽

10.14419/ijet.v10i2.31538 ◽

2021 ◽

Vol 10 (2) ◽

pp. 95

Author(s):

Md. Kowsher ◽

Md. Jashim Uddin ◽

Anik Tahabilder ◽

Nusrat Jahan Prottasha ◽

Mahid Ahmed ◽

...

Keyword(s):

Machine Learning ◽

Computer Vision ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Parameter Tuning ◽

Research Community ◽

Word Embedding ◽

The Other

Progression in machine learning and statistical inference are facilitating the advancement of domains like computer vision, natural language processing (NLP), automation & robotics, and so on. Among the different persuasive improvements in NLP, word embedding is one of the most used and revolutionary techniques. In this paper, we manifest an open-source library for Bangla word extraction systems named BnVec which expects to furnish the Bangla NLP research community by the utilization of some incredible word embedding techniques. The BnVec is splitted up into two parts, the first one is the Bangla suitable defined class to embed words with access to the six most popular word embedding schemes (CountVectorizer, TF-IDF, Hash Vectorizer, Word2vec, fastText, and Glove). The other one is based on the pre-trained distributed word embedding system of Word2vec, fastText, and GloVe. The pre-trained models have been built by collecting content from the newspaper, social media, and Bangla wiki articles. The total number of tokens used to build the models exceeds 395,289,960. The paper additionally depicts the performance of these models by various hyper-parameter tuning and then analyzes the results.

Download Full-text

A Comparison of Approaches for Measuring the Semantic Similarity of Short Texts Based on Word Embeddings

Journal of information and organizational sciences ◽

10.31341/jios.44.2.2 ◽

2020 ◽

Vol 44 (2) ◽

pp. 231-246

Author(s):

Karlo Babić ◽

Francesco Guerra ◽

Sanda Martinčić-Ipšić ◽

Ana Meštrović

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Semantic Similarity ◽

Language Processing ◽

Similarity Measures ◽

Vital Role ◽

The Other ◽

Word Embeddings ◽

Spearman Correlation ◽

Word Senses

Measuring the semantic similarity of texts has a vital role in various tasks from the field of natural language processing. In this paper, we describe a set of experiments we carried out to evaluate and compare the performance of different approaches for measuring the semantic similarity of short texts. We perform a comparison of four models based on word embeddings: two variants of Word2Vec (one based on Word2Vec trained on a specific dataset and the second extending it with embeddings of word senses), FastText, and TF-IDF. Since these models provide word vectors, we experiment with various methods that calculate the semantic similarity of short texts based on word vectors. More precisely, for each of these models, we test five methods for aggregating word embeddings into text embedding. We introduced three methods by making variations of two commonly used similarity measures. One method is an extension of the cosine similarity based on centroids, and the other two methods are variations of the Okapi BM25 function. We evaluate all approaches on the two publicly available datasets: SICK and Lee in terms of the Pearson and Spearman correlation. The results indicate that extended methods perform better from the original in most of the cases.

Download Full-text

A Review and evaluation of Machine Translation methods for Lumasaaba

Journal of Digital Science ◽

10.33847/2686-8296.2.1_1 ◽

2020 ◽

pp. 3-17

Author(s):

Peter Nabende

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

Research Area ◽

Data Driven ◽

East African ◽

Data Set ◽

African Languages ◽

Translation Methods

Natural Language Processing for under-resourced languages is now a mainstream research area. However, there are limited studies on Natural Language Processing applications for many indigenous East African languages. As a contribution to covering the current gap of knowledge, this paper focuses on evaluating the application of well-established machine translation methods for one heavily under-resourced indigenous East African language called Lumasaaba. Specifically, we review the most common machine translation methods in the context of Lumasaaba including both rule-based and data-driven methods. Then we apply a state of the art data-driven machine translation method to learn models for automating translation between Lumasaaba and English using a very limited data set of parallel sentences. Automatic evaluation results show that a transformer-based Neural Machine Translation model architecture leads to consistently better BLEU scores than the recurrent neural network-based models. Moreover, the automatically generated translations can be comprehended to a reasonable extent and are usually associated with the source language input.

Download Full-text

Convolution–deconvolution word embedding: An end-to-end multi-prototype fusion embedding method for natural language processing

Information Fusion ◽

10.1016/j.inffus.2019.06.009 ◽

2020 ◽

Vol 53 ◽

pp. 112-122 ◽

Cited By ~ 9

Author(s):

Kai Shuang ◽

Zhixuan Zhang ◽

Jonathan Loo ◽

Sen Su

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Word Embedding ◽

Embedding Method ◽

End To End

Download Full-text