Convolution–deconvolution word embedding: An end-to-end multi-prototype fusion embedding method for natural language processing

Multi-Sense Embeddings per Word

10.31219/osf.io/udfhn ◽

2020 ◽

Author(s):

Masashi Sugiyama

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Research Area ◽

Word Embedding ◽

The Other ◽

Word Embeddings ◽

Word Similarity ◽

Better Than ◽

Non Parametric

Recently, word embeddings have been used in many natural language processing problems successfully and how to train a robust and accurate word embedding system efficiently is a popular research area. Since many, if not all, words have more than one sense, it is necessary to learn vectors for all senses of word separately. Therefore, in this project, we have explored two multi-sense word embedding models, including Multi-Sense Skip-gram (MSSG) model and Non-parametric Multi-sense Skip Gram model (NP-MSSG). Furthermore, we propose an extension of the Multi-Sense Skip-gram model called Incremental Multi-Sense Skip-gram (IMSSG) model which could learn the vectors of all senses per word incrementally. We evaluate all the systems on word similarity task and show that IMSSG is better than the other models.

Download Full-text

Concept of TF-IDF, Common Bag of Word and Word Embedding for Effective Sentiment Classification

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.f4582.049620 ◽

2020 ◽

Vol 9 (4) ◽

pp. 2198-2201

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Sentiment Classification ◽

Word Embedding ◽

Text Representation ◽

Human Beings ◽

Text Data

Sentiment Classification is one of the well-known and most popular domain of machine learning and natural language processing. An algorithm is developed to understand the opinion of an entity similar to human beings. This research fining article presents a similar to the mention above. Concept of natural language processing is considered for text representation. Later novel word embedding model is proposed for effective classification of the data. Tf-IDF and Common BoW representation models were considered for representation of text data. Importance of these models are discussed in the respective sections. The proposed is testing using IMDB datasets. 50% training and 50% testing with three random shuffling of the datasets are used for evaluation of the model.

Download Full-text

DAWE: A Double Attention-Based Word Embedding Model with Sememe Structure Information

Applied Sciences ◽

10.3390/app10175804 ◽

2020 ◽

Vol 10 (17) ◽

pp. 5804

Author(s):

Shengwen Li ◽

Renyao Chen ◽

Bo Wan ◽

Junfang Gong ◽

Lin Yang ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Structural Information ◽

Word Embedding ◽

Semantic Matching ◽

Word Sense ◽

Reasoning Task ◽

Structure Information ◽

Matching Techniques

Word embedding is an important reference for natural language processing tasks, which can generate distribution presentations of words based on many text data. Recent evidence demonstrates that introducing sememe knowledge is a promising strategy to improve the performance of word embedding. However, previous works ignored the structure information of sememe knowledges. To fill the gap, this study implicitly synthesized the structural feature of sememes into word embedding models based on an attention mechanism. Specifically, we propose a novel double attention word-based embedding (DAWE) model that encodes the characteristics of sememes into words by a “double attention” strategy. DAWE is integrated with two specific word training models through context-aware semantic matching techniques. The experimental results show that, in word similarity task and word analogy reasoning task, the performance of word embedding can be effectively improved by synthesizing the structural information of sememe knowledge. The case study also verifies the power of DAWE model in word sense disambiguation task. Furthermore, the DAWE model is a general framework for encoding sememes into words, which can be integrated into other existing word embedding models to provide more options for various natural language processing downstream tasks.

Download Full-text

NMT Multi-Sense Embeddings per Word

10.31219/osf.io/k623t ◽

2019 ◽

Author(s):

William Jin

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Research Area ◽

Word Embedding ◽

The Other ◽

Word Embeddings ◽

Word Similarity ◽

Better Than ◽

Non Parametric

Recently, word embeddings have been used in many natural language processing problems successfully and how to train a robust and accurate word embedding system efficiently is a popular research area. Since many, if not all, words have more than one sense, it is necessary to learn vectors for all senses of word separately. Therefore, in this project, we have explored two multi-sense word embedding models, including Multi-Sense Skip-gram (MSSG) model and Non-parametric Multi-sense Skip Gram model (NP-MSSG). Furthermore, we propose an extension of the Multi-Sense Skip-gram model called Incremental Multi-Sense Skip-gram (IMSSG) model which could learn the vectors of all senses per word incrementally. We evaluate all the systems on word similarity task and show that IMSSG is better than the other models.

Download Full-text

Getting in Shape: Word Embedding SubSpaces

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/761 ◽

2019 ◽

Author(s):

Tianyuan Zhou ◽

João Sedoc ◽

Jordan Rodu

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Semantic Similarity ◽

Language Processing ◽

Theoretical Framework ◽

Word Embedding ◽

Word Embeddings ◽

Empirical Results ◽

Linear Alignment ◽

The Relationship

Many tasks in natural language processing require the alignment of word embeddings. Embedding alignment relies on the geometric properties of the manifold of word vectors. This paper focuses on supervised linear alignment and studies the relationship between the shape of the target embedding. We assess the performance of aligned word vectors on semantic similarity tasks and find that the isotropy of the target embedding is critical to the alignment. Furthermore, aligning with an isotropic noise can deliver satisfactory results. We provide a theoretical framework and guarantees which aid in the understanding of empirical results.

Download Full-text

Generating Senses and RoLes: An End-to-End Model for Dependency- and Span-based Semantic Role Labeling

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/521 ◽

2021 ◽

Author(s):

Rexhina Blloshmi ◽

Simone Conia ◽

Rocco Tripodi ◽

Roberto Navigli

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

State Of The Art ◽

Great Success ◽

Semantic Role ◽

Semantic Role Labeling ◽

Complex Predicate ◽

Input Sentence ◽

End To End

Despite the recent great success of the sequence-to-sequence paradigm in Natural Language Processing, the majority of current studies in Semantic Role Labeling (SRL) still frame the problem as a sequence labeling task. In this paper we go against the flow and propose GSRL (Generating Senses and RoLes), the first sequence-to-sequence model for end-to-end SRL. Our approach benefits from recently-proposed decoder-side pretraining techniques to generate both sense and role labels for all the predicates in an input sentence at once, in an end-to-end fashion. Evaluated on standard gold benchmarks, GSRL achieves state-of-the-art results in both dependency- and span-based English SRL, proving empirically that our simple generation-based model can learn to produce complex predicate-argument structures. Finally, we propose a framework for evaluating the robustness of an SRL model in a variety of synthetic low-resource scenarios which can aid human annotators in the creation of better, more diverse, and more challenging gold datasets. We release GSRL at github.com/SapienzaNLP/gsrl.

Download Full-text

An interactive dashboard to track themes, development maturity, and global equity in clinical artificial intelligence research

10.1101/2021.11.23.21266758 ◽

2021 ◽

Author(s):

Joe Zhang ◽

Stephen Whebell ◽

Jack Gallifant ◽

Sanjay Budhdeo ◽

Heather Mattie ◽

...

Keyword(s):

Artificial Intelligence ◽

Natural Language Processing ◽

Natural Language ◽

Real Time ◽

Language Processing ◽

Artificial Intelligence Research ◽

End To End ◽

Monitoring Progress

The global clinical artificial intelligence (AI) research landscape is constantly evolving, with heterogeneity across specialties, disease areas, geographical representation, and development maturity. Continual assessment of this landscape is important for monitoring progress. Taking advantage of developments in natural language processing (NLP), we produce an end-to-end NLP pipeline to automate classification and characterization of all original clinical AI research on MEDLINE, outputting real-time results to a public, interactive dashboard (https://aiforhealth.app/).

Download Full-text

A Survey of Cross-lingual Word Embedding Models

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.11640 ◽

2019 ◽

Vol 65 ◽

pp. 569-631 ◽

Cited By ~ 19

Author(s):

Sebastian Ruder ◽

Ivan Vulić ◽

Anders Søgaard

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Word Meaning ◽

Word Embedding ◽

Word Embeddings ◽

Objective Functions ◽

Future Challenges ◽

Cross Lingual ◽

Data Requirements

Cross-lingual representations of words enable us to reason about word meaning in multilingual contexts and are a key facilitator of cross-lingual transfer when developing natural language processing models for low-resource languages. In this survey, we provide a comprehensive typology of cross-lingual word embedding models. We compare their data requirements and objective functions. The recurring theme of the survey is that many of the models presented in the literature optimize for the same objectives, and that seemingly different models are often equivalent, modulo optimization strategies, hyper-parameters, and such. We also discuss the different ways cross-lingual word embeddings are evaluated, as well as future challenges and research horizons.

Download Full-text

Word Embedding Techniques for Sentiment Analyzers

10.4018/978-1-7998-8061-5.ch013 ◽

2021 ◽

pp. 233-252

Author(s):

Upendar Rao Rayala ◽

Karthick Seshadri

Keyword(s):

Social Networks ◽

Natural Language Processing ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Word Embedding ◽

Word Embeddings ◽

The Public ◽

The Given ◽

Research Domain

Sentiment analysis is perceived to be a multi-disciplinary research domain composed of machine learning, artificial intelligence, deep learning, image processing, and social networks. Sentiment analysis can be used to determine opinions of the public about products and to find the customers' interest and their feedback through social networks. To perform any natural language processing task, the input text/comments should be represented in a numerical form. Word embeddings represent the given text/sentences/words as a vector that can be employed in performing subsequent natural language processing tasks. In this chapter, the authors discuss different techniques that can improve the performance of sentiment analysis using concepts and techniques like traditional word embeddings, sentiment embeddings, emoticons, lexicons, and neural networks. This chapter also traces the evolution of word embedding techniques with a chronological discussion of the recent research advancements in word embedding techniques.

Download Full-text

Evaluating word embedding models: methods and experimental results

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2019.12 ◽

2019 ◽

Vol 8 ◽

Cited By ~ 12

Author(s):

Bin Wang ◽

Angela Wang ◽

Fenxiao Chen ◽

Yuncheng Wang ◽

C.-C. Jay Kuo

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Performance Metrics ◽

Word Embedding ◽

Experimental Results ◽

Extensive Evaluation ◽

Work First ◽

Study Performance

AbstractExtensive evaluation on a large number of word embedding models for language processing applications is conducted in this work. First, we introduce popular word embedding models and discuss desired properties of word models and evaluation methods (or evaluators). Then, we categorize evaluators into intrinsic and extrinsic two types. Intrinsic evaluators test the quality of a representation independent of specific natural language processing tasks while extrinsic evaluators use word embeddings as input features to a downstream task and measure changes in performance metrics specific to that task. We report experimental results of intrinsic and extrinsic evaluators on six word embedding models. It is shown that different evaluators focus on different aspects of word models, and some are more correlated with natural language processing tasks. Finally, we adopt correlation analysis to study performance consistency of extrinsic and intrinsic evaluators.

Download Full-text