scholarly journals DAWE: A Double Attention-Based Word Embedding Model with Sememe Structure Information

2020 ◽  
Vol 10 (17) ◽  
pp. 5804
Author(s):  
Shengwen Li ◽  
Renyao Chen ◽  
Bo Wan ◽  
Junfang Gong ◽  
Lin Yang ◽  
...  

Word embedding is an important reference for natural language processing tasks, which can generate distribution presentations of words based on many text data. Recent evidence demonstrates that introducing sememe knowledge is a promising strategy to improve the performance of word embedding. However, previous works ignored the structure information of sememe knowledges. To fill the gap, this study implicitly synthesized the structural feature of sememes into word embedding models based on an attention mechanism. Specifically, we propose a novel double attention word-based embedding (DAWE) model that encodes the characteristics of sememes into words by a “double attention” strategy. DAWE is integrated with two specific word training models through context-aware semantic matching techniques. The experimental results show that, in word similarity task and word analogy reasoning task, the performance of word embedding can be effectively improved by synthesizing the structural information of sememe knowledge. The case study also verifies the power of DAWE model in word sense disambiguation task. Furthermore, the DAWE model is a general framework for encoding sememes into words, which can be integrated into other existing word embedding models to provide more options for various natural language processing downstream tasks.


2018 ◽  
Vol 10 (10) ◽  
pp. 3729 ◽  
Author(s):  
Hei Wang ◽  
Yung Chi ◽  
Ping Hsin

With the advent of the knowledge economy, firms often compete for intellectual property rights. Being the first to acquire high-potential patents can assist firms in achieving future competitive advantages. To identify patents capable of being developed, firms often search for a focus by using existing patent documents. Because of the rapid development of technology, the number of patent documents is immense. A prominent topic among current firms is how to use this large number of patent documents to discover new business opportunities while avoiding conflicts with existing patents. In the search for technological opportunities, a crucial task is to present results in the form of an easily understood visualization. Currently, natural language processing can help in achieving this goal. In natural language processing, word sense disambiguation (WSD) is the problem of determining which “sense” (meaning) of a word is activated in a given context. Given a word and its possible senses, as defined by a dictionary, we classify the occurrence of a word in context into one or more of its sense classes. The features of the context (such as neighboring words) provide evidence for these classifications. The current method for patent document analysis warrants improvement in areas, such as the analysis of many dimensions and the development of recommendation methods. This study proposes a visualization method that supports semantics, reduces the number of dimensions formed by terms, and can easily be understood by users. Since polysemous words occur frequently in patent documents, we also propose a WSD method to decrease the calculated degrees of distortion between terms. An analysis of outlier distributions is used to construct a patent map capable of distinguishing similar patents. During the development of new strategies, the constructed patent map can assist firms in understanding patent distributions in commercial areas, thereby preventing patent infringement caused by the development of similar technologies. Subsequently, technological opportunities can be recommended according to the patent map, aiding firms in assessing relevant patents in commercial areas early and sustainably achieving future competitive advantages.



Author(s):  
Marina Sokolova ◽  
Stan Szpakowicz

This chapter presents applications of machine learning techniques to traditional problems in natural language processing, including part-of-speech tagging, entity recognition and word-sense disambiguation. People usually solve such problems without difficulty or at least do a very good job. Linguistics may suggest labour-intensive ways of manually constructing rule-based systems. It is, however, the easy availability of large collections of texts that has made machine learning a method of choice for processing volumes of data well above the human capacity. One of the main purposes of text processing is all manner of information extraction and knowledge extraction from such large text. Machine learning methods discussed in this chapter have stimulated wide-ranging research in natural language processing and helped build applications with serious deployment potential.





2020 ◽  
Author(s):  
Masashi Sugiyama

Recently, word embeddings have been used in many natural language processing problems successfully and how to train a robust and accurate word embedding system efficiently is a popular research area. Since many, if not all, words have more than one sense, it is necessary to learn vectors for all senses of word separately. Therefore, in this project, we have explored two multi-sense word embedding models, including Multi-Sense Skip-gram (MSSG) model and Non-parametric Multi-sense Skip Gram model (NP-MSSG). Furthermore, we propose an extension of the Multi-Sense Skip-gram model called Incremental Multi-Sense Skip-gram (IMSSG) model which could learn the vectors of all senses per word incrementally. We evaluate all the systems on word similarity task and show that IMSSG is better than the other models.



Sentiment Classification is one of the well-known and most popular domain of machine learning and natural language processing. An algorithm is developed to understand the opinion of an entity similar to human beings. This research fining article presents a similar to the mention above. Concept of natural language processing is considered for text representation. Later novel word embedding model is proposed for effective classification of the data. Tf-IDF and Common BoW representation models were considered for representation of text data. Importance of these models are discussed in the respective sections. The proposed is testing using IMDB datasets. 50% training and 50% testing with three random shuffling of the datasets are used for evaluation of the model.



2019 ◽  
Author(s):  
William Jin

Recently, word embeddings have been used in many natural language processing problems successfully and how to train a robust and accurate word embedding system efficiently is a popular research area. Since many, if not all, words have more than one sense, it is necessary to learn vectors for all senses of word separately. Therefore, in this project, we have explored two multi-sense word embedding models, including Multi-Sense Skip-gram (MSSG) model and Non-parametric Multi-sense Skip Gram model (NP-MSSG). Furthermore, we propose an extension of the Multi-Sense Skip-gram model called Incremental Multi-Sense Skip-gram (IMSSG) model which could learn the vectors of all senses per word incrementally. We evaluate all the systems on word similarity task and show that IMSSG is better than the other models.



Author(s):  
Tianyuan Zhou ◽  
João Sedoc ◽  
Jordan Rodu

Many tasks in natural language processing require the alignment of word embeddings. Embedding alignment relies on the geometric properties of the manifold of word vectors. This paper focuses on supervised linear alignment and studies the relationship between the shape of the target embedding. We assess the performance of aligned word vectors on semantic similarity tasks and find that the isotropy of the target embedding is critical to the alignment. Furthermore, aligning with an isotropic noise can deliver satisfactory results. We provide a theoretical framework and guarantees which aid in the understanding of empirical results.



2015 ◽  
Vol 77 (18) ◽  
Author(s):  
Wai-Howe Khong ◽  
Lay-Ki Soon ◽  
Hui-Ngo Goh

Sentiment analysis has emerged as one of the most powerful tools in business intelligence. With the aim of proposing an effective sentiment analysis technique, we have performed experiments on analyzing the sentiments of 3,424 tweets using both statistical and natural language processing (NLP) techniques as part of our background study.  For statistical technique, machine learning algorithms such as Support Vector Machines (SVMs), decision trees and Naïve Bayes have been explored. The results show that SVM consistently outperformed the rest in both classifications. As for sentiment analysis using NLP techniques, we used two different tagging methods for part-of-speech (POS) tagging.  Subsequently, the output is used for word sense disambiguation (WSD) using WordNet, followed by sentiment identification using SentiWordNet.  Our experimental results indicate that adjectives and adverbs are sufficient to infer the sentiment of tweets compared to other combinations. Comparatively, the statistical approach records higher accuracy than the NLP approach by approximately 17%.



2015 ◽  
Vol 2015 ◽  
pp. 1-6 ◽  
Author(s):  
Yuntong Liu ◽  
Hua Sun

In order to use semantics more effectively in natural language processing, a word sense disambiguation method for Chinese based on semantics calculation was proposed. The word sense disambiguation for a Chinese clause could be achieved by solving the semantic model of the natural language; each step of the word sense disambiguation process was discussed in detail; and the computational complexity of the word sense disambiguation process was analyzed. Finally, some experiments were finished to verify the effectiveness of the method.



Sign in / Sign up

Export Citation Format

Share Document