scholarly journals A Word Selection Method for Producing Interpretable Distributional Semantic Word Vectors

2021 ◽  
Vol 72 ◽  
pp. 1281-1305
Author(s):  
Atefe Pakzad ◽  
Morteza Analoui

Distributional semantic models represent the meaning of words as vectors. We introduce a selection method to learn a vector space that each of its dimensions is a natural word. The selection method starts from the most frequent words and selects a subset, which has the best performance. The method produces a vector space that each of its dimensions is a word. This is the main advantage of the method compared to fusion methods such as NMF, and neural embedding models. We apply the method to the ukWaC corpus and train a vector space of N=1500 basis words. We report tests results on word similarity tasks for MEN, RG-65, SimLex-999, and WordSim353 gold datasets. Also, results show that reducing the number of basis vectors from 5000 to 1500 reduces accuracy by about 1.5-2%. So, we achieve good interpretability without a large penalty. Interpretability evaluation results indicate that the word vectors obtained by the proposed method using N=1500 are more interpretable than word embedding models, and the baseline method. We report the top 15 words of 1500 selected basis words in this paper.

2016 ◽  
Author(s):  
Miroslav Batchkarov ◽  
Thomas Kober ◽  
Jeremy Reffin ◽  
Julie Weeds ◽  
David Weir

2014 ◽  
Author(s):  
Masoud Rouhizadeh ◽  
Emily Prud'hommeaux ◽  
Jan van Santen ◽  
Richard Sproat

2019 ◽  
Vol 45 (1) ◽  
pp. 1-57 ◽  
Author(s):  
Silvio Cordeiro ◽  
Aline Villavicencio ◽  
Marco Idiart ◽  
Carlos Ramisch

Nominal compounds such as red wine and nut case display a continuum of compositionality, with varying contributions from the components of the compound to its semantics. This article proposes a framework for compound compositionality prediction using distributional semantic models, evaluating to what extent they capture idiomaticity compared to human judgments. For evaluation, we introduce data sets containing human judgments in three languages: English, French, and Portuguese. The results obtained reveal a high agreement between the models and human predictions, suggesting that they are able to incorporate information about idiomaticity. We also present an in-depth evaluation of various factors that can affect prediction, such as model and corpus parameters and compositionality operations. General crosslingual analyses reveal the impact of morphological variation and corpus size in the ability of the model to predict compositionality, and of a uniform combination of the components for best results.


2020 ◽  
Vol 8 ◽  
pp. 311-329
Author(s):  
Kushal Arora ◽  
Aishik Chakraborty ◽  
Jackie C. K. Cheung

In this paper, we propose LexSub, a novel approach towards unifying lexical and distributional semantics. We inject knowledge about lexical-semantic relations into distributional word embeddings by defining subspaces of the distributional vector space in which a lexical relation should hold. Our framework can handle symmetric attract and repel relations (e.g., synonymy and antonymy, respectively), as well as asymmetric relations (e.g., hypernymy and meronomy). In a suite of intrinsic benchmarks, we show that our model outperforms previous approaches on relatedness tasks and on hypernymy classification and detection, while being competitive on word similarity tasks. It also outperforms previous systems on extrinsic classification tasks that benefit from exploiting lexical relational cues. We perform a series of analyses to understand the behaviors of our model. 1 Code available at https://github.com/aishikchakraborty/LexSub .


2015 ◽  
Vol 55 (2) ◽  
Author(s):  
Adolfas Dargys

To have a closed system, the Maxwell electromagnetic equations should be supplemented by constitutive relations which describe medium properties and connect primary fields (E, B) with secondary ones (D, H). J.W. Gibbs and O. Heaviside introduced the basis vectors {i, j, k} to represent the fields and constitutive relations in the three-dimensional vectorial space. In this paper the constitutive relations are presented in a form of Cl3,0 algebra which describes the vector space by three basis vectors {σ1, σ2, σ3} that satisfy Pauli commutation relations. It is shown that the classification of electromagnetic wave propagation phenomena with the help of constitutive relations in this case comes from the structure of Cl3,0 itself. Concrete expressions for classical constitutive relations are presented including electromagnetic wave propagation in a moving dielectric.


Languages ◽  
2019 ◽  
Vol 4 (3) ◽  
pp. 46
Author(s):  
Juan ◽  
Faber

EcoLexicon is a terminological knowledge base on environmental science, whose design permits the geographic contextualization of data. For the geographic contextualization of landform concepts, this paper presents a semi-automatic method for extracting terms associated with named rivers (e.g., Mississippi River). Terms were extracted from a specialized corpus, where named rivers were automatically identified. Statistical procedures were applied for selecting both terms and rivers in distributional semantic models to construct the conceptual structures underlying the usage of named rivers. The rivers sharing associated terms were also clustered and represented in the same conceptual network. The results showed that the method successfully described the semantic frames of named rivers with explanatory adequacy, according to the premises of Frame-Based Terminology.


Author(s):  
S Hasanzadeh ◽  
S M Fakhrahmad ◽  
M Taheri

Abstract Recommender systems nowadays play an important role in providing helpful information for users, especially in ecommerce applications. Many of the proposed models use rating histories of the users in order to predict unknown ratings. Recently, users’ reviews as a valuable source of knowledge have attracted the attention of researchers in this field and a new category denoted as review-based recommender systems has emerged. In this study, we make use of the information included in user reviews as well as available rating scores to develop a review-based rating prediction system. The proposed scheme attempts to handle the uncertainty problem of the rating histories, by fuzzifying the given ratings. Another advantage of the proposed system is the use of a word embedding representation model for textual reviews, instead of using traditional models such as binary bag of words and TFIDF 1 vector space. It also makes use of the helpfulness voting scores, in order to prune data and achieve better results. The effectiveness of the rating prediction scheme as well as the final recommender system was evaluated against the Amazon dataset. Experimental results revealed that the proposed recommender system outperforms its counterparts and can be used as a suitable tool in ecommerce environments.


Sign in / Sign up

Export Citation Format

Share Document