Comparison of the Best Parameter Settings in the Creation and Comparison of Feature Vectors in Distributional Semantic Models Across Multiple Languages

Author(s):  
András Dobó ◽  
János Csirik
2019 ◽  
Author(s):  
András Dobó

Measuring the semantic similarity and relatedness of words is important for many natural language processing tasks. Although distributional semantic models designed for this task have many different parameters, such as vector similarity measures, weighting schemes and dimensionality reduction techniques, there is no truly comprehensive study simultaneously evaluating these parameters while also analysing the differences in the findings for multiple languages. We would like to address this gap with our systematic study by searching for the best configuration in the creation and comparison of feature vectors in distributional semantic models for English, Spanish and Hungarian separately, and then comparing our findings across these languages. During our extensive analysis we test a large number of possible settings for all parameters, with more than a thousand novel variants in case of some of them. As a result of this we were able to find such configurations that significantly outperform conventional configurations and achieve state-of-the-art results.


2014 ◽  
Author(s):  
Masoud Rouhizadeh ◽  
Emily Prud'hommeaux ◽  
Jan van Santen ◽  
Richard Sproat

2019 ◽  
Vol 45 (1) ◽  
pp. 1-57 ◽  
Author(s):  
Silvio Cordeiro ◽  
Aline Villavicencio ◽  
Marco Idiart ◽  
Carlos Ramisch

Nominal compounds such as red wine and nut case display a continuum of compositionality, with varying contributions from the components of the compound to its semantics. This article proposes a framework for compound compositionality prediction using distributional semantic models, evaluating to what extent they capture idiomaticity compared to human judgments. For evaluation, we introduce data sets containing human judgments in three languages: English, French, and Portuguese. The results obtained reveal a high agreement between the models and human predictions, suggesting that they are able to incorporate information about idiomaticity. We also present an in-depth evaluation of various factors that can affect prediction, such as model and corpus parameters and compositionality operations. General crosslingual analyses reveal the impact of morphological variation and corpus size in the ability of the model to predict compositionality, and of a uniform combination of the components for best results.


Languages ◽  
2019 ◽  
Vol 4 (3) ◽  
pp. 46
Author(s):  
Juan ◽  
Faber

EcoLexicon is a terminological knowledge base on environmental science, whose design permits the geographic contextualization of data. For the geographic contextualization of landform concepts, this paper presents a semi-automatic method for extracting terms associated with named rivers (e.g., Mississippi River). Terms were extracted from a specialized corpus, where named rivers were automatically identified. Statistical procedures were applied for selecting both terms and rivers in distributional semantic models to construct the conceptual structures underlying the usage of named rivers. The rivers sharing associated terms were also clustered and represented in the same conceptual network. The results showed that the method successfully described the semantic frames of named rivers with explanatory adequacy, according to the premises of Frame-Based Terminology.


Author(s):  
Piero Molino ◽  
Pierpaolo Basile ◽  
Annalina Caputo ◽  
Pasquale Lops ◽  
Giovanni Semeraro

2016 ◽  
Author(s):  
Miroslav Batchkarov ◽  
Thomas Kober ◽  
Jeremy Reffin ◽  
Julie Weeds ◽  
David Weir

Sign in / Sign up

Export Citation Format

Share Document