Visual Exploration of Semantic Relationships in Neural Word Embeddings

Word embedding are numeric representations of text which capture meanings and semantic relationships in text. Embeddings can be constructed using different methods such as One Hot encoding, Frequency-based or Prediction-based approaches. Prediction-based approaches such as&#160; Word2Vec, can be used to generate word embeddings that can capture the underlying semantics and word relationships in a corpus. Word2Vec embeddings generated from domain specific corpus have been shown in studies to both predict relationships and augment word vectors to improve classifications. We describe results from two different experiments utilizing word embeddings for Earth science constructed from a corpus of over 20,000 journal papers using Word2Vec.&#160;The first experiment explores the analogy prediction performance of word embeddings built from the Earth science journal corpus and trained using domain-specific vocabulary. Our results demonstrate that the accuracy of domain-specific word embeddings in predicting Earth science analogy questions outperforms the ability of general corpus embedding to predict general analogy questions. While the results are as anticipated,&#160; the substantial increase in accuracy, particularly in the lexicographical domain was encouraging. The results point to the need for developing a comprehensive Earth science analogy test set that covers the full breadth of lexicographical and encyclopedic categories for validating word embeddings.The second experiment utilizes the word embeddings to augment metadata keyword classifications. Metadata describing NASA datasets have science keywords that are manually assigned which can lead to errors and inconsistencies. These science keywords are controlled vocabulary and are used to aid data discovery via faceted search and relevancy ranking. Given the small size of the number of metadata records with proper description and keywords, word embeddings were used for augmentation. A fully connected neural network was trained to suggest keywords given a description text. This approach provided the best accuracy at ~76% as compared to other methods tested.

Download Full-text

Studying Ideational Change in Russian Politics with Topic Models and Word Embeddings

The Palgrave Handbook of Digital Russia Studies ◽

10.1007/978-3-030-42855-6_25 ◽

2020 ◽

pp. 443-464

Author(s):

Andrey Indukaev

Keyword(s):

Economic Development ◽

The Political ◽

Word Embeddings ◽

Semantic Change ◽

Semantic Relationships ◽

Political Role ◽

Ideational Change ◽

Russian Politics ◽

Large Corpus

AbstractThis chapter applies computational methods of textual analysis to a large corpus of media texts to study ideational change. The empirical focus of the chapter is on the ideas of the political role of innovation, technology, and economic development that were introduced into Russian politics during Medvedev’s presidency. The chapter uses topic modeling, shows the limitations of the method, and provides a more nuanced analysis with the help of word embeddings. The latter method is used to analyze semantic change and to capture complex semantic relationships between the studied concepts.

Download Full-text

Visual exploration and comparison of word embeddings

Journal of Visual Languages & Computing ◽

10.1016/j.jvlc.2018.08.008 ◽

2018 ◽

Vol 48 ◽

pp. 178-186 ◽

Cited By ~ 2

Author(s):

Juntian Chen ◽

Yubo Tao ◽

Hai Lin

Keyword(s):

Visual Exploration ◽

Word Embeddings

Download Full-text

Mapping Nominal Values to Numbers for Effective Visualization

Information Visualization ◽

10.1057/palgrave.ivs.9500072 ◽

2004 ◽

Vol 3 (2) ◽

pp. 80-95 ◽

Cited By ~ 38

Author(s):

Geraldine E Rosario ◽

Elke A Rundensteiner ◽

David C Brown ◽

Matthew O Ward ◽

Shiping Huang

Keyword(s):

General Purpose ◽

Visual Exploration ◽

Data Sets ◽

Distance Information ◽

Semantic Relationships ◽

Large Numbers ◽

Exploration Tool ◽

A New Technique ◽

Effective Visualization ◽

Nominal Variables

Data sets with a large numbers of nominal variables, including some with large number of distinct values, are becoming increasingly common and need to be explored. Unfortunately, most existing visual exploration tools are designed to handle numeric variables only. When importing data sets with nominal values into such visualization tools, most solutions to date are rather simplistic. Often, techniques that map nominal values to numbers do not assign order or spacing among the values in a manner that conveys semantic relationships. Moreover, displays designed for nominal variables usually cannot handle high cardinality variables well. This paper addresses the problem of how to display nominal variables in general-purpose visual exploration tools designed for numeric variables. Specifically, we investigate (1) how to assign order and spacing among the nominal values, and (2) how to reduce the number of distinct values to display. We propose a new technique, called the Distance-Quantification-Classing (DQC) approach, to preprocess nominal variables before being imported into a visual exploration tool. In the Distance Step, we identify a set of independent dimensions that can be used to calculate the distance between nominal values. In the Quantification Step, we use the independent dimensions and the distance information to assign order and spacing among the nominal values. In the Classing Step, we use results from the previous steps to determine which values within the domain of a variable are similar to each other and thus can be grouped together. Each step in the DQC approach can be accomplished by a variety of techniques. We extended the XmdvTool package to incorporate this approach. We evaluated our approach on several data sets using a variety of measures.

Download Full-text

Creating Welsh Language Word Embeddings

Applied Sciences ◽

10.3390/app11156896 ◽

2021 ◽

Vol 11 (15) ◽

pp. 6896

Author(s):

Padraig Corcoran ◽

Geraint Palmer ◽

Laura Arman ◽

Dawn Knight ◽

Irena Spasić

Keyword(s):

Vector Space ◽

Quantitative Evaluation ◽

Training Phase ◽

Distributional Semantics ◽

Word Embeddings ◽

Semantic Relationships ◽

Qualitative And Quantitative ◽

Welsh Language ◽

Language Data ◽

Large Corpus

Word embeddings are representations of words in a vector space that models semantic relationships between words by means of distance and direction. In this study, we adapted two existing methods, word2vec and fastText, to automatically learn Welsh word embeddings taking into account syntactic and morphological idiosyncrasies of this language. These methods exploit the principles of distributional semantics and, therefore, require a large corpus to be trained on. However, Welsh is a minoritised language, hence significantly less Welsh language data are publicly available in comparison to English. Consequently, assembling a sufficiently large text corpus is not a straightforward endeavour. Nonetheless, we compiled a corpus of 92,963,671 words from 11 sources, which represents the largest corpus of Welsh. The relative complexity of Welsh punctuation made the tokenisation of this corpus relatively challenging as punctuation could not be used for boundary detection. We considered several tokenisation methods including one designed specifically for Welsh. To account for rich inflection, we used a method for learning word embeddings that is based on subwords and, therefore, can more effectively relate different surface forms during the training phase. We conducted both qualitative and quantitative evaluation of the resulting word embeddings, which outperformed previously described word embeddings in Welsh as part of larger study including 157 languages. Our study was the first to focus specifically on Welsh word embeddings.

Download Full-text

The Elicited Language Analysis Procedure

Language Speech and Hearing Services in Schools ◽

10.1044/0161-1461.1301.37 ◽

1982 ◽

Vol 13 (1) ◽

pp. 37-41

Author(s):

Larry J. Mattes

Keyword(s):

Diagnostic Tool ◽

Descriptive Analysis ◽

Quality Measures ◽

Analysis Procedure ◽

Elicited Imitation ◽

Response Quality ◽

Semantic Relationships ◽

Language Analysis

Elicited imitation tasks are frequently used as a diagnostic tool in evaluating children with communication handicaps. This article presents a scoring procedure that can be used to obtain an in-depth descriptive analysis of responses produced on elicited imitation tasks. The Elicited Language Analysis Procedure makes it possible to systematically evaluate responses in terms of both their syntactic and semantic relationships to the stimulus sentences presented by the examiner. Response quality measures are also included in the analysis procedure.

Download Full-text

The Gender Typicality of Faces and Its Impact on Visual Processing and on Hiring Decisions

Experimental Psychology (formerly Zeitschrift für Experimentelle Psychologie) ◽

10.1027/1618-3169/a000217 ◽

2013 ◽

Vol 60 (6) ◽

pp. 444-452 ◽

Cited By ~ 8

Author(s):

Lisa von Stockhausen ◽

Sara Koeser ◽

Sabine Sczesny

Keyword(s):

Visual Processing ◽

Past Research ◽

Visual Exploration ◽

Facial Appearance ◽

Leadership Roles ◽

Hiring Decisions ◽

Leadership Selection ◽

Gender Typicality ◽

Second Stage ◽

The Feminine

Past research has shown that the gender typicality of applicants’ faces affects leadership selection irrespective of a candidate’s gender: A masculine facial appearance is congruent with masculine-typed leadership roles, thus masculine-looking applicants are hired more certainly than feminine-looking ones. In the present study, we extended this line of research by investigating hiring decisions for both masculine- and feminine-typed professional roles. Furthermore, we used eye tracking to examine the visual exploration of applicants’ portraits. Our results indicate that masculine-looking applicants were favored for the masculine-typed role (leader) and feminine-looking applicants for the feminine-typed role (team member). Eye movement patterns showed that information about gender category and facial appearance was integrated during first fixations of the portraits. Hiring decisions, however, were not based on this initial analysis, but occurred at a second stage, when the portrait was viewed in the context of considering the applicant for a specific job.

Download Full-text

Why Semantic Relationships Are More Reliable Than Associations: Human and Computational Results

PsycEXTRA Dataset ◽

10.1037/e501882009-063 ◽

2000 ◽

Cited By ~ 1

Author(s):

Steve Bueno ◽

Cheryl Frenck-Mestre ◽

Curt Burgess ◽

Kevin Lund

Keyword(s):

Computational Results ◽

Semantic Relationships

Download Full-text

Visual exploration of dynamic real-world scenes in patients with hemispatial neglect

Klinische Neurophysiologie ◽

10.1055/s-0031-1272664 ◽

2011 ◽

Vol 42 (01) ◽

Author(s):

J. von der Gablentz ◽

A. Sprenger ◽

M. Dorr ◽

E. Barth ◽

W. Heide ◽

...

Keyword(s):

Real World ◽

Visual Exploration ◽

Hemispatial Neglect

Download Full-text

Predicting the citation and impact factor of terms for scientific publications using machine learning algorithms

CPT2020 The 8th International Scientific Conference on Computing in Physics and Technology Proceedings ◽

10.30987/conferencearticle_5fd755c0ea6458.82600196 ◽

2020 ◽

Author(s):

Aleksey Klokov ◽

Evgenii Slobodyuk ◽

Michael Charnine

Keyword(s):

Machine Learning ◽

Semantic Processing ◽

The Body ◽

Machine Learning Algorithms ◽

Scientific Publications ◽

Text Data ◽

Semantic Relationships ◽

Subject Areas ◽

The Subject ◽

Scientific Environment

The object of the research when writing the work was the body of text data collected together with the scientific advisor and the algorithms for processing the natural language of analysis. The stream of hypotheses has been tested against computer science scientific publications through a series of simulation experiments described in this dissertation. The subject of the research is algorithms and the results of the algorithms, aimed at predicting promising topics and terms that appear in the course of time in the scientific environment. The result of this work is a set of machine learning models, with the help of which experiments were carried out to identify promising terms and semantic relationships in the text corpus. The resulting models can be used for semantic processing and analysis of other subject areas.

Download Full-text