scholarly journals Visual Exploration of Semantic Relationships in Neural Word Embeddings

2018 ◽  
Vol 24 (1) ◽  
pp. 553-562 ◽  
Author(s):  
Shusen Liu ◽  
Peer-Timo Bremer ◽  
Jayaraman J. Thiagarajan ◽  
Vivek Srikumar ◽  
Bei Wang ◽  
...  
2020 ◽  
Author(s):  
Derek Koehl ◽  
Carson Davis ◽  
Rahul Ramachandran ◽  
Udaysankar Nair ◽  
Manil Maskey

<p>Word embedding are numeric representations of text which capture meanings and semantic relationships in text. Embeddings can be constructed using different methods such as One Hot encoding, Frequency-based or Prediction-based approaches. Prediction-based approaches such as  Word2Vec, can be used to generate word embeddings that can capture the underlying semantics and word relationships in a corpus. Word2Vec embeddings generated from domain specific corpus have been shown in studies to both predict relationships and augment word vectors to improve classifications. We describe results from two different experiments utilizing word embeddings for Earth science constructed from a corpus of over 20,000 journal papers using Word2Vec. </p><p>The first experiment explores the analogy prediction performance of word embeddings built from the Earth science journal corpus and trained using domain-specific vocabulary. Our results demonstrate that the accuracy of domain-specific word embeddings in predicting Earth science analogy questions outperforms the ability of general corpus embedding to predict general analogy questions. While the results are as anticipated,  the substantial increase in accuracy, particularly in the lexicographical domain was encouraging. The results point to the need for developing a comprehensive Earth science analogy test set that covers the full breadth of lexicographical and encyclopedic categories for validating word embeddings.</p><p>The second experiment utilizes the word embeddings to augment metadata keyword classifications. Metadata describing NASA datasets have science keywords that are manually assigned which can lead to errors and inconsistencies. These science keywords are controlled vocabulary and are used to aid data discovery via faceted search and relevancy ranking. Given the small size of the number of metadata records with proper description and keywords, word embeddings were used for augmentation. A fully connected neural network was trained to suggest keywords given a description text. This approach provided the best accuracy at ~76% as compared to other methods tested.</p>


Author(s):  
Andrey Indukaev

AbstractThis chapter applies computational methods of textual analysis to a large corpus of media texts to study ideational change. The empirical focus of the chapter is on the ideas of the political role of innovation, technology, and economic development that were introduced into Russian politics during Medvedev’s presidency. The chapter uses topic modeling, shows the limitations of the method, and provides a more nuanced analysis with the help of word embeddings. The latter method is used to analyze semantic change and to capture complex semantic relationships between the studied concepts.


2018 ◽  
Vol 48 ◽  
pp. 178-186 ◽  
Author(s):  
Juntian Chen ◽  
Yubo Tao ◽  
Hai Lin

2004 ◽  
Vol 3 (2) ◽  
pp. 80-95 ◽  
Author(s):  
Geraldine E Rosario ◽  
Elke A Rundensteiner ◽  
David C Brown ◽  
Matthew O Ward ◽  
Shiping Huang

Data sets with a large numbers of nominal variables, including some with large number of distinct values, are becoming increasingly common and need to be explored. Unfortunately, most existing visual exploration tools are designed to handle numeric variables only. When importing data sets with nominal values into such visualization tools, most solutions to date are rather simplistic. Often, techniques that map nominal values to numbers do not assign order or spacing among the values in a manner that conveys semantic relationships. Moreover, displays designed for nominal variables usually cannot handle high cardinality variables well. This paper addresses the problem of how to display nominal variables in general-purpose visual exploration tools designed for numeric variables. Specifically, we investigate (1) how to assign order and spacing among the nominal values, and (2) how to reduce the number of distinct values to display. We propose a new technique, called the Distance-Quantification-Classing (DQC) approach, to preprocess nominal variables before being imported into a visual exploration tool. In the Distance Step, we identify a set of independent dimensions that can be used to calculate the distance between nominal values. In the Quantification Step, we use the independent dimensions and the distance information to assign order and spacing among the nominal values. In the Classing Step, we use results from the previous steps to determine which values within the domain of a variable are similar to each other and thus can be grouped together. Each step in the DQC approach can be accomplished by a variety of techniques. We extended the XmdvTool package to incorporate this approach. We evaluated our approach on several data sets using a variety of measures.


2021 ◽  
Vol 11 (15) ◽  
pp. 6896
Author(s):  
Padraig Corcoran ◽  
Geraint Palmer ◽  
Laura Arman ◽  
Dawn Knight ◽  
Irena Spasić

Word embeddings are representations of words in a vector space that models semantic relationships between words by means of distance and direction. In this study, we adapted two existing methods, word2vec and fastText, to automatically learn Welsh word embeddings taking into account syntactic and morphological idiosyncrasies of this language. These methods exploit the principles of distributional semantics and, therefore, require a large corpus to be trained on. However, Welsh is a minoritised language, hence significantly less Welsh language data are publicly available in comparison to English. Consequently, assembling a sufficiently large text corpus is not a straightforward endeavour. Nonetheless, we compiled a corpus of 92,963,671 words from 11 sources, which represents the largest corpus of Welsh. The relative complexity of Welsh punctuation made the tokenisation of this corpus relatively challenging as punctuation could not be used for boundary detection. We considered several tokenisation methods including one designed specifically for Welsh. To account for rich inflection, we used a method for learning word embeddings that is based on subwords and, therefore, can more effectively relate different surface forms during the training phase. We conducted both qualitative and quantitative evaluation of the resulting word embeddings, which outperformed previously described word embeddings in Welsh as part of larger study including 157 languages. Our study was the first to focus specifically on Welsh word embeddings.


1982 ◽  
Vol 13 (1) ◽  
pp. 37-41
Author(s):  
Larry J. Mattes

Elicited imitation tasks are frequently used as a diagnostic tool in evaluating children with communication handicaps. This article presents a scoring procedure that can be used to obtain an in-depth descriptive analysis of responses produced on elicited imitation tasks. The Elicited Language Analysis Procedure makes it possible to systematically evaluate responses in terms of both their syntactic and semantic relationships to the stimulus sentences presented by the examiner. Response quality measures are also included in the analysis procedure.


Author(s):  
Lisa von Stockhausen ◽  
Sara Koeser ◽  
Sabine Sczesny

Past research has shown that the gender typicality of applicants’ faces affects leadership selection irrespective of a candidate’s gender: A masculine facial appearance is congruent with masculine-typed leadership roles, thus masculine-looking applicants are hired more certainly than feminine-looking ones. In the present study, we extended this line of research by investigating hiring decisions for both masculine- and feminine-typed professional roles. Furthermore, we used eye tracking to examine the visual exploration of applicants’ portraits. Our results indicate that masculine-looking applicants were favored for the masculine-typed role (leader) and feminine-looking applicants for the feminine-typed role (team member). Eye movement patterns showed that information about gender category and facial appearance was integrated during first fixations of the portraits. Hiring decisions, however, were not based on this initial analysis, but occurred at a second stage, when the portrait was viewed in the context of considering the applicant for a specific job.


2011 ◽  
Vol 42 (01) ◽  
Author(s):  
J. von der Gablentz ◽  
A. Sprenger ◽  
M. Dorr ◽  
E. Barth ◽  
W. Heide ◽  
...  

Author(s):  
Aleksey Klokov ◽  
Evgenii Slobodyuk ◽  
Michael Charnine

The object of the research when writing the work was the body of text data collected together with the scientific advisor and the algorithms for processing the natural language of analysis. The stream of hypotheses has been tested against computer science scientific publications through a series of simulation experiments described in this dissertation. The subject of the research is algorithms and the results of the algorithms, aimed at predicting promising topics and terms that appear in the course of time in the scientific environment. The result of this work is a set of machine learning models, with the help of which experiments were carried out to identify promising terms and semantic relationships in the text corpus. The resulting models can be used for semantic processing and analysis of other subject areas.


Sign in / Sign up

Export Citation Format

Share Document