vector space models
Recently Published Documents


TOTAL DOCUMENTS

93
(FIVE YEARS 22)

H-INDEX

11
(FIVE YEARS 2)

Author(s):  
Roman Shaptala ◽  
Gennadiy Kyselov

In this study, we explore and compare two ways of vector space model creation for Kyiv city petitions. Both models are built on top of word vectors based on the distributional hypothesis, namely Word2Vec and FastText. We train word vectors on the dataset of Kyiv city petitions, preprocess the documents, and apply averaging to create petition vectors. Visualizations of the vector spaces after dimensionality reduction via UMAP are demonstrated in an attempt to show their overall structure. We show that the resulting models can be used to effectively query semantically related petitions as well as search for clusters of related petitions. The advantages and disadvantages of both models are analyzed.


2021 ◽  
Author(s):  
Mateus Alex dos Santos Luna ◽  
André Paulino Lima ◽  
Thaís Rodrigues Neubauer ◽  
Marcelo Fantinato ◽  
Sarajane Marques Peres

Process mining explores event logs to offer valuable insights to business process managers. Some types of business processes are hard to mine, including unstructured and knowledge-intensive processes. Then, trace clustering is usually applied to event logs aiming to break it into sublogs, making it more amenable to the typical process mining task. However, applying clustering algorithms involves decisions, such as how traces are represented, that can lead to better results. In this paper, we compare four vector space models for trace clustering, using them with an agglomerative clustering algorithm in synthetic and real-world event logs. Our analyses suggest the embeddings-based vector space model can properly handle trace clustering in unstructured processes.


Author(s):  
James E Dobson

Abstract Scholars working in computational literary studies are increasingly making use of text-derived vector space models, by which I mean numerical models of texts that represent the distribution or modeled relations among the vocabulary extracted from these texts. These models, as this essay will argue, call for distinct modes of humanistic interpretation and explication that are related to but distinct from those that may have been used on the original source texts. While vector space models are analyzed using increasingly complicated quantitative methods and the explanation of their operation requires statistical sophistication, my emphasis on humanistic interpretation is quite intentional. This essay theorizes two major categories of vector space models, the document-term matrix and neural language models, to position these models as not merely descriptions of texts but inscriptive representational objects that perform interpretive work of their own in order to demonstrate the need for a multi-level hermeneutics in computational literary studies.


2021 ◽  
Author(s):  
Marcos Garcia ◽  
Tiago Kramer Vieira ◽  
Carolina Scarton ◽  
Marco Idiart ◽  
Aline Villavicencio

Cognition ◽  
2020 ◽  
Vol 205 ◽  
pp. 104440
Author(s):  
Joshua C. Peterson ◽  
Dawn Chen ◽  
Thomas L. Griffiths

Energies ◽  
2020 ◽  
Vol 13 (22) ◽  
pp. 5948
Author(s):  
Renxi Gong ◽  
Siqiang Li ◽  
Weiyu Peng

Decision-making for the condition-based maintenance (CBM) of power transformers is critical to their sustainable operation. Existing research exhibits significant shortcomings; neither group decision-making nor maintenance intention is considered, which does not satisfy the needs of smart grids. Thus, a multivariate assessment system, which includes the consideration of technology, cost-effectiveness, and security, should be created, taking into account current research findings. In order to address the uncertainty of maintenance strategy selection, this paper proposes a maintenance decision-making model composed of cloud and vector space models. The optimal maintenance strategy is selected in a multivariate assessment system. Cloud models allow for the expression of natural language evaluation information and are used to transform qualitative concepts into quantitative expressions. The subjective and objective weights of the evaluation index are derived from the analytic hierarchy process and the grey relational analysis method, respectively. The kernel vector space model is then used to select the best maintenance strategy through the close degree calculation. Finally, an optimal maintenance strategy is determined. A comparison and analysis of three different representative maintenance strategies resulted in the following findings: The proposed model is effective; it provides a new decision-making method for power transformer maintenance decision-making; it is simple, practical, and easy to combine with the traditional state assessment method, and thus should play a role in transformer fault diagnosis.


2020 ◽  
pp. 016555152096805
Author(s):  
Mete Eminagaoglu

There are various models, methodologies and algorithms that can be used today for document classification, information retrieval and other text mining applications and systems. One of them is the vector space–based models, where distance metrics or similarity measures lie at the core of such models. Vector space–based model is one of the fast and simple alternatives for the processing of textual data; however, its accuracy, precision and reliability still need significant improvements. In this study, a new similarity measure is proposed, which can be effectively used for vector space models and related algorithms such as k-nearest neighbours ( k-NN) and Rocchio as well as some clustering algorithms such as K-means. The proposed similarity measure is tested with some universal benchmark data sets in Turkish and English, and the results are compared with some other standard metrics such as Euclidean distance, Manhattan distance, Chebyshev distance, Canberra distance, Bray–Curtis dissimilarity, Pearson correlation coefficient and Cosine similarity. Some successful and promising results have been obtained, which show that this proposed similarity measure could be alternatively used within all suitable algorithms and models for information retrieval, document clustering and text classification.


Sign in / Sign up

Export Citation Format

Share Document