scholarly journals Matrix-Based Method for Inferring Elements in Data Attributes Using a Vector Space Model

Information ◽  
2019 ◽  
Vol 10 (3) ◽  
pp. 107 ◽  
Author(s):  
Teruaki Hayashi ◽  
Yukio Ohsawa

This article addresses the task of inferring elements in the attributes of data. Extracting data related to our interests is a challenging task. Although data on the web can be accessed through free text queries, it is difficult to obtain results that accurately correspond to user intentions because users might not express their objects of interest using exact terms (variables, outlines of data, etc.) found in the data. In other words, users do not always have sufficient knowledge of the data to formulate an effective query. Hence, we propose a method that enables the type, format, and variable elements to be inferred as attributes of data when a natural language summary of the data is provided as a free text query. To evaluate the proposed method, we used the Data Jacket’s datasets whose metadata is written in natural language. The experimental results indicate that our method outperforms those obtained from string matching and word embedding. Applications based on this study can support users who wish to retrieve or acquire new data.

Author(s):  
Budi Yulianto ◽  
Widodo Budiharto ◽  
Iman Herwidiana Kartowisastro

Boolean Retrieval (BR) and Vector Space Model (VSM) are very popular methods in information retrieval for creating an inverted index and querying terms. BR method searches the exact results of the textual information retrieval without ranking the results. VSM method searches and ranks the results. This study empirically compares the two methods. The research utilizes a sample of the corpus data obtained from Reuters. The experimental results show that the required times to produce an inverted index by the two methods are nearly the same. However, a difference exists on the querying index. The results also show that the numberof generated indexes, the sizes of the generated files, and the duration of reading and searching an index are proportional with the file number in the corpus and thefile size.


2011 ◽  
Vol 3 (2) ◽  
pp. 1-17
Author(s):  
Rajiv Kadaba ◽  
Suratna Budalakoti ◽  
David DeAngelis ◽  
K. Suzanne Barber

Entities interacting on the web establish their identity by creating virtual personas. These entities, or agents, can be human users or software-based. This research models identity using the Entity-Persona Model, a semantically annotated social network inferred from the persistent traces of interaction between personas on the web. A Persona Mapping Algorithm is proposed which compares the local views of personas in their social network referred to as their Virtual Signatures, for structural and semantic similarity. The semantics of the Entity-Persona Model are modeled by a vector space model of the text associated with the personas in the network, which allows comparison of their Virtual Signatures. This enables all the publicly accessible personas of an entity to be identified on the scale of the web. This research enables an agent to identify a single entity using multiple personas on different networks, provided that multiple personas exhibit characteristic behavior. The agent is able to increase the trustworthiness of on-line interactions by establishing the identity of entities operating under multiple personas. Consequently, reputation measures based on on-line interactions with multiple personas can be aggregated and resolved to the true singular identity.


Author(s):  
Rajiv Kadaba ◽  
Suratna Budalakoti ◽  
David DeAngelis ◽  
K. Suzanne Barber

Entities interacting on the web establish their identity by creating virtual personas. These entities, or agents, can be human users or software-based. This research models identity using the Entity-Persona Model, a semantically annotated social network inferred from the persistent traces of interaction between personas on the web. A Persona Mapping Algorithm is proposed which compares the local views of personas in their social network referred to as their Virtual Signatures, for structural and semantic similarity. The semantics of the Entity-Persona Model are modeled by a vector space model of the text associated with the personas in the network, which allows comparison of their Virtual Signatures. This enables all the publicly accessible personas of an entity to be identified on the scale of the web. This research enables an agent to identify a single entity using multiple personas on different networks, provided that multiple personas exhibit characteristic behavior. The agent is able to increase the trustworthiness of on-line interactions by establishing the identity of entities operating under multiple personas. Consequently, reputation measures based on on-line interactions with multiple personas can be aggregated and resolved to the true singular identity.


2013 ◽  
Vol 411-414 ◽  
pp. 106-109 ◽  
Author(s):  
Ya Heng Ren

Vertical Search Engine provides a professional search compared with the traditional search engine. All of the data searched by vertical search engine is relative with some one theme, which is decided by users. Usually Vector Space Model is used for judging the relativity between data in the web and the decided theme. But when elements of the theme appear repeatedly, their order is not considered by Vector Space Model. Adding a new element, the Evolved Vector Space Model is provided. The experiments show that the new model has fixed the problem and have a better performance in judging relativity.


2019 ◽  
pp. 016555151986004
Author(s):  
Jayant Gadge ◽  
Sunil Bhirud

The World Wide Web (WWW) is the largest available repository of information. This huge amount of information put forward the challenges of retrieval of trustworthy information from WWW. It defies researchers with new issues of diversity and complexity while retrieving the web information. Information retrieval from the web demands approaches that span beyond conventional information retrieval. Heterogeneity, complexity and the huge volume of web information requires a unique approach to retrieve information. Besides, end-users introduce some difficulties in the retrieval process. Sometimes queries submitted by the user are subtle and ambiguous. The primary concern in information retrieval is the issue of predicting the relevance of documents. In this article, a new approach is proposed that rationally separates web document into five layers, namely, title, header, hyperlink, meta tag and body layer. The proposed method effectively combines the textual information and structural evidence of web document for retrieving information from Web. In the proposed layered vector space model, each layer has an allocated priority which is used to compute weight factor for these layers. The proposed method deduces equation that effectively combines priority of the layer and length of the layer to calculate the weight of the layer.


2014 ◽  
Vol 651-653 ◽  
pp. 2252-2257
Author(s):  
Zhi Qiang Li ◽  
Yuan Tan ◽  
Hong Chen Guo ◽  
Chong Feng

In recent years, the prevailing topic crawler algorithms are concentrated on the contents of topical words. These existing approaches neglect the sematic relationship among textual concepts, which lead to low correlation between crawled webpages. To address the issue, this paper presents a deep analysis of Shark Search algorithm, and makes an optimization in terms of incorporating the characteristics associated with semi-structured webpages. Furthermore, we enhance the performance of vector space model utilized in Shark Search algorithm by virtue of domain ontology, and propose a standardized method based on the vector space of ontology model to improve the evaluation metric of TF-IDF. The experimental results demonstrate the effectiveness of our algorithm that outperforms the state-of-the-art significantly in precision and recall.


Sign in / Sign up

Export Citation Format

Share Document