LES
            3

Set similarity search is a problem of central interest to a wide variety of applications such as data cleaning and web search. Past approaches on set similarity search utilize either heavy indexing structures, incurring large search costs or indexes that produce large candidate sets. In this paper, we design a learning-based exact set similarity search approach, LES 3 . Our approach first partitions sets into groups, and then utilizes a light-weight bitmap-like indexing structure, called token-group matrix (TGM), to organize groups and prune out candidates given a query set. In order to optimize pruning using the TGM, we analytically investigate the optimal partitioning strategy under certain distributional assumptions. Using these results, we then design a learning-based partitioning approach called L2P and an associated data representation encoding, PTR, to identify the partitions. We conduct extensive experiments on real and synthetic datasets to fully study LES 3 , establishing the effectiveness and superiority over other applicable approaches.

Download Full-text

Memory versus logic: two models of organizing information and their influences on web retrieval strategies

tripleC Communication Capitalism & Critique Open Access Journal for a Global Sustainable Information Society ◽

10.31269/triplec.v4i2.34 ◽

1970 ◽

Vol 4 (2) ◽

pp. 178-186

Author(s):

Teresa Numerico

Keyword(s):

Information Retrieval ◽

Web Search ◽

Data Representation ◽

Philosophical Tradition ◽

Formal Representation ◽

Mathematical Functions ◽

Von Neumann ◽

Vannevar Bush ◽

Social Topology ◽

Information Retrieval Methods

We can find the first anticipation of the World Wide Web hypertextual structure in Bush paper of 1945, where he described a “selection” and storage machine called the Memex, capable of keeping the useful information of a user and connecting it to other relevant material present in the machine or added by other users. We will argue that Vannevar Bush, who conceived this type of machine, did it because its involvement with analogical devices. During the 1930s, in fact, he invented and built the Differential Analyzer, a powerful analogue machine, used to calculate various relevant mathematical functions. The model of the Memex is not the digital one, because it relies on another form of data representation that emulates more the procedures of memory than the attitude of the logic used by the intellect. Memory seems to select and arrange information according to association strategies, i.e., using analogies and connections that are very often arbitrary, sometimes even chaotic and completely subjective. The organization of information and the knowledge creation process suggested by logic and symbolic formal representation of data is deeply different from the former one, though the logic approach is at the core of the birth of computer science (i.e., the Turing Machine and the Von Neumann Machine). We will discuss the issues raised by these two “visions” of information management and the influences of the philosophical tradition of the theory of knowledge on the hypertextual organization of content. We will also analyze all the consequences of these different attitudes with respect to information retrieval techniques in a hypertextual environment, as the web. Our position is that it necessary to take into accounts the nature and the dynamic social topology of the network when we choose information retrieval methods for the network; otherwise, we risk creating a misleading service for the end user of web search tools (i.e., search engines).

Download Full-text

A Novel Similarity Search Approach for Streaming Time Series

Journal of Physics Conference Series ◽

10.1088/1742-6596/1302/2/022084 ◽

2019 ◽

Vol 1302 ◽

pp. 022084

Author(s):

Yiming Ding ◽

Wei Luo ◽

Yufei Zhao ◽

Zhen Li ◽

Peng Zhan ◽

...

Keyword(s):

Time Series ◽

Similarity Search ◽

Search Approach

Download Full-text

A Similarity Search Approach to Solving the Multi-query Problems

2012 IEEE/ACIS 11th International Conference on Computer and Information Science ◽

10.1109/icis.2012.17 ◽

2012 ◽

Author(s):

Yong Shi ◽

B. Graham

Keyword(s):

Similarity Search ◽

Search Approach

Download Full-text

Hierarchical Indexing Structure for Efficient Similarity Search in Video Retrieval

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2006.174 ◽

2006 ◽

Vol 18 (11) ◽

pp. 1544-1559 ◽

Cited By ~ 30

Author(s):

Hong Lu ◽

Beng Chin Ooi ◽

Heng Tao Shen ◽

Xiangyang Xue

Keyword(s):

Similarity Search ◽

Video Retrieval ◽

Indexing Structure

Download Full-text

LwVLC : Light-weight Variable-length Chunking Scheme for File Similarity Search in Digital Forensics

10.14257/astl.2013.28.16 ◽

2013 ◽

Cited By ~ 1

Author(s):

Min Ja Kim ◽

Chuck Yoo ◽

Wan Yeon Lee ◽

Young Woong Ko

Keyword(s):

Similarity Search ◽

Digital Forensics ◽

Variable Length ◽

Light Weight

Download Full-text

Clustering of the Web Search Results in Educational Recommender Systems

Educational Recommender Systems and Technologies ◽

10.4018/978-1-61350-489-5.ch007 ◽

2012 ◽

pp. 154-181 ◽

Cited By ~ 12

Author(s):

Constanta-Nicoleta Bodea ◽

Maria-Iuliana Dascalu ◽

Adina Lipai

Keyword(s):

Recommender Systems ◽

Clustering Algorithm ◽

Web Search ◽

Web Pages ◽

Lexical Database ◽

Assessment Task ◽

Search Results ◽

Meta Search ◽

Search Approach ◽

The Web

This chapter presents a meta-search approach, meant to deliver bibliography from the internet, according to trainees’ results obtained at an e-assessment task. The bibliography consists of web pages related to the knowledge gaps of the trainees. The meta-search engine is part of an education recommender system, attached to an e-assessment application for project management knowledge. Meta-search means that, for a specific query (or mistake made by the trainee), several search mechanisms for suitable bibliography (further reading) could be applied. The lists of results delivered by the standard search mechanisms are used to build thematically homogenous groups using an ontology-based clustering algorithm. The clustering process uses an educational ontology and WordNet lexical database to create its categories. The research is presented in the context of recommender systems and their various applications to the education domain.

Download Full-text

Advances in Search Strategy Using the Set of Brand Considerations in the Web Ecosystem

Applied Sciences ◽

10.3390/app11083514 ◽

2021 ◽

Vol 11 (8) ◽

pp. 3514

Author(s):

Sungeun Kwon ◽

Jonghyuk Kim ◽

Zoonky Lee

Keyword(s):

Digital Media ◽

Information Search ◽

Web Search ◽

Search Strategy ◽

Search Costs ◽

Third Party ◽

Online Information ◽

Consumer Journeys ◽

Information Search Behavior ◽

The Postponement

This study explores changes in a set of brand considerations as a result of web search strategies. Survey and personal computer log data of car buyers were used to identify online information search behavior for brands and products. Through this study, we found that higher frequencies of brand searching are associated with how much consumer-initiated sites and third-party-initiated sites are used, while lower frequencies of brand searching are only related to how much brand-initiated websites are used. We also concluded that ambivalent messages on consumer-initiated sites lead to the postponement of a decision and a continued search for another brand. In addition, third party-initiated information sources lower search costs, which lead to longer consumer journeys and expand the set of brands considered and searched. The results of this study can help marketers understand the importance of their own media and aid in the development of a digital media strategy.

Download Full-text