scholarly journals Relevance ranking for proximity full-text search based on additional indexes with multi-component keys

Author(s):  
A.B. Veretennikov

The problem of proximity full-text search is considered. If a search query contains high-frequently occurring words, then multi-component key indexes deliver improvement of the search speed in comparison with ordinary inverted indexes. It was shown that we can increase the search speed up to 130 times in cases when queries consist of high-frequently occurring words. In this paper, we are investigating how the multi-component key indexes architecture affects the quality of the search. We consider several well-known methods of relevance ranking; these methods are of different authors. Using these methods we perform the search in the ordinary inverted index and then in the index that is enhanced with multi-component key indexes. The results show that with multi-component key indexes we obtain search results that are very near in terms of relevance ranking to the search results that are obtained by means of ordinary inverted indexes.

2013 ◽  
Vol 284-287 ◽  
pp. 3428-3432 ◽  
Author(s):  
Yu Hsiu Huang ◽  
Richard Chun Hung Lin ◽  
Ying Chih Lin ◽  
Cheng Yi Lin

Most applications of traditional full-text search, e.g., webpage search, are offline which exploit text search engine to preview the texts and set up related index. However, applications of online realtime full-text search, e.g., network Intrusion detection and prevention systems (IDPS) are too hard to implementation by using commodity hardware. They are expensive and inflexible for more and more occurrences of new virus patterns and the text cannot be previewed and the search must be complete realtime online. Additionally, IDPS needs multi-pattern matching, and then malicious packets can be removed immediately from normal ones without degrading the network performance. Considering the problem of realtime multi-pattern matching, we implement two sequential algorithms, Wu-Manber and Aho-Corasick, respectively over GPU parallel computation platform. Both pattern matching algorithms are quite suitable for the cases with a large amount of patterns. In addition, they are also easier extendable over GPU parallel computation platform to satisfy realtime requirement. Our experimental results show that the throughput of GPU implementation is about five to seven times faster than CPU. Therefore, pattern matching over GPU offers an attractive solution of IDPS to speed up malicious packets detection among the normal traffic by considering the lower cost, easy expansion and better performance.


Author(s):  
Namik Delilovic

Searching for contents in present digital libraries is still very primitive; most websites provide a search field where users can enter information such as book title, author name, or terms they expect to be found in the book. Some platforms provide advanced search options, which allow the users to narrow the search results by specific parameters such as year, author name, publisher, and similar. Currently, when users find a book which might be of interest to them, this search process ends; only a full-text search or references at the end of the book may provide some additional pointers. In this chapter, the author is going to give an example of how a user could permanently get recommendations for additional contents even while reading the article, using present machine learning and artificial intelligence techniques.


Author(s):  
Imam Farisi ◽  
Mukhaimy Gazali ◽  
Rudy Anshari

A business institution can promote their products in its online shop. With the promotion of online stores, consumers can find out which products are sold at the store. Consumers who are looking for a product can search based on various attributes of goods that are scattered in various fields in the database table, and can even be spread across different tables. All attributes that point to an item are collected in one document which will be searched using the Full Text Search system. Two alternatives to the Full Text Search system were selected; Lucene.Net and Sqlite Full Text Search. Before actually being used, these two search system alternatives were tested first. In document storage size, Lucene.Net is superior by 6.78 times. The speed of writing Sqlite search documents is superior by between 1,875 times to 5,197 times. In terms of key search speed, Lucene.Net was superior by between 1,169 and 1,698 times. Based on the consideration of the speed and development of Lucene.Net Core which is still in beta stage, Sqlite Full Text Search is suitable for use in the product search process in the Online Store.


Author(s):  
Mary Holstege

To a search engine, indexes are specified by the content: the words, phrases, and characters that are actually present tell the search engine what inverted indexes to create. Other external knowledge can be applied add to this inventory of indexes. For example, knowledge of the document language can lead to indexes for word stems or decompounding. These can unify different content into the same index or split the same content into multiple indexes. That is, different words manifest in the content can be unified under a single search key, and the same word can have multiple manifestations under different search keys. Turning this around, the indexes represent the retrievable information content in the document. Full text search is not an either/or yes/no system, but one of relative fit (scoring). Precision balances against recall, mediated by scoring. The search engine perspective offers a different way to think about markup: As a specification of the retrievable information content of the document. As something that can, with additional information, unify different markup or provide multiple distinct views of the same markup. As something that can be present to greater or lesser degrees, with a goodness of match (scoring). As a specification that can be adjusted to balance precision and recall. What does this search engine perspective on markup mean, concretely? Can we use it to reframe some persistent conundrums, such as vocabulary resolution and overlap? Let's see.


Author(s):  
Pat Case

The Web changed the paradigm for full-text search. Searching Google for search engines returns 57,300,000 results at this writing, an impressive result set. Web search engines favor simple searches, speed, and relevance ranking. The end user most often finds a wanted result or two within the first page of search results. This new paradigm is less useful in searching collections of homogeneous data and documents than it is for searching the web. When searching collections end users may need to review everything in the collection on a topic, or may want a clean result set of only those 6 high-quality results, or may need to confirm that there are no wanted results because finding no results within a collection sometimes answers a question about a topic or collection. To accomplish these tasks, end users may need more end user functionality to return small, manageable result sets. The W3C XQuery and XPath Full Text Recommendation (XQFT) offers extensive end user functionality, restoring the end user control that librarians and expert searches enjoyed before the Web. XQFT offers more end user functionality and control than any other full-text search standard ever: more match options, more logical operators, more proximity operators, more ways to return a manageable result set. XQFT searches are also completely composable with XQuery string, number, date, and node queries, bringing the power of full-text search and database querying together for the first time. XQFT searches run directly against XML, enabling searches on any elements or attributes. XQFT implementations are standard-driven, based on shared semantics and syntax. A search in any implementation is portable and may be used in other implementations.


2021 ◽  
Vol 50 (2) ◽  
pp. 375-389
Author(s):  
Waheed Iqbal ◽  
Waqas Ilyas Malik ◽  
Faisal Bukhari ◽  
Khaled Mohamad Almustafa ◽  
Zubiar Nawaz

An efficient full-text search is achieved by indexing the raw data with an additional 20 to 30 percent storagecost. In the context of Big Data, this additional storage space is huge and introduces challenges to entertainfull-text search queries with good performance. It also incurs overhead to store, manage, and update the largesize index. In this paper, we propose and evaluate a method to minimize the index size to offer full-text searchover Big Data using an automatic extractive-based text summarization method. To evaluate the effectivenessof the proposed approach, we used two real-world datasets. We indexed actual and summarized datasets usingApache Lucene and studied average simple overlapping, Spearman’s rho correlation, and average rankingscore measures of search results obtained using different search queries. Our experimental evaluation showsthat automatic text summarization is an effective method to reduce the index size significantly. We obtained amaximum of 82% reduction in index size with 42% higher relevance of the search results using the proposedsolution to minimize the full-text index size.


2012 ◽  
Vol 02 (04) ◽  
pp. 106-109 ◽  
Author(s):  
Rujia Gao ◽  
Danying Li ◽  
Wanlong Li ◽  
Yaze Dong

Sign in / Sign up

Export Citation Format

Share Document