Relevance ranking for proximity full-text search based on additional indexes with multi-component keys

The problem of proximity full-text search is considered. If a search query contains high-frequently occurring words, then multi-component key indexes deliver improvement of the search speed in comparison with ordinary inverted indexes. It was shown that we can increase the search speed up to 130 times in cases when queries consist of high-frequently occurring words. In this paper, we are investigating how the multi-component key indexes architecture affects the quality of the search. We consider several well-known methods of relevance ranking; these methods are of different authors. Using these methods we perform the search in the ordinary inverted index and then in the index that is enhanced with multi-component key indexes. The results show that with multi-component key indexes we obtain search results that are very near in terms of relevance ranking to the search results that are obtained by means of ordinary inverted indexes.

Download Full-text

GPU Computation for Online Realtime Multi-Pattern Matching

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.284-287.3428 ◽

2013 ◽

Vol 284-287 ◽

pp. 3428-3432 ◽

Cited By ~ 2

Author(s):

Yu Hsiu Huang ◽

Richard Chun Hung Lin ◽

Ying Chih Lin ◽

Cheng Yi Lin

Keyword(s):

Parallel Computation ◽

Pattern Matching ◽

Full Text ◽

Network Performance ◽

Text Search ◽

Full Text Search ◽

Network Intrusion ◽

Speed Up ◽

Set Up ◽

Gpu Implementation

Most applications of traditional full-text search, e.g., webpage search, are offline which exploit text search engine to preview the texts and set up related index. However, applications of online realtime full-text search, e.g., network Intrusion detection and prevention systems (IDPS) are too hard to implementation by using commodity hardware. They are expensive and inflexible for more and more occurrences of new virus patterns and the text cannot be previewed and the search must be complete realtime online. Additionally, IDPS needs multi-pattern matching, and then malicious packets can be removed immediately from normal ones without degrading the network performance. Considering the problem of realtime multi-pattern matching, we implement two sequential algorithms, Wu-Manber and Aho-Corasick, respectively over GPU parallel computation platform. Both pattern matching algorithms are quite suitable for the cases with a large amount of patterns. In addition, they are also easier extendable over GPU parallel computation platform to satisfy realtime requirement. Our experimental results show that the throughput of GPU implementation is about five to seven times faster than CPU. Therefore, pattern matching over GPU offers an attractive solution of IDPS to speed up malicious packets detection among the normal traffic by considering the lower cost, easy expansion and better performance.

Download Full-text

Recommender Systems in Digital Libraries Using Artificial Intelligence and Machine Learning

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Handbook of Research on Methodologies and Applications of Supercomputing ◽

10.4018/978-1-7998-7156-9.ch012 ◽

2021 ◽

pp. 162-178

Author(s):

Namik Delilovic

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Digital Libraries ◽

Full Text ◽

Text Search ◽

Full Text Search ◽

Artificial Intelligence Techniques ◽

Search Results ◽

Advanced Search ◽

Search Field

Searching for contents in present digital libraries is still very primitive; most websites provide a search field where users can enter information such as book title, author name, or terms they expect to be found in the book. Some platforms provide advanced search options, which allow the users to narrow the search results by specific parameters such as year, author name, publisher, and similar. Currently, when users find a book which might be of interest to them, this search process ends; only a full-text search or references at the end of the book may provide some additional pointers. In this chapter, the author is going to give an example of how a user could permanently get recommendations for additional contents even while reading the article, using present machine learning and artificial intelligence techniques.

Download Full-text

SQLite Sebagai Pengganti Lucene.Net pada Pencarian Produk Toko Online

Jurnal CoSciTech (Computer Science and Information Technology) ◽

10.37859/coscitech.v1i2.2204 ◽

2020 ◽

Vol 1 (2) ◽

pp. 36-43

Author(s):

Imam Farisi ◽

Mukhaimy Gazali ◽

Rudy Anshari

Keyword(s):

Full Text ◽

Search Process ◽

Text Search ◽

Search System ◽

Full Text Search ◽

Database Table ◽

Search Speed ◽

Product Search ◽

Online Stores ◽

Two Alternatives

A business institution can promote their products in its online shop. With the promotion of online stores, consumers can find out which products are sold at the store. Consumers who are looking for a product can search based on various attributes of goods that are scattered in various fields in the database table, and can even be spread across different tables. All attributes that point to an item are collected in one document which will be searched using the Full Text Search system. Two alternatives to the Full Text Search system were selected; Lucene.Net and Sqlite Full Text Search. Before actually being used, these two search system alternatives were tested first. In document storage size, Lucene.Net is superior by 6.78 times. The speed of writing Sqlite search documents is superior by between 1,875 times to 5,197 times. In terms of key search speed, Lucene.Net was superior by between 1,169 and 1,698 times. Based on the consideration of the speed and development of Lucene.Net Core which is still in beta stage, Sqlite Full Text Search is suitable for use in the product search process in the Online Store.

Download Full-text

Markup as Index Interface

Proceedings of Balisage: The Markup Conference 2015 ◽

10.4242/balisagevol15.holstege01 ◽

2015 ◽

Author(s):

Mary Holstege

Keyword(s):

Information Content ◽

Search Engine ◽

Full Text ◽

List Type ◽

Text Search ◽

Full Text Search ◽

External Knowledge ◽

Additional Information ◽

Inverted Indexes ◽

Different Content

To a search engine, indexes are specified by the content: the words, phrases, and characters that are actually present tell the search engine what inverted indexes to create. Other external knowledge can be applied add to this inventory of indexes. For example, knowledge of the document language can lead to indexes for word stems or decompounding. These can unify different content into the same index or split the same content into multiple indexes. That is, different words manifest in the content can be unified under a single search key, and the same word can have multiple manifestations under different search keys. Turning this around, the indexes represent the retrievable information content in the document. Full text search is not an either/or yes/no system, but one of relative fit (scoring). Precision balances against recall, mediated by scoring. The search engine perspective offers a different way to think about markup: As a specification of the retrievable information content of the document. As something that can, with additional information, unify different markup or provide multiple distinct views of the same markup. As something that can be present to greater or lesser degrees, with a goodness of match (scoring). As a specification that can be adjusted to balance precision and recall. What does this search engine perspective on markup mean, concretely? Can we use it to reframe some persistent conundrums, such as vocabulary resolution and overlap? Let's see.

Download Full-text

When 57,300,000 Full Text Search Results Are Just Too Many

Proceedings of Balisage: The Markup Conference 2014 ◽

10.4242/balisagevol13.case01 ◽

2014 ◽

Cited By ~ 1

Author(s):

Pat Case

Keyword(s):

Full Text ◽

Search Engines ◽

End Users ◽

End User ◽

New Paradigm ◽

Text Search ◽

Full Text Search ◽

Search Results ◽

Impressive Result ◽

The Web

The Web changed the paradigm for full-text search. Searching Google for search engines returns 57,300,000 results at this writing, an impressive result set. Web search engines favor simple searches, speed, and relevance ranking. The end user most often finds a wanted result or two within the first page of search results. This new paradigm is less useful in searching collections of homogeneous data and documents than it is for searching the web. When searching collections end users may need to review everything in the collection on a topic, or may want a clean result set of only those 6 high-quality results, or may need to confirm that there are no wanted results because finding no results within a collection sometimes answers a question about a topic or collection. To accomplish these tasks, end users may need more end user functionality to return small, manageable result sets. The W3C XQuery and XPath Full Text Recommendation (XQFT) offers extensive end user functionality, restoring the end user control that librarians and expert searches enjoyed before the Web. XQFT offers more end user functionality and control than any other full-text search standard ever: more match options, more logical operators, more proximity operators, more ways to return a manageable result set. XQFT searches are also completely composable with XQuery string, number, date, and node queries, bringing the power of full-text search and database querying together for the first time. XQFT searches run directly against XML, enabling searches on any elements or attributes. XQFT implementations are standard-driven, based on shared semantics and syntax. A search in any implementation is portable and may be used in other implementations.

Download Full-text

Big Data Full-Text Search Index Minimization Using Text Summarization

Information Technology And Control ◽

10.5755/j01.itc.50.2.25470 ◽

2021 ◽

Vol 50 (2) ◽

pp. 375-389

Author(s):

Waheed Iqbal ◽

Waqas Ilyas Malik ◽

Faisal Bukhari ◽

Khaled Mohamad Almustafa ◽

Zubiar Nawaz

Keyword(s):

Big Data ◽

Full Text ◽

Text Summarization ◽

Text Search ◽

Full Text Search ◽

Search Queries ◽

Search Results ◽

Real World Datasets ◽

Search Index ◽

Index Size

An efficient full-text search is achieved by indexing the raw data with an additional 20 to 30 percent storagecost. In the context of Big Data, this additional storage space is huge and introduces challenges to entertainfull-text search queries with good performance. It also incurs overhead to store, manage, and update the largesize index. In this paper, we propose and evaluate a method to minimize the index size to offer full-text searchover Big Data using an automatic extractive-based text summarization method. To evaluate the effectivenessof the proposed approach, we used two real-world datasets. We indexed actual and summarized datasets usingApache Lucene and studied average simple overlapping, Spearman’s rho correlation, and average rankingscore measures of search results obtained using different search queries. Our experimental evaluation showsthat automatic text summarization is an effective method to reduce the index size significantly. We obtained amaximum of 82% reduction in index size with 42% higher relevance of the search results using the proposedsolution to minimize the full-text index size.

Download Full-text