Big Data Full-Text Search Index Minimization Using Text Summarization

An efficient full-text search is achieved by indexing the raw data with an additional 20 to 30 percent storagecost. In the context of Big Data, this additional storage space is huge and introduces challenges to entertainfull-text search queries with good performance. It also incurs overhead to store, manage, and update the largesize index. In this paper, we propose and evaluate a method to minimize the index size to offer full-text searchover Big Data using an automatic extractive-based text summarization method. To evaluate the effectivenessof the proposed approach, we used two real-world datasets. We indexed actual and summarized datasets usingApache Lucene and studied average simple overlapping, Spearman’s rho correlation, and average rankingscore measures of search results obtained using different search queries. Our experimental evaluation showsthat automatic text summarization is an effective method to reduce the index size significantly. We obtained amaximum of 82% reduction in index size with 42% higher relevance of the search results using the proposedsolution to minimize the full-text index size.

Download Full-text

Score-consistent algebraic optimization of full-text search queries with GRAFT

Proceedings of the 2011 international conference on Management of data - SIGMOD '11 ◽

10.1145/1989323.1989404 ◽

2011 ◽

Cited By ~ 1

Author(s):

Nathan Bales ◽

Alin Deutsch ◽

Vasilis Vassalos

Keyword(s):

Full Text ◽

Text Search ◽

Full Text Search ◽

Search Queries ◽

Algebraic Optimization

Download Full-text

Recommender Systems in Digital Libraries Using Artificial Intelligence and Machine Learning

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Handbook of Research on Methodologies and Applications of Supercomputing ◽

10.4018/978-1-7998-7156-9.ch012 ◽

2021 ◽

pp. 162-178

Author(s):

Namik Delilovic

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Digital Libraries ◽

Full Text ◽

Text Search ◽

Full Text Search ◽

Artificial Intelligence Techniques ◽

Search Results ◽

Advanced Search ◽

Search Field

Searching for contents in present digital libraries is still very primitive; most websites provide a search field where users can enter information such as book title, author name, or terms they expect to be found in the book. Some platforms provide advanced search options, which allow the users to narrow the search results by specific parameters such as year, author name, publisher, and similar. Currently, when users find a book which might be of interest to them, this search process ends; only a full-text search or references at the end of the book may provide some additional pointers. In this chapter, the author is going to give an example of how a user could permanently get recommendations for additional contents even while reading the article, using present machine learning and artificial intelligence techniques.

Download Full-text

SMART SEARCH IN THE DATABASE OF PHYSIC-TECHNICAL INFORMATION

ITNOU: Information technologies in education, science and management ◽

10.47501/itnou.2020.2.28-32 ◽

2020 ◽

pp. 28-32

Author(s):

Dmitry Mikhailovich Korobkin ◽

Stanislav Alekseevich Avdosev ◽

Sergei Alekseevich Fomenkov ◽

Sergei Grigorievich Kolesnikov

Keyword(s):

Full Text ◽

Technical Information ◽

Text Search ◽

Search System ◽

Full Text Search ◽

Search Queries ◽

Intelligent Search ◽

Physical Effects

The article describes the process of developing an automated intelligent search system based on physical effects. The developed system performs descriptor, full-text search, logging of search queries, displaying physical effects and other functions.

Download Full-text

Big Data Processing for Full-Text Search and Visualization with Elasticsearch

International Journal of Advanced Computer Science and Applications ◽

10.14569/ijacsa.2017.081211 ◽

2017 ◽

Vol 8 (12) ◽

Cited By ~ 2

Author(s):

Aleksei Voit ◽

Aleksei Stankus ◽

Shamil Magomedov ◽

Irina Ivanova

Keyword(s):

Big Data ◽

Data Processing ◽

Full Text ◽

Text Search ◽

Big Data Processing ◽

Full Text Search

Download Full-text

The architecture of a system for full-text search by speech data based on a global search index

Scientific and technical journal of information technologies mechanics and optics ◽

10.17586/2226-1494-2021-21-5-791-794 ◽

2021 ◽

Vol 21 (5) ◽

pp. 791-794

Author(s):

O.E. Petrov

Keyword(s):

Full Text ◽

Global Search ◽

Text Search ◽

Full Text Search ◽

Speech Data ◽

Search Index

Download Full-text

Relevance ranking for proximity full-text search based on additional indexes with multi-component keys

Vestnik Udmurtskogo Universiteta Matematika Mekhanika Komp yuternye Nauki ◽

10.35634/vm210110 ◽

2021 ◽

Vol 31 (1) ◽

pp. 132-148

Author(s):

A.B. Veretennikov

Keyword(s):

Full Text ◽

Search Query ◽

Text Search ◽

Relevance Ranking ◽

Full Text Search ◽

Search Results ◽

Search Speed ◽

Speed Up ◽

Inverted Indexes

The problem of proximity full-text search is considered. If a search query contains high-frequently occurring words, then multi-component key indexes deliver improvement of the search speed in comparison with ordinary inverted indexes. It was shown that we can increase the search speed up to 130 times in cases when queries consist of high-frequently occurring words. In this paper, we are investigating how the multi-component key indexes architecture affects the quality of the search. We consider several well-known methods of relevance ranking; these methods are of different authors. Using these methods we perform the search in the ordinary inverted index and then in the index that is enhanced with multi-component key indexes. The results show that with multi-component key indexes we obtain search results that are very near in terms of relevance ranking to the search results that are obtained by means of ordinary inverted indexes.

Download Full-text

DIFTSAS: A DIstributed Full Text Search and Analysis System for Big Data

2013 IEEE 16th International Conference on Computational Science and Engineering ◽

10.1109/cse.2013.193 ◽

2013 ◽

Author(s):

Bo Li ◽

Jingjie Zhang ◽

Mingyu Chen ◽

Jinchao Zhang ◽

Kunpeng Wang ◽

...

Keyword(s):

Big Data ◽

Full Text ◽

Text Search ◽

Full Text Search ◽

Analysis System

Download Full-text

When 57,300,000 Full Text Search Results Are Just Too Many

Proceedings of Balisage: The Markup Conference 2014 ◽

10.4242/balisagevol13.case01 ◽

2014 ◽

Cited By ~ 1

Author(s):

Pat Case

Keyword(s):

Full Text ◽

Search Engines ◽

End Users ◽

End User ◽

New Paradigm ◽

Text Search ◽

Full Text Search ◽

Search Results ◽

Impressive Result ◽

The Web

The Web changed the paradigm for full-text search. Searching Google for search engines returns 57,300,000 results at this writing, an impressive result set. Web search engines favor simple searches, speed, and relevance ranking. The end user most often finds a wanted result or two within the first page of search results. This new paradigm is less useful in searching collections of homogeneous data and documents than it is for searching the web. When searching collections end users may need to review everything in the collection on a topic, or may want a clean result set of only those 6 high-quality results, or may need to confirm that there are no wanted results because finding no results within a collection sometimes answers a question about a topic or collection. To accomplish these tasks, end users may need more end user functionality to return small, manageable result sets. The W3C XQuery and XPath Full Text Recommendation (XQFT) offers extensive end user functionality, restoring the end user control that librarians and expert searches enjoyed before the Web. XQFT offers more end user functionality and control than any other full-text search standard ever: more match options, more logical operators, more proximity operators, more ways to return a manageable result set. XQFT searches are also completely composable with XQuery string, number, date, and node queries, bringing the power of full-text search and database querying together for the first time. XQFT searches run directly against XML, enabling searches on any elements or attributes. XQFT implementations are standard-driven, based on shared semantics and syntax. A search in any implementation is portable and may be used in other implementations.

Download Full-text

Improving full text search performance through textual analysis

Information Processing & Management ◽

10.1016/0306-4573(93)90083-p ◽

1993 ◽

Vol 29 (5) ◽

pp. 615-632 ◽

Cited By ~ 2

Author(s):

Mavis Molto

Keyword(s):

Full Text ◽

Textual Analysis ◽

Search Performance ◽

Text Search ◽

Full Text Search

Download Full-text

GPU Computation for Online Realtime Multi-Pattern Matching

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.284-287.3428 ◽

2013 ◽

Vol 284-287 ◽

pp. 3428-3432 ◽

Cited By ~ 2

Author(s):

Yu Hsiu Huang ◽

Richard Chun Hung Lin ◽

Ying Chih Lin ◽

Cheng Yi Lin

Keyword(s):

Parallel Computation ◽

Pattern Matching ◽

Full Text ◽

Network Performance ◽

Text Search ◽

Full Text Search ◽

Network Intrusion ◽

Speed Up ◽

Set Up ◽

Gpu Implementation

Most applications of traditional full-text search, e.g., webpage search, are offline which exploit text search engine to preview the texts and set up related index. However, applications of online realtime full-text search, e.g., network Intrusion detection and prevention systems (IDPS) are too hard to implementation by using commodity hardware. They are expensive and inflexible for more and more occurrences of new virus patterns and the text cannot be previewed and the search must be complete realtime online. Additionally, IDPS needs multi-pattern matching, and then malicious packets can be removed immediately from normal ones without degrading the network performance. Considering the problem of realtime multi-pattern matching, we implement two sequential algorithms, Wu-Manber and Aho-Corasick, respectively over GPU parallel computation platform. Both pattern matching algorithms are quite suitable for the cases with a large amount of patterns. In addition, they are also easier extendable over GPU parallel computation platform to satisfy realtime requirement. Our experimental results show that the throughput of GPU implementation is about five to seven times faster than CPU. Therefore, pattern matching over GPU offers an attractive solution of IDPS to speed up malicious packets detection among the normal traffic by considering the lower cost, easy expansion and better performance.

Download Full-text