scholarly journals Full Text Search Setup on a Website

2021 ◽  
pp. 55-60
Author(s):  
Halyna V. Khodiakova ◽  
◽  
Nataliia V. Khodiakova ◽  
Valery A. Pozdeev ◽  
◽  
...  

ntroduction. When implementing the search for text fragments on the site, approaches are used that are different in complexity and performance. There is also a sequence of related tasks: choosing a text indexing option, sending a text for indexing, selecting texts for indexing specifically from the CMS database, choosing a search engine, and others. These approaches do not always provide satisfactory search results. Purpose. The purpose of the article is to the description of existing solutions for full-text search on a website, their advantages, and disadvantages. Development of a full-text search algorithm using the Elasticsearch system. Methods. Analysis of approaches to the implementation of full-text search on a website, varying in complexity and performance. Identification of flaws and vulnerabilities in more primitive approaches and the development of more advanced and complex algorithms that eliminate the identified deficiencies. Step-by-step implementation of full-text search using third-party systems. Results. A method for implementing full-text search using Elasticsearch is described. The advantage of the new approach is the asynchronous sending of the page content and its address to a specific service responsible for communication with Elasticsearch. This allows you not to block the normal work with the CMS and not depend on the availability of the indexing service. The approach described in the article is flexible and adaptable for various website architectures. Asynchronous processing of indexing requests ensures high query execution speed and system fault tolerance. Conclusions. The article discusses various approaches to implementing full-text search on a website, their advantages and disadvantages. Based on the analysis, a more flexible and universal approach to the implementation of a full-text search system has been developed. A solution is proposed with step-by-step implementation and setup of advanced full-text search using Elasticsearch.

2011 ◽  
Vol 135-136 ◽  
pp. 369-374
Author(s):  
Yang Sen Zhang ◽  
Gai Juan Huang

In this paper, we have designed and realized a efficient full-text retrieval system for the basic annotation People's Daily Corpus based on the inverted index technology. According to the characteristics of the basic annotation People’s Daily Corpus data, we have analyzed the methods and strategies of system implementing thoroughly. On the basis of comparing the various schemes, we have put forward to the three levels index structure of Chinese character, word and address set, and given the design approach of each level index dictionary structure. After converting the unstructured People’s Daily corpus into index structured data, we realized the full-text search algorithm correspond to the proposed index structure. Experimental results show that the proposed search algorithm has achieved the target of "ten millions Chinese characters, response in a second", improved the speed of the People's Daily Corpus full-text search.


2013 ◽  
Vol 284-287 ◽  
pp. 3428-3432 ◽  
Author(s):  
Yu Hsiu Huang ◽  
Richard Chun Hung Lin ◽  
Ying Chih Lin ◽  
Cheng Yi Lin

Most applications of traditional full-text search, e.g., webpage search, are offline which exploit text search engine to preview the texts and set up related index. However, applications of online realtime full-text search, e.g., network Intrusion detection and prevention systems (IDPS) are too hard to implementation by using commodity hardware. They are expensive and inflexible for more and more occurrences of new virus patterns and the text cannot be previewed and the search must be complete realtime online. Additionally, IDPS needs multi-pattern matching, and then malicious packets can be removed immediately from normal ones without degrading the network performance. Considering the problem of realtime multi-pattern matching, we implement two sequential algorithms, Wu-Manber and Aho-Corasick, respectively over GPU parallel computation platform. Both pattern matching algorithms are quite suitable for the cases with a large amount of patterns. In addition, they are also easier extendable over GPU parallel computation platform to satisfy realtime requirement. Our experimental results show that the throughput of GPU implementation is about five to seven times faster than CPU. Therefore, pattern matching over GPU offers an attractive solution of IDPS to speed up malicious packets detection among the normal traffic by considering the lower cost, easy expansion and better performance.


2012 ◽  
Vol 02 (04) ◽  
pp. 106-109 ◽  
Author(s):  
Rujia Gao ◽  
Danying Li ◽  
Wanlong Li ◽  
Yaze Dong

Sign in / Sign up

Export Citation Format

Share Document