scholarly journals A Detailed Study of Distributed Indexed Search Techniques using SOLR

For any web application running on RDBMS databases as the backend, it might be a huge performance impact if a search needs to be performed on a table with millions of rows or if a query needs to be executed which joins multiple tables. In general, such kind of backend services make the website extremely slow. Document based reverse indexing can be a useful solution in these cases. SOLR is a standalone enterprise search server with a REST-like API. It has major features which include powerful full-text search, hit highlighting, faceted search, near real-time indexing, dynamic clustering, database integration, NoSQL features and rich document (e.g., Word, PDF and more) parsing, geospatial search, Security built in. Databases and SOLR have complementary strengths and weaknesses. SQL supports very simple wildcard-based text search with some simple normalization like matching upper case to lowercase. The problem is that these are full table scans. In SOLR all searchable words are stored in an "inverse index based", which searches orders of magnitude faster. However, designing this framework is quite challenging. This paper discusses the techniques that are highly reliable, scalable and fault tolerant which can help in setting up the distributed indexing, replication and load-balanced querying with a centralized configuration.

2013 ◽  
Vol 284-287 ◽  
pp. 3428-3432 ◽  
Author(s):  
Yu Hsiu Huang ◽  
Richard Chun Hung Lin ◽  
Ying Chih Lin ◽  
Cheng Yi Lin

Most applications of traditional full-text search, e.g., webpage search, are offline which exploit text search engine to preview the texts and set up related index. However, applications of online realtime full-text search, e.g., network Intrusion detection and prevention systems (IDPS) are too hard to implementation by using commodity hardware. They are expensive and inflexible for more and more occurrences of new virus patterns and the text cannot be previewed and the search must be complete realtime online. Additionally, IDPS needs multi-pattern matching, and then malicious packets can be removed immediately from normal ones without degrading the network performance. Considering the problem of realtime multi-pattern matching, we implement two sequential algorithms, Wu-Manber and Aho-Corasick, respectively over GPU parallel computation platform. Both pattern matching algorithms are quite suitable for the cases with a large amount of patterns. In addition, they are also easier extendable over GPU parallel computation platform to satisfy realtime requirement. Our experimental results show that the throughput of GPU implementation is about five to seven times faster than CPU. Therefore, pattern matching over GPU offers an attractive solution of IDPS to speed up malicious packets detection among the normal traffic by considering the lower cost, easy expansion and better performance.


2012 ◽  
Vol 02 (04) ◽  
pp. 106-109 ◽  
Author(s):  
Rujia Gao ◽  
Danying Li ◽  
Wanlong Li ◽  
Yaze Dong

Author(s):  
Namik Delilovic

Searching for contents in present digital libraries is still very primitive; most websites provide a search field where users can enter information such as book title, author name, or terms they expect to be found in the book. Some platforms provide advanced search options, which allow the users to narrow the search results by specific parameters such as year, author name, publisher, and similar. Currently, when users find a book which might be of interest to them, this search process ends; only a full-text search or references at the end of the book may provide some additional pointers. In this chapter, the author is going to give an example of how a user could permanently get recommendations for additional contents even while reading the article, using present machine learning and artificial intelligence techniques.


Author(s):  
Imam Farisi ◽  
Mukhaimy Gazali ◽  
Rudy Anshari

A business institution can promote their products in its online shop. With the promotion of online stores, consumers can find out which products are sold at the store. Consumers who are looking for a product can search based on various attributes of goods that are scattered in various fields in the database table, and can even be spread across different tables. All attributes that point to an item are collected in one document which will be searched using the Full Text Search system. Two alternatives to the Full Text Search system were selected; Lucene.Net and Sqlite Full Text Search. Before actually being used, these two search system alternatives were tested first. In document storage size, Lucene.Net is superior by 6.78 times. The speed of writing Sqlite search documents is superior by between 1,875 times to 5,197 times. In terms of key search speed, Lucene.Net was superior by between 1,169 and 1,698 times. Based on the consideration of the speed and development of Lucene.Net Core which is still in beta stage, Sqlite Full Text Search is suitable for use in the product search process in the Online Store.


Sign in / Sign up

Export Citation Format

Share Document