For any web application running on RDBMS
databases as the backend, it might be a huge performance impact
if a search needs to be performed on a table with millions of rows
or if a query needs to be executed which joins multiple tables. In
general, such kind of backend services make the website extremely
slow. Document based reverse indexing can be a useful solution in
these cases. SOLR is a standalone enterprise search server with a
REST-like API. It has major features which include powerful
full-text search, hit highlighting, faceted search, near real-time
indexing, dynamic clustering, database integration, NoSQL
features and rich document (e.g., Word, PDF and more) parsing,
geospatial search, Security built in. Databases and SOLR have
complementary strengths and weaknesses. SQL supports very
simple wildcard-based text search with some simple normalization
like matching upper case to lowercase. The problem is that these
are full table scans. In SOLR all searchable words are stored in an
"inverse index based", which searches orders of magnitude faster.
However, designing this framework is quite challenging. This
paper discusses the techniques that are highly reliable, scalable
and fault tolerant which can help in setting up the distributed
indexing, replication and load-balanced querying with a
centralized configuration.