Fast Filtering of Search Results Sorted by Attribute
Modern search services often provide multiple options to rank the search results, e.g., sort “by relevance”, “by price” or “by discount” in e-commerce. While the traditional rank by relevance effectively places the relevant results in the top positions of the results list, the rank by attribute could place many marginally relevant results in the head of the results list leading to poor user experience. In the past, this issue has been addressed by investigating the relevance-aware filtering problem, which asks to select the subset of results maximizing the relevance of the attribute-sorted list. Recently, an exact algorithm has been proposed to solve this problem optimally. However, the high computational cost of the algorithm makes it impractical for the Web search scenario, which is characterized by huge lists of results and strict time constraints. For this reason, the problem is often solved using efficient yet inaccurate heuristic algorithms. In this article, we first prove the performance bounds of the existing heuristics. We then propose two efficient and effective algorithms to solve the relevance-aware filtering problem. First, we propose OPT-Filtering, a novel exact algorithm that is faster than the existing state-of-the-art optimal algorithm. Second, we propose an approximate and even more efficient algorithm, ϵ-Filtering, which, given an allowed approximation error ϵ, finds a (1-ϵ)–optimal filtering, i.e., the relevance of its solution is at least (1-ϵ) times the optimum. We conduct a comprehensive evaluation of the two proposed algorithms against state-of-the-art competitors on two real-world public datasets. Experimental results show that OPT-Filtering achieves a significant speedup of up to two orders of magnitude with respect to the existing optimal solution, while ϵ-Filtering further improves this result by trading effectiveness for efficiency. In particular, experiments show that ϵ-Filtering can achieve quasi-optimal solutions while being faster than all state-of-the-art competitors in most of the tested configurations.