Ad hoc Retrieval models

Author(s):  
Souria Ortiga

During the 1980s, and despite its maturity, the search information (RI) was only intended for librarians and experts in the field of information. Such tendentious vision prevailed for many years. Since the mid-90s, the web has become an increasingly crucial source of information , which has a renewed interest in IR. In the last decade, the popularization of computers, the terrible explosion in the amount of unstructured data, internal documents, and corporate collections, and the huge and growing number of internet document sources have deeply shaken the relationship between man and information. Today, a great change has taken place, and the RI is often used by billions of people around the world. Simply, the need for automated methods for efficient access to this huge amount of digital information has become more important, and appears as a necessity.


2022 ◽  
Vol 40 (3) ◽  
pp. 1-37
Author(s):  
Edward Kai Fung Dang ◽  
Robert Wing Pong Luk ◽  
James Allan

In Information Retrieval, numerous retrieval models or document ranking functions have been developed in the quest for better retrieval effectiveness. Apart from some formal retrieval models formulated on a theoretical basis, various recent works have applied heuristic constraints to guide the derivation of document ranking functions. While many recent methods are shown to improve over established and successful models, comparison among these new methods under a common environment is often missing. To address this issue, we perform an extensive and up-to-date comparison of leading term-independence retrieval models implemented in our own retrieval system. Our study focuses on the following questions: (RQ1) Is there a retrieval model that consistently outperforms all other models across multiple collections; (RQ2) What are the important features of an effective document ranking function? Our retrieval experiments performed on several TREC test collections of a wide range of sizes (up to the terabyte-sized Clueweb09 Category B) enable us to answer these research questions. This work also serves as a reproducibility study for leading retrieval models. While our experiments show that no single retrieval model outperforms all others across all tested collections, some recent retrieval models, such as MATF and MVD, consistently perform better than the common baselines.


2015 ◽  
Vol 15 (19) ◽  
pp. 11133-11145 ◽  
Author(s):  
F. Chevallier

Abstract. The extending archive of the Greenhouse Gases Observing Satellite (GOSAT) measurements (now covering about 6 years) allows increasingly robust statistics to be computed, that document the performance of the corresponding retrievals of the column-average dry air-mole fraction of CO2 (XCO2). Here, we demonstrate that atmospheric inversions cannot be rigorously optimal when assimilating current XCO2 retrievals, even with averaging kernels, in particular because retrievals and inversions use different assumption about prior uncertainty. We look for some practical evidence of this sub-optimality from the view point of atmospheric inversion by comparing a model simulation constrained by surface air-sample measurements with one of the GOSAT retrieval products (NASA's ACOS). The retrieval-minus-model differences result from various error sources, both in the retrievals and in the simulation: we discuss the plausibility of the origin of the major patterns. We find systematic retrieval errors over the dark surfaces of high-latitude lands and over African savannahs. More importantly, we also find a systematic over-fit of the GOSAT radiances by the retrievals over land for the high-gain detector mode, which is the usual observation mode. The over-fit is partially compensated by the retrieval bias-correction. These issues are likely common to other retrieval products and may explain some of the surprising and inconsistent CO2 atmospheric inversion results obtained with the existing GOSAT retrieval products. We suggest that reducing the observation weight in the retrieval schemes (for instance so that retrieval increments to the retrieval prior values are halved for the studied retrieval product) would significantly improve the retrieval quality and reduce the need for (or at least reduce the complexity of) ad-hoc retrieval bias correction.


Sign in / Sign up

Export Citation Format

Share Document