Benchmarking a transformer-FREE model for ad-hoc retrieval

Abstract. The extending archive of the Greenhouse Gases Observing Satellite (GOSAT) measurements (now covering about 6 years) allows increasingly robust statistics to be computed, that document the performance of the corresponding retrievals of the column-average dry air-mole fraction of CO2 (XCO2). Here, we demonstrate that atmospheric inversions cannot be rigorously optimal when assimilating current XCO2 retrievals, even with averaging kernels, in particular because retrievals and inversions use different assumption about prior uncertainty. We look for some practical evidence of this sub-optimality from the view point of atmospheric inversion by comparing a model simulation constrained by surface air-sample measurements with one of the GOSAT retrieval products (NASA's ACOS). The retrieval-minus-model differences result from various error sources, both in the retrievals and in the simulation: we discuss the plausibility of the origin of the major patterns. We find systematic retrieval errors over the dark surfaces of high-latitude lands and over African savannahs. More importantly, we also find a systematic over-fit of the GOSAT radiances by the retrievals over land for the high-gain detector mode, which is the usual observation mode. The over-fit is partially compensated by the retrieval bias-correction. These issues are likely common to other retrieval products and may explain some of the surprising and inconsistent CO2 atmospheric inversion results obtained with the existing GOSAT retrieval products. We suggest that reducing the observation weight in the retrieval schemes (for instance so that retrieval increments to the retrieval prior values are halved for the studied retrieval product) would significantly improve the retrieval quality and reduce the need for (or at least reduce the complexity of) ad-hoc retrieval bias correction.

Download Full-text

Multi-Perspective Relevance Matching with Hierarchical ConvNets for Social Media Search

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301232 ◽

2019 ◽

Vol 33 ◽

pp. 232-240 ◽

Cited By ~ 6

Author(s):

Jinfeng Rao ◽

Wei Yang ◽

Yuhao Zhang ◽

Ferhan Ture ◽

Jimmy Lin

Keyword(s):

Social Media ◽

Ad Hoc ◽

Similarity Measurement ◽

Web Pages ◽

Ranking Models ◽

Twitter Data ◽

The Social ◽

Ad Hoc Retrieval ◽

Feature Based ◽

Substantial Interest

Despite substantial interest in applications of neural networks to information retrieval, neural ranking models have mostly been applied to “standard” ad hoc retrieval tasks over web pages and newswire articles. This paper proposes MP-HCNN (Multi-Perspective Hierarchical Convolutional Neural Network), a novel neural ranking model specifically designed for ranking short social media posts. We identify document length, informal language, and heterogeneous relevance signals as features that distinguish documents in our domain, and present a model specifically designed with these characteristics in mind. Our model uses hierarchical convolutional layers to learn latent semantic soft-match relevance signals at the character, word, and phrase levels. A poolingbased similarity measurement layer integrates evidence from multiple types of matches between the query, the social media post, as well as URLs contained in the post. Extensive experiments using Twitter data from the TREC Microblog Tracks 2011–2014 show that our model significantly outperforms prior feature-based as well as existing neural ranking models. To our best knowledge, this paper presents the first substantial work tackling search over social media posts using neural ranking models. Our code and data are publicly available.1

Download Full-text