scholarly journals DeepTileBars: Visualizing Term Distribution for Neural Information Retrieval

Author(s):  
Zhiwen Tang ◽  
Grace Hui Yang

Most neural Information Retrieval (Neu-IR) models derive query-to-document ranking scores based on term-level matching. Inspired by TileBars, a classical term distribution visualization method, in this paper, we propose a novel Neu-IR model that handles query-to-document matching at the subtopic and higher levels. Our system first splits the documents into topical segments, “visualizes” the matchings between the query and the segments, and then feeds an interaction matrix into a Neu-IR model, DeepTileBars, to obtain the final ranking scores. DeepTileBars models the relevance signals occurring at different granularities in a document’s topic hierarchy. It better captures the discourse structure of a document and thus the matching patterns. Although its design and implementation are light-weight, DeepTileBars outperforms other state-of-the-art Neu-IR models on benchmark datasets including the Text REtrieval Conference (TREC) 2010-2012 Web Tracks and LETOR 4.0.

Author(s):  
Furkan Goz ◽  
Alev Mutlu

Keyword indexing is the problem of assigning keywords to text documents. It is an important task as keywords play crucial roles in several information retrieval tasks. The problem is also challenging as the number of text documents is increasing, and such documents come in different forms (i.e., scientific papers, online news articles, and microblog posts). This chapter provides an overview of keyword indexing and elaborates on keyword extraction techniques. The authors provide the general motivations behind the supervised and the unsupervised keyword extraction and enumerate several pioneering and state-of-the-art techniques. Feature engineering, evaluation metrics, and benchmark datasets used to evaluate the performance of keyword extraction systems are also discussed.


Author(s):  
Lin Zhu ◽  
Yihong Chen ◽  
Bowen He

As one of the most popular techniques for solving the ranking problem in information retrieval, Learning-to-rank (LETOR) has received a lot of attention both in academia and industry due to its importance in a wide variety of data mining applications. However, most of existing LETOR approaches choose to learn a single global ranking function to handle all queries, and ignore the substantial differences that exist between queries. In this paper, we propose a domain generalization strategy to tackle this problem. We propose QueryInvariant Listwise Context Modeling (QILCM), a novel neural architecture which eliminates the detrimental influence of inter-query variability by learning query-invariant latent representations, such that the ranking system could generalize better to unseen queries. We evaluate our techniques on benchmark datasets, demonstrating that QILCM outperforms previous state-of-the-art approaches by a substantial margin.


2018 ◽  
Author(s):  
Bhaskar Mitra ◽  
Nick Craswell

2021 ◽  
Vol 16 (1) ◽  
pp. 1-23
Author(s):  
Min-Ling Zhang ◽  
Jun-Peng Fang ◽  
Yi-Bo Wang

In multi-label classification, the task is to induce predictive models which can assign a set of relevant labels for the unseen instance. The strategy of label-specific features has been widely employed in learning from multi-label examples, where the classification model for predicting the relevancy of each class label is induced based on its tailored features rather than the original features. Existing approaches work by generating a group of tailored features for each class label independently, where label correlations are not fully considered in the label-specific features generation process. In this article, we extend existing strategy by proposing a simple yet effective approach based on BiLabel-specific features. Specifically, a group of tailored features is generated for a pair of class labels with heuristic prototype selection and embedding. Thereafter, predictions of classifiers induced by BiLabel-specific features are ensembled to determine the relevancy of each class label for unseen instance. To thoroughly evaluate the BiLabel-specific features strategy, extensive experiments are conducted over a total of 35 benchmark datasets. Comparative studies against state-of-the-art label-specific features techniques clearly validate the superiority of utilizing BiLabel-specific features to yield stronger generalization performance for multi-label classification.


2019 ◽  
Vol 53 (2) ◽  
pp. 3-10
Author(s):  
Muthu Kumar Chandrasekaran ◽  
Philipp Mayr

The 4 th joint BIRNDL workshop was held at the 42nd ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019) in Paris, France. BIRNDL 2019 intended to stimulate IR researchers and digital library professionals to elaborate on new approaches in natural language processing, information retrieval, scientometrics, and recommendation techniques that can advance the state-of-the-art in scholarly document understanding, analysis, and retrieval at scale. The workshop incorporated different paper sessions and the 5 th edition of the CL-SciSumm Shared Task.


2021 ◽  
Vol 11 (4) ◽  
pp. 1728
Author(s):  
Hua Zhong ◽  
Li Xu

The prediction interval (PI) is an important research topic in reliability analyses and decision support systems. Data size and computation costs are two of the issues which may hamper the construction of PIs. This paper proposes an all-batch (AB) loss function for constructing high quality PIs. Taking the full advantage of the likelihood principle, the proposed loss makes it possible to train PI generation models using the gradient descent (GD) method for both small and large batches of samples. With the structure of dual feedforward neural networks (FNNs), a high-quality PI generation framework is introduced, which can be adapted to a variety of problems including regression analysis. Numerical experiments were conducted on the benchmark datasets; the results show that higher-quality PIs were achieved using the proposed scheme. Its reliability and stability were also verified in comparison with various state-of-the-art PI construction methods.


2021 ◽  
pp. 1-12
Author(s):  
Yingwen Fu ◽  
Nankai Lin ◽  
Xiaotian Lin ◽  
Shengyi Jiang

Named entity recognition (NER) is fundamental to natural language processing (NLP). Most state-of-the-art researches on NER are based on pre-trained language models (PLMs) or classic neural models. However, these researches are mainly oriented to high-resource languages such as English. While for Indonesian, related resources (both in dataset and technology) are not yet well-developed. Besides, affix is an important word composition for Indonesian language, indicating the essentiality of character and token features for token-wise Indonesian NLP tasks. However, features extracted by currently top-performance models are insufficient. Aiming at Indonesian NER task, in this paper, we build an Indonesian NER dataset (IDNER) comprising over 50 thousand sentences (over 670 thousand tokens) to alleviate the shortage of labeled resources in Indonesian. Furthermore, we construct a hierarchical structured-attention-based model (HSA) for Indonesian NER to extract sequence features from different perspectives. Specifically, we use an enhanced convolutional structure as well as an enhanced attention structure to extract deeper features from characters and tokens. Experimental results show that HSA establishes competitive performance on IDNER and three benchmark datasets.


Sensors ◽  
2021 ◽  
Vol 21 (14) ◽  
pp. 4666
Author(s):  
Zhiqiang Pan ◽  
Honghui Chen

Collaborative filtering (CF) aims to make recommendations for users by detecting user’s preference from the historical user–item interactions. Existing graph neural networks (GNN) based methods achieve satisfactory performance by exploiting the high-order connectivity between users and items, however they suffer from the poor training efficiency problem and easily introduce bias for information propagation. Moreover, the widely applied Bayesian personalized ranking (BPR) loss is insufficient to provide supervision signals for training due to the extremely sparse observed interactions. To deal with the above issues, we propose the Efficient Graph Collaborative Filtering (EGCF) method. Specifically, EGCF adopts merely one-layer graph convolution to model the collaborative signal for users and items from the first-order neighbors in the user–item interactions. Moreover, we introduce contrastive learning to enhance the representation learning of users and items by deriving the self-supervisions, which is jointly trained with the supervised learning. Extensive experiments are conducted on two benchmark datasets, i.e., Yelp2018 and Amazon-book, and the experimental results demonstrate that EGCF can achieve the state-of-the-art performance in terms of Recall and normalized discounted cumulative gain (NDCG), especially on ranking the target items at right positions. In addition, EGCF shows obvious advantages in the training efficiency compared with the competitive baselines, making it practicable for potential applications.


Sign in / Sign up

Export Citation Format

Share Document