Visual graph query formulation and exploration: a new perspective on information retrieval at the edge

AbstractResearch on question answering dates back to the 1960s but has more recently been revisited as part of TREC's evaluation campaigns, where question answering is addressed as a subarea of information retrieval that focuses on specific answers to a user's information need. Whereas document retrieval systems aim to return the documents that are most relevant to a user's query, question answering systems aim to return actual answers to a users question. Despite this difference, question answering systems rely on information retrieval components to identify documents that contain an answer to a user's question. The computationally more expensive answer extraction methods are then applied only to this subset of documents that are likely to contain an answer. As information retrieval methods are used to filter the documents in the collection, the performance of this component is critical as documents that are not retrieved are not analyzed by the answer extraction component. The formulation of queries that are used for retrieving those documents has a strong impact on the effectiveness of the retrieval component. In this paper, we focus on predicting the importance of terms from the original question. We use model tree machine learning techniques in order to assign weights to query terms according to their usefulness for identifying documents that contain an answer. Term weights are learned by inspecting a large number of query formulation variations and their respective accuracy in identifying documents containing an answer. Several linguistic features are used for building the models, including part-of-speech tags, degree of connectivity in the dependency parse tree of the question, and ontological information. All of these features are extracted automatically by using several natural language processing tools. Incorporating the learned weights into a state-of-the-art retrieval system results in statistically significant improvements in identifying answer-bearing documents.

Download Full-text

Towards plug-and-play visual graph query interfaces

Proceedings of the VLDB Endowment ◽

10.14778/3476249.3476256 ◽

2021 ◽

Vol 14 (11) ◽

pp. 1979-1991

Author(s):

Zifeng Yuan ◽

Huey Eng Chua ◽

Sourav S Bhowmick ◽

Zekun Ye ◽

Wook-Shin Han ◽

...

Keyword(s):

Real World ◽

Domain Knowledge ◽

Experimental Studies ◽

Plug And Play ◽

Large Networks ◽

Graph Query ◽

Query Interfaces ◽

Wide Range ◽

Real World Datasets ◽

Visual Graph

Canned patterns ( i.e. , small subgraph patterns) in visual graph query interfaces (a.k.a GUI) facilitate efficient query formulation by enabling pattern-at-a-time construction mode. However, existing GUIS for querying large networks either do not expose any canned patterns or if they do then they are typically selected manually based on domain knowledge. Unfortunately, manual generation of canned patterns is not only labor intensive but may also lack diversity for supporting efficient visual formulation of a wide range of subgraph queries. In this paper, we present a novel, generic, and extensible framework called TATTOO that takes a data-driven approach to automatically select canned patterns for a GUI from large networks. Specifically, it first decomposes the underlying network into truss-infested and truss-oblivious regions. Then candidate canned patterns capturing different real-world query topologies are generated from these regions. Canned patterns based on a user-specified plug are then selected for the GUI from these candidates by maximizing coverage and diversity , and by minimizing the cognitive load of the pattern set. Experimental studies with real-world datasets demonstrate the benefits of TATTOO. Importantly, this work takes a concrete step towards realizing plug-and-play visual graph query interfaces for large networks.

Download Full-text

An Efficient Topic Modeling Approach for Text Mining and Information Retrieval through K-means Clustering

Mehran University Research Journal of Engineering and Technology ◽

10.22581/muet1982.2001.20 ◽

2020 ◽

Vol 39 (1) ◽

pp. 213-222

Author(s):

Junaid Rashid ◽

Syed Muhammad Adnan Shah ◽

Aun Irtaza

Keyword(s):

Information Retrieval ◽

Text Mining ◽

Topic Modeling ◽

Clustering Algorithm ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

State Of The Art ◽

Text Documents ◽

New Perspective ◽

Better Than

Topic modeling is an effective text mining and information retrieval approach to organizing knowledge with various contents under a specific topic. Text documents in form of news articles are increasing very fast on the web. Analysis of these documents is very important in the fields of text mining and information retrieval. Meaningful information extraction from these documents is a challenging task. One approach for discovering the theme from text documents is topic modeling but this approach still needs a new perspective to improve its performance. In topic modeling, documents have topics and topics are the collection of words. In this paper, we propose a new k-means topic modeling (KTM) approach by using the k-means clustering algorithm. KTM discovers better semantic topics from a collection of documents. Experiments on two real-world Reuters 21578 and BBC News datasets show that KTM performance is better than state-of-the-art topic models like LDA (Latent Dirichlet Allocation) and LSA (Latent Semantic Analysis). The KTM is also applicable for classification and clustering tasks in text mining and achieves higher performance with a comparison of its competitors LDA and LSA.

Download Full-text

Bipolar queries in textual information retrieval: A new perspective

Information Processing & Management ◽

10.1016/j.ipm.2011.05.001 ◽

2012 ◽

Vol 48 (3) ◽

pp. 390-398 ◽

Cited By ~ 10

Author(s):

Sławomir Zadrożny ◽

Janusz Kacprzyk ◽

Guy De Tré

Keyword(s):

Information Retrieval ◽

Textual Information ◽

New Perspective

Download Full-text

AURORA: Data-driven Construction of Visual Graph Query Interfaces for Graph Databases

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data ◽

10.1145/3318464.3384681 ◽

2020 ◽

Author(s):

Sourav S. Bhowmick ◽

Kai Huang ◽

Huey Eng Chua ◽

Zifeng Yuan ◽

Byron Choi ◽

...

Keyword(s):

Data Driven ◽

Graph Databases ◽

Graph Query ◽

Query Interfaces ◽

Visual Graph

Download Full-text

Toward a fairer information retrieval system

ACM SIGIR Forum ◽

10.1145/3476415.3476429 ◽

2021 ◽

Vol 55 (1) ◽

pp. 1-2

Author(s):

Ruoyuan Gao

Keyword(s):

Information Retrieval ◽

Optimization Problems ◽

Solution Space ◽

Post Processing ◽

Unified Framework ◽

Real World Datasets ◽

New Perspective ◽

Evaluation Metric ◽

Google Search ◽

The Relationship

With the increasing popularity and social influence of information retrieval (IR) systems, various studies have raised concerns on the presence of bias in IR and the social responsibilities of IR systems. Techniques for addressing these issues can be classified into pre-processing , in-processing and post-processing. Pre-processing reduces bias in the data that is fed into machine learning models. In-processing encodes fairness constraints as a part of the objective function or learning process. Post-processing operates as a top layer over the trained model to reduce the presentation bias exposed to users. This dissertation explored ways to bring the pre-processing and post-processing approaches, together with the fairness-aware evaluation metrics, into a unified framework as an attempt to break the vicious cycle of bias and improve fairness in IR. We first investigated the existing bias presented in search engine results. Specifically, we focused on the top-k fairness ranking in terms of statistical parity fairness and disparate impact fairness definitions. With Google search and a general purposed text cluster as a lens, we explored several topical diversity fairness ranking strategies to understand the relationship between relevance and fairness in search results. Our experimental results showed that different fairness ranking strategies resulted in distinct utility scores and performed differently with distinct datasets. Second, to further investigate the relationship of data and fairness algorithms, we developed a statistical framework that was able to facilitate various analysis and decision making. Our framework could effectively and efficiently estimate the domain of data and solution space. We derived theoretical expressions to identify the fairness and relevance bounds for data of different distributions, and applied them to both synthetic datasets and real world datasets. We presented a series of use cases to demonstrate how our framework was applied to associate data and provide insights to fairness optimization problems. Third, we proposed an evaluation metric FAIR for the ranking results that encoded fairness, diversity, novelty and relevance. This metric offered a new perspective of evaluating fairness-aware ranking results. Based on this metric, we developed an effective ranking algorithm that jointly optimized for fairness and utility. Our experiments showed that our new metric was able to highlight results that achieved good user utility and fair information exposure at the same time. We showed how FAIR metric related to existing metrics through correlation analysis and case studies, and demonstrated the effectiveness of our FAIR-based algorithm.

Download Full-text