Semantic Table Retrieval Using Keyword and Table Queries

Tables on the Web contain a vast amount of knowledge in a structured form. To tap into this valuable resource, we address the problem of table retrieval: answering an information need with a ranked list of tables. We investigate this problem in two different variants, based on how the information need is expressed: as a keyword query or as an existing table (“query-by-table”). The main novel contribution of this work is a semantic table retrieval framework for matching information needs (keyword or table queries) against tables. Specifically, we (i) represent queries and tables in multiple semantic spaces (both discrete sparse and continuous dense vector representations) and (ii) introduce various similarity measures for matching those semantic representations. We consider all possible combinations of semantic representations and similarity measures and use these as features in a supervised learning model. Using two purpose-built test collections based on Wikipedia tables, we demonstrate significant and substantial improvements over state-of-the-art baselines.

Download Full-text

TIPS: Time-aware Personalised Semantic-based query auto-completion

Journal of Information Science ◽

10.1177/0165551520968690 ◽

2020 ◽

pp. 016555152096869

Author(s):

Saedeh Tahery ◽

Saeed Farzi

Keyword(s):

Search Engines ◽

Information Needs ◽

Semantic Information ◽

State Of The Art ◽

Language Model ◽

Experimental Studies ◽

Short Length ◽

Context Aware ◽

Ranked List ◽

Time Aware

With the rapid growth of the Internet, search engines play vital roles in meeting the users’ information needs. However, formulating information needs to simple queries for canonical users is a problem yet. Therefore, query auto-completion, which is one of the most important characteristics of the search engines, is leveraged to provide a ranked list of queries matching the user’s entered prefix. Although query auto-completion utilises useful information provided by search engine logs, time-, semantic- and context-aware features are still important resources of extra knowledge. Specifically, in this study, a hybrid query auto-completion system called TIPS ( Time-aware Personalised Semantic-based query auto-completion) is introduced to combine the well-known systems performing based on popularity and neural language model. Furthermore, this system is supplemented by time-aware features that blend both context and semantic information in a collaborative manner. Experimental studies on the standard AOL dataset are conducted to compare our proposed system with state-of-the-art methods, that is, FactorCell, ConcatCell and Unadapted. The results illustrate the significant superiorities of TIPS in terms of mean reciprocal rank (MRR), especially for short-length prefixes.

Download Full-text

Cheap IR evaluation

ACM SIGIR Forum ◽

10.1145/3483382.3483400 ◽

2020 ◽

Vol 54 (2) ◽

pp. 1-2

Author(s):

Kevin Roitero

Keyword(s):

Statistical Power ◽

Information Needs ◽

Web Search ◽

State Of The Art ◽

Extensive Study ◽

Test Collection ◽

Test Collections ◽

Fine Grained ◽

Retrieval Systems ◽

Ranked List

To evaluate Information Retrieval (IR) effectiveness, a possible approach is to use test collections, which are composed of a collection of documents, a set of description of information needs (called topics), and a set of relevant documents to each topic. Test collections are modelled in a competition scenario: for example, in the well known TREC initiative, participants run their own retrieval systems over a set of topics and they provide a ranked list of retrieved documents; some of the retrieved documents (usually the first ranked) constitute the so called pool, and their relevance is evaluated by human assessors; the document list is then used to compute effectiveness metrics and rank the participant systems. Private Web Search companies also run their in-house evaluation exercises; although the details are mostly unknown, and the aims are somehow different, the overall approach shares several issues with the test collection approach. The aim of this work is to: (i) develop and improve some state-of-the-art work on the evaluation of IR effectiveness while saving resources, and (ii) propose a novel, more principled and engineered, overall approach to test collection based effectiveness evaluation. In this thesis we focus on three main directions: the first part details the usage of few topics (i.e., information needs) in retrieval evaluation and shows an extensive study detailing the effect of using fewer topics for retrieval evaluation in terms of number of topics, topics subsets, and statistical power. The second part of this thesis discusses the evaluation without relevance judgements, reproducing, extending, and generalizing state-of-the-art methods and investigating their combinations by means of data fusion techniques and machine learning. Finally, the third part uses crowdsourcing to gather relevance labels, and in particular shows the effect of using fine grained judgement scales; furthermore, explores methods to transform judgements between different relevance scales. Awarded by: University of Udine, Udine, Italy on 19 March 2020. Supervised by: Professor Stefano Mizzaro. Available at: https://kevinroitero.com/resources/kr-phd-thesis.pdf.

Download Full-text

Knowledge Transfer for Entity Resolution with Siamese Neural Networks

Journal of Data and Information Quality ◽

10.1145/3410157 ◽

2021 ◽

Vol 13 (1) ◽

pp. 1-25

Author(s):

Michael Loster ◽

Ioannis Koumarelas ◽

Felix Naumann

Keyword(s):

Knowledge Transfer ◽

Similarity Measure ◽

State Of The Art ◽

Similarity Measures ◽

Engineering Process ◽

Domain Experts ◽

Multiple Datasets ◽

Multiple Data ◽

Domain Expertise ◽

F Measure

The integration of multiple data sources is a common problem in a large variety of applications. Traditionally, handcrafted similarity measures are used to discover, merge, and integrate multiple representations of the same entity—duplicates—into a large homogeneous collection of data. Often, these similarity measures do not cope well with the heterogeneity of the underlying dataset. In addition, domain experts are needed to manually design and configure such measures, which is both time-consuming and requires extensive domain expertise. We propose a deep Siamese neural network, capable of learning a similarity measure that is tailored to the characteristics of a particular dataset. With the properties of deep learning methods, we are able to eliminate the manual feature engineering process and thus considerably reduce the effort required for model construction. In addition, we show that it is possible to transfer knowledge acquired during the deduplication of one dataset to another, and thus significantly reduce the amount of data required to train a similarity measure. We evaluated our method on multiple datasets and compare our approach to state-of-the-art deduplication methods. Our approach outperforms competitors by up to +26 percent F-measure, depending on task and dataset. In addition, we show that knowledge transfer is not only feasible, but in our experiments led to an improvement in F-measure of up to +4.7 percent.

Download Full-text

An interactive query-based approach for summarizing scientific documents

Information Discovery and Delivery ◽

10.1108/idd-10-2020-0124 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Farnoush Bayatmakou ◽

Azadeh Mohebi ◽

Abbas Ahmadi

Keyword(s):

User Satisfaction ◽

Information Needs ◽

Specific Information ◽

Information Need ◽

Science Data ◽

Data Set ◽

Content Type ◽

Interactive Query ◽

Summarization System ◽

Clear Idea

Purpose Query-based summarization approaches might not be able to provide summaries compatible with the user’s information need, as they mostly rely on a limited source of information, usually represented as a single query by the user. This issue becomes even more challenging when dealing with scientific documents, as they contain more specific subject-related terms, while the user may not be able to express his/her specific information need in a query with limited terms. This study aims to propose an interactive multi-document text summarization approach that generates an eligible summary that is more compatible with the user’s information need. This approach allows the user to interactively specify the composition of a multi-document summary. Design/methodology/approach This approach exploits the user’s opinion in two stages. The initial query is refined by user-selected keywords/keyphrases and complete sentences extracted from the set of retrieved documents. It is followed by a novel method for sentence expansion using the genetic algorithm, and ranking the final set of sentences using the maximal marginal relevance method. Basically, for implementation, the Web of Science data set in the artificial intelligence (AI) category is considered. Findings The proposed approach receives feedback from the user in terms of favorable keywords and sentences. The feedback eventually improves the summary as the end. To assess the performance of the proposed system, this paper has asked 45 users who were graduate students in the field of AI to fill out a questionnaire. The quality of the final summary has been also evaluated from the user’s perspective and information redundancy. It has been investigated that the proposed approach leads to higher degrees of user satisfaction compared to the ones with no or only one step of the interaction. Originality/value The interactive summarization approach goes beyond the initial user’s query, while it includes the user’s preferred keywords/keyphrases and sentences through a systematic interaction. With respect to these interactions, the system gives the user a more clear idea of the information he/she is looking for and consequently adjusting the final result to the ultimate information need. Such interaction allows the summarization system to achieve a comprehensive understanding of the user’s information needs while expanding context-based knowledge and guiding the user toward his/her information journey.

Download Full-text

Information Needs of Women Regarding Health and Hygiene Practices

International Journal of TROPICAL DISEASE & Health ◽

10.9734/ijtdh/2020/v41i330257 ◽

2020 ◽

pp. 1-7

Author(s):

Loveleen Kaur ◽

Sukhjeet Kaur ◽

Preeti Sharma

Keyword(s):

Information Needs ◽

Low Cost ◽

Active Role ◽

Individual Growth ◽

Information Need ◽

Growth And Survival ◽

Hygiene Practices ◽

Rural And Urban ◽

Mass Media Exposure ◽

Communication Methods

Information is a source of power and is important for individual growth and survival. Information about health and hygiene is crucial because it influences an individual’s quality of life. As far as health and hygiene practices are concerned, women play an active role in getting information about these; hence there is a need to study their information needs regarding health and hygiene. After finding out the needs of women regarding these aspects, accordingly information can be made accessible to them. Keeping this into account, the present study was conducted in Ludhiana district of Punjab. Data was taken from 200 rural and urban women of 25-50 years, by the help of an interview schedule. Health and hygiene practices were studied under three categories as personal, food related and household health and hygiene practices. Information needs were studied on a three point continuum i.e. highly needed, somewhat needed and not needed. Results of the study showed that under personal health and hygiene practice, information on hair care and obesity was most needed. The most needed information regarding food related health and hygiene was on low cost nutritious recipes. The major information need reported by the respondents was related to control of insects and pests in case of household health and hygiene practices and majority of the respondents had low level of information need for all health and hygiene practices. Information needs of the women were positively related with their education and mass media exposure, whereas age was negatively correlated with the information needs of women. Consequently, there is a need to educate women regarding health and hygiene practices through effective communication methods, so that they can realize the importance and need for information on these topics.

Download Full-text

PENGARUH MEDIA SOSIAL TIKTOK TERHADAP KEBUTUHAN INFORMASI SEKS EDUKASI PADA GENERASI Z

JISIP (Jurnal Ilmu Sosial dan Pendidikan) ◽

10.36312/jisip.v6i1.2849 ◽

2022 ◽

Vol 6 (1) ◽

Author(s):

Ahmad Fahri Ramadhan ◽

Muhammad Ramdhani ◽

Wahyu Utamidewi

Keyword(s):

Sex Education ◽

Information Needs ◽

Media Content ◽

Message Content ◽

Information Need ◽

Literature Study ◽

Analysis Technique ◽

The World ◽

Effect Theory ◽

Data Collection Technique

Sex education is still a topic that is considered taboo in Indonesia, through Tiktok which is a popular application in the world in 2020, it is used as a medium to meet this information need by the @tabu.id account. The purpose of this study was to determine the effect of intensity, media content and attractiveness of using social media on the TikTok @tabu.id account on the fulfillment of sex education information needs. This study uses a quantitative approach with an explanatory survey. The theory used is the Uses Effect Theory. The data collection technique used is a questionnaire or questionnaire and literature study. While the data analysis technique will be collected using a Likert scale. The results of this study are the intensity, message content and attractiveness affect the need for information about sex education. While the magnitude of the influence of sex education information is 6.75%, the magnitude of the influence of infographic messages on sex education information is 33.26% and the magnitude of the influence of sex education information is 15.02%.

Download Full-text

Relevance-guided Supervision for OpenQA with ColBERT

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00405 ◽

2021 ◽

Vol 9 ◽

pp. 929-944

Author(s):

Omar Khattab ◽

Christopher Potts ◽

Matei Zaharia

Keyword(s):

Question Answering ◽

State Of The Art ◽

Training Data ◽

Coarse Grained ◽

Retrieval Model ◽

Open Domain ◽

Weak Supervision ◽

Fine Grained ◽

Vector Representations ◽

Large Corpus

Abstract Systems for Open-Domain Question Answering (OpenQA) generally depend on a retriever for finding candidate passages in a large corpus and a reader for extracting answers from those passages. In much recent work, the retriever is a learned component that uses coarse-grained vector representations of questions and passages. We argue that this modeling choice is insufficiently expressive for dealing with the complexity of natural language questions. To address this, we define ColBERT-QA, which adapts the scalable neural retrieval model ColBERT to OpenQA. ColBERT creates fine-grained interactions between questions and passages. We propose an efficient weak supervision strategy that iteratively uses ColBERT to create its own training data. This greatly improves OpenQA retrieval on Natural Questions, SQuAD, and TriviaQA, and the resulting system attains state-of-the-art extractive OpenQA performance on all three datasets.

Download Full-text

User-Centric Similarity and Proximity Measures for Spatial Personalization

International Journal of Data Warehousing and Mining ◽

10.4018/jdwm.2010040104 ◽

2010 ◽

Vol 6 (2) ◽

pp. 59-78 ◽

Cited By ~ 5

Author(s):

Yanwu Yang ◽

Christophe Claramunt ◽

Marie-Aude Aufaure ◽

Wensheng Zhang

Keyword(s):

Information Needs ◽

Spatial Information ◽

Personal Information ◽

Similarity Measures ◽

Spatial Proximity ◽

Mobile Environments ◽

Conceptual Approach ◽

Semantic Domain ◽

User Centric ◽

Proximity Measures

Spatial personalization can be defined as a novel way to fulfill user information needs when accessing spatial information services either on the web or in mobile environments. The research presented in this paper introduces a conceptual approach that models the spatial information offered to a given user into a user-centered conceptual map, and spatial proximity and similarity measures that considers her/his location, interests and preferences. This approach is based on the concepts of similarity in the semantic domain, and proximity in the spatial domain, but taking into account user’s personal information. Accordingly, these spatial proximity and similarity measures could directly support derivation of personalization services and refinement of the way spatial information is accessible to the user in spatially related applications. These modeling approaches are illustrated by some experimental case studies.

Download Full-text

Information Needs of Cancer Patients are Influenced by Time Since Diagnosis, Stage of Cancer, Patients’ Age, and Preferred Role in Treatment-related Decisions

Evidence Based Library and Information Practice ◽

10.18438/b8g01g ◽

2006 ◽

Vol 1 (3) ◽

pp. 80 ◽

Cited By ~ 1

Author(s):

John Loy

Keyword(s):

Cancer Patients ◽

Information Seeking ◽

Information Needs ◽

English Language ◽

Meta Analysis ◽

Active Role ◽

Post Treatment ◽

Inclusion Criteria ◽

Information Need ◽

Need For Information

A review of: Kalyani, Ankem. “Factors Influencing Information Needs Among Cancer Patients: A Meta-Analysis.” Library & Information Science Research; 28.1 (2006) 7-23. Objective – The author aims to study the aggregate influence of demographic and situational variables on the information needs of cancer patients, in order to inform the provision of information to those patients. Design – Meta-analysis. Setting – Research articles published in the MEDLINE and CINAHL databases. Subjects – English language studies published between 1993 and 2003. An initial search set of 196 studies from MEDLINE and 283 studies from CINAHL were identified. Following rigorous assessment, 12 studies met the inclusion criteria. Methods – A comprehensive search of the databases was conducted, initially combining “neoplasm” with “cancer patients” using the Boolean “or”. These results were then combined with five separate searches using the following terms; information need(s), information seeking, information seeking behaviour, information source(s) and information resource(s). This identified in total 479 English language articles. Based on a review of titles and abstracts, 110 articles were found covering information resources or the information needs of cancer patients. These articles were then subjected to the further inclusion criteria and limited to studies which included: analysis of information needs and/or information sources of cancer patients; adults as subjects of the research; and application of quantitative research methods and relevant statistics. This eliminated a further 35 papers. Twelve of the remaining 75 studies were selected for meta-analysis based on their use of the same variables measured consistently in comparable units. The final 12 studies included various forms of cancer, and no distinction was made among them. All 12 studies appeared in peer-reviewed journals. Main results – The meta-analysis found there was consistently no difference between the information needs of men and women. Five subsets were identified within the meta-analysis, and findings for each can be stated as follows: The younger the age of the patient, the greater their overall need for information was likely to be. During treatment, the time elapsed from the diagnosis to the information need was not significant. Once identified, the information need remained constant. During treatment and post-treatment phases, the time elapsed from the diagnosis to the information need made no significant difference, with the information need remaining constant and continuing into the post-treatment phase. The stage of cancer made no difference to the need for information. Those patients in the advanced stages of cancer required an equal amount of information to those in the early stages of cancer. The individual patient’s preferred role in treatment-related decisions made a difference to the information need. Patients who took an active role in treatment-related decisions had a greater need for information than those who did not take an active role. Conclusion – Findings from this meta-analysis can be used to guide information provision to cancer patients, specifically taking patient age and preferred role in treatment decision-making into consideration. Further research into the reasons behind the lower information needs among older patients is called for by the author.

Download Full-text

Combining Axiom Injection and Knowledge Base Completion for Efficient Natural Language Inference

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33017410 ◽

2019 ◽

Vol 33 ◽

pp. 7410-7417 ◽

Cited By ~ 1

Author(s):

Masashi Yoshikawa ◽

Koji Mineshima ◽

Hiroshi Noji ◽

Daisuke Bekki

Keyword(s):

Natural Language ◽

Knowledge Base ◽

Processing Speed ◽

Processing Time ◽

State Of The Art ◽

Proof Automation ◽

New Knowledge ◽

Textual Entailment ◽

Amount Of Knowledge ◽

Recognizing Textual Entailment

In logic-based approaches to reasoning tasks such as Recognizing Textual Entailment (RTE), it is important for a system to have a large amount of knowledge data. However, there is a tradeoff between adding more knowledge data for improved RTE performance and maintaining an efficient RTE system, as such a big database is problematic in terms of the memory usage and computational complexity. In this work, we show the processing time of a state-of-the-art logic-based RTE system can be significantly reduced by replacing its search-based axiom injection (abduction) mechanism by that based on Knowledge Base Completion (KBC). We integrate this mechanism in a Coq plugin that provides a proof automation tactic for natural language inference. Additionally, we show empirically that adding new knowledge data contributes to better RTE performance while not harming the processing speed in this framework.

Download Full-text