A Relation Pattern-Driven Probability Model for Related Entity Retrieval

As the Web is becoming the largest knowledge repository which contains various entities and their relations, the task of related entity retrieval excites interest in the field of information retrieval. This challenging task is introduced in TREC 2009 Entity Track. In this task, given an entity and the type of the target entity, a retrieval system is required to return a ranked list of related entities extracted from a given large corpus. It means that entity ranking goes beyond entity relevance and integrates the judgment of relation into the evaluation of the retrieved entities. This paper proposes a probability model using relation patterns to address the task of related entity retrieval. This model takes into account both relevance and relation between entities. The authors focus on using relation patterns to measure the level of relations matching between entities, and then to estimate the probability of occurrence of relation between two entities. In addition, the authors represent entity by its context language model and measure the relevance between two entities by a language model. Experimental results on TREC Entity Track dataset show that the proposed model significantly improves retrieval performances over baseline. The comparison with other approaches also reveals the effectiveness of the model.

Download Full-text

SIREn: Entity Retrieval System for the Web of Data

10.14236/ewic/fdia2009.6 ◽

2009 ◽

Cited By ~ 2

Author(s):

Renaud Delbru

Keyword(s):

Retrieval System ◽

Entity Retrieval ◽

Web Of Data ◽

The Web

Download Full-text

Improving Document Retrieval with Spelling Correction for Weak and Fabricated Indonesian-Translated Hadith

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i3.1913 ◽

2020 ◽

Vol 4 (3) ◽

pp. 551-557

Author(s):

Muhammad zaky ramadhan ◽

Kemas Muslim Lhaksmana

Keyword(s):

Retrieval System ◽

Islamic Law ◽

Vector Space Model ◽

Document Retrieval ◽

The Internet ◽

Average Precision ◽

Spelling Correction ◽

Space Model ◽

The Mean ◽

The Web

Hadith has several levels of authenticity, among which are weak (dhaif), and fabricated (maudhu) hadith that may not originate from the prophet Muhammad PBUH, and thus should not be considered in concluding an Islamic law (sharia). However, many such hadiths have been commonly confused as authentic hadiths among ordinary Muslims. To easily distinguish such hadiths, this paper proposes a method to check the authenticity of a hadith by comparing them with a collection of fabricated hadiths in Indonesian. The proposed method applies the vector space model and also performs spelling correction using symspell to check whether the use of spelling check can improve the accuracy of hadith retrieval, because it has never been done in previous works and typos are common on Indonesian-translated hadiths on the Web and social media raw text. The experiment result shows that the use of spell checking improves the mean average precision and recall to become 81% (from 73%) and 89% (from 80%), respectively. Therefore, the improvement in accuracy by implementing spelling correction make the hadith retrieval system more feasible and encouraged to be implemented in future works because it can correct typos that are common in the raw text on the Internet.

Download Full-text

Hierarchical Concept-Driven Language Model

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3451167 ◽

2021 ◽

Vol 15 (6) ◽

pp. 1-22

Author(s):

Yashen Wang ◽

Huanhuan Zhang ◽

Zhirun Liu ◽

Qiang Zhou

Keyword(s):

Language Model ◽

Generation Process ◽

Data Generation ◽

Modeling Framework ◽

Long Distance ◽

Short Text ◽

Proposed Model ◽

Scalable Inference ◽

End To End ◽

Hidden Layer

For guiding natural language generation, many semantic-driven methods have been proposed. While clearly improving the performance of the end-to-end training task, these existing semantic-driven methods still have clear limitations: for example, (i) they only utilize shallow semantic signals (e.g., from topic models) with only a single stochastic hidden layer in their data generation process, which suffer easily from noise (especially adapted for short-text etc.) and lack of interpretation; (ii) they ignore the sentence order and document context, as they treat each document as a bag of sentences, and fail to capture the long-distance dependencies and global semantic meaning of a document. To overcome these problems, we propose a novel semantic-driven language modeling framework, which is a method to learn a Hierarchical Language Model and a Recurrent Conceptualization-enhanced Gamma Belief Network, simultaneously. For scalable inference, we develop the auto-encoding Variational Recurrent Inference, allowing efficient end-to-end training and simultaneously capturing global semantics from a text corpus. Especially, this article introduces concept information derived from high-quality lexical knowledge graph Probase, which leverages strong interpretability and anti-nose capability for the proposed model. Moreover, the proposed model captures not only intra-sentence word dependencies, but also temporal transitions between sentences and inter-sentence concept dependence. Experiments conducted on several NLP tasks validate the superiority of the proposed approach, which could effectively infer meaningful hierarchical concept structure of document and hierarchical multi-scale structures of sequences, even compared with latest state-of-the-art Transformer-based models.

Download Full-text

Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of Data

Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR '15 ◽

10.1145/2766462.2767756 ◽

2015 ◽

Cited By ~ 37

Author(s):

Nikita Zhiltsov ◽

Alexander Kotov ◽

Fedor Nikolaev

Keyword(s):

Ad Hoc ◽

Entity Retrieval ◽

Web Of Data ◽

The Web

Download Full-text

Review of Andersen (2012): Exploring Newspaper Language: Using the web to create and investigate a large corpus of modern Norwegian

Terminology ◽

10.1075/term.19.1.07ber ◽

2013 ◽

Vol 19 (1) ◽

pp. 143-148

Author(s):

Gabriel Bernier-Colborne

Keyword(s):

Large Corpus ◽

The Web

Download Full-text

Development of parser agents for bibliographic data retrieval

Bhartiya Krishi Anusandhan Patrika ◽

10.18805/bkap141 ◽

2018 ◽

Vol 33 (4) ◽

Author(s):

Murari Kumar ◽

Samir Farooqi ◽

K. K. Chaturvedi ◽

Chandan Kumar Deb ◽

Pankaj Das

Keyword(s):

Retrieval System ◽

Data Retrieval ◽

Web Pages ◽

Bibliographic Database ◽

Bibliographic Information ◽

Open Access Journals ◽

Bibliographic Records ◽

Bibliographic Data ◽

Structured Information ◽

The Web

Bibliographic data contains necessary information about literature to help users to recognize and retrieve that resource. These data are used quantitatively by a “Bibliometrician” for analysis and dissemination purpose but with the increasing rate of literature publication in open access journals such as Nucleic Acids Research (NAR), Springer, Oxford Journals etc., it has become difficult to retrieve structured bibliographic information in desired format. A digital bibliographic database contains necessary and structured information about published literature. Bibliographic records of different articles are scattered and resides on different web pages. This thesis presents the retrieval system for bibliographic data of NAR at a single place. For this purpose, parser agents have been developed which access the web pages of NAR and parse the scattered bibliographic data and finally store it into a local bibliographic database. Based on the bibliographic database, “three-tier architecture” has been utilized to display the bibliographic information in systematized format. Using this system, it would be possible to build the network between different authors and affiliations and also other analytical reports can be generated.

Download Full-text

Analytical failure probability model for generic gravity dam classes

Proceedings of the Institution of Mechanical Engineers Part O Journal of Risk and Reliability ◽

10.1177/1748006x17712663 ◽

2017 ◽

Vol 231 (5) ◽

pp. 546-557 ◽

Cited By ~ 2

Author(s):

Mohammad Amin Hariri-Ardebili

Keyword(s):

Failure Probability ◽

Safety Assessment ◽

Statistical Approach ◽

Probability Model ◽

Gravity Dam ◽

Preliminary Design ◽

Failure Model ◽

Proposed Model ◽

Dimensional Gravity ◽

Demand And Capacity

Risk analysis of concrete dams and quantification of the failure probability are important tasks in dam safety assessment. The conditional probability of demand and capacity is usually estimated by numerical simulation and Monte Carlo technique. However, the estimated failure probability (or the reliability index) is dam-dependent which makes its application limited to some case studies. This article proposes an analytical failure model for generic gravity dam classes which is optimized based on large number of nonlinear finite element analyses. A hybrid parametric–probabilistic–statistical approach is used to estimate the failure probability as a function of dam size, material distributional models and external hydrological hazard. The proposed model can be used for preliminary design and evaluation of two-dimensional gravity dam models.

Download Full-text

Web Text Mining

The Oxford Handbook of Computational Linguistics 2nd edition ◽

10.1093/oxfordhb/9780199573691.013.27 ◽

2016 ◽

Author(s):

Ricardo Baeza-Yates ◽

Roi Blanco ◽

Malú Castellanos

Keyword(s):

Social Media ◽

Text Mining ◽

Sentiment Analysis ◽

Web Search ◽

Internet Users ◽

Entity Retrieval ◽

Web Text Mining ◽

Text Content ◽

The Web

Web search has become a ubiquitous commodity for Internet users. This fact puts a large number of documents with plenty of text content at our fingertips. To make good use of this data, we need to mine web text. This triggers the two problems covered here: sentiment analysis and entity retrieval in the context of the Web. The first problem answers the question of what people think about a given product or a topic, in particular sentiment analysis in social media. The second problem addresses the issue of solving certain enquiries precisely by returning a particular object: for instance, where the next concert of my favourite band will be or who the best cooks are in a particular region. Where to find these objects and how to retrieve, rank, and display them are tasks related to the entity retrieval problem.

Download Full-text

WikiServe: Using Wikipedia to Match IoT based Services for Situation Response

10.20944/preprints202108.0163.v2 ◽

2021 ◽

Author(s):

Sazid Zaman Khan ◽

Alan Colman ◽

Iqbal H. Sarker

Keyword(s):

Internet Of Things ◽

Natural Disaster ◽

Smart Devices ◽

Baseline Method ◽

Service Matching ◽

Ontological Model ◽

Ranked List ◽

F Measure ◽

The Web ◽

Better Than

A large number of smart devices (things) are being deployed with the swift development of Internet of Things (IOT). These devices, owned by different organizations, have a wide variety of services to offer over the web. During a natural disaster or emergency (i.e., a situation), for example, relevant IOT services can be found and put to use. However, appropriate service matching methods are required to find the relevant services. Organizations that manage situation responses and organizations that provide IOT services are likely to be independent of each other, and therefore it is difficult for them to adopt a common ontological model to facilitate the service matching. Moreover, there exists a large conceptual gap between the domain of discourse for situations and the domain of discourse for services, which cannot be adequately bridged by existing techniques. In this paper, we address these issues and propose a new method, WikiServe, to identify IOT services that are functionally relevant to a given situation. Using concepts (terms) from situation and service descriptions, WikiServe employs Wikipedia as a knowledge source to bridge the conceptual gap between situation and service descriptions and match functionally relevant IOT services for a situation. It uses situation terms to retrieve situation related articles from Wikipedia. Then it creates a ranked list of services for the situation using the weighted occurrences of service terms in weighted situation articles. WikiServe performs better than a commonly used baseline method in terms of Precision, Recall and F measure for service matching.

Download Full-text

Automatic Title Generation for Learning Resources and Pathways with Pre-trained Transformer Models

International Journal of Semantic Computing ◽

10.1142/s1793351x21400134 ◽

2021 ◽

Vol 15 (04) ◽

pp. 487-510

Author(s):

Prakhar Mishra ◽

Chaitali Diwan ◽

Srinath Srinivasa ◽

G. Srinivasaraghavan

Keyword(s):

Online Learning ◽

Language Model ◽

Research Paper ◽

Learning Resources ◽

Research Papers ◽

Medium Length ◽

Proposed Model ◽

Learning Pathways ◽

Test Sets ◽

Transformer Model

To create curiosity and interest for a topic in online learning is a challenging task. A good preview that outlines the contents of a learning pathway could help learners know the topic and get interested in it. Towards this end, we propose a hierarchical title generation approach to generate semantically relevant titles for the learning resources in a learning pathway and a title for the pathway itself. Our approach to Automatic Title Generation for a given text is based on pre-trained Transformer Language Model GPT-2. A pool of candidate titles are generated and an appropriate title is selected among them which is then refined or de-noised to get the final title. The model is trained on research paper abstracts from arXiv and evaluated on three different test sets. We show that it generates semantically and syntactically relevant titles as reflected in ROUGE, BLEU scores and human evaluations. We propose an optional abstractive Summarizer module based on pre-trained Transformer model T5 to shorten medium length documents. This module is also trained and evaluated on research papers from arXiv dataset. Finally, we show that the proposed model of hierarchical title generation for learning pathways has promising results.

Download Full-text