Hypergraph-of-entity

AbstractModern search is heavily powered by knowledge bases, but users still query using keywords or natural language. As search becomes increasingly dependent on the integration of text and knowledge, novel approaches for a unified representation of combined data present the opportunity to unlock new ranking strategies. We have previously proposed the graph-of-entity as a purely graph-based representation and retrieval model, however this model would scale poorly. We tackle the scalability issue by adapting the model so that it can be represented as a hypergraph. This enables a significant reduction of the number of (hyper)edges, in regard to the number of nodes, while nearly capturing the same amount of information. Moreover, such a higher-order data structure, presents the ability to capture richer types of relations, including nary connections such as synonymy, or subsumption. We present the hypergraph-of-entity as the next step in the graph-of-entity model, where we explore a ranking approach based on biased random walks. We evaluate the approaches using a subset of the INEX 2009 Wikipedia Collection. While performance is still below the state of the art, we were, in part, able to achieve a MAP score similar to TF-IDF and greatly improve indexing efficiency over the graph-of-entity.

Download Full-text

Drop Redundant, Shrink Irrelevant: Selective Knowledge Injection for Language Pretraining

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/552 ◽

2021 ◽

Author(s):

Ningyu Zhang ◽

Shumin Deng ◽

Xu Cheng ◽

Xi Chen ◽

Yichi Zhang ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Prior Knowledge ◽

Language Processing ◽

Empirical Investigation ◽

State Of The Art ◽

Knowledge Bases ◽

Experimental Results ◽

Benchmark Datasets ◽

Injection Methods

Previous research has demonstrated the power of leveraging prior knowledge to improve the performance of deep models in natural language processing. However, traditional methods neglect the fact that redundant and irrelevant knowledge exists in external knowledge bases. In this study, we launched an in-depth empirical investigation into downstream tasks and found that knowledge-enhanced approaches do not always exhibit satisfactory improvements. To this end, we investigate the fundamental reasons for ineffective knowledge infusion and present selective injection for language pretraining, which constitutes a model-agnostic method and is readily pluggable into previous approaches. Experimental results on benchmark datasets demonstrate that our approach can enhance state-of-the-art knowledge injection methods.

Download Full-text

RWNE: A Scalable Random-Walk based Network Embedding Framework with Personalized Higher-order Proximity Preserved

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.12567 ◽

2021 ◽

Vol 71 ◽

pp. 237-263

Author(s):

Jianxin Li ◽

Cheng Ji ◽

Hao Peng ◽

Yu He ◽

Yangqiu Song ◽

...

Keyword(s):

Random Walk ◽

Random Walks ◽

Real World ◽

State Of The Art ◽

Higher Order ◽

The State ◽

Random Walk With Restart ◽

Network Embedding ◽

Network Proximity ◽

Embedding Methods

Higher-order proximity preserved network embedding has attracted increasing attention. In particular, due to the superior scalability, random-walk-based network embedding has also been well developed, which could efficiently explore higher-order neighborhoods via multi-hop random walks. However, despite the success of current random-walk-based methods, most of them are usually not expressive enough to preserve the personalized higher-order proximity and lack a straightforward objective to theoretically articulate what and how network proximity is preserved. In this paper, to address the above issues, we present a general scalable random-walk-based network embedding framework, in which random walk is explicitly incorporated into a sound objective designed theoretically to preserve arbitrary higher-order proximity. Further, we introduce the random walk with restart process into the framework to naturally and effectively achieve personalized-weighted preservation of proximities of different orders. We conduct extensive experiments on several real-world networks and demonstrate that our proposed method consistently and substantially outperforms the state-of-the-art network embedding methods.

Download Full-text

Improving Natural Language Inference Using External Knowledge in the Science Questions Domain

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33017208 ◽

2019 ◽

Vol 33 ◽

pp. 7208-7215 ◽

Cited By ~ 3

Author(s):

Xiaoyan Wang ◽

Pavan Kapanipathi ◽

Ryan Musa ◽

Mo Yu ◽

Kartik Talamadupula ◽

...

Keyword(s):

Natural Language ◽

Language Processing ◽

Large Scale ◽

Question Answering ◽

State Of The Art ◽

Knowledge Bases ◽

Improve Performance ◽

External Knowledge ◽

Structured Knowledge ◽

Significant Attention

Natural Language Inference (NLI) is fundamental to many Natural Language Processing (NLP) applications including semantic search and question answering. The NLI problem has gained significant attention due to the release of large scale, challenging datasets. Present approaches to the problem largely focus on learning-based methods that use only textual information in order to classify whether a given premise entails, contradicts, or is neutral with respect to a given hypothesis. Surprisingly, the use of methods based on structured knowledge – a central topic in artificial intelligence – has not received much attention vis-a-vis the NLI problem. While there are many open knowledge bases that contain various types of reasoning information, their use for NLI has not been well explored. To address this, we present a combination of techniques that harness external knowledge to improve performance on the NLI problem in the science questions domain. We present the results of applying our techniques on text, graph, and text-and-graph based models; and discuss the implications of using external knowledge to solve the NLI problem. Our model achieves close to state-of-the-art performance for NLI on the SciTail science questions dataset.

Download Full-text

The Distinction between Linguistic and Conceptual Semantics in Medical Terminology and its Implication for NLP-Based Knowledge Acquisition

Methods of Information in Medicine ◽

10.1055/s-0038-1634568 ◽

1998 ◽

Vol 37 (04/05) ◽

pp. 327-333 ◽

Cited By ~ 3

Author(s):

F. Buekens ◽

G. De Moor ◽

A. Waagmeester ◽

W. Ceusters

Keyword(s):

Natural Language ◽

Knowledge Acquisition ◽

Natural Language Understanding ◽

Knowledge Bases ◽

Linguistic Knowledge ◽

Medical Terminology ◽

Language Understanding ◽

Conceptual Semantics

AbstractNatural language understanding systems have to exploit various kinds of knowledge in order to represent the meaning behind texts. Getting this knowledge in place is often such a huge enterprise that it is tempting to look for systems that can discover such knowledge automatically. We describe how the distinction between conceptual and linguistic semantics may assist in reaching this objective, provided that distinguishing between them is not done too rigorously. We present several examples to support this view and argue that in a multilingual environment, linguistic ontologies should be designed as interfaces between domain conceptualizations and linguistic knowledge bases.

Download Full-text

Report on the 4th Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries at SIGIR 2019

ACM SIGIR Forum ◽

10.1145/3458553.3458554 ◽

2019 ◽

Vol 53 (2) ◽

pp. 3-10

Author(s):

Muthu Kumar Chandrasekaran ◽

Philipp Mayr

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Research And Development ◽

Language Processing ◽

Digital Libraries ◽

State Of The Art ◽

Shared Task ◽

Processing Information ◽

Joint Workshop

The 4 th joint BIRNDL workshop was held at the 42nd ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019) in Paris, France. BIRNDL 2019 intended to stimulate IR researchers and digital library professionals to elaborate on new approaches in natural language processing, information retrieval, scientometrics, and recommendation techniques that can advance the state-of-the-art in scholarly document understanding, analysis, and retrieval at scale. The workshop incorporated different paper sessions and the 5 th edition of the CL-SciSumm Shared Task.

Download Full-text

Improved analysis of higher order random walks and applications

Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing ◽

10.1145/3357713.3384317 ◽

2020 ◽

Author(s):

Vedat Levi Alev ◽

Lap Chi Lau

Keyword(s):

Random Walks ◽

Higher Order

Download Full-text

Suffix array for multi-pattern matching with variable length wildcards

Intelligent Data Analysis ◽

10.3233/ida-205087 ◽

2021 ◽

Vol 25 (2) ◽

pp. 283-303

Author(s):

Na Liu ◽

Fei Xie ◽

Xindong Wu

Keyword(s):

Dynamic Programming ◽

Data Structure ◽

Pattern Matching ◽

Edit Distance ◽

State Of The Art ◽

Suffix Array ◽

Variable Length ◽

Distance Method ◽

Efficient Data ◽

Comparison Algorithms

Approximate multi-pattern matching is an important issue that is widely and frequently utilized, when the pattern contains variable-length wildcards. In this paper, two suffix array-based algorithms have been proposed to solve this problem. Suffix array is an efficient data structure for exact string matching in existing studies, as well as for approximate pattern matching and multi-pattern matching. An algorithm called MMSA-S is for the short exact characters in a pattern by dynamic programming, while another algorithm called MMSA-L deals with the long exact characters by the edit distance method. Experimental results of Pizza & Chili corpus demonstrate that these two newly proposed algorithms, in most cases, are more time-efficient than the state-of-the-art comparison algorithms.

Download Full-text

Cache-efficient sweeping-based interval joins for extended Allen relation predicates

The VLDB Journal ◽

10.1007/s00778-020-00650-5 ◽

2021 ◽

Author(s):

Danila Piatov ◽

Sven Helmer ◽

Anton Dignös ◽

Fabio Persia

Keyword(s):

Data Structure ◽

Experimental Evaluation ◽

State Of The Art ◽

Temporal Databases ◽

Access Method ◽

Wide Range ◽

Interval Relation ◽

Cache Efficient ◽

Join Algorithms ◽

Better Than

AbstractWe develop a family of efficient plane-sweeping interval join algorithms for evaluating a wide range of interval predicates such as Allen’s relationships and parameterized relationships. Our technique is based on a framework, components of which can be flexibly combined in different manners to support the required interval relation. In temporal databases, our algorithms can exploit a well-known and flexible access method, the Timeline Index, thus expanding the set of operations it supports even further. Additionally, employing a compact data structure, the gapless hash map, we utilize the CPU cache efficiently. In an experimental evaluation, we show that our approach is several times faster and scales better than state-of-the-art techniques, while being much better suited for real-time event processing.

Download Full-text

Large-scale Semantic Parsing without Question-Answer Pairs

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00190 ◽

2014 ◽

Vol 2 ◽

pp. 377-392 ◽

Cited By ~ 40

Author(s):

Siva Reddy ◽

Mirella Lapata ◽

Mark Steedman

Keyword(s):

Natural Language ◽

Large Scale ◽

Graph Matching ◽

State Of The Art ◽

The State ◽

Semantic Parsing ◽

Matching Problem ◽

Weak Supervision ◽

Benchmark Datasets

In this paper we introduce a novel semantic parsing approach to query Freebase in natural language without requiring manual annotations or question-answer pairs. Our key insight is to represent natural language via semantic graphs whose topology shares many commonalities with Freebase. Given this representation, we conceptualize semantic parsing as a graph matching problem. Our model converts sentences to semantic graphs using CCG and subsequently grounds them to Freebase guided by denotations as a form of weak supervision. Evaluation experiments on a subset of the Free917 and WebQuestions benchmark datasets show our semantic parser improves over the state of the art.

Download Full-text

The Riddle of Vagueness

10.1093/oso/9780199277339.001.0001 ◽

2021 ◽

Author(s):

Crispin Wright

Keyword(s):

Natural Language ◽

Borderline Case ◽

Higher Order ◽

Cutting Edge ◽

Sorites Paradox ◽

Systematic Research ◽

Pure Mathematics ◽

Key Issues ◽

Critical Overview ◽

Original Treatment

This anthology includes fourteen of Crispin Wrights’s highly influential essays on the phenomenon of vagueness in natural language, collectively representing almost half a century of cutting-edge systematic research. Key issues addressed include whether or under what assumptions vague expressions’ apparent tolerance of marginal changes in things to which they apply indicates that they are governed by inconsistent semantic rules, the varieties of Sorites paradox and the roots of the plausibility of their respective major premises, what it is for something to be a borderline case of a vague expression, whether vagueness should be viewed as fundamentally a semantic or an epistemic phenomenon, whether there is ‘higher-order’ vagueness, and what should be the appropriate logic for vague statements. The essays reprinted here jointly document the development of a distinctively original treatment of the philosophy and logic of vagueness, broadly analogous to the intuitionistic philosophy and logic for pure mathematics. Richard Kimberly Heck contributes an extended introductory essay, providing both an insightful critical overview of the development of the distinctive elements of Wright’s thought about vagueness, and indeed an invaluable advanced introduction to the topic.

Download Full-text