Approximate entity extraction in temporal databases

2011 ◽  
Vol 14 (2) ◽  
pp. 157-186 ◽  
Author(s):  
Wei Lu ◽  
Gabriel Pui Cheong Fung ◽  
Xiaoyong Du ◽  
Xiaofang Zhou ◽  
Lijiang Chen ◽  
...  
2018 ◽  
Vol 110 (1) ◽  
pp. 85-101 ◽  
Author(s):  
Ronald Cardenas ◽  
Kevin Bello ◽  
Alberto Coronado ◽  
Elizabeth Villota

Abstract Managing large collections of documents is an important problem for many areas of science, industry, and culture. Probabilistic topic modeling offers a promising solution. Topic modeling is an unsupervised machine learning method and the evaluation of this model is an interesting problem on its own. Topic interpretability measures have been developed in recent years as a more natural option for topic quality evaluation, emulating human perception of coherence with word sets correlation scores. In this paper, we show experimental evidence of the improvement of topic coherence score by restricting the training corpus to that of relevant information in the document obtained by Entity Recognition. We experiment with job advertisement data and find that with this approach topic models improve interpretability in about 40 percentage points on average. Our analysis reveals as well that using the extracted text chunks, some redundant topics are joined while others are split into more skill-specific topics. Fine-grained topics observed in models using the whole text are preserved.


IEEE Access ◽  
2021 ◽  
Vol 9 ◽  
pp. 40216-40226
Author(s):  
Zhe Kong ◽  
Changxi Yue ◽  
Ying Shi ◽  
Jicheng Yu ◽  
Changjun Xie ◽  
...  

2021 ◽  
Author(s):  
Danila Piatov ◽  
Sven Helmer ◽  
Anton Dignös ◽  
Fabio Persia

AbstractWe develop a family of efficient plane-sweeping interval join algorithms for evaluating a wide range of interval predicates such as Allen’s relationships and parameterized relationships. Our technique is based on a framework, components of which can be flexibly combined in different manners to support the required interval relation. In temporal databases, our algorithms can exploit a well-known and flexible access method, the Timeline Index, thus expanding the set of operations it supports even further. Additionally, employing a compact data structure, the gapless hash map, we utilize the CPU cache efficiently. In an experimental evaluation, we show that our approach is several times faster and scales better than state-of-the-art techniques, while being much better suited for real-time event processing.


2004 ◽  
Vol 16 (2) ◽  
pp. 123-163 ◽  
Author(s):  
Dengfeng Gao ◽  
Jose Alvin G. Gendrano ◽  
Bongki Moon ◽  
Richard T. Snodgrass ◽  
Minseok Park ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document