Enhancing collective entity resolution utilizing Quasi-Clique similarity measure

Author(s):  
Zhang Yongxin ◽  
Li Qingzhong ◽  
Bian Ji
2021 ◽  
pp. 127-140
Author(s):  
Xinming Li ◽  
John R. Talburt ◽  
Ting Li ◽  
Xiangwen Liu

Errors with names occur frequently. “California” and “CA” refer to the same state of the USA; however, they may both appear as records in a database at the same time. Several techniques need to be proposed to solve these problems. In this chapter, the authors introduce the methods of entity resolution on names. They propose three methods. Similarity measure between names is a kind of fundamental techniques; it makes a significant contribution to the textual similarity. The method of string transformations can handle some situations beyond textual similarity. Recently, learning algorithms on string transformations have been proposed to make matching robust to such variations. Examples illustrate the benefits of each approach.


Author(s):  
Mohana Priya K ◽  
Pooja Ragavi S ◽  
Krishna Priya G

Clustering is the process of grouping objects into subsets that have meaning in the context of a particular problem. It does not rely on predefined classes. It is referred to as an unsupervised learning method because no information is provided about the "right answer" for any of the objects. Many clustering algorithms have been proposed and are used based on different applications. Sentence clustering is one of best clustering technique. Hierarchical Clustering Algorithm is applied for multiple levels for accuracy. For tagging purpose POS tagger, porter stemmer is used. WordNet dictionary is utilized for determining the similarity by invoking the Jiang Conrath and Cosine similarity measure. Grouping is performed with respect to the highest similarity measure value with a mean threshold. This paper incorporates many parameters for finding similarity between words. In order to identify the disambiguated words, the sense identification is performed for the adjectives and comparison is performed. semcor and machine learning datasets are employed. On comparing with previous results for WSD, our work has improvised a lot which gives a percentage of 91.2%


Informatica ◽  
2018 ◽  
Vol 29 (3) ◽  
pp. 399-420
Author(s):  
Alessia Amelio ◽  
Darko Brodić ◽  
Radmila Janković

2012 ◽  
Vol 38 (2) ◽  
pp. 229-235 ◽  
Author(s):  
Wen-Qing LI ◽  
Xin SUN ◽  
Chang-You ZHANG ◽  
Ye FENG

2011 ◽  
Vol 34 (11) ◽  
pp. 2131-2141 ◽  
Author(s):  
Ya-Kun LI ◽  
Hong-Zhi WANG ◽  
Hong GAO ◽  
Jian-Zhong LI
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document