An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations

2010 ◽  
Vol 61 (9) ◽  
pp. 1853-1870 ◽  
Author(s):  
Ricardo G. Cota ◽  
Anderson A. Ferreira ◽  
Cristiano Nascimento ◽  
Marcos André Gonçalves ◽  
Alberto H. F. Laender
2013 ◽  
Vol 32 (9) ◽  
pp. 2488-2490
Author(s):  
Xin-xin YANG ◽  
Pei-feng LI ◽  
Qiao-ming ZHU

2021 ◽  
Vol 210 ◽  
pp. 104253
Author(s):  
José F.Q. Pereira ◽  
Maria Fernanda Pimentel ◽  
Ricardo S. Honorato ◽  
Rasmus Bro

Author(s):  
Reinald Kim Amplayo ◽  
Seung-won Hwang ◽  
Min Song

Word sense induction (WSI), or the task of automatically discovering multiple senses or meanings of a word, has three main challenges: domain adaptability, novel sense detection, and sense granularity flexibility. While current latent variable models are known to solve the first two challenges, they are not flexible to different word sense granularities, which differ very much among words, from aardvark with one sense, to play with over 50 senses. Current models either require hyperparameter tuning or nonparametric induction of the number of senses, which we find both to be ineffective. Thus, we aim to eliminate these requirements and solve the sense granularity problem by proposing AutoSense, a latent variable model based on two observations: (1) senses are represented as a distribution over topics, and (2) senses generate pairings between the target word and its neighboring word. These observations alleviate the problem by (a) throwing garbage senses and (b) additionally inducing fine-grained word senses. Results show great improvements over the stateof-the-art models on popular WSI datasets. We also show that AutoSense is able to learn the appropriate sense granularity of a word. Finally, we apply AutoSense to the unsupervised author name disambiguation task where the sense granularity problem is more evident and show that AutoSense is evidently better than competing models. We share our data and code here: https://github.com/rktamplayo/AutoSense.


Author(s):  
Bo Chen ◽  
Jing Zhang ◽  
Jie Tang ◽  
Lingfan Cai ◽  
Zhaoyu Wang ◽  
...  
Keyword(s):  

2021 ◽  
pp. 016555152110181
Author(s):  
Jinseok Kim ◽  
Jenna Kim ◽  
Jinmo Kim

Chinese author names are known to be more difficult to disambiguate than other ethnic names because they tend to share surnames and forenames, thus creating many homonyms. In this study, we demonstrate how using Chinese characters can affect machine learning for author name disambiguation. For analysis, 15K author names recorded in Chinese are transliterated into English and simplified by initialising their forenames to create counterfactual scenarios, reflecting real-world indexing practices in which Chinese characters are usually unavailable. The results show that Chinese author names that are highly ambiguous in English or with initialised forenames tend to become less confusing if their Chinese characters are included in the processing. Our findings indicate that recording Chinese author names in native script can help researchers and digital libraries enhance authority control of Chinese author names that continue to increase in size in bibliographic data.


2021 ◽  
Vol 80 (8) ◽  
Author(s):  
Wangsheng Pan ◽  
Liangtong Fu ◽  
Hanli Xiao ◽  
Xiulian Yu ◽  
Xin Li ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document