name disambiguation Latest Research Papers

Abstract Several scholarly knowledge graphs have been proposed to model and analyze the academic landscape. However, although the number of data sets has increased remarkably in recent years, these knowledge graphs do not primarily focus on data sets but rather associated entities such as publications. Moreover, publicly available data set knowledge graphs do not systematically contain links to the publications in which the data sets are mentioned. In this paper, we present an approach for constructing an RDF knowledge graph that fulfills these mentioned criteria. Our data set knowledge graph, DSKG, is publicly available at http://dskg.org and contains metadata of data sets for all scientific disciplines. To ensure high data quality of the DSKG, we first identify suitable raw data set collections for creating the DSKG. We then establish links between the data sets and publications modeled in the Microsoft Academic Knowledge Graph that mention these data sets. As the author names of data sets can be ambiguous, we develop and evaluate a method for author name disambiguation and enrich the knowledge graph with links to ORCID. Overall, our knowledge graph contains more than 2,000 data sets with associated properties, as well as 814,000 links to 635,000 scientific publications. It can be used for a variety of scenarios, facilitating advanced data set search systems and new ways of measuring and awarding the provisioning of data sets.

Download Full-text

Applying Data Augmentation for Disambiguating Author Names

10.5753/sbbd.2021.17870 ◽

2021 ◽

Author(s):

Luciano V. B. Espiridião ◽

Laura L. Dias ◽

Anderson A. Ferreira

Keyword(s):

Machine Learning ◽

Digital Library ◽

Information Quality ◽

Data Augmentation ◽

Experimental Results ◽

Machine Learning Techniques ◽

Name Disambiguation ◽

Author Name Disambiguation ◽

Learning Techniques ◽

The Many

Author name ambiguity is one of the most challenging issues that can compromise the information quality in a scholarly digital library. For years, researchers have been searched for solutions to solve such a problem. Despite the many methods already proposed, the question remains open. In this study, we address the issue of producing a more accurate disambiguation function by means of applying data augmentation in the set of data training. We also propose a SyGAR-based data augmentation approach and evaluate our proposal on three collections commonly used in works about author name disambiguation task. The experimental results showed scenarios where improvements are possible in the author name disambiguation task. The proposal of data augmentation outperforms other data augmentation approach, as well as improves some machine learning techniques that were not specifically designed for the author name disambiguation task.

Download Full-text

Dual-Channel Heterogeneous Graph Network for Author Name Disambiguation

Information ◽

10.3390/info12090383 ◽

2021 ◽

Vol 12 (9) ◽

pp. 383

Author(s):

Xin Zheng ◽

Pengyu Zhang ◽

Yanjie Cui ◽

Rong Du ◽

Yong Zhang

Keyword(s):

Semantic Information ◽

High Accuracy ◽

Social Analysis ◽

Accurate Data ◽

Significant Issue ◽

Name Disambiguation ◽

Clustering Method ◽

Structure Information ◽

Dual Channel ◽

Author Name Disambiguation

Name disambiguation has long been a significant issue in many fields, such as literature management and social analysis. In recent years, methods based on graph networks have performed well in name disambiguation, but these works have rarely used heterogeneous graphs to capture relationships between nodes. Heterogeneous graphs can extract more comprehensive relationship information so that more accurate node embedding can be learned. Therefore, a Dual-Channel Heterogeneous Graph Network is proposed to solve the name disambiguation problem. We use the heterogeneous graph network to capture various node information to ensure that our method can learn more accurate data structure information. In addition, we use fastText to extract the semantic information of the data. Then, a clustering method based on DBSCAN is used to classify academic papers by different authors into different clusters. In many experiments based on real datasets, our method achieved high accuracy, which proves its effectiveness.

Download Full-text

S2AND: A Benchmark and Evaluation System for Author Name Disambiguation

10.1109/jcdl52503.2021.00029 ◽

2021 ◽

Author(s):

Shivashankar Subramanian ◽

Daniel King ◽

Doug Downey ◽

Sergey Feldman

Keyword(s):

Evaluation System ◽

Name Disambiguation ◽

Author Name Disambiguation

Download Full-text

Multiple Features Driven Author Name Disambiguation

10.1109/icws53863.2021.00071 ◽

2021 ◽

Author(s):

Qian Zhou ◽

Wei Chen ◽

Weiqing Wang ◽

Jiajie Xu ◽

Lei Zhao

Keyword(s):

Name Disambiguation ◽

Multiple Features ◽

Author Name Disambiguation

Download Full-text

Multilayer heuristics based clustering framework (MHCF) for author name disambiguation

Scientometrics ◽

10.1007/s11192-021-04087-7 ◽

2021 ◽

Author(s):

Humaira Waqas ◽

Muhammad Abdul Qadir

Keyword(s):

Name Disambiguation ◽

Author Name Disambiguation

Download Full-text

Author Name Disambiguation Using Multiple Graph Attention Networks

10.1109/ijcnn52387.2021.9534125 ◽

2021 ◽

Author(s):

Zhiqiang Zhang ◽

Chunqi Wu ◽

Zhao Li ◽

Juanjuan Peng ◽

Haiyan Wu ◽

...

Keyword(s):

Name Disambiguation ◽

Attention Networks ◽

Author Name Disambiguation ◽

Multiple Graph

Download Full-text

Exploiting similarities across multiple dimensions for author name disambiguation

Scientometrics ◽

10.1007/s11192-021-04101-y ◽

2021 ◽

Author(s):

KM. Pooja ◽

Samrat Mondal ◽

Joydeep Chandra

Keyword(s):

Name Disambiguation ◽

Multiple Dimensions ◽

Author Name Disambiguation

Download Full-text

Effect of Chinese characters on machine learning for Chinese author name disambiguation: A counterfactual evaluation

Journal of Information Science ◽

10.1177/01655515211018171 ◽

2021 ◽

pp. 016555152110181

Author(s):

Jinseok Kim ◽

Jenna Kim ◽

Jinmo Kim

Keyword(s):

Machine Learning ◽

Real World ◽

Digital Libraries ◽

Chinese Characters ◽

Name Disambiguation ◽

Authority Control ◽

Author Name Disambiguation ◽

Bibliographic Data ◽

Chinese Author

Chinese author names are known to be more difficult to disambiguate than other ethnic names because they tend to share surnames and forenames, thus creating many homonyms. In this study, we demonstrate how using Chinese characters can affect machine learning for author name disambiguation. For analysis, 15K author names recorded in Chinese are transliterated into English and simplified by initialising their forenames to create counterfactual scenarios, reflecting real-world indexing practices in which Chinese characters are usually unavailable. The results show that Chinese author names that are highly ambiguous in English or with initialised forenames tend to become less confusing if their Chinese characters are included in the processing. Our findings indicate that recording Chinese author names in native script can help researchers and digital libraries enhance authority control of Chinese author names that continue to increase in size in bibliographic data.

Download Full-text

name disambiguation
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Completing features for author name disambiguation (AND): an empirical analysis

The Data Set Knowledge Graph: Creating a Linked Open Data Source for Data Sets

Applying Data Augmentation for Disambiguating Author Names

Dual-Channel Heterogeneous Graph Network for Author Name Disambiguation

S2AND: A Benchmark and Evaluation System for Author Name Disambiguation

Multiple Features Driven Author Name Disambiguation

Multilayer heuristics based clustering framework (MHCF) for author name disambiguation

Author Name Disambiguation Using Multiple Graph Attention Networks

Exploiting similarities across multiple dimensions for author name disambiguation

Effect of Chinese characters on machine learning for Chinese author name disambiguation: A counterfactual evaluation

Export Citation Format

name disambiguationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Completing features for author name disambiguation (AND): an empirical analysis

The Data Set Knowledge Graph: Creating a Linked Open Data Source for Data Sets

Applying Data Augmentation for Disambiguating Author Names

Dual-Channel Heterogeneous Graph Network for Author Name Disambiguation

S2AND: A Benchmark and Evaluation System for Author Name Disambiguation

Multiple Features Driven Author Name Disambiguation

Multilayer heuristics based clustering framework (MHCF) for author name disambiguation

Author Name Disambiguation Using Multiple Graph Attention Networks

Exploiting similarities across multiple dimensions for author name disambiguation

Effect of Chinese characters on machine learning for Chinese author name disambiguation: A counterfactual evaluation

name disambiguation
Recently Published Documents