Data Cleansing

Keyword(s):  
2010 ◽  
Vol 1 (2) ◽  
pp. 213-221 ◽  
Author(s):  
Jiyi Chen ◽  
Wenyuan Li ◽  
Adriel Lau ◽  
Jiguo Cao ◽  
Ke Wang

2021 ◽  
Vol 20 (01) ◽  
pp. 2150011
Author(s):  
Worapan Kusakunniran ◽  
Thearith Ponn ◽  
Nuttapol Boonsom ◽  
Suwimol Wahakit ◽  
Kittikhun Thongkanchorn

This paper develops the Scopus H5-Index rankings, using the field of computer science as a case study. The challenge begins with the inconsistency of conference names. The rule-based approach is invented to automatically clean up duplicate conferences and assign unique pseudo ID for each conference. This data cleansing process is applied on conference names retrieved from both Scopus and ERA/CORE, in order to share common pseudo IDs for the sake of correlation analysis. The proposed data cleansing process is validated using ERA 2010 and CORE 2018 as references and reports the very small errors of 0.6% and 0.4%, respectively. Then, the Scopus H5-Index 2006–2010 and Scopus H5-Index 2014–2018 rankings are constructed and compared with the existing ERA 2010 and CORE 2018 rankings, respectively. The results show that the correlation within the Scopus H5-Index rankings (i.e. Scopus H5-Index 2006–2010 and Scopus H5-Index 2014–2018) is at the top of the moderate correlation band, where the correlation within the ERA/CORE rankings (ERA 2010 and CORE 2018) is at the top of the strong correlation band. While the correlations across ranking systems (i.e. Scopus H5-Index 2006–2010 vs. ERA 2010, and Scopus H5-Index 2014–2018 vs. CORE 2018) are at the bottom and middle of the moderate correlation band. It can be said that the quality assessment using the Scopus H5-Index ranking is more dynamic and quickly up-to-date when compared with the ERA/CORE ranking. Also, these two ranking systems are moderately correlated with each other for both periods of 2010 and 2018.


Author(s):  
Baumgart Matthias ◽  
Romer Lisa ◽  
Luhr Matthias ◽  
Roschke Christian ◽  
Ritter Marc ◽  
...  

2020 ◽  
pp. 211-229
Author(s):  
Mitchell Pearson ◽  
Brian Knight ◽  
Devin Knight ◽  
Manuel Quintana

2013 ◽  
Vol 4 (4) ◽  
pp. 2347-2355 ◽  
Author(s):  
Gonzalo Mateos ◽  
Georgios B. Giannakis
Keyword(s):  
Low Rank ◽  

Sign in / Sign up

Export Citation Format

Share Document