schema matching
Recently Published Documents


TOTAL DOCUMENTS

361
(FIVE YEARS 42)

H-INDEX

27
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Evan Shieh ◽  
Saul Simhon ◽  
Geetha Aluri ◽  
Giorgos Papachristoudis ◽  
Doa Yakut ◽  
...  
Keyword(s):  

2021 ◽  
Vol 11 (3) ◽  
pp. 119-129
Author(s):  
Rifqi Hammad ◽  
◽  
Azriel Christian Nurcahyo ◽  
Ahmad Zuli Amrullah ◽  
Pahrul Irfan ◽  
...  

University requires the integration of data from one system with other systems as needed. This is because there are still many processes to input the same data but with different information systems. The application of data integration generally has several obstacles, one of which is due to the diversity of databases used by each information system. Schema matching is one method that can be used to overcome data integration problems caused by database diversity. The schema matching method used in this research is linguistic and constraint. The results of the matching scheme are used as material for optimizing data integration at the database level. The optimization process shows a change in the number of tables and attributes in the database that is a decrease in the number of tables by 13 tables and 492 attributes. The changes were caused by some tables and attributes were omitted and normalized. This research shows that after optimization, data integration becomes better because the data was connected and used by other systems has increased by 46.67% from the previous amount. This causes the same data entry on different systems can be reduced and also data inconsistencies caused by duplication of data on different systems can be minimized.


2021 ◽  
Vol 11 (3) ◽  
pp. 119-129
Author(s):  
Rifqi Hammad ◽  
◽  
Azriel Christian Nurcahyo ◽  
Ahmad Zuli Amrullah ◽  
Pahrul Irfan ◽  
...  

University requires the integration of data from one system with other systems as needed. This is because there are still many processes to input the same data but with different information systems. The application of data integration generally has several obstacles, one of which is due to the diversity of databases used by each information system. Schema matching is one method that can be used to overcome data integration problems caused by database diversity. The schema matching method used in this research is linguistic and constraint. The results of the matching scheme are used as material for optimizing data integration at the database level. The optimization process shows a change in the number of tables and attributes in the database that is a decrease in the number of tables by 13 tables and 492 attributes. The changes were caused by some tables and attributes were omitted and normalized. This research shows that after optimization, data integration becomes better because the data was connected and used by other systems has increased by 46.67% from the previous amount. This causes the same data entry on different systems can be reduced and also data inconsistencies caused by duplication of data on different systems can be minimized.


2021 ◽  
Vol 27 (1) ◽  
Author(s):  
Diego Rodrigues ◽  
Altigran da Silva

AbstractSchema matching is the problem of finding semantic correspondences between elements from different schemas. This is a challenging problem since disparate elements in the schemas often represent the same concept. Traditional instances of this problem involved a pair of schemas. However, recently, there has been an increasing interest in matching several related schemas at once, a problem known as schema matching networks. The goal is to identify elements from several schemas that correspond to a single concept. We propose a family of methods for schema matching networks based on machine learning, which proved to be a competitive alternative for the traditional matching problem in several domains. To overcome the issue of requiring a large amount of training data, we also propose a bootstrapping procedure to generate training data automatically. In addition, we leverage constraints that arise in network scenarios to improve the quality of this data. We also study a strategy for receiving user feedback to assert some of the matchings generated and, relying on this feedback, improve the final result’s quality. Our experiments show that our methods can outperform baselines, reaching F1-score up to 0.83.


Author(s):  
Archana Patel ◽  
Narayan C. Debnath ◽  
Ambrish Kumar Mishra ◽  
Sarika Jain
Keyword(s):  

2021 ◽  
Vol 14 (8) ◽  
pp. 1254-1261
Author(s):  
Nan Tang ◽  
Ju Fan ◽  
Fangyi Li ◽  
Jianhong Tu ◽  
Xiaoyong Du ◽  
...  

Can AI help automate human-easy but computer-hard data preparation tasks that burden data scientists, practitioners, and crowd workers? We answer this question by presenting RPT, a denoising autoencoder for tuple-to-X models (" X " could be tuple, token, label, JSON, and so on). RPT is pre-trained for a tuple-to-tuple model by corrupting the input tuple and then learning a model to reconstruct the original tuple. It adopts a Transformer-based neural translation architecture that consists of a bidirectional encoder (similar to BERT) and a left-to-right autoregressive decoder (similar to GPT), leading to a generalization of both BERT and GPT. The pre-trained RPT can already support several common data preparation tasks such as data cleaning, auto-completion and schema matching. Better still, RPT can be fine-tuned on a wide range of data preparation tasks, such as value normalization, data transformation, data annotation, etc. To complement RPT, we also discuss several appealing techniques such as collaborative training and few-shot learning for entity resolution, and few-shot learning and NLP question-answering for information extraction. In addition, we identify a series of research opportunities to advance the field of data preparation.


Sign in / Sign up

Export Citation Format

Share Document