Clio: Schema Mapping Creation and Data Exchange

Author(s):  
Ronald Fagin ◽  
Laura M. Haas ◽  
Mauricio Hernández ◽  
Renée J. Miller ◽  
Lucian Popa ◽  
...  
Keyword(s):  
2018 ◽  
Vol 3 (2) ◽  
pp. 162
Author(s):  
Slamet Sudaryanto Nurhendratno ◽  
Sudaryanto Sudaryanto

 Data integration is an important step in integrating information from multiple sources. The problem is how to find and combine data from scattered data sources that are heterogeneous and have semantically informant interconnections optimally. The heterogeneity of data sources is the result of a number of factors, including storing databases in different formats, using different software and hardware for database storage systems, designing in different data semantic models (Katsis & Papakonstantiou, 2009, Ziegler & Dittrich , 2004). Nowadays there are two approaches in doing data integration that is Global as View (GAV) and Local as View (LAV), but both have different advantages and limitations so that proper analysis is needed in its application. Some of the major factors to be considered in making efficient and effective data integration of heterogeneous data sources are the understanding of the type and structure of the source data (source schema). Another factor to consider is also the view type of integration result (target schema). The results of the integration can be displayed into one type of global view or a variety of other views. So in integrating data whose source is structured the approach will be different from the integration of the data if the data source is not structured or semi-structured. Scheme mapping is a specific declaration that describes the relationship between the source scheme and the target scheme. In the scheme mapping is expressed in in some logical formulas that can help applications in data interoperability, data exchange and data integration. In this paper, in the case of establishing a patient referral center data center, it requires integration of data whose source is derived from a number of different health facilities, it is necessary to design a schema mapping system (to support optimization). Data Center as the target orientation schema (target schema) from various reference service units as a source schema (source schema) has the characterization and nature of data that is structured and independence. So that the source of data can be integrated tersetruktur of the data source into an integrated view (as a data center) with an equivalent query rewriting (equivalent). The data center as a global schema serves as a schema target requires a "mediator" that serves "guides" to maintain global schemes and map (mapping) between global and local schemes. Data center as from Global As View (GAV) here tends to be single and unified view so to be effective in its integration process with various sources of schema which is needed integration facilities "integration". The "Pemadu" facility is a declarative mapping language that allows to specifically link each of the various schema sources to the data center. So that type of query rewriting equivalent is suitable to be applied in the context of query optimization and maintenance of physical data independence.Keywords: Global as View (GAV), Local as View (LAV), source schema ,mapping schema


2012 ◽  
Vol 54 (3) ◽  
pp. 105-113 ◽  
Author(s):  
Giansalvatore Mecca ◽  
Paolo Papotti

Author(s):  
Huiping Cao ◽  
Yan Qi ◽  
K. Selçuk Candan ◽  
Maria Luisa Sapino

Many applications require exchange and integration of data from multiple, heterogeneous sources. eXtensible Markup Language (XML) is a standard developed to satisfy the convenient data exchange needs of these applications. However, XML by itself does not address the data integration requirements. This chapter discusses the challenges and techniques in XML Data Integration. It first presents a four step outline, illustrating the steps involved in the integration of XML data. This chapter, then, focuses on the first two of these steps: schema extraction and data/schema mapping. More specifically, schema extraction presents techniques to extract tree summaries, DTDs, or XML Schemas from XML documents. The discussion on data/schema mapping focuses on techniques for aligning XML data and schemas.


2020 ◽  
Vol 51 (2) ◽  
pp. 479-493
Author(s):  
Jenny A. Roberts ◽  
Evelyn P. Altenberg ◽  
Madison Hunter

Purpose The results of automatic machine scoring of the Index of Productive Syntax from the Computerized Language ANalysis (CLAN) tools of the Child Language Data Exchange System of TalkBank (MacWhinney, 2000) were compared to manual scoring to determine the accuracy of the machine-scored method. Method Twenty transcripts of 10 children from archival data of the Weismer Corpus from the Child Language Data Exchange System at 30 and 42 months were examined. Measures of absolute point difference and point-to-point accuracy were compared, as well as points erroneously given and missed. Two new measures for evaluating automatic scoring of the Index of Productive Syntax were introduced: Machine Item Accuracy (MIA) and Cascade Failure Rate— these measures further analyze points erroneously given and missed. Differences in total scores, subscale scores, and individual structures were also reported. Results Mean absolute point difference between machine and hand scoring was 3.65, point-to-point agreement was 72.6%, and MIA was 74.9%. There were large differences in subscales, with Noun Phrase and Verb Phrase subscales generally providing greater accuracy and agreement than Question/Negation and Sentence Structures subscales. There were significantly more erroneous than missed items in machine scoring, attributed to problems of mistagging of elements, imprecise search patterns, and other errors. Cascade failure resulted in an average of 4.65 points lost per transcript. Conclusions The CLAN program showed relatively inaccurate outcomes in comparison to manual scoring on both traditional and new measures of accuracy. Recommendations for improvement of the program include accounting for second exemplar violations and applying cascaded credit, among other suggestions. It was proposed that research on machine-scored syntax routinely report accuracy measures detailing erroneous and missed scores, including MIA, so that researchers and clinicians are aware of the limitations of a machine-scoring program. Supplemental Material https://doi.org/10.23641/asha.11984364


Author(s):  
Scot D. Weaver ◽  
Thomas E. Lefchik ◽  
Marc I. Hoit ◽  
Kirk Beach

Sign in / Sign up

Export Citation Format

Share Document