Study of Data Integration Model Based on Network Technology

2011 ◽  
Vol 268-270 ◽  
pp. 1868-1873
Author(s):  
Li Jun Yang

The existence of heterogeneous data sources brings great inconvenience to realize the exchange visits to data between different information systems. Therefore, it becomes a meaningful research topic to solve the problem of realizing convenient and flexible exchange visits. This paper combines the data representation format of XML generally used in current network with an interaction technique of WebService, and constructs a UDM data model, which can implement structured data of relational type as well as describe unstructured data and self-describing semi-structured data. So UDM data model can be used as a common data model integrated by heterogeneous data to integrate these heterogeneous data.

2020 ◽  
Vol 10 (1) ◽  
pp. 7
Author(s):  
Miguel R. Luaces ◽  
Jesús A. Fisteus ◽  
Luis Sánchez-Fernández ◽  
Mario Munoz-Organero ◽  
Jesús Balado ◽  
...  

Providing citizens with the ability to move around in an accessible way is a requirement for all cities today. However, modeling city infrastructures so that accessible routes can be computed is a challenge because it involves collecting information from multiple, large-scale and heterogeneous data sources. In this paper, we propose and validate the architecture of an information system that creates an accessibility data model for cities by ingesting data from different types of sources and provides an application that can be used by people with different abilities to compute accessible routes. The article describes the processes that allow building a network of pedestrian infrastructures from the OpenStreetMap information (i.e., sidewalks and pedestrian crossings), improving the network with information extracted obtained from mobile-sensed LiDAR data (i.e., ramps, steps, and pedestrian crossings), detecting obstacles using volunteered information collected from the hardware sensors of the mobile devices of the citizens (i.e., ramps and steps), and detecting accessibility problems with software sensors in social networks (i.e., Twitter). The information system is validated through its application in a case study in the city of Vigo (Spain).


Author(s):  
Sijia Liu ◽  
Yanshan Wang ◽  
Andrew Wen ◽  
Liwei Wang ◽  
Na Hong ◽  
...  

BACKGROUND Widespread adoption of electronic health records has enabled the secondary use of electronic health record data for clinical research and health care delivery. Natural language processing techniques have shown promise in their capability to extract the information embedded in unstructured clinical data, and information retrieval techniques provide flexible and scalable solutions that can augment natural language processing systems for retrieving and ranking relevant records. OBJECTIVE In this paper, we present the implementation of a cohort retrieval system that can execute textual cohort selection queries on both structured data and unstructured text—Cohort Retrieval Enhanced by Analysis of Text from Electronic Health Records (CREATE). METHODS CREATE is a proof-of-concept system that leverages a combination of structured queries and information retrieval techniques on natural language processing results to improve cohort retrieval performance using the Observational Medical Outcomes Partnership Common Data Model to enhance model portability. The natural language processing component was used to extract common data model concepts from textual queries. We designed a hierarchical index to support the common data model concept search utilizing information retrieval techniques and frameworks. RESULTS Our case study on 5 cohort identification queries, evaluated using the precision at 5 information retrieval metric at both the patient-level and document-level, demonstrates that CREATE achieves a mean precision at 5 of 0.90, which outperforms systems using only structured data or only unstructured text with mean precision at 5 values of 0.54 and 0.74, respectively. CONCLUSIONS The implementation and evaluation of Mayo Clinic Biobank data demonstrated that CREATE outperforms cohort retrieval systems that only use one of either structured data or unstructured text in complex textual cohort queries.


Author(s):  
Barbara Catania ◽  
Elena Ferrari

Web is characterized by a huge amount of very heterogeneous data sources, that differ both in media support and format representation. In this scenario, there is the need of an integrating approach for querying heterogeneous Web documents. To this purpose, XML can play an important role since it is becoming a standard for data representation and exchange over the Web. Due to its flexibility, XML is currently being used as an interface language over the Web, by which (part of) document sources are represented and exported. Under this assumption, the problem of querying heterogeneous sources can be reduced to the problem of querying XML data sources. In this chapter, we first survey the most relevant query languages for XML data proposed both by the scientific community and by standardization committees, e.g., W3C, mainly focusing on their expressive power. Then, we investigate how typical Information Retrieval concepts, such as ranking, similarity-based search, and profile-based search, can be applied to XML query languages. Commercial products based on the considered approaches are then briefly surveyed. Finally, we conclude the chapter by providing an overview of the most promising research trends in the fields.


2014 ◽  
Vol 912-914 ◽  
pp. 1201-1204
Author(s):  
Gang Huang ◽  
Xiu Ying Wu ◽  
Man Yuan

This paper provides an ontology-based distributed heterogeneous data integration framework (ODHDIF). The framework resolves the problem of semantic interoperability between heterogeneous data sources in semantic level. By metadatas specifying the distributed, heterogeneous data and by describing semantic information of data source , having "ontology" as a common semantic model, semantic match is established through ontology mapping between heterogeneous data sources and semantic difference institutions are shielded, so that semantic heterogeneity problem of the heterogeneous data sources can be effectively solved. It provides an effective technology measure for the interior information of enterprises to be shared in time accurately.


Author(s):  
Ivan Bojicic ◽  
Zoran Marjanovic ◽  
Nina Turajlic ◽  
Marko Petrovic ◽  
Milica Vuckovic ◽  
...  

In order for a data warehouse to be able to adequately fulfill its integrative and historical purpose, its data model must enable the appropriate and consistent representation of the different states of a system. In effect, a DW data model, representing the physical structure of the DW, must be general enough, to be able to consume data from heterogeneous data sources and reconcile the semantic differences of the data source models, and, at the same time, be resilient to the constant changes in the structure of the data sources. One of the main problems related to DW development is the absence of a standardized DW data model. In this paper a comparative analysis of the four most prominent DW data models (namely the relational/normalized model, data vault model, anchor model and dimensional model) will be given. On the basis of the results of [1]a, the new DW data model (the Domain/Mapping model- DMM) which would more adequately fulfill the posed requirements is presented.


2018 ◽  
Author(s):  
Michele Donini ◽  
Joao M. Monteiro ◽  
Massimiliano Pontil ◽  
Tim Hahn ◽  
Andreas J. Fallgatter ◽  
...  

Combining neuroimaging and clinical information for diagnosis, as for example behavioral tasks and genetics characteristics, is potentially beneficial but presents challenges in terms of finding the best data representation for the different sources of information. Their simple combination usually does not provide an improvement if compared with using the best source alone. In this paper, we proposed a framework based on a recent multiple kernel learning algorithm called EasyMKL and we investigated the benefits of this approach for diagnosing two different mental health diseases. The well known Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset tackling the Alzheimer Disease (AD) patients versus healthy controls classification task, and a second dataset tackling the task of classifying an heterogeneous group of depressed patients versus healthy controls. We used EasyMKL to combine a huge amount of basic kernels alongside a feature selection methodology, pursuing an optimal and sparse solution to facilitate interpretability. Our results show that the proposed approach, called EasyMKLFS, outperforms baselines (e.g. SVM and SimpleMKL), state-of-the-art random forests (RF) and feature selection (FS) methods.


2014 ◽  
Vol 687-691 ◽  
pp. 2776-2779
Author(s):  
Zhong Kan Xiong ◽  
Pei Zhen Wan ◽  
Jiu Ping Cai

Big data is one of the important development direction of modern information technology, realizing the sharing and analysis of large data will bring immeasurable economic value, but also has a tremendous role in promoting the social. In the age of big data, unified the data representation, large data processing, query, analysis and visualization are the key problem to be solved urgently. In order to provide a standardized framework construction of the large data service platform, this paper designed a large data service oriented architecture user experience. Secondly, in the aspect of data model, in order to achieve high data service for non structured data, the design of the non structured data model based on subject behavior. In large data service model, algebraic model large data services and their composition was established by using process algebra. In large data service applications, detailed retrieval, process analysis and visualization services, and by improving the retrieval accuracy and efficiency of the service in two aspects of measures to achieve the high data service optimization.


2012 ◽  
Vol 241-244 ◽  
pp. 2665-2668 ◽  
Author(s):  
Bo Zhou ◽  
Wen Liang Liu

The thesis researches and analyzes the ways of information query from the model of SaaS (Software-as-a-service) in cloud computing frame to establish a common data model by using XML. It puts forward a frame of integrated query of multi-resources information by the way of SaaS application by connecting the query of heterogeneous data in the way of the middleware and combining the method of interface software with the method of gathering database metadata during cross search. The model will facilitate querying effectively and utilizing fully rich information resources in the field of information query for users; effectively reduce the costs of users without maintaining and managing; promote the integration of social resources and produce the beneficial result of the scale economy.


Author(s):  
Mariagrazia Fugini ◽  
Mirko Cesarini ◽  
Mario Mezzanzanica

This chapter presents a case study concerning the development of a Statistical Information System (SIS) out of data coming from administrative archives of the PAs. Such archives are a rich source of up to date information, but an attempt to use them as sources for statistical analysis reveals errors and incompatibilities among each other that do not permit their usage as a statistical and decision support basis. These errors and incompatibilities are usually undetected during administrative use, since they do not affect their day-by-day use in the PAs; however they need to be fixed before performing any further aggregate analysis. The reader is engaged with the basic aspects involved in building a SIS out of administrative data, such as design of an integration model for different and heterogeneous data sources, improvement of the overall data quality, removal of errors that might impact on the correctness of statistical analysis, design of a data warehouse for statistical analysis, and design of a multidimensional database to develop indicators for decision support. Finally, some examples are presented concerning the information that can be obtained by making use of a SIS constructed out of Registry and Income Office archives.


Sign in / Sign up

Export Citation Format

Share Document