A real-world service mashup platform based on data integration, information synthesis, and knowledge fusion

2020 ◽  
pp. 1-19
Author(s):  
Lei Yu ◽  
Yucong Duan ◽  
Kuan-Ching Li
2021 ◽  
pp. 193229682110075
Author(s):  
Rebecca A. Harvey Towers ◽  
Xiaohe Zhang ◽  
Rasoul Yousefi ◽  
Ghazaleh Esmaili ◽  
Liang Wang ◽  
...  

The algorithm for the Dexcom G6 CGM System was enhanced to retain accuracy while reducing the frequency and duration of sensor error. The new algorithm was evaluated by post-processing raw signals collected from G6 pivotal trials (NCT02880267) and by assessing the difference in data availability after a limited, real-world launch. Accuracy was comparable with the new algorithm—the overall %20/20 was 91.7% before and 91.8% after the algorithm modification; MARD was unchanged. The mean data gap due to sensor error nearly halved and total time spent in sensor error decreased by 59%. A limited field launch showed similar results, with a 43% decrease in total time spent in sensor error. Increased data availability may improve patient experience and CGM data integration into insulin delivery systems.


Author(s):  
James Boyd ◽  
Anna Ferrante ◽  
Adrian Brown ◽  
Sean Randall ◽  
James Semmens

ABSTRACT ObjectivesWhile record linkage has become a strategic research priority within Australia and internationally, legal and administrative issues prevent data linkage in some situations due to privacy concerns. Even current best practices in record linkage carry some privacy risk as they require the release of personally identifying information to trusted third parties. Application of record linkage systems that do not require the release of personal information can overcome legal and privacy issues surrounding data integration. Current conceptual and experimental privacy-preserving record linkage (PPRL) models show promise in addressing data integration challenges but do not yet address all of the requirements for real-world operations. This paper aims to identify and address some of the challenges of operationalising PPRL frameworks. ApproachTraditional linkage processes involve comparing personally identifying information (name, address, date of birth) on pairs of records to determine whether the records belong to the same person. Designing appropriate linkage strategies is an important part of the process. These are typically based on the analysis of data attributes (metadata) such as data completeness, consistency, constancy and field discriminating power. Under a PPRL model, however, these factors cannot be discerned from the encrypted data, so an alternative approach is required. This paper explores methods for data profiling, blocking, weight/threshold estimation and error detection within a PPRL framework. ResultsProbabilistic record linkage typically involves the estimation of weights and thresholds to optimise the linkage and ensure highly accurate results. The paper outlines the metadata requirements and automated methods necessary to collect data without compromising privacy. We present work undertaken to develop parameter estimation methods which can help optimise a linkage strategy without the release of personally identifiable information. These are required in all parts of the privacy preserving record linkage process (pre-processing, standardising activities, linkage, grouping and extracting). ConclusionsPPRL techniques that operate on encrypted data have the potential for large-scale record linkage, performing both accurately and efficiently under experimental conditions. Our research has advanced the current state of PPRL with a framework for secure record linkage that can be implemented to improve and expand linkage service delivery while protecting an individual’s privacy. However, more research is required to supplement this technique with additional elements to ensure the end-to-end method is practical and can be incorporated into real-world models.


2019 ◽  
Vol 15 (1) ◽  
pp. 47-70
Author(s):  
Fernando R.S. Serrano ◽  
Alvaro A.A. Fernandes ◽  
Klitos Christodoulou

Purpose The pay-as-you-go approach to data integration aims to reduce the time and effort required by proposing a bootstrap phase in which algorithms, rather than experts, identify semantic correspondences and generate the mappings. This highly automated bootstrap phase is likely to be of low quality, thus pay-as-you-go approaches postulate a subsequent continuous improvement phase based on user feedback assimilation to improve the quality of the integration. The purpose of this paper is to quantify the quality of a speculative integration, using one particular type of feedback, mapping results, whilst taking into account the uncertainty of user feedback provided. Design/methodology/approach The authors propose a systematic approach to quantify the quality of an integration as a conditional probability given the trustworthiness of the workers. Given a set of mappings and a set of workers of unknown trustworthiness, feedback instances are collected in the extents of the mappings that characterize the integration. Taking into account the available evidence obtained from worker feedback, the technique provides a quality quantification of the speculative integration. Findings Experimental results on both synthetic and real-world scenarios provide valuable empirical evidence that the technique produces a cost-effective quantification of integration quality that faithfully reflects the judgement of the workers whilst taking into account the inherent uncertainty of user feedback. Originality/value Current pay-as-you-go techniques provide a limited view of the integration quality as the result of feedback assimilation. To the best of the authors’ knowledge, this is the first proposal for quantifying integration quality in a systematic and principled manner using mapping results as a piece of evidence while at the same time considering the uncertainty inherited from user feedback.


2017 ◽  
Vol 2017 ◽  
pp. 1-14
Author(s):  
Suphanut Jamonnak ◽  
En Cheng

Mobile devices are rapidly becoming the new medium of educational and social life for young people, and hence mobile educational games have become an important mechanism for learning. To help school-aged children learn about the fascinating world of plants, we present a mobile educational game called Little Botany, where players can create their own virtual gardens in any location on earth. One unique feature of Little Botany is that the game is built upon real-world data by leveraging data integration mechanism. The gardens created in Little Botany are augmented with real-world location data and real-time weather data. More specifically, Little Botany is using real-time weather data for the garden location to simulate how the weather affects plants growth. Little Botany players can learn to select what crops to plant, maintain their own garden, watch crops to grow, tend the crops on a daily basis, and harvest them. With this game, users can also learn plant structure and three chemical reactions.


2021 ◽  
Vol 16 (2) ◽  
pp. 1-17
Author(s):  
Kevin O’hare ◽  
Anna Jurek-Loughrey ◽  
Cassio De Campos

Data integration is an important component of Big Data analytics. One of the key challenges in data integration is record linkage, that is, matching records that represent the same real-world entity. Because of computational costs, methods referred to as blocking are employed as a part of the record linkage pipeline in order to reduce the number of comparisons among records. In the past decade, a range of blocking techniques have been proposed. Real-world applications require approaches that can handle heterogeneous data sources and do not rely on labelled data. We propose high-value token-blocking (HVTB), a simple and efficient approach for blocking that is unsupervised and schema-agnostic, based on a crafted use of Term Frequency-Inverse Document Frequency. We compare HVTB with multiple methods and over a range of datasets, including a novel unstructured dataset composed of titles and abstracts of scientific papers. We thoroughly discuss results in terms of accuracy, use of computational resources, and different characteristics of datasets and records. The simplicity of HVTB yields fast computations and does not harm its accuracy when compared with existing approaches. It is shown to be significantly superior to other methods, suggesting that simpler methods for blocking should be considered before resorting to more sophisticated methods.


Author(s):  
Jeanette Samuelsen ◽  
Weiqin Chen ◽  
Barbara Wasson

AbstractLearning analytics (LA) is a field that examines data about learners and their context, for understanding and optimizing learning and the environments in which it occurs. Integration of multiple data sources, an important dimension of scalability, has the potential to provide rich insights within LA. Using a common standard such as the Experience API (xAPI) to describe learning activity data across multiple sources can alleviate obstacles for data integration. Despite their potential, however, research indicates that standards are seldom used for integration of multiple sources in LA. Our research aims to understand and address the challenges of using current learning activity data standards for describing learning context with regard to interoperability and data integration. In this paper, we present the results of an exploratory case study involving in-depth interviews with stakeholders having used xAPI in a real-world project. Based on the subsequent thematic analysis of interviews, and examination of xAPI, we identified challenges and limitations in describing learning context data, and developed recommendations (provided in this paper in summarized form) for enriching context descriptions and enhancing the expressibility of xAPI. By situating the research in a real-world setting, our research also contributes to bridge the gap between the academic community and practitioners in learning activity data standards and scalability, focusing on description of learning context.


JMIR Cancer ◽  
10.2196/23161 ◽  
2021 ◽  
Vol 7 (2) ◽  
pp. e23161
Author(s):  
Michael Grabner ◽  
Cliff Molife ◽  
Liya Wang ◽  
Katherine B Winfree ◽  
Zhanglin Lin Cui ◽  
...  

Background The integration of data from disparate sources could help alleviate data insufficiency in real-world studies and compensate for the inadequacies of single data sources and short-duration, small sample size studies while improving the utility of data for research. Objective This study aims to describe and evaluate a process of integrating data from several complementary sources to conduct health outcomes research in patients with non–small cell lung cancer (NSCLC). The integrated data set is also used to describe patient demographics, clinical characteristics, treatment patterns, and mortality rates. Methods This retrospective cohort study integrated data from 4 sources: administrative claims from the HealthCore Integrated Research Database, clinical data from a Cancer Care Quality Program (CCQP), clinical data from abstracted medical records (MRs), and mortality data from the US Social Security Administration. Patients with lung cancer who initiated second-line (2L) therapy between November 01, 2015, and April 13, 2018, were identified in the claims and CCQP data. Eligible patients were 18 years or older and received atezolizumab, docetaxel, erlotinib, nivolumab, pembrolizumab, pemetrexed, or ramucirumab in the 2L setting. The main analysis cohort included patients with claims data and data from at least one additional data source (CCQP or MR). Patients without integrated data (claims only) were reported separately. Descriptive and univariate statistics were reported. Results Data integration resulted in a main analysis cohort of 2195 patients with NSCLC; 2106 patients had CCQP and 407 patients had MR data. The claims-only cohort included 931 eligible patients. For the main analysis cohort, the mean age was 62.1 (SD 9.27) years, 48.56% (1066/2195) were female, the median length of follow-up was 6.8 months, and for 37.77% (829/2195), death was observed. For the claims-only cohort, the mean age was 66.6 (SD 12.69) years, 52.1% (485/931) were female, the median length of follow-up was 8.6 months, and for 29.3% (273/931), death was observed. The most frequent 2L treatment was immunotherapy (1094/2195, 49.84%), followed by platinum-based regimens (472/2195, 21.50%) and single-agent chemotherapy (441/2195, 20.09%); mean duration of 2L therapy was 5.6 (SD 4.9, median 4) months. We describe challenges and learnings from the data integration process, and the benefits of the integrated data set, which includes a richer set of clinical and outcome data to supplement the utilization metrics available in administrative claims. Conclusions The management of patients with NSCLC requires care from a multidisciplinary team, leading to a lack of a single aggregated data source in real-world settings. The availability of integrated clinical data from MRs, health plan claims, and other sources of clinical care may improve the ability to assess emerging treatments.


Author(s):  
Sujitha Ratnasingham ◽  
Ximena Camacho ◽  
Ted McDonald ◽  
Nicole Yada ◽  
Brent Diverty ◽  
...  

IntroductionThe importance of the determinants of health to health outcomes has long been established. Historically, data from each of these sectors has been captured in disparate, often siloed, sources. Attempts to integrate these data have faced a number of challenges including technical, legislative and interpretative barriers, creating inefficiencies and inhibiting knowledge sharing. Despite this, there have been notable successes where intersectoral data and health data have been brought together in a meaningful way. The establishment of strong partnerships, with academia, governments, privacy and legal sectors, and other bodies, across sectors has been key to this success. These partnerships ensure data are integrated, analyzed, and interpreted accurately and appropriately, while also leveraging existing investments and expertise. Objectives and ApproachThe objective of this session is to explore the role of partnerships throughout the data integration life cycle, from initial discussions, to data integration, through to connecting research output to policy impact. Each of the presenters will discuss the successes, barriers and mitigation strategies they have experienced across different jurisdictions using real world examples. ResultsHealth research institutes globally are increasingly able to access routinely collected intersectoral data from non-health sectors. In each institute, data are unique, complex and have been collected in a manner consistent with the needs of the sector. As health research institutes work to understand the data structures and determine the best way to link, use and interpret the information according to national and international best practice guidelines, it has become clear that it is critical to undertake this in partnership with experts from each sector, who understand how the data was collected and can guide appropriate interpretation. In addition, these partnerships have enabled the connection of policy priorities in other sectors with research done in the health sector using intersectoral data. For example, in addition to supporting government health departments, health research institutes have collaborated with other government ministries including immigration, social services, and education. This session will present real world examples from local (provincial), national and international contexts, and highlight a novel data platform, being developed to minimize barriers to data access and use across sectors and jurisdictions. Conclusion / ImplicationsThe participants on this panel will demonstrate the importance of partnership throughout the data integration life cycle when working with intersectoral data using real world examples. Collaboration increases the value of integrated data to both health and non-health sectors, through the connection of policy priorities and support of research across the determinants of health.


2020 ◽  
Author(s):  
Michael Grabner ◽  
Cliff Molife ◽  
Liya Wang ◽  
Katherine B Winfree ◽  
Zhanglin Lin Cui ◽  
...  

BACKGROUND The integration of data from disparate sources could help alleviate data insufficiency in real-world studies and compensate for the inadequacies of single data sources and short-duration, small sample size studies while improving the utility of data for research. OBJECTIVE This study aims to describe and evaluate a process of integrating data from several complementary sources to conduct health outcomes research in patients with non–small cell lung cancer (NSCLC). The integrated data set is also used to describe patient demographics, clinical characteristics, treatment patterns, and mortality rates. METHODS This retrospective cohort study integrated data from 4 sources: administrative claims from the HealthCore Integrated Research Database, clinical data from a Cancer Care Quality Program (CCQP), clinical data from abstracted medical records (MRs), and mortality data from the US Social Security Administration. Patients with lung cancer who initiated second-line (2L) therapy between November 01, 2015, and April 13, 2018, were identified in the claims and CCQP data. Eligible patients were 18 years or older and received atezolizumab, docetaxel, erlotinib, nivolumab, pembrolizumab, pemetrexed, or ramucirumab in the 2L setting. The main analysis cohort included patients with claims data and data from at least one additional data source (CCQP or MR). Patients without integrated data (claims only) were reported separately. Descriptive and univariate statistics were reported. RESULTS Data integration resulted in a main analysis cohort of 2195 patients with NSCLC; 2106 patients had CCQP and 407 patients had MR data. The claims-only cohort included 931 eligible patients. For the main analysis cohort, the mean age was 62.1 (SD 9.27) years, 48.56% (1066/2195) were female, the median length of follow-up was 6.8 months, and for 37.77% (829/2195), death was observed. For the claims-only cohort, the mean age was 66.6 (SD 12.69) years, 52.1% (485/931) were female, the median length of follow-up was 8.6 months, and for 29.3% (273/931), death was observed. The most frequent 2L treatment was immunotherapy (1094/2195, 49.84%), followed by platinum-based regimens (472/2195, 21.50%) and single-agent chemotherapy (441/2195, 20.09%); mean duration of 2L therapy was 5.6 (SD 4.9, median 4) months. We describe challenges and learnings from the data integration process, and the benefits of the integrated data set, which includes a richer set of clinical and outcome data to supplement the utilization metrics available in administrative claims. CONCLUSIONS The management of patients with NSCLC requires care from a multidisciplinary team, leading to a lack of a single aggregated data source in real-world settings. The availability of integrated clinical data from MRs, health plan claims, and other sources of clinical care may improve the ability to assess emerging treatments.


Sign in / Sign up

Export Citation Format

Share Document