Using Grid for Data Sharing to Support Intelligence in Decision Making

Author(s):  
N. Bessis ◽  
T. French ◽  
M. Burakova-Lorgnier ◽  
W. Huang

This chapter is about conceptualizing the applicability of grid related technologies for supporting intelligence in decision-making. It aims to discuss how the open grid service architecture—data, access integration (OGSA-DAI) can facilitate the discovery of and controlled access to vast data-sets, to assist intelligence in decision making. Trust is also identified as one of the main challenges for intelligence in decision-making. On this basis, the implications and challenges of using grid technologies to serve this purpose are also discussed. To further the explanation of the concepts and practices associated with the process of intelligence in decision-making using grid technologies, a minicase is employed incorporating a scenario. That is to say, “Synergy Financial Solutions Ltd” is presented as the minicase, so as to provide the reader with a central and continuous point of reference.

Author(s):  
Sree Nilakanta ◽  
L. L. Miller ◽  
Dan Zhu

This chapter is about conceptualizing the applicability of grid related technologies for supporting intelligence in decision-making. It aims to discuss how the open grid service architecture—data, access integration (OGSA-DAI) can facilitate the discovery of and controlled access to vast data-sets, to assist intelligence in decision making. Trust is also identified as one of the main challenges for intelligence in decision-making. On this basis, the implications and challenges of using grid technologies to serve this purpose are also discussed. To further the explanation of the concepts and practices associated with the process of intelligence in decision-making using grid technologies, a minicase is employed incorporating a scenario. That is to say, “Synergy Financial Solutions Ltd” is presented as the minicase, so as to provide the reader with a central and continuous point of reference.


2010 ◽  
Vol 143-144 ◽  
pp. 462-466
Author(s):  
Shu Yan

Campus grid is to use grid technology, the existing campus network, distributed, heterogeneous computing resources and information resources to the virtual integrated together to form a pool of available computing resources and can provide information platform for users to query information. This paper presents job scheduling model based on open grid service architecture by analyzing the current status of campus network and grid technologies. The simulation results show that the models are feasibility and validity.


Author(s):  
Longzhi Yang ◽  
Jie Li ◽  
Noe Elisa ◽  
Tom Prickett ◽  
Fei Chao

AbstractBig data refers to large complex structured or unstructured data sets. Big data technologies enable organisations to generate, collect, manage, analyse, and visualise big data sets, and provide insights to inform diagnosis, prediction, or other decision-making tasks. One of the critical concerns in handling big data is the adoption of appropriate big data governance frameworks to (1) curate big data in a required manner to support quality data access for effective machine learning and (2) ensure the framework regulates the storage and processing of the data from providers and users in a trustworthy way within the related regulatory frameworks (both legally and ethically). This paper proposes a framework of big data governance that guides organisations to make better data-informed business decisions within the related regularity framework, with close attention paid to data security, privacy, and accessibility. In order to demonstrate this process, the work also presents an example implementation of the framework based on the case study of big data governance in cybersecurity. This framework has the potential to guide the management of big data in different organisations for information sharing and cooperative decision-making.


2021 ◽  
Author(s):  
lili Zhang ◽  
Himanshu Vashisht ◽  
Andrey Totev ◽  
Nam Trinh ◽  
Tomas Ward

UNSTRUCTURED Deep learning models, especially RNN models, are potentially powerful tools for representing the complex learning processes and decision-making strategies used by humans. Such neural network models make fewer assumptions about the underlying mechanisms thus providing experimental flexibility in terms of applicability. However this comes at the cost of requiring a larger number of tunable parameters requiring significantly more training and representative data for effective learning. This presents practical challenges given that most computational modelling experiments involve relatively small numbers of subjects, which while adequate for conventional modelling using low dimensional parameter spaces, leads to sub-optimal model training when adopting deeper neural network approaches. Laboratory collaboration is a natural way of increasing data availability however, data sharing barriers among laboratories as necessitated by data protection regulations encourage us to seek alternative methods for collaborative data science. Distributed learning, especially federated learning, which supports the preservation of data privacy, is a promising method for addressing this issue. To verify the reliability and feasibility of applying federated learning to train neural networks models used in the characterisation of human decision making, we conducted experiments on a real-world, many-labs data pool including experimentally significant data-sets from ten independent studies. The performance of single models that were trained on single laboratory data-sets was poor, especially those with small numbers of subjects. This unsurprising finding supports the need for larger and more diverse data-sets to train more generalised and reliable models. To that end we evaluated four collaborative approaches for comparison purposes. The first approach represents conventional centralized data sharing (CL-based) and is the optimal approach but requires complete sharing of data which we wish to avoid. The results however establish a benchmark for the other three distributed approaches; federated learning (FL-based), incremental learning (IL-based), and cyclic incremental learning (CIL-based). We evaluate these approaches in terms of prediction accuracy and capacity to characterise human decision-making strategies in the context of the computational modelling experiments considered here. The results demonstrate that the FL-based model achieves performance most comparable to that of a centralized data sharing approach. This demonstrate that federated learning has value in scaling data science methods to data collected in computational modelling contexts in circumstances where data sharing is not convenient, practical or permissible.


2021 ◽  
Author(s):  
lili Zhang ◽  
Himanshu Vashisht ◽  
Andrey Totev ◽  
Nam Trinh ◽  
Tomas Ward

Deep learning models, especially RNN models, are potentially powerful tools for representing the complex learning processes and decision-making strategies used by humans. Such neural network models make fewer assumptions about the underlying mechanisms thus providing experimental flexibility in terms of applicability. However this comes at the cost of requiring a larger number of tunable parameters requiring significantly more training and representative data for effective learning. This presents practical challenges given that most computational modelling experiments involve relatively small numbers of subjects, which while adequate for conventional modelling using low dimensional parameter spaces, leads to sub-optimal model training when adopting deeper neural network approaches. Laboratory collaboration is a natural way of increasing data availability however, data sharing barriers among laboratories as necessitated by data protection regulations encourage us to seek alternative methods for collaborative data science. Distributed learning, especially federated learning, which supports the preservation of data privacy, is a promising method for addressing this issue. To verify the reliability and feasibility of applying federated learning to train neural networks models used in the characterisation of human decision making, we conducted experiments on a real-world, many-labs data pool including experimentally significant data-sets from ten independent studies. The performance of single models that were trained on single laboratory data-sets was poor, especially those with small numbers of subjects. This unsurprising finding supports the need for larger and more diverse data-sets to train more generalised and reliable models. To that end we evaluated four collaborative approaches for comparison purposes. The first approach represents conventional centralized data sharing (CL-based) and is the optimal approach but requires complete sharing of data which we wish to avoid. The results however establish a benchmark for the other three distributed approaches; federated learning (FL-based), incremental learning (IL-based), and cyclic incremental learning (CIL-based). We evaluate these approaches in terms of prediction accuracy and capacity to characterise human decision-making strategies in the context of the computational modelling experiments considered here. The results demonstrate that the FL-based model achieves performance most comparable to that of a centralized data sharing approach. This demonstrate that federated learning has value in scaling data science methods to data collected in computational modelling contexts in circumstances where data sharing is not convenient, practical or permissible.


Author(s):  
Ahmet Sayar ◽  
Geoffrey C. Fox ◽  
Marlon E. Pierce

Geographic information is critical for building disaster planning, crisis management, and early-warning systems. Decision making in geographic information systems (GIS) increasingly relies on analyses of spatial data in map-based formats. Maps are complex structures composed of layers created from distributed heterogeneous data belonging to the separate organizations. This chapter presents a distributed service architecture for managing the production of knowledge from distributed collections of observations and simulation data through integrated data-views. Integrated views are defined by a federation service (“federator”) located on top of the standard service components. Common GIS standards enable the construction of this system. However, compliance requirements for interoperability, such as XML-encoded data and domain specific data characteristics, have costs and performance overhead. The authors investigate issues of combining standard compliance with performance. Although their framework is designed for GIS, they extend the principles and requirements to general science domains and discuss how these may be applied.


10.2196/18087 ◽  
2020 ◽  
Vol 22 (7) ◽  
pp. e18087
Author(s):  
Christine Suver ◽  
Adrian Thorogood ◽  
Megan Doerr ◽  
John Wilbanks ◽  
Bartha Knoppers

Developing or independently evaluating algorithms in biomedical research is difficult because of restrictions on access to clinical data. Access is restricted because of privacy concerns, the proprietary treatment of data by institutions (fueled in part by the cost of data hosting, curation, and distribution), concerns over misuse, and the complexities of applicable regulatory frameworks. The use of cloud technology and services can address many of the barriers to data sharing. For example, researchers can access data in high performance, secure, and auditable cloud computing environments without the need for copying or downloading. An alternative path to accessing data sets requiring additional protection is the model-to-data approach. In model-to-data, researchers submit algorithms to run on secure data sets that remain hidden. Model-to-data is designed to enhance security and local control while enabling communities of researchers to generate new knowledge from sequestered data. Model-to-data has not yet been widely implemented, but pilots have demonstrated its utility when technical or legal constraints preclude other methods of sharing. We argue that model-to-data can make a valuable addition to our data sharing arsenal, with 2 caveats. First, model-to-data should only be adopted where necessary to supplement rather than replace existing data-sharing approaches given that it requires significant resource commitments from data stewards and limits scientific freedom, reproducibility, and scalability. Second, although model-to-data reduces concerns over data privacy and loss of local control when sharing clinical data, it is not an ethical panacea. Data stewards will remain hesitant to adopt model-to-data approaches without guidance on how to do so responsibly. To address this gap, we explored how commitments to open science, reproducibility, security, respect for data subjects, and research ethics oversight must be re-evaluated in a model-to-data context.


2020 ◽  
Author(s):  
Christine Suver ◽  
Adrian Thorogood ◽  
Megan Doerr ◽  
John Wilbanks ◽  
Bartha Knoppers

UNSTRUCTURED Developing or independently evaluating algorithms in biomedical research is difficult because of restrictions on access to clinical data. Access is restricted because of privacy concerns, the proprietary treatment of data by institutions (fueled in part by the cost of data hosting, curation, and distribution), concerns over misuse, and the complexities of applicable regulatory frameworks. The use of cloud technology and services can address many of the barriers to data sharing. For example, researchers can access data in high performance, secure, and auditable cloud computing environments without the need for copying or downloading. An alternative path to accessing data sets requiring additional protection is the model-to-data approach. In model-to-data, researchers submit algorithms to run on secure data sets that remain hidden. Model-to-data is designed to enhance security and local control while enabling communities of researchers to generate new knowledge from sequestered data. Model-to-data has not yet been widely implemented, but pilots have demonstrated its utility when technical or legal constraints preclude other methods of sharing. We argue that model-to-data can make a valuable addition to our data sharing arsenal, with 2 caveats. First, model-to-data should only be adopted where necessary to supplement rather than replace existing data-sharing approaches given that it requires significant resource commitments from data stewards and limits scientific freedom, reproducibility, and scalability. Second, although model-to-data reduces concerns over data privacy and loss of local control when sharing clinical data, it is not an ethical panacea. Data stewards will remain hesitant to adopt model-to-data approaches without guidance on how to do so responsibly. To address this gap, we explored how commitments to open science, reproducibility, security, respect for data subjects, and research ethics oversight must be re-evaluated in a model-to-data context.


Sign in / Sign up

Export Citation Format

Share Document