XRootd, disk-based, caching proxy for optimization of data access, data placement and data replication

The International Agency for Research on Cancer (IARC) proposed this international historical cohort study trying to solve the controversy about the increased risk of cancer in the workers of the Pulp and Paper Industry. One of the most important aspects presented by this study in Brazil was the strategies used to overcome the methodological challenges, such as: data access, data accuracy, data availability, multiple data sources, and the large follow-up period. Through multiple strategies it was possible to build a Brazilian cohort of 3,622 workers, to follow them with a 93 percent success rate and to identify in 99 percent of the cases the cause of death. This paper, has evaluated the data access, data accuracy and the effectiveness of the strategies used and the different sources of data.

Download Full-text

RADAR: Runtime Asymmetric Data-Access Driven Scientific Data Replication

Lecture Notes in Computer Science - Supercomputing ◽

10.1007/978-3-319-07518-1_19 ◽

2014 ◽

pp. 296-313 ◽

Cited By ~ 10

Author(s):

John Jenkins ◽

Xiaocheng Zou ◽

Houjun Tang ◽

Dries Kimpe ◽

Robert Ross ◽

...

Keyword(s):

Data Replication ◽

Data Access ◽

Scientific Data ◽

Asymmetric Data

Download Full-text

Data Literacy and Citizenship

Advances in Educational Technologies and Instructional Design - Handbook of Research on Driving STEM Learning With Educational Technologies ◽

10.4018/978-1-5225-2026-9.ch004 ◽

2017 ◽

pp. 65-79 ◽

Cited By ~ 3

Author(s):

Eddy L. Borges-Rey

Keyword(s):

Big Data ◽

Teaching And Learning ◽

New Technologies ◽

Data Access ◽

Data Sampling ◽

Public And Private ◽

Data Literacy ◽

Vital Fluid ◽

And Mathematics ◽

Access Data

This chapter explores the challenges that emerge from a narrow understanding of the principles underpinning Big data, framed in the context of the teaching and learning of Science and Mathematics. This study considers the materiality of computerised data and examines how notions of data access, data sampling, data sense-making and data collection are nowadays contested by datafied public and private bodies, hindering the capacity of citizens to effectively understand and make better use of the data they generate or engage with. The study offers insights from secondary and documentary research and its results suggest that understanding data in less constraining terms, namely: a) as capable of secondary agency, b) as the vital fluid of societal institutions, c) as gathered or accessed by new data brokers and through new technologies and techniques, and d) as mediated by the constant interplay between public and corporate spheres and philosophies, could greatly enhance the teaching and learning of Science and Mathematics in the framework of current efforts to advance data literacy.

Download Full-text

Monitoring of a Grid Storage Virtualization Service

International Journal of Grid and High Performance Computing ◽

10.4018/jghpc.2013010104 ◽

2013 ◽

Vol 5 (1) ◽

pp. 53-69

Author(s):

Jacques Jorda ◽

Aurélien Ortiz ◽

Abdelaziz M’zoughi ◽

Salam Traboulsi

Keyword(s):

Monitoring System ◽

Data Storage ◽

Large Scale ◽

Distributed Storage ◽

Storage System ◽

Data Access ◽

Data Placement ◽

Workload Prediction ◽

Storage Virtualization

Grid computing is commonly used for large scale application requiring huge computation capabilities. In such distributed architectures, the data storage on the distributed storage resources must be handled by a dedicated storage system to ensure the required quality of service. In order to simplify the data placement on nodes and to increase the performance of applications, a storage virtualization layer can be used. This layer can be a single parallel filesystem (like GPFS) or a more complex middleware. The latter is preferred as it allows the data placement on the nodes to be tuned to increase both the reliability and the performance of data access. Thus, in such a middleware, a dedicated monitoring system must be used to ensure optimal performance. In this paper, the authors briefly introduce the Visage middleware – a middleware for storage virtualization. They present the most broadly used grid monitoring systems, and explain why they are not adequate for virtualized storage monitoring. The authors then present the architecture of their monitoring system dedicated to storage virtualization. We introduce the workload prediction model used to define the best node for data placement, and show on a simple experiment its accuracy.

Download Full-text

Consistency of Replicated Datasets in Grid Computing

Handbook of Research on Grid Technologies and Utility Computing ◽

10.4018/978-1-60566-184-1.ch006 ◽

2009 ◽

pp. 49-58

Author(s):

Gianni Pucciani ◽

Flavia Donno ◽

Andrea Domenici ◽

Heinz Stockinger

Keyword(s):

Distributed Systems ◽

Fault Tolerance ◽

Grid Computing ◽

Data Management ◽

Data Replication ◽

Data Access ◽

Grid Middleware ◽

Pros And Cons ◽

Consistency Problem ◽

Replica Consistency

Data replication is a well-known technique used in distributed systems in order to improve fault tolerance and make data access faster. Several copies of a dataset are created and placed at different nodes, so that users can access the replica closest to them, and at the same time the data access load is distributed among the replicas. In today’s Grid middleware solutions, data management services allow users to replicate datasets (i.e., flat files or databases) among storage elements within a Grid, but replicas are often considered read-only because of the absence of mechanisms able to propagate updates and enforce replica consistency. This entry analyzes the replica consistency problem and provides hints for the development of a Replica Consistency Service, highlighting the main issues and pros and cons of several approaches.

Download Full-text

Data replication techniques in the mobile ad hoc networks

International Journal of Pervasive Computing and Communications ◽

10.1108/ijpcc-06-2019-0051 ◽

2019 ◽

Vol 15 (3/4) ◽

pp. 174-198

Author(s):

A. Abdollahi Nami ◽

L. Rajabion

Keyword(s):

Ad Hoc ◽

Data Replication ◽

Data Access ◽

Data Consistency ◽

Mobility Limitations ◽

Data Accessibility ◽

Content Type ◽

Advantages And Disadvantages ◽

Replica Allocation ◽

The Future

Purpose A mobile ad hoc network (MANET) enables providers and customers to communicate without a fixed infrastructure. Databases are extended on MANETs to have easy data access and update. As the energy and mobility limitations of both servers and clients affect the availability of data in MANETs, these data are replicated. The purpose of this paper is to provide a literature review of data replication issues and classify the available strategies based on the issues they addressed. Design/methodology/approach The selected articles are reviewed based on the defined criteria. Also, the differences, the advantages and disadvantages of these techniques are described. The methods in the literature can be categorized into three groups, including cluster-based, location-based and group-based mechanisms. Findings High flexibility and data consistency are the features of cluster-based mechanisms. The location-based mechanisms are also appropriate for replica allocation, and they mostly have low network traffic and delay. Also, the group-based mechanism has high data accessibility compared to other mechanisms. Data accessibility and time have got more attention to data replication techniques. Scalability as an important parameter must be considered more in the future. The reduction of storage cost in MANETs is the main goal of data replication. Researchers have to consider the cost parameter when another parameter will be influenced. Research limitations/implications Data replication in MANETs has been covered in different available sources such as Web pages, technical reports, academic publications and editorial notes. The articles published in national journals and conferences are ignored in this study. This study includes articles from academic main international journals to get the best capability. Originality/value The paper reviews the past and the state-of-the-art mechanisms in data replication in MANET. Exclusively, data replication’s main goal, existing challenges, research terminologies and mechanisms in MANET are summarized using the answers to the research questions. This method will help researchers in the future to develop more effective data replication method in MANET.

Download Full-text

A High-Speed Railway Data Placement Strategy Based on Cloud Computing

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.135-136.43 ◽

2011 ◽

Vol 135-136 ◽

pp. 43-49

Author(s):

Han Ning Wang ◽

Wei Xiang Xu ◽

Chao Long Jia

Keyword(s):

Cloud Computing ◽

High Speed ◽

Data Access ◽

Interval Mapping ◽

Data Placement ◽

Programming Algorithm ◽

High Speed Railway ◽

Mapping Algorithm ◽

Data Intensive ◽

Study Results

The application of high-speed railway data, which is an important component of China's transportation science data sharing, has embodied the typical characteristics of data-intensive computing. A reasonable and effective data placement strategy is needed to deploy and execute data-intensive applications in the cloud computing environment. Study results of current data placement approaches have been analyzed and compared in this paper. Combining the semi-definite programming algorithm with the dynamic interval mapping algorithm, a hierarchical structure data placement strategy is proposed. The semi-definite programming algorithm is suitable for the placement of files with various replications, ensuring that different replications of a file are placed on different storage devices. And the dynamic interval mapping algorithm could guarantee better self-adaptability of the data storage system. It has been proved both by theoretical analysis and experiment demonstration that a hierarchical data placement strategy could guarantee the self-adaptability, data reliability and high-speed data access for large-scale networks.

Download Full-text

A Connection Access Mechanism of Distributed Network based on Block Chain

International Journal of Circuits, Systems and Signal Processing ◽

10.46300/9106.2022.16.27 ◽

2022 ◽

Vol 16 ◽

pp. 224-231

Author(s):

Xianfei Zhou ◽

Hongfang Cheng ◽

Fulong Chen

Keyword(s):

Traditional Method ◽

Feature Detection ◽

Data Access ◽

Strategy Management ◽

Distributed Network ◽

Network Connection ◽

Fuzzy Clustering Method ◽

Block Chain ◽

The Impact ◽

Access Data

Cross-border payment optimization technology based on block chain has become a hot spot in the industry. The traditional method mainly includes the block feature detection method, the fuzzy access method, the adaptive scheduling method, which perform related feature extraction and quantitative regression analysis on the collected distributed network connection access data, and combine the fuzzy clustering method to optimize the data access design, and realize the group detection and identification of data in the block chain. However, the traditional method has a large computational overhead for distributed network connection access, and the packet detection capability is not good. This paper constructs a statistical sequence model of adaptive connection access data to extract the descriptive statistical features of the distributed network block chain adaptive connection access data similarity. The performance of the strategy retrieval efficiency in the experiment is tested based on the strategy management method. The experiment performs matching query tests on the test sets of different query sizes. The different parameters for error rate and search delay test are set to evaluate the impact of different parameters on retrieval performance. The calculation method of single delay is the total delay or the total number of matches. The optimization effect is mainly measured by the retrieval delay of the strategy in the strategy management contract; the smaller the delay, the higher the execution efficiency, and the better the retrieval optimization effect.

Download Full-text

A Memory Architecture Design for High-Performance Cloud Computing

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.532-533.677 ◽

2012 ◽

Vol 532-533 ◽

pp. 677-681

Author(s):

Li Qun Luo ◽

Si Jin He

Keyword(s):

High Performance ◽

Large Scale ◽

Storage System ◽

Data Access ◽

Memory Model ◽

Low Latency ◽

Memory Architecture ◽

Distributed Environment ◽

Access Data ◽

Performance Computing

The advent of cloud is drastically changing the High Performance Computing (HPC) application scenarios. Current virtual machine-based IaaS architectures are not designed for HPC applications. This paper presents a new cloud oriented storage system by constructing a large scale memory grid in a distributed environment in order to support low latency data access of HPC applications. This Cloud Memory model is built through the implementation of a private virtual file system (PVFS) upon virtual operating system (OS) that allows HPC applications to access data in such a way that Cloud Memory can access local disks in the same fashion.

Download Full-text

The Data Tags Suite (DATS) model for discovering data access and use requirements

GigaScience ◽

10.1093/gigascience/giz165 ◽

2020 ◽

Vol 9 (2) ◽

Cited By ~ 1

Author(s):

George Alter ◽

Alejandra Gonzalez-Beltran ◽

Lucila Ohno-Machado ◽

Philippe Rocca-Serra

Keyword(s):

Data Use ◽

Data Access ◽

Research Data ◽

Data Reuse ◽

Automated Systems ◽

Data Discovery ◽

Confidential Data ◽

Technical Systems ◽

Access Data ◽

Existing Data

Abstract Background Data reuse is often controlled to protect the privacy of subjects and patients. Data discovery tools need ways to inform researchers about restrictions on data access and re-use. Results We present elements in the Data Tags Suite (DATS) metadata schema describing data access, data use conditions, and consent information. DATS metadata are explained in terms of the administrative, legal, and technical systems used to protect confidential data. Conclusions The access and use metadata items in DATS are designed from the perspective of a researcher who wants to find and re-use existing data. We call for standard ways of describing informed consent and data use agreements that will enable automated systems for managing research data.

Download Full-text