Data Allocation Service ADAS for the Data Rebalancing of ATLAS

The distributed data management system Rucio manages all data of the ATLAS collaboration across the grid. Automation, such as data replication and data rebalancing are important to ensure proper operation and execution of the scientific workflow. In this proceedings, a new data allocation grid service based on machine learning is proposed. This learning agent takes subsets of the global datasets and proposes a better allocation based on the imposed cost metric, such as waiting time in the workflow. As a service, it can be modularized and can run independently of the existing rebalancing and replication mechanisms. Furthermore, it collects data from other services and learns better allocation while running in the background. Apart from the user selecting datasets, other data services may consult this meta-heuristic service for improved data placement. Network and storage utilization is also taken into account.

Download Full-text

Security Test by using F T M and Data Allocation Strategies on Leakage Detection

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v4i2b1.3227 ◽

2005 ◽

Vol 4 (2) ◽

pp. 393-400

Author(s):

Pallavali Radha ◽

G. Sireesha

Keyword(s):

Third Party ◽

Distributed Data ◽

Data Allocation ◽

Security Testing ◽

Sensitive Data ◽

Sample Data ◽

The One ◽

Security Test ◽

Data Request ◽

Allocation Strategies

The data distributors work is to give sensitive data to a set of presumably trusted third party agents.The data i.e., sent to these third parties are available on the unauthorized places like web and or some ones systems, due to data leakage. The distributor must know the way the data was leaked from one or more agents instead of as opposed to having been independently gathered by other means. Our new proposal on data allocation strategies will improve the probability of identifying leakages along with Security attacks typically result from unintended behaviors or invalid inputs. Â Due to too many invalid inputs in the real world programs is labor intensive about security testing.The most desirable thing is to automate or partially automate security-testing process. In this paper we represented Predicate/ Transition nets approach for security tests automated generationby using formal threat models to detect the agents using allocation strategies without modifying the original data.The guilty agent is the one who leaks the distributed data. To detect guilty agents more effectively the idea is to distribute the data intelligently to agents based on sample data request and explicit data request. The fake object implementation algorithms will improve the distributor chance of detecting guilty agents.

Download Full-text

Product data allocation for distributed product data management system

Computers in Industry ◽

10.1016/s0166-3615(01)00152-x ◽

2002 ◽

Vol 47 (3) ◽

pp. 289-298 ◽

Cited By ~ 8

Author(s):

K.K. Leong ◽

K.M. Yu ◽

W.B. Lee

Keyword(s):

Data Management ◽

Management System ◽

Product Data Management ◽

Data Management System ◽

Data Allocation ◽

Product Data ◽

Product Data Management System

Download Full-text

A distributed data management system to support large-scale data analysis

Journal of Systems and Software ◽

10.1016/j.jss.2018.11.007 ◽

2019 ◽

Vol 148 ◽

pp. 105-115 ◽

Cited By ~ 6

Author(s):

Tamer Z. Emara ◽

Joshua Zhexue Huang

Keyword(s):

Data Analysis ◽

Data Management ◽

Management System ◽

Large Scale ◽

Data Management System ◽

Distributed Data ◽

Distributed Data Management ◽

Large Scale Data ◽

Scale Data

Download Full-text

Experience of a low-maintenance distributed data management system

Journal of Physics Conference Series ◽

10.1088/1742-6596/513/3/032095 ◽

2014 ◽

Vol 513 (3) ◽

pp. 032095 ◽

Cited By ~ 1

Author(s):

Wataru Takase ◽

Yoshimi Matsumoto ◽

Adil Hasan ◽

Francesca Di Lodovico ◽

Yoshiyuki Watase ◽

...

Keyword(s):

Data Management ◽

Management System ◽

Data Management System ◽

Distributed Data ◽

Distributed Data Management

Download Full-text

Keysystems in large systems implementing distributed data processing and storage technologies

Highly available systems ◽

10.18127/j20729472-202103-01 ◽

2021 ◽

Author(s):

V.G. Belenkov ◽

V.I. Korolev ◽

V.I. Budzko ◽

D.A. Melnikov

Keyword(s):

Data Processing ◽

Distributed Processing ◽

Specific Work ◽

Distributed Data ◽

Distributed Data Processing ◽

Technical Specifications ◽

The Creation ◽

Processing And Storage ◽

Storage Technologies ◽

And Storage

The article discusses the features of the use of the cryptographic information protection means (CIPM)in the environment of distributed processing and storage of data of large information and telecommunication systems (LITS).A brief characteristic is given of the properties of the cryptographic protection control subsystem - the key system (CS). A description is given of symmetric and asymmetric cryptographic systems, required to describe the problem of using KS in LITS.Functional and structural models of the use of KS and CIPM in LITS, are described. Generalized information about the features of using KS in LITS is given. The obtained results form the basis for further work on the development of the architecture and principles of KS construction in LITS that implement distributed data processing and storage technologies. They can be used both as a methodological guide, and when carrying out specific work on the creation and development of systems that implement these technologies, as well as when forming technical specifications for the implementation of work on the creation of such systems.

Download Full-text

A Two-Level Fuzzy Value-Based Replica Replacement Algorithm in Data Grids

Fuzzy Systems ◽

10.4018/978-1-5225-1908-9.ch023 ◽

2017 ◽

pp. 516-539

Author(s):

Nazanin Saadat ◽

Amir Masoud Rahmani

Keyword(s):

Data Grid ◽

Data Availability ◽

Distributed Data ◽

Similar Data ◽

Replacement Algorithm ◽

Minimum Latency ◽

Network Usage ◽

Effective Network ◽

And Storage ◽

Replica Replacement

One of the challenges of data grid is to access widely distributed data fast and efficiently and providing maximum data availability with minimum latency. Data replication is an efficient way used to address this challenge by replicating and storing replicas, making it possible to access similar data in different locations of the data grid and can shorten the time of getting the files. However, as the number and storage size of grid sites is limited and restricted, an optimized and effective replacement algorithm is needed to improve the efficiency of replication. In this paper, the authors propose a novel two-level replacement algorithm which uses Fuzzy Replica Preserving Value Evaluator System (FRPVES) for evaluating the value of each replica. The algorithm was tested using a grid simulator, OptorSim developed by European Data Grid projects. Results from simulation procedure show that the authors' proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, total number of replications and effective network usage.

Download Full-text

Identifying Privacy Risks in Distributed Data Services: A Model-Driven Approach

2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS) ◽

10.1109/icdcs.2018.00157 ◽

2018 ◽

Cited By ~ 1

Author(s):

Paul Grace ◽

Daniel Burns ◽

Geoffrey Neumann ◽

Brian Pickering ◽

Panos Melas ◽

...

Keyword(s):

Distributed Data ◽

Data Services ◽

Model Driven ◽

Privacy Risks ◽

Model Driven Approach

Download Full-text

Effectively and Efficiently Designing and Querying Parallel Relational Data Warehouses on Heterogeneous Database Clusters

Journal of Database Management ◽

10.4018/jdm.2012100102 ◽

2012 ◽

Vol 23 (4) ◽

pp. 17-51 ◽

Cited By ~ 12

Author(s):

Ladjel Bellatreche ◽

Alfredo Cuzzocrea ◽

Soumia Benkrid

Keyword(s):

Storage Capacity ◽

Cost Model ◽

Hill Climbing ◽

Data Allocation ◽

Cost Models ◽

Data Warehouses ◽

Heterogeneous Database ◽

Traditional Design ◽

Processing Power ◽

And Storage

In this paper, a comprehensive methodology for designing and querying Parallel Rational Data Warehouses (PRDW) over database clusters, called Fragmentation & Allocation (F&A) is proposed. F&A assumes that cluster nodes are heterogeneous in processing power and storage capacity, contrary to traditional design approaches that assume that cluster nodes are instead homogeneous, and fragmentation and allocation phases are performed in a simultaneous manner. In classical approaches, two different cost models are used to perform fragmentation and allocation, separately, whereas F&A makes use of one cost model that considers fragmentation and allocation parameters simultaneously. Therefore, according to the F&A methodology proposed, the allocation phase/decision is done at fragmentation. At the fragmentation phase, F&A uses two well-known algorithms, namely Hill Climbing (HC) and Genetic Algorithm (GA), which the authors adapt to the main PRDW design problem over heterogeneous database clusters, as these algorithms are capable of taking into account the heterogeneous characteristics of the reference application scenario. At the allocation phase, F&A introduces an innovative matrix-based formalism capable of capturing the interactions among fragments, input queries, and cluster node characteristics, driving the data allocation task accordingly, and a related affinity-based algorithm, called F&A-ALLOC. Finally, their proposal is experimentally assessed and validated against the widely-known data warehouse benchmark APB-1 release II.

Download Full-text

Distributed Data Collection for the Next Generation ATLAS EventIndex Project

EPJ Web of Conferences ◽

10.1051/epjconf/201921404010 ◽

2019 ◽

Vol 214 ◽

pp. 04010

Author(s):

Álvaro Fernández Casaní ◽

Dario Barberis ◽

Javier Sánchez ◽

Carlos García Montoro ◽

Santiago González de la Hoz ◽

...

Keyword(s):

Data Collection ◽

Current Approach ◽

Distributed Data ◽

Future Data ◽

Data Files ◽

Complete Catalogue ◽

And Performance ◽

Processing And Storage ◽

Storage Technologies ◽

And Storage

The ATLAS EventIndex currently runs in production in order to build a complete catalogue of events for experiments with large amounts of data. The current approach is to index all final produced data files at CERN Tier0, and at hundreds of grid sites, with a distributed data collection architecture using Object Stores to temporarily maintain the conveyed information, with references to them sent with a Messaging System. The final backend of all the indexed data is a central Hadoop infrastructure at CERN; an Oracle relational database is used for faster access to a subset of this information. In the future of ATLAS, instead of files, the event should be the atomic information unit for metadata, in order to accommodate future data processing and storage technologies. Files will no longer be static quantities, possibly dynamically aggregating data, and also allowing event-level granularity processing in heavily parallel computing environments. It also simplifies the handling of loss and or extension of data. In this sense the EventIndex may evolve towards a generalized whiteboard, with the ability to build collections and virtual datasets for end users. This proceedings describes the current Distributed Data Collection Architecture of the ATLAS EventIndex project, with details of the Producer, Consumer and Supervisor entities, and the protocol and information temporarily stored in the ObjectStore. It also shows the data flow rates and performance achieved since the new Object Store as temporary store approach was put in production in July 2017. We review the challenges imposed by the expected increasing rates that will reach 35 billion new real events per year in Run 3, and 100 billion new real events per year in Run 4. For simulated events the numbers are even higher, with 100 billion events/year in run 3, and 300 billion events/year in run 4. We also outline the challenges we face in order to accommodate future use cases in the EventIndex.

Download Full-text

The data management of heterogeneous resources in Belle II

EPJ Web of Conferences ◽

10.1051/epjconf/201921404031 ◽

2019 ◽

Vol 214 ◽

pp. 04031

Author(s):

Malachi Schram

Keyword(s):

Data Management ◽

High Speed ◽

Data Management System ◽

Distributed Data ◽

Global Networks ◽

Distributed Data Management ◽

Geographically Distributed ◽

Belle Ii ◽

Belle Experiment ◽

Heterogeneous Resources

The Belle II experiment at the SuperKEKB collider in Tsukuba, Japan, has started taking physics data in early 2018 and plans to accumulate 50 ab-1, which is approximately 50 times more data than the Belle experiment. The collaboration expects it will require managing and processing approximately 200 PB of data. Computing at this scale requires efficient and coordinated use of the geographically distributed compute resources in North America, Asia and Europe and will take advantage of high-speed global networks. We present the general Belle II the distributed data management system and computing results from the first phase of data taking.

Download Full-text