scholarly journals Data Allocation Service ADAS for the Data Rebalancing of ATLAS

2019 ◽  
Vol 214 ◽  
pp. 06012
Author(s):  
Ralf Vamosi ◽  
Mario Lassnig ◽  
Erich Schikuta

The distributed data management system Rucio manages all data of the ATLAS collaboration across the grid. Automation, such as data replication and data rebalancing are important to ensure proper operation and execution of the scientific workflow. In this proceedings, a new data allocation grid service based on machine learning is proposed. This learning agent takes subsets of the global datasets and proposes a better allocation based on the imposed cost metric, such as waiting time in the workflow. As a service, it can be modularized and can run independently of the existing rebalancing and replication mechanisms. Furthermore, it collects data from other services and learns better allocation while running in the background. Apart from the user selecting datasets, other data services may consult this meta-heuristic service for improved data placement. Network and storage utilization is also taken into account.

2005 ◽  
Vol 4 (2) ◽  
pp. 393-400
Author(s):  
Pallavali Radha ◽  
G. Sireesha

The data distributors work is to give sensitive data to a set of presumably trusted third party agents.The data i.e., sent to these third parties are available on the unauthorized places like web and or some ones systems, due to data leakage. The distributor must know the way the data was leaked from one or more agents instead of as opposed to having been independently gathered by other means. Our new proposal on data allocation strategies will improve the probability of identifying leakages along with Security attacks typically result from unintended behaviors or invalid inputs.  Due to too many invalid inputs in the real world programs is labor intensive about security testing.The most desirable thing is to automate or partially automate security-testing process. In this paper we represented Predicate/ Transition nets approach for security tests automated generationby using formal threat models to detect the agents using allocation strategies without modifying the original data.The guilty agent is the one who leaks the distributed data. To detect guilty agents more effectively the idea is to distribute the data intelligently to agents based on sample data request and explicit data request. The fake object implementation algorithms will improve the distributor chance of detecting guilty agents.


2014 ◽  
Vol 513 (3) ◽  
pp. 032095 ◽  
Author(s):  
Wataru Takase ◽  
Yoshimi Matsumoto ◽  
Adil Hasan ◽  
Francesca Di Lodovico ◽  
Yoshiyuki Watase ◽  
...  

Author(s):  
V.G. Belenkov ◽  
V.I. Korolev ◽  
V.I. Budzko ◽  
D.A. Melnikov

The article discusses the features of the use of the cryptographic information protection means (CIPM)in the environment of distributed processing and storage of data of large information and telecommunication systems (LITS).A brief characteristic is given of the properties of the cryptographic protection control subsystem - the key system (CS). A description is given of symmetric and asymmetric cryptographic systems, required to describe the problem of using KS in LITS.Functional and structural models of the use of KS and CIPM in LITS, are described. Generalized information about the features of using KS in LITS is given. The obtained results form the basis for further work on the development of the architecture and principles of KS construction in LITS that implement distributed data processing and storage technologies. They can be used both as a methodological guide, and when carrying out specific work on the creation and development of systems that implement these technologies, as well as when forming technical specifications for the implementation of work on the creation of such systems.


Fuzzy Systems ◽  
2017 ◽  
pp. 516-539
Author(s):  
Nazanin Saadat ◽  
Amir Masoud Rahmani

One of the challenges of data grid is to access widely distributed data fast and efficiently and providing maximum data availability with minimum latency. Data replication is an efficient way used to address this challenge by replicating and storing replicas, making it possible to access similar data in different locations of the data grid and can shorten the time of getting the files. However, as the number and storage size of grid sites is limited and restricted, an optimized and effective replacement algorithm is needed to improve the efficiency of replication. In this paper, the authors propose a novel two-level replacement algorithm which uses Fuzzy Replica Preserving Value Evaluator System (FRPVES) for evaluating the value of each replica. The algorithm was tested using a grid simulator, OptorSim developed by European Data Grid projects. Results from simulation procedure show that the authors' proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, total number of replications and effective network usage.


2012 ◽  
Vol 23 (4) ◽  
pp. 17-51 ◽  
Author(s):  
Ladjel Bellatreche ◽  
Alfredo Cuzzocrea ◽  
Soumia Benkrid

In this paper, a comprehensive methodology for designing and querying Parallel Rational Data Warehouses (PRDW) over database clusters, called Fragmentation & Allocation (F&A) is proposed. F&A assumes that cluster nodes are heterogeneous in processing power and storage capacity, contrary to traditional design approaches that assume that cluster nodes are instead homogeneous, and fragmentation and allocation phases are performed in a simultaneous manner. In classical approaches, two different cost models are used to perform fragmentation and allocation, separately, whereas F&A makes use of one cost model that considers fragmentation and allocation parameters simultaneously. Therefore, according to the F&A methodology proposed, the allocation phase/decision is done at fragmentation. At the fragmentation phase, F&A uses two well-known algorithms, namely Hill Climbing (HC) and Genetic Algorithm (GA), which the authors adapt to the main PRDW design problem over heterogeneous database clusters, as these algorithms are capable of taking into account the heterogeneous characteristics of the reference application scenario. At the allocation phase, F&A introduces an innovative matrix-based formalism capable of capturing the interactions among fragments, input queries, and cluster node characteristics, driving the data allocation task accordingly, and a related affinity-based algorithm, called F&A-ALLOC. Finally, their proposal is experimentally assessed and validated against the widely-known data warehouse benchmark APB-1 release II.


2019 ◽  
Vol 214 ◽  
pp. 04010
Author(s):  
Álvaro Fernández Casaní ◽  
Dario Barberis ◽  
Javier Sánchez ◽  
Carlos García Montoro ◽  
Santiago González de la Hoz ◽  
...  

The ATLAS EventIndex currently runs in production in order to build a complete catalogue of events for experiments with large amounts of data. The current approach is to index all final produced data files at CERN Tier0, and at hundreds of grid sites, with a distributed data collection architecture using Object Stores to temporarily maintain the conveyed information, with references to them sent with a Messaging System. The final backend of all the indexed data is a central Hadoop infrastructure at CERN; an Oracle relational database is used for faster access to a subset of this information. In the future of ATLAS, instead of files, the event should be the atomic information unit for metadata, in order to accommodate future data processing and storage technologies. Files will no longer be static quantities, possibly dynamically aggregating data, and also allowing event-level granularity processing in heavily parallel computing environments. It also simplifies the handling of loss and or extension of data. In this sense the EventIndex may evolve towards a generalized whiteboard, with the ability to build collections and virtual datasets for end users. This proceedings describes the current Distributed Data Collection Architecture of the ATLAS EventIndex project, with details of the Producer, Consumer and Supervisor entities, and the protocol and information temporarily stored in the ObjectStore. It also shows the data flow rates and performance achieved since the new Object Store as temporary store approach was put in production in July 2017. We review the challenges imposed by the expected increasing rates that will reach 35 billion new real events per year in Run 3, and 100 billion new real events per year in Run 4. For simulated events the numbers are even higher, with 100 billion events/year in run 3, and 300 billion events/year in run 4. We also outline the challenges we face in order to accommodate future use cases in the EventIndex.


2019 ◽  
Vol 214 ◽  
pp. 04031
Author(s):  
Malachi Schram

The Belle II experiment at the SuperKEKB collider in Tsukuba, Japan, has started taking physics data in early 2018 and plans to accumulate 50 ab-1, which is approximately 50 times more data than the Belle experiment. The collaboration expects it will require managing and processing approximately 200 PB of data. Computing at this scale requires efficient and coordinated use of the geographically distributed compute resources in North America, Asia and Europe and will take advantage of high-speed global networks. We present the general Belle II the distributed data management system and computing results from the first phase of data taking.


Sign in / Sign up

Export Citation Format

Share Document