scholarly journals Distributed Data Collection for the Next Generation ATLAS EventIndex Project

2019 ◽  
Vol 214 ◽  
pp. 04010
Author(s):  
Álvaro Fernández Casaní ◽  
Dario Barberis ◽  
Javier Sánchez ◽  
Carlos García Montoro ◽  
Santiago González de la Hoz ◽  
...  

The ATLAS EventIndex currently runs in production in order to build a complete catalogue of events for experiments with large amounts of data. The current approach is to index all final produced data files at CERN Tier0, and at hundreds of grid sites, with a distributed data collection architecture using Object Stores to temporarily maintain the conveyed information, with references to them sent with a Messaging System. The final backend of all the indexed data is a central Hadoop infrastructure at CERN; an Oracle relational database is used for faster access to a subset of this information. In the future of ATLAS, instead of files, the event should be the atomic information unit for metadata, in order to accommodate future data processing and storage technologies. Files will no longer be static quantities, possibly dynamically aggregating data, and also allowing event-level granularity processing in heavily parallel computing environments. It also simplifies the handling of loss and or extension of data. In this sense the EventIndex may evolve towards a generalized whiteboard, with the ability to build collections and virtual datasets for end users. This proceedings describes the current Distributed Data Collection Architecture of the ATLAS EventIndex project, with details of the Producer, Consumer and Supervisor entities, and the protocol and information temporarily stored in the ObjectStore. It also shows the data flow rates and performance achieved since the new Object Store as temporary store approach was put in production in July 2017. We review the challenges imposed by the expected increasing rates that will reach 35 billion new real events per year in Run 3, and 100 billion new real events per year in Run 4. For simulated events the numbers are even higher, with 100 billion events/year in run 3, and 300 billion events/year in run 4. We also outline the challenges we face in order to accommodate future use cases in the EventIndex.

Author(s):  
V.G. Belenkov ◽  
V.I. Korolev ◽  
V.I. Budzko ◽  
D.A. Melnikov

The article discusses the features of the use of the cryptographic information protection means (CIPM)in the environment of distributed processing and storage of data of large information and telecommunication systems (LITS).A brief characteristic is given of the properties of the cryptographic protection control subsystem - the key system (CS). A description is given of symmetric and asymmetric cryptographic systems, required to describe the problem of using KS in LITS.Functional and structural models of the use of KS and CIPM in LITS, are described. Generalized information about the features of using KS in LITS is given. The obtained results form the basis for further work on the development of the architecture and principles of KS construction in LITS that implement distributed data processing and storage technologies. They can be used both as a methodological guide, and when carrying out specific work on the creation and development of systems that implement these technologies, as well as when forming technical specifications for the implementation of work on the creation of such systems.


2012 ◽  
Vol 49 (No. 10) ◽  
pp. 389-399 ◽  
Author(s):  
M. Trckova ◽  
L. Matlova ◽  
L. Dvorska ◽  
I. Pavlik

Feeding kaolin as a supplement to pigs for prevention of diarrheal diseases has been introduced into some farms in the CzechRepublic. Peat was used in the 1990s for a similar purpose; however, most farmers ceased feeding peat as a supplement because of its frequent contamination with conditionally pathogenic mycobacteria, esp. with Mycobacterium avium subsp. hominissuis. The aim of the present paper is to review available literature from the standpoint of the advantages and disadvantages related to feeding kaolin as a supplement to animals. Its positive effects exerted through the diet primarily consist in its adsorbent capability which may be useful for detoxification of the organism and for prevention of diarrheal diseases in pigs. Because the mechanism of action of kaolin fed as a supplement is unknown, a risk related to its potential interactions with other nutrient compounds of the diet exists. Therefore, it is necessary to investigate the effectiveness and safety of feeding kaolin in detail with regard to the health status and performance of each farm animal species. The disadvantage of kaolin use is its potential toxicity, provided it has been mined from the environment with natural or anthropogenic occurrence of toxic compounds. Another risk factor is a potential contamination of originally sterile kaolin with conditionally pathogenic mycobacteria from surface water, dust, soil, and other constituents of the environment in the mines during kaolin extraction, processing and storage.


2020 ◽  
Vol 14 (4) ◽  
pp. 507-520
Author(s):  
Adriane Chapman ◽  
Paolo Missier ◽  
Giulia Simonelli ◽  
Riccardo Torlone

Data processing pipelines that are designed to clean, transform and alter data in preparation for learning predictive models, have an impact on those models' accuracy and performance, as well on other properties, such as model fairness. It is therefore important to provide developers with the means to gain an in-depth understanding of how the pipeline steps affect the data, from the raw input to training sets ready to be used for learning. While other efforts track creation and changes of pipelines of relational operators, in this work we analyze the typical operations of data preparation within a machine learning process, and provide infrastructure for generating very granular provenance records from it, at the level of individual elements within a dataset. Our contributions include: (i) the formal definition of a core set of preprocessing operators, and the definition of provenance patterns for each of them, and (ii) a prototype implementation of an application-level provenance capture library that works alongside Python. We report on provenance processing and storage overhead and scalability experiments, carried out over both real ML benchmark pipelines and over TCP-DI, and show how the resulting provenance can be used to answer a suite of provenance benchmark queries that underpin some of the developers' debugging questions, as expressed on the Data Science Stack Exchange.


Author(s):  
Frank Theo Moerman ◽  
Kostadin Fikiin

Proper control and performance of evaporators in food refrigeration facilities are vital to provide a suitable temperature regime, safety, quality and wholesomeness of refrigerated products at minimum electricity costs. When humid air passes along the surfaces of a low-temperature evaporator, frost is usually formed, which decreases the heat transfer efficiency. Frosting and defrosting phenomena have been extensively investigated for different industrial scenarios and extensive literature exists in the matter. However, no studies have been published so far to address in a comprehensive way the methods and patterns of evaporator defrosting as affected by hygienic design implications and criteria. This book chapter is intended to fill in this gap by enforcing hygienic imperatives in the evaporator design. Various design solutions and conditions of operation are considered as decisive in determining the amount, thickness and structure of the frost build-up. Advantages and drawbacks of diverse defrost methods are outlined with regards to contamination risks in refrigeration facilities.


Author(s):  
Zachary Nixon ◽  
Carl Childs ◽  
John Tarpley ◽  
Ben Shorr

ABSTRACT To address the growing detail, complexity, and volume of data collected and developed during oil spill response, and facilitate data sharing and conversion between data collection and storage and management systems across diverse parties to a response, the National Oceanic and Atmospheric Administration (NOAA) Office of Response and Restoration (ORR) has developed and published a data management standard for observational Shoreline Cleanup Assessment Technique (SCAT). The standard was cooperatively developed by NOAA and others in the response community over the past three years through a series of workshops and meetings. The standard is agnostic about physical spill environment, data collection methods, algorithms, software and computing environment, and requires only the most basic structured data to preserve the maximum flexibility for spill specific conditions and the unanticipated needs of future data collection. NOAA is also in the process of expanding the role of the DIVER (Data Integration Visualization Exploration and Reporting) centralized data warehouse and query tools used to house, query and visualize analytical results, field observations, photos and other information. As part of this effort, DIVER is being expanded to ingest and store SCAT data compliant with the standard using a SCAT data management standard. We anticipate that the use of the standard will be mandated as part of data sharing agreements put in place for future incidents for spills involving NOAA or other federal agencies. As such, we seek to widely disseminate information about the standard to the spill response community. Here, we discuss the components of the standard in detail, and provide information on available documentation, example data, file interchange formats, and methods to provide feedback to NOAA.


MRS Advances ◽  
2016 ◽  
Vol 1 (42) ◽  
pp. 2839-2855 ◽  
Author(s):  
Eric L Miller ◽  
Dimitrios Papageorgopoulos ◽  
Ned Stetson ◽  
Katie Randolph ◽  
David Peterson ◽  
...  

ABSTRACTThis paper provides an overview of the U.S. Department of Energy’s (DOE) hydrogen and fuel cell activities within the Office of Energy Efficiency and Renewable Energy (EERE), focusing on key targets, progress towards meeting those targets, and materials-related issues that need to be addressed. The most recent, state-of-the-art data on metrics such as cost, durability, and performance of fuel cell and hydrogen technologies are presented. Key technical accomplishments to date include a 50% reduction in the modeled high volume cost of fuel cells since 2006, and an 80% cost reduction for electrolyzers since 2002. The statuses of various hydrogen production, delivery, and storage technologies are also presented along with a summary of materials-related challenges for hydrogen infrastructure technologies such as compression, dispensing, seals, pipeline materials/embrittlement, and storage materials. Specific examples and areas requiring more research are discussed. Finally, future plans including EERE’s lab consortium approach such as HyMARC (Hydrogen Storage Materials Advanced Research Consortium) and FC-PAD (Fuel Cell Performance and Durability) Consortia, are summarized.


Sign in / Sign up

Export Citation Format

Share Document