Heat Prediction of High Energy Physical Data Based on LSTM Recurrent Neural Network

High-energy physics computing is a typical data-intensive calculation. Each year, petabytes of data needs to be analyzed, and data access performance is increasingly demanding. The tiered storage system scheme for building a unified namespace has been widely adopted. Generally, data is stored on storage devices with different performances and different prices according to different access frequency. When the heat of the data changes, the data is then migrated to the appropriate storage tier. At present, heuristic algorithms based on artificial experience are widely used in data heat prediction. Due to the differences in computing models of different users, the accuracy of prediction is low. A method for predicting future access popularity based on file access characteristics with the help of LSTM deep learning algorithm is proposed as the basis for data migration in hierarchical storage. This paper uses the real data of high-energy physics experiment LHAASO as an example for comparative testing. The results show that under the same test conditions, the model has higher prediction accuracy and stronger applicability than existing prediction models.

Download Full-text

Data access in the High Energy Physics community

10.22323/1.093.0006 ◽

2011 ◽

Cited By ~ 1

Author(s):

Fabrizio Furano

Keyword(s):

High Energy Physics ◽

Data Access ◽

High Energy ◽

Physics Community ◽

Energy Physics

Download Full-text

Striped Data Analysis Framework

EPJ Web of Conferences ◽

10.1051/epjconf/202024506042 ◽

2020 ◽

Vol 245 ◽

pp. 06042

Author(s):

Oliver Gutsche ◽

Igor Mandrichenko

Keyword(s):

Data Analysis ◽

Data Storage ◽

High Energy Physics ◽

Data Access ◽

High Energy ◽

Data Representation ◽

General Idea ◽

Common Data Model ◽

Local File ◽

Energy Physics

A columnar data representation is known to be an efficient way for data storage, specifically in cases when the analysis is often done based only on a small fragment of the available data structures. A data representation like Apache Parquet is a step forward from a columnar representation, which splits data horizontally to allow for easy parallelization of data analysis. Based on the general idea of columnar data storage, working on the [LDRD Project], we have developed a striped data representation, which, we believe, is better suited to the needs of High Energy Physics data analysis. A traditional columnar approach allows for efficient data analysis of complex structures. While keeping all the benefits of columnar data representations, the striped mechanism goes further by enabling easy parallelization of computations without requiring special hardware. We will present an implementation and some performance characteristics of such a data representation mechanism using a distributed no-SQL database or a local file system, unified under the same API and data representation model. The representation is efficient and at the same time simple so that it allows for a common data model and APIs for wide range of underlying storage mechanisms such as distributed no-SQL databases and local file systems. Striped storage adopts Numpy arrays as its basic data representation format, which makes it easy and efficient to use in Python applications. The Striped Data Server is a web service, which allows to hide the server implementation details from the end user, easily exposes data to WAN users, and allows to utilize well known and developed data caching solutions to further increase data access efficiency. We are considering the Striped Data Server as the core of an enterprise scale data analysis platform for High Energy Physics and similar areas of data processing. We have been testing this architecture with a 2TB dataset from a CMS dark matter search and plan to expand it to multiple 100 TB or even PB scale. We will present the striped format, Striped Data Server architecture and performance test results.

Download Full-text

Distributed Metadata Management of Mass Storage System in High Energy Physics

Journal of Physics Conference Series ◽

10.1088/1742-6596/898/6/062003 ◽

2017 ◽

Vol 898 ◽

pp. 062003

Author(s):

Qiulan Huang ◽

Ran Du ◽

YaoDong Cheng ◽

Jingyan Shi ◽

Gang Chen ◽

...

Keyword(s):

High Energy Physics ◽

Storage System ◽

High Energy ◽

Metadata Management ◽

Mass Storage ◽

Mass Storage System ◽

Energy Physics

Download Full-text

Adversarial domain adaptation to reduce sample bias of a high energy physics classifier

Machine Learning: Science and Technology ◽

10.1088/2632-2153/ac3dde ◽

2021 ◽

Author(s):

José Manuel Manuel Clavijo Columbié ◽

Paul Glaysher ◽

Jenia Jitsev ◽

Judith Maria Katzy

Keyword(s):

Neural Network ◽

High Energy Physics ◽

Domain Adaptation ◽

Learning Algorithm ◽

High Energy ◽

Machine Learning Algorithm ◽

Sample Bias ◽

The Difference ◽

Background Events ◽

Energy Physics

Abstract We apply adversarial domain adaptation to reduce sample bias in a classification machine learning algorithm. We add a gradient reversal layer to a neural network to simultaneously classify signal versus background events, while minimising the difference of the classifier response to a background sample using an alternative MC model. We show this on the example of simulated events at the LHC with $t\bar{t}H$ signal versus $t\bar{t}b\bar{b}$ background classification.

Download Full-text

EXPERIENCE WITH THE IBM ZISC036 NEURAL NETWORK CHIP

International Journal of Modern Physics C ◽

10.1142/s0129183195000460 ◽

1995 ◽

Vol 06 (04) ◽

pp. 579-584 ◽

Cited By ~ 5

Author(s):

CLARK S. LINDSEY ◽

THOMAS LINDBLAD ◽

GIVI SEKHNIAIDZE ◽

G. SZÉKELY ◽

M. MINERSKJÖLD

Keyword(s):

Neural Network ◽

Basis Function ◽

Processing Time ◽

First Generation ◽

High Energy Physics ◽

Learning Algorithm ◽

Middle Layer ◽

High Energy ◽

Instruction Set ◽

Energy Physics

The new IBM Zero Instruction Set Computer (ZISC) provides a radial basis function neural network. The first generation chip (ZISC036) allows for 64 8-bit inputs, 36 RBF neurons in the middle layer, and up to 16383 possible output categories. Forward processing takes 4μs with a 20MHz clock. Cascading multiple chips increases the number of available RBF’s with no increase in processing time. The chip also executes a learning algorithm. We report on tests of the ZISC with a high energy physics related task.

Download Full-text