scholarly journals Data Preparation for NA62

2019 ◽  
Vol 214 ◽  
pp. 02017
Author(s):  
Paul Laycock

In 2017, NA62 recorded over a petabyte of raw data, collecting around a billion events per day of running. Data are collected in bursts of 3-5 seconds, producing output files of a few gigabytes. A typical run, a sequence of bursts with the same detector configuration and similar experimental conditions, contains 1500 bursts and constitutes the basic unit for offline data processing. A sample of 100 random bursts is used to make timing calibrations of all detectors, after which every burst in the run is reconstructed. Finally the reconstructed events are filtered by physics channel with an average reduction factor of 20, and data quality metrics are calculated. Initially a bespoke data processing solution was implemented using a simple finite state machine with limited production system functionality. In 2017, the ATLAS Tier-0 team offered the use of their production system, together with the necessary support. Data processing workflows were rewritten with better error-handling and I/O operations were minimised, the reconstruction software was improved and conditions data handling was changed to follow best practices suggested by the HEP Software Foundation conditions database working group. This contribution describes the experience gained in using these tools and methods for data-processing on a petabyte scale experiment.

2017 ◽  
Vol 898 ◽  
pp. 052016 ◽  
Author(s):  
F H Barreiro ◽  
M Borodin ◽  
K De ◽  
D Golubkov ◽  
A Klimentov ◽  
...  

F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 967 ◽  
Author(s):  
Ting-Li Han ◽  
Yang Yang ◽  
Hua Zhang ◽  
Kai P. Law

Background: A challenge of metabolomics is data processing the enormous amount of information generated by sophisticated analytical techniques. The raw data of an untargeted metabolomic experiment are composited with unwanted biological and technical variations that confound the biological variations of interest. The art of data normalisation to offset these variations and/or eliminate experimental or biological biases has made significant progress recently. However, published comparative studies are often biased or have omissions. Methods: We investigated the issues with our own data set, using five different representative methods of internal standard-based, model-based, and pooled quality control-based approaches, and examined the performance of these methods against each other in an epidemiological study of gestational diabetes using plasma. Results: Our results demonstrated that the quality control-based approaches gave the highest data precision in all methods tested, and would be the method of choice for controlled experimental conditions. But for our epidemiological study, the model-based approaches were able to classify the clinical groups more effectively than the quality control-based approaches because of their ability to minimise not only technical variations, but also biological biases from the raw data. Conclusions: We suggest that metabolomic researchers should optimise and justify the method they have chosen for their experimental condition in order to obtain an optimal biological outcome.


2006 ◽  
Vol 81 (15-17) ◽  
pp. 1863-1867
Author(s):  
E. Barrera ◽  
M. Ruiz ◽  
S. López ◽  
D. Machón ◽  
J. Vega ◽  
...  

2021 ◽  
Vol 251 ◽  
pp. 02029
Author(s):  
Luisa Arrabito ◽  
Johan Bregeon ◽  
Patrick Maeght ◽  
Michèle Sanguillon ◽  
Andrei Tsaregorodtsev ◽  
...  

The Cherenkov Telescope Array (CTA) is the next-generation instrument in the very-high energy gamma ray astronomy domain. It will consist of tens of Cherenkov telescopes deployed in 2 arrays at La Palma (Spain) and Paranal (ESO, Chile) respectively. Currently under construction, CTA will start operations around 2023 for a duration of about 30 years. During operations CTA is expected to produce about 2 PB of raw data per year plus 5-20 PB of Monte Carlo data. The global data volume to be managed by the CTA archive, including all versions and copies, is of the order of 100 PB with a smooth growing profile. The associated processing needs are also very high, of the order of hundreds of millions of CPU HS06 hours per year. In order to optimize the instrument design and study its performances, during the preparatory phase (2010-2017) and the current construction phase, the CTA consortium has run massive Monte Carlo productions on the EGI grid infrastructure. In order to handle these productions and the future data processing, we have developed a production system based on the DIRAC framework. The current system is the result of several years of hardware infrastructure upgrades, software development and integration of different services like CVMFS and FTS. In this paper we present the current status of the CTA production system and its exploitation during the latest large-scale Monte Carlo campaigns.


Author(s):  
К. N. Rozhdestvenskaya

Introduction:A control system for a data processing network interacts with the network by sending commands and receiving responses. Such a control system is responsible for the network viability, and therefore should be analyzed, in particular, in terms of behavior over time, without exhaustive search for possible control options.Purpose:Studying and analyzing the behavior of a control system in a data processing network using mathematical modeling based on finite automata theory, and performing computer simulation of the obtained theoretical positions.Results:A finite state machine is constructed, presented in the form of a transition graph, reflecting the temporal behavior of a part of a specific control system in the data processing network by Plug-and-Play manager. Rules of conduct are specified, and the problem of the manager FSM analysis is defined. As a result, the control vector types have been obtained which lead the PnP manager to the correct temporal behavior. Computer simulation was performed using a script program in MatLab mathematical package. The simulation results are presented as time diagrams of finite state machine transitions. Its behavior varies depending on the incoming signals and the starting state of the machine. On the time diagrams, you can trace the behavior and transitions between states, estimate the frequency of getting into a particular state, or bypass the machine states.Practical relevance:The control vector types found for the PnP manager without an exhaustive search do not lead to incorrect situations in network data processing.


2021 ◽  
pp. 089443932110060
Author(s):  
Carina Cornesse ◽  
Barbara Felderer ◽  
Marina Fikel ◽  
Ulrich Krieger ◽  
Annelies G. Blom

Once recruited, probability-based online panels have proven to enable high-quality and high-frequency data collection. In ever faster-paced societies and, recently, in times of pandemic lockdowns, such online survey infrastructures are invaluable to social research. In absence of email sampling frames, one way of recruiting such a panel is via postal mail. However, few studies have examined how to best approach and then transition sample members from the initial postal mail contact to the online panel registration. To fill this gap, we implemented a large-scale experiment in the recruitment of the 2018 sample of the German Internet Panel (GIP) varying panel recruitment designs in four experimental conditions: online-only, concurrent mode, online-first, and paper-first. Our results show that the online-only design delivers higher online panel registration rates than the other recruitment designs. In addition, all experimental conditions led to similarly representative samples on key socio-demographic characteristics.


1983 ◽  
Vol 16 (2) ◽  
pp. 242-250 ◽  
Author(s):  
T. J. Greenhough ◽  
J. R. Helliwell ◽  
S. A. Rule

The size and shape of diffraction spots produced by monochromated synchrotron X-radiation from a singly bent triangular monochromator are derived following a simpler more comparative treatment given in earlier work on reflecting range and prediction of partiality for oscillation camera data [Greenhough & Helliwell (1982). J. Appl. Cryst. 15, 493–508]. In that treatment the possibility of a polychromatic experiment {[see Arndt Nucl. Instrum. Methods (1978), 152, 307–311] for an earlier suggestion} was identified and here the energy profile within each diffraction spot is derived along with the spectral resolution ∂E/E at any point in the profile due to experimental conditions and sample characteristics. A firm theoretical basis is established so that experimental problems and procedures can be discussed with a view to optimizing the method in the light of the presentation of the first reported polychromatic experiment [Arndt, Greenhough, Helliwell, Howard, Rule & Thompson (1982). Nature (London), 298, 835–838].


Sign in / Sign up

Export Citation Format

Share Document