scholarly journals RDataFrame: Easy Parallel ROOT Analysis at 100 Threads

2019 ◽  
Vol 214 ◽  
pp. 06029
Author(s):  
Danilo Piparo ◽  
Philippe Canal ◽  
Enrico Guiraud ◽  
Xavier Valls Pla ◽  
Gerardo Ganis ◽  
...  

The Physics programmes of LHC Run III and HL-LHC challenge the HEP community. The volume of data to be handled is unprecedented at every step of the data processing chain: analysis is no exception. Physicists must be provided with first-class analysis tools which are easy to use, exploit bleeding edge hardware technologies and allow to seamlessly express parallelism. This document discusses the declarative analysis engine of ROOT, RDataFrame, and gives details about how it allows to profitably exploit commodity hardware as well as high-end servers and manycore accelerators thanks to the synergy with the existing parallelised ROOT components. Real-life analyses of LHC experiments’ data expressed in terms of RDataFrame are presented, highlighting the programming model provided to express them in a concise and powerful way. The recent developments which make RDataFrame a lightweight data processing framework are described, such as callbacks and I/O capabilities. Finally, the flexibility of RDataFrame and its ability to read data formats other than ROOT’s are characterised, as an example it is discussed how RDataFrame can directly read and analyse LHCb’s raw data format MDF.

2016 ◽  
Vol 12 (1) ◽  
pp. 49-68 ◽  
Author(s):  
Christian Esposito ◽  
Massimo Ficco

The demand to access to a large volume of data, distributed across hundreds or thousands of machines, has opened new opportunities in commerce, science, and computing applications. MapReduce is a paradigm that offers a programming model and an associated implementation for processing massive datasets in a parallel fashion, by using non-dedicated distributed computing hardware. It has been successfully adopted in several academic and industrial projects for Big Data Analytics. However, since such analytics is increasingly demanded within the context of mission-critical applications, security and reliability in MapReduce frameworks are strongly required in order to manage sensible information, and to obtain the right answer at the right time. In this paper, the authors present the main implementation of the MapReduce programming paradigm, provided by Apache with the name of Hadoop. They illustrate the security and reliability concerns in the context of a large-scale data processing infrastructure. They review the available solutions, and their limitations to support security and reliability within the context MapReduce frameworks. The authors conclude by describing the undergoing evolution of such solutions, and the possible issues for improvements, which could be challenging research opportunities for academic researchers.


2011 ◽  
Vol 314-316 ◽  
pp. 2253-2258
Author(s):  
Dong Gen Cai ◽  
Tian Rui Zhou

The data processing and conversion plays an important role in RP processes in which the choice of data format determines data processing procedure and method. In this paper, the formats and features of commonly used interface standards such as STL, IGES and STEP are introduced, and the data conversion experiments of CAD models are carried out based on Pro/E system in which the conversion effects of different data formats are compared and analyzed, and the most reasonable data conversion format is proposed.


2020 ◽  
Author(s):  
Christian Zeeden ◽  
Christian Laag ◽  
Pierre Camps ◽  
Yohan Guyodo ◽  
Ulrich Hambach ◽  
...  

<p>Paleomagnetic data are used in different data formats, adapted to data output of a variety of devices and specific analysis software. This includes widely used openly available software, e.g. PMag.py/MagIC, AGICO/.jr6 & .ged, and PuffinPlot/.ppl. Besides these, individual software and data formats have been established by individual laboratories.</p><p>Here we compare different data formats, identify similarities and create a common and interchangeable data basis. We introduce the idea of a paleomagnetic object (pmob), a simple data table that can include any and all data that would be relevant to the user. We propose a basic nomenclature of abbreviations for the most common paleomagnetic data to merge different data formats. For this purpose, we introduce a set of automatization routines for paleomagnetic data conversion. Our routines bring several data formats into a common data format (pmob), and also allow reversion into selected formats. We propose creating similar routines for all existing paleomagnetic data formats; our suite of computation tools will provide the basis to facilitate the inclusion of further data formats. Furthermore, automatized data processing allows quality assessment of data.</p>


2017 ◽  
Vol 9 (1) ◽  
pp. 64-73 ◽  
Author(s):  
Sławomir Biruk ◽  
Piotr Jaśkowski ◽  
Agata Czarnigowska

AbstractThe authors aim to provide a set of tools to facilitate the main stages of the competitive bidding process for construction contractors. These involve 1) deciding whether to bid, 2) calculating the total price, and 3) breaking down the total price into the items of the bill of quantities or the schedule of payments to optimise contractor cash flows. To define factors that affect the decision to bid, the authors rely upon literature on the subject and put forward that multi-criteria methods are applied to calculate a single measure of contract attractiveness (utility value). An attractive contract implies that the contractor is likely to offer a lower price to increase chances of winning the competition. The total bid price is thus to be interpolated between the lowest acceptable and the highest justifiable price based on the contract attractiveness. With the total bid price established, the next step is to split it between the items of the schedule of payments. A linear programming model is proposed for this purpose. The application of the models is illustrated with a numerical example.The model produces an economically justified bid price together with its breakdown, maintaining the logical proportion between unit prices of particular items of the schedule of payment. Contrary to most methods presented in the literature, the method does not focus on the trade-off between probability of winning and the price but is solely devoted to defining the most reasonable price under project-specific circumstances.The approach proposed in the paper promotes a systematic approach to real-life bidding problems. It integrates practices observed in operation of construction enterprises and uses directly available input. It may facilitate establishing the contractor’s in-house procedures and managerial decision support systems for the pricing process.


2014 ◽  
Vol 2014 ◽  
pp. 1-12 ◽  
Author(s):  
Jianxun Cui ◽  
Shi An ◽  
Meng Zhao

During real-life disasters, that is, earthquakes, floods, terrorist attacks, and other unexpected events, emergency evacuation and rescue are two primary operations that can save the lives and property of the affected population. It is unavoidable that evacuation flow and rescue flow will conflict with each other on the same spatial road network and within the same time window. Therefore, we propose a novel generalized minimum cost flow model to optimize the distribution pattern of these two types of flow on the same network by introducing the conflict cost. The travel time on each link is assumed to be subject to a bureau of public road (BPR) function rather than a fixed cost. Additionally, we integrate contraflow operations into this model to redesign the network shared by those two types of flow. A nonconvex mixed-integer nonlinear programming model with bilinear, fractional, and power components is constructed, and GAMS/BARON is used to solve this programming model. A case study is conducted in the downtown area of Harbin city in China to verify the efficiency of proposed model, and several helpful findings and managerial insights are also presented.


2016 ◽  
Vol 181 ◽  
pp. 139-146 ◽  
Author(s):  
Yingjie Xia ◽  
Jinlong Chen ◽  
Xindai Lu ◽  
Chunhui Wang ◽  
Chao Xu

2019 ◽  
Vol 8 (4) ◽  
pp. 8593-8596

Evaluation of Internet of Things (IoT) technologies in real life has scaled the enumeration of data in huge volumes and that too with high velocity, and thus a new issue has come into picture that is of management & analytics of this BIG IOT STREAM data. In order to optimize the performance of the IoT Machines and services provided by the vendors, industry is giving high priority to analyze this big IoT Stream Data for surviving in the competitive global environment. Thses analysis are done through number of applications using various Data Analytics Framework, which require obtaining the valuable information intelligently from a large amount of real-time produced data. This paper, discusses the challenges and issues faced by distributed stream analytics frameworks at the data processing level and tries to recommend a possible a Scalable Framework to adapt with the volume and velocity of Big IoT Stream Data. Experiments focus on evaluating the performance of three Distributed Stream Analytics Here Analytics frameworks, namely Apache Spark, Splunk and Apache Storm are being evaluated over large steam IoT data on latency & throughput as parameters in respect to concurrency. The outcome of the paper is to find the best possible existing framework and recommend a possible scalable framework.


2018 ◽  
Vol 30 (4) ◽  
pp. 367-386 ◽  
Author(s):  
Liyang Xiao ◽  
Mahjoub Dridi ◽  
Amir Hajjam El Hassani ◽  
Wanlong Lin ◽  
Hongying Fei

Abstract In this study, we aim to minimize the total waiting time between successive treatments for inpatients in rehabilitation hospitals (departments) during a working day. Firstly, the daily treatment scheduling problem is formulated as a mixed-integer linear programming model, taking into consideration real-life requirements, and is solved by Gurobi, a commercial solver. Then, an improved cuckoo search algorithm is developed to obtain good quality solutions quickly for large-sized problems. Our methods are demonstrated with data collected from a medium-sized rehabilitation hospital in China. The numerical results indicate that the improved cuckoo search algorithm outperforms the real schedules applied in the targeted hospital with regard to the total waiting time of inpatients. Gurobi can construct schedules without waits for all the tested dataset though its efficiency is quite low. Three sets of numerical experiments are executed to compare the improved cuckoo search algorithm with Gurobi in terms of solution quality, effectiveness and capability to solve large instances.


Sign in / Sign up

Export Citation Format

Share Document