RDataFrame: Easy Parallel ROOT Analysis at 100 Threads

The Physics programmes of LHC Run III and HL-LHC challenge the HEP community. The volume of data to be handled is unprecedented at every step of the data processing chain: analysis is no exception. Physicists must be provided with first-class analysis tools which are easy to use, exploit bleeding edge hardware technologies and allow to seamlessly express parallelism. This document discusses the declarative analysis engine of ROOT, RDataFrame, and gives details about how it allows to profitably exploit commodity hardware as well as high-end servers and manycore accelerators thanks to the synergy with the existing parallelised ROOT components. Real-life analyses of LHC experiments’ data expressed in terms of RDataFrame are presented, highlighting the programming model provided to express them in a concise and powerful way. The recent developments which make RDataFrame a lightweight data processing framework are described, such as callbacks and I/O capabilities. Finally, the flexibility of RDataFrame and its ability to read data formats other than ROOT’s are characterised, as an example it is discussed how RDataFrame can directly read and analyse LHCb’s raw data format MDF.

Download Full-text

Recent Developments on Security and Reliability in Large-Scale Data Processing with MapReduce

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2016010104 ◽

2016 ◽

Vol 12 (1) ◽

pp. 49-68 ◽

Cited By ~ 7

Author(s):

Christian Esposito ◽

Massimo Ficco

Keyword(s):

Data Processing ◽

Large Scale ◽

Programming Model ◽

Big Data Analytics ◽

Large Scale Data ◽

Recent Developments ◽

Security And Reliability ◽

Large Scale Data Processing ◽

The Right ◽

Scale Data

The demand to access to a large volume of data, distributed across hundreds or thousands of machines, has opened new opportunities in commerce, science, and computing applications. MapReduce is a paradigm that offers a programming model and an associated implementation for processing massive datasets in a parallel fashion, by using non-dedicated distributed computing hardware. It has been successfully adopted in several academic and industrial projects for Big Data Analytics. However, since such analytics is increasingly demanded within the context of mission-critical applications, security and reliability in MapReduce frameworks are strongly required in order to manage sensible information, and to obtain the right answer at the right time. In this paper, the authors present the main implementation of the MapReduce programming paradigm, provided by Apache with the name of Hadoop. They illustrate the security and reliability concerns in the context of a large-scale data processing infrastructure. They review the available solutions, and their limitations to support security and reliability within the context MapReduce frameworks. The authors conclude by describing the undergoing evolution of such solutions, and the possible issues for improvements, which could be challenging research opportunities for academic researchers.

Download Full-text

Research on CAD Model Data Conversion for RP Technology

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.314-316.2253 ◽

2011 ◽

Vol 314-316 ◽

pp. 2253-2258

Author(s):

Dong Gen Cai ◽

Tian Rui Zhou

Keyword(s):

Data Processing ◽

Data Conversion ◽

Cad Model ◽

Model Data ◽

Data Format ◽

Data Formats ◽

Processing Procedure ◽

Cad Models ◽

Data Processing Procedure

The data processing and conversion plays an important role in RP processes in which the choice of data format determines data processing procedure and method. In this paper, the formats and features of commonly used interface standards such as STL, IGES and STEP are introduced, and the data conversion experiments of CAD models are carried out based on Pro/E system in which the conversion effects of different data formats are compared and analyzed, and the most reasonable data conversion format is proposed.

Download Full-text

Towards data interchangeability in paleomagnetism

10.5194/egusphere-egu2020-10627 ◽

2020 ◽

Author(s):

Christian Zeeden ◽

Christian Laag ◽

Pierre Camps ◽

Yohan Guyodo ◽

Ulrich Hambach ◽

...

Keyword(s):

Data Processing ◽

Quality Assessment ◽

Paleomagnetic Data ◽

Data Conversion ◽

Analysis Software ◽

Data Format ◽

Data Table ◽

Data Formats ◽

Specific Analysis ◽

Data Output

<p>Paleomagnetic data are used in different data formats, adapted to data output of a variety of devices and specific analysis software. This includes widely used openly available software, e.g. PMag.py/MagIC, AGICO/.jr6 & .ged, and PuffinPlot/.ppl. Besides these, individual software and data formats have been established by individual laboratories.</p><p>Here we compare different data formats, identify similarities and create a common and interchangeable data basis. We introduce the idea of a paleomagnetic object (pmob), a simple data table that can include any and all data that would be relevant to the user. We propose a basic nomenclature of abbreviations for the most common paleomagnetic data to merge different data formats. For this purpose, we introduce a set of automatization routines for paleomagnetic data conversion. Our routines bring several data formats into a common data format (pmob), and also allow reversion into selected formats. We propose creating similar routines for all existing paleomagnetic data formats; our suite of computation tools will provide the basis to facilitate the inclusion of further data formats. Furthermore, automatized data processing allows quality assessment of data.</p>

Download Full-text

Modelling contractor’s bidding decision

Engineering Management in Production and Services ◽

10.1515/emj-2017-0007 ◽

2017 ◽

Vol 9 (1) ◽

pp. 64-73 ◽

Cited By ~ 1

Author(s):

Sławomir Biruk ◽

Piotr Jaśkowski ◽

Agata Czarnigowska

Keyword(s):

Programming Model ◽

Real Life ◽

Cash Flows ◽

Managerial Decision ◽

Single Measure ◽

Construction Enterprises ◽

Pricing Process ◽

Bid Price ◽

Total Price ◽

Bidding Process

AbstractThe authors aim to provide a set of tools to facilitate the main stages of the competitive bidding process for construction contractors. These involve 1) deciding whether to bid, 2) calculating the total price, and 3) breaking down the total price into the items of the bill of quantities or the schedule of payments to optimise contractor cash flows. To define factors that affect the decision to bid, the authors rely upon literature on the subject and put forward that multi-criteria methods are applied to calculate a single measure of contract attractiveness (utility value). An attractive contract implies that the contractor is likely to offer a lower price to increase chances of winning the competition. The total bid price is thus to be interpolated between the lowest acceptable and the highest justifiable price based on the contract attractiveness. With the total bid price established, the next step is to split it between the items of the schedule of payments. A linear programming model is proposed for this purpose. The application of the models is illustrated with a numerical example.The model produces an economically justified bid price together with its breakdown, maintaining the logical proportion between unit prices of particular items of the schedule of payment. Contrary to most methods presented in the literature, the method does not focus on the trade-off between probability of winning and the price but is solely devoted to defining the most reasonable price under project-specific circumstances.The approach proposed in the paper promotes a systematic approach to real-life bidding problems. It integrates practices observed in operation of construction enterprises and uses directly available input. It may facilitate establishing the contractor’s in-house procedures and managerial decision support systems for the pricing process.

Download Full-text

A Generalized Minimum Cost Flow Model for Multiple Emergency Flow Routing

Mathematical Problems in Engineering ◽

10.1155/2014/832053 ◽

2014 ◽

Vol 2014 ◽

pp. 1-12 ◽

Cited By ~ 4

Author(s):

Jianxun Cui ◽

Shi An ◽

Meng Zhao

Keyword(s):

Flow Model ◽

Time Window ◽

Programming Model ◽

Real Life ◽

Minimum Cost ◽

Emergency Evacuation ◽

Mixed Integer ◽

Fixed Cost ◽

Minimum Cost Flow ◽

Cost Flow

During real-life disasters, that is, earthquakes, floods, terrorist attacks, and other unexpected events, emergency evacuation and rescue are two primary operations that can save the lives and property of the affected population. It is unavoidable that evacuation flow and rescue flow will conflict with each other on the same spatial road network and within the same time window. Therefore, we propose a novel generalized minimum cost flow model to optimize the distribution pattern of these two types of flow on the same network by introducing the conflict cost. The travel time on each link is assumed to be subject to a bureau of public road (BPR) function rather than a fixed cost. Additionally, we integrate contraflow operations into this model to redesign the network shared by those two types of flow. A nonconvex mixed-integer nonlinear programming model with bilinear, fractional, and power components is constructed, and GAMS/BARON is used to solve this programming model. A case study is conducted in the downtown area of Harbin city in China to verify the efficiency of proposed model, and several helpful findings and managerial insights are also presented.

Download Full-text

Big traffic data processing framework for intelligent monitoring and recording systems

Neurocomputing ◽

10.1016/j.neucom.2015.07.140 ◽

2016 ◽

Vol 181 ◽

pp. 139-146 ◽

Cited By ~ 18

Author(s):

Yingjie Xia ◽

Jinlong Chen ◽

Xindai Lu ◽

Chunhui Wang ◽

Chao Xu

Keyword(s):

Data Processing ◽

Traffic Data ◽

Intelligent Monitoring ◽

Big Traffic Data ◽

Processing Framework

Download Full-text

Real Time Data Processing Framework

International Journal of Data Mining & Knowledge Management Process ◽

10.5121/ijdkp.2015.5504 ◽

2015 ◽

Vol 5 (5) ◽

pp. 49-63 ◽

Cited By ~ 3

Author(s):

Karan Patel ◽

Yash Sakaria ◽

Chetashri Bhadane

Keyword(s):

Data Processing ◽

Real Time ◽

Time Data ◽

Real Time Data ◽

Real Time Data Processing ◽

Processing Framework

Download Full-text

Performance Assay of Big IoT Data Analytics Framework

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d7383.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 8593-8596

Keyword(s):

Internet Of Things ◽

Data Processing ◽

High Velocity ◽

Data Analytics ◽

Real Life ◽

Global Environment ◽

Stream Data ◽

Processing Level ◽

Stream Analytics ◽

Apache Storm

Evaluation of Internet of Things (IoT) technologies in real life has scaled the enumeration of data in huge volumes and that too with high velocity, and thus a new issue has come into picture that is of management & analytics of this BIG IOT STREAM data. In order to optimize the performance of the IoT Machines and services provided by the vendors, industry is giving high priority to analyze this big IoT Stream Data for surviving in the competitive global environment. Thses analysis are done through number of applications using various Data Analytics Framework, which require obtaining the valuable information intelligently from a large amount of real-time produced data. This paper, discusses the challenges and issues faced by distributed stream analytics frameworks at the data processing level and tries to recommend a possible a Scalable Framework to adapt with the volume and velocity of Big IoT Stream Data. Experiments focus on evaluating the performance of three Distributed Stream Analytics Here Analytics frameworks, namely Apache Spark, Splunk and Apache Storm are being evaluated over large steam IoT data on latency & throughput as parameters in respect to concurrency. The outcome of the paper is to find the best possible existing framework and recommend a possible scalable framework.

Download Full-text

ctapipe: A Low-level Data Processing Framework for the Cherenkov Telescope Array

10.22323/1.358.0717 ◽

2019 ◽

Author(s):

Michele Peresano ◽

Karl Kosack ◽

Keyword(s):

Data Processing ◽

Telescope Array ◽

Cherenkov Telescope ◽

Low Level ◽

Level Data ◽

Processing Framework

Download Full-text

A solution method for treatment scheduling in rehabilitation hospitals with real-life requirements

IMA Journal of Management Mathematics ◽

10.1093/imaman/dpy009 ◽

2018 ◽

Vol 30 (4) ◽

pp. 367-386 ◽

Cited By ~ 1

Author(s):

Liyang Xiao ◽

Mahjoub Dridi ◽

Amir Hajjam El Hassani ◽

Wanlong Lin ◽

Hongying Fei

Keyword(s):

Waiting Time ◽

Programming Model ◽

Search Algorithm ◽

Real Life ◽

Cuckoo Search ◽

Cuckoo Search Algorithm ◽

Mixed Integer ◽

Solution Quality ◽

Improved Cuckoo Search Algorithm ◽

Rehabilitation Hospitals

Abstract In this study, we aim to minimize the total waiting time between successive treatments for inpatients in rehabilitation hospitals (departments) during a working day. Firstly, the daily treatment scheduling problem is formulated as a mixed-integer linear programming model, taking into consideration real-life requirements, and is solved by Gurobi, a commercial solver. Then, an improved cuckoo search algorithm is developed to obtain good quality solutions quickly for large-sized problems. Our methods are demonstrated with data collected from a medium-sized rehabilitation hospital in China. The numerical results indicate that the improved cuckoo search algorithm outperforms the real schedules applied in the targeted hospital with regard to the total waiting time of inpatients. Gurobi can construct schedules without waits for all the tested dataset though its efficiency is quite low. Three sets of numerical experiments are executed to compare the improved cuckoo search algorithm with Gurobi in terms of solution quality, effectiveness and capability to solve large instances.

Download Full-text