hep_tables: Heterogeneous Array Programming for HEP
Array operations are one of the most concise ways of expressing common filtering and simple aggregation operations that are the hallmark of a particle physics analysis: selection, filtering, basic vector operations, and filling histograms. The High Luminosity run of the Large Hadron Collider (HL-LHC), scheduled to start in 2026, will require physicists to regularly skim datasets that are over a PB in size, and repeatedly run over datasets that are 100’s of TB’s – too big to fit in memory. Declarative programming techniques are a way of separating the intent of the physicist from the mechanics of finding the data and using distributed computing to process and make histograms. This paper describes a library that implements a declarative distributed framework based on array programming. This prototype library provides a framework for different sub-systems to cooperate in producing plots via plug-in’s. This prototype has a ServiceX data-delivery sub-system and an awkward array sub-system cooperating to generate requested data or plots. The ServiceX system runs against ATLAS xAOD data and flat ROOT TTree’s and awkward on the columnar data produced by ServiceX.