In the land of data streams where synopses are missing, one framework to bring them all

2021 ◽  
Vol 14 (10) ◽  
pp. 1818-1831
Author(s):  
Rudi Poepsel-Lemaitre ◽  
Martin Kiefer ◽  
Joscha von Hein ◽  
Jorge-Arnulfo Quiané-Ruiz ◽  
Volker Markl

In pursuit of real-time data analysis, approximate summarization structures, i.e., synopses, have gained importance over the years. However, existing stream processing systems, such as Flink, Spark, and Storm, do not support synopses as first class citizens, i.e., as pipeline operators. Synopses' implementation is upon users. This is mainly because of the diversity of synopses, which makes a unified implementation difficult. We present Condor, a framework that supports synopses as first class citizens. Condor facilitates the specification and processing of synopsis-based streaming jobs while hiding all internal processing details. Condor's key component is its model that represents synopses as a particular case of windowed aggregate functions. An inherent divide and conquer strategy allows Condor to efficiently distribute the computation, allowing for high-performance and linear scalability. Our evaluation shows that Condor outperforms existing approaches by up to a factor of 75x and that it scales linearly with the number of cores.

2018 ◽  
Vol 19 (S18) ◽  
Author(s):  
Ahmed Sanaullah ◽  
Chen Yang ◽  
Yuri Alexeev ◽  
Kazutomo Yoshii ◽  
Martin C. Herbordt

Sign in / Sign up

Export Citation Format

Share Document