Octopus-DF: Unified DataFrame-based cross-platform data analytic system

2022 ◽  
pp. 102879
Author(s):  
Rong Gu ◽  
Jun Shi ◽  
Xiaofei Chen ◽  
Zhaokang Wang ◽  
Yang Che ◽  
...  
Keyword(s):  
2020 ◽  
Vol 29 (6) ◽  
pp. 1287-1310
Author(s):  
Sebastian Kruse ◽  
Zoi Kaoudi ◽  
Bertty Contreras-Rojas ◽  
Sanjay Chawla ◽  
Felix Naumann ◽  
...  

AbstractData analytics are moving beyond the limits of a single platform. In this paper, we present the cost-based optimizer of Rheem, an open-source cross-platform system that copes with these new requirements. The optimizer allocates the subtasks of data analytic tasks to the most suitable platforms. Our main contributions are: (i) a mechanism based on graph transformations to explore alternative execution strategies; (ii) a novel graph-based approach to determine efficient data movement plans among subtasks and platforms; and (iii) an efficient plan enumeration algorithm, based on a novel enumeration algebra. We extensively evaluate our optimizer under diverse real tasks. We show that our optimizer can perform tasks more than one order of magnitude faster when using multiple platforms than when using a single platform.


2019 ◽  
Vol 227 (1) ◽  
pp. 64-82 ◽  
Author(s):  
Martin Voracek ◽  
Michael Kossmeier ◽  
Ulrich S. Tran

Abstract. Which data to analyze, and how, are fundamental questions of all empirical research. As there are always numerous flexibilities in data-analytic decisions (a “garden of forking paths”), this poses perennial problems to all empirical research. Specification-curve analysis and multiverse analysis have recently been proposed as solutions to these issues. Building on the structural analogies between primary data analysis and meta-analysis, we transform and adapt these approaches to the meta-analytic level, in tandem with combinatorial meta-analysis. We explain the rationale of this idea, suggest descriptive and inferential statistical procedures, as well as graphical displays, provide code for meta-analytic practitioners to generate and use these, and present a fully worked real example from digit ratio (2D:4D) research, totaling 1,592 meta-analytic specifications. Specification-curve and multiverse meta-analysis holds promise to resolve conflicting meta-analyses, contested evidence, controversial empirical literatures, and polarized research, and to mitigate the associated detrimental effects of these phenomena on research progress.


Author(s):  
Thomas W. Kamarck ◽  
Saul S. Shiffman ◽  
Leslie Smithline ◽  
Hayley Thompson ◽  
Jeff Goodie ◽  
...  

Author(s):  
Manbir Sandhu ◽  
Purnima, Anuradha Saini

Big data is a fast-growing technology that has the scope to mine huge amount of data to be used in various analytic applications. With large amount of data streaming in from a myriad of sources: social media, online transactions and ubiquity of smart devices, Big Data is practically garnering attention across all stakeholders from academics, banking, government, heath care, manufacturing and retail. Big Data refers to an enormous amount of data generated from disparate sources along with data analytic techniques to examine this voluminous data for predictive trends and patterns, to exploit new growth opportunities, to gain insight, to make informed decisions and optimize processes. Data-driven decision making is the essence of business establishments. The explosive growth of data is steering the business units to tap the potential of Big Data to achieve fueling growth and to achieve a cutting edge over their competitors. The overwhelming generation of data brings with it, its share of concerns. This paper discusses the concept of Big Data, its characteristics, the tools and techniques deployed by organizations to harness the power of Big Data and the daunting issues that hinder the adoption of Business Intelligence in Big Data strategies in organizations.


Author(s):  
Ivan Batrak ◽  
Keyword(s):  

Designing a cross-platform software for implementing IRBIS LAS on the PHP platform is discussed. The new print format language interpreter for IRBIS LAS based on J-ISIS and CISIS formatting language features and capabilities, is also developed.


2019 ◽  
Vol 2 (4) ◽  
pp. 260-266
Author(s):  
Haru Purnomo Ipung ◽  
Amin Soetomo

This research proposed a model to assist the design of the associated data architecture and data analytic to support talent forecast in the current accelerating changes in economy, industry and business change due to the accelerating pace of technological change. The emerging and re-emerging economy model were available, such as Industrial revolution 4.0, platform economy, sharing economy and token economy. Those were driven by new business model and technology innovation. An increase capability of technology to automate more jobs will cause a shift in talent pool and workforce. New business model emerge as the availabilityand the cost effective emerging technology, and as a result of emerging or re-emerging economic models. Both, new business model and technology innovation, create new jobs and works that have not been existed decades ago. The future workers will be faced by jobs that may not exist today. A dynamics model of inter-correlation of economy, industry, business model and talent forecast were proposed. A collection of literature review were conducted to initially validate the model.


Sign in / Sign up

Export Citation Format

Share Document