Cost-Effective, Workload-Adaptive Migration of Big Data Applications to the Cloud

Today, many big data applications require massively parallel tasks to compute complicated mathematical operations. To perform parallel tasks, platforms like CUDA (Compute Unified Device Architecture) and OpenCL (Open Computing Language) are widely used and developed to enhance the throughput of massively parallel tasks. There is also a need for high-level abstractions and platform-independence over those massively parallel computing platforms. Recently, Khronos group announced SYCL (C++ Single-source Heterogeneous Programming for OpenCL), a new cross-platform abstraction layer, to provide an efficient way for single-source heterogeneous computing, with C++-template-level abstractions. However, since there has been no official implementation of SYCL, we currently have several different implementations from various vendors. In this paper, we analyse the characteristics of those SYCL implementations. We also show performance measures of those SYCL implementations, especially for well-known massively parallel tasks. We show that each implementation has its own strength in computing different types of mathematical operations, along with different sizes of data. Our analysis is available for fundamental measurements of the abstract-level cost-effective use of massively parallel computations, especially for big-data applications.

Download Full-text

Guest Editorial Special Issue on Big Data Applications and Techniques in Cyber Threat Intelligence

Intelligent Automation & Soft Computing ◽

10.31209/2020.100000198 ◽

2020 ◽

pp. -1--1

Author(s):

Zheng Xu ◽

Qingyuan Zhou

Keyword(s):

Big Data ◽

Guest Editorial ◽

Special Issue ◽

Big Data Applications ◽

Threat Intelligence ◽

Editorial Special Issue ◽

Cyber Threat ◽

Cyber Threat Intelligence

Download Full-text

Poster: Cascaded TCP: BIG Throughput for BIG DATA Applications in Distributed HPC

2012 SC Companion: High Performance Computing, Networking Storage and Analysis ◽

10.1109/sc.companion.2012.230 ◽

2012 ◽

Author(s):

Umar Kalim ◽

Mark Gardner ◽

Eric Brown ◽

Wu-chun Feng

Keyword(s):

Big Data ◽

Big Data Applications

Download Full-text

Power Budgeting of Big Data Applications in Container-based Clusters

2020 IEEE International Conference on Cluster Computing (CLUSTER) ◽

10.1109/cluster49012.2020.00038 ◽

2020 ◽

Author(s):

Jonatan Enes ◽

Guillaume Fieni ◽

Roberto R. Exposito ◽

Romain Rouvoy ◽

Juan Tourino

Keyword(s):

Big Data ◽

Big Data Applications

Download Full-text

Introduction to the Special Issue on New Longitudinal Data for Retirement Analysis and Policy

Journal of Pension Economics and Finance ◽

10.1017/s1474747221000044 ◽

2021 ◽

pp. 1-5

Author(s):

Marco Angrisani ◽

Anya Samek ◽

Arie Kapteyn

Keyword(s):

Big Data ◽

Academic Research ◽

Cost Effective ◽

Research Data ◽

Data Sources ◽

Special Issue ◽

Administrative Records ◽

The Past ◽

Survey Questionnaires ◽

Internet Panels

The number of data sources available for academic research on retirement economics and policy has increased rapidly in the past two decades. Data quality and comparability across studies have also improved considerably, with survey questionnaires progressively converging towards common ways of eliciting the same measurable concepts. Probability-based Internet panels have become a more accepted and recognized tool to obtain research data, allowing for fast, flexible, and cost-effective data collection compared to more traditional modes such as in-person and phone interviews. In an era of big data, academic research has also increasingly been able to access administrative records (e.g., Kostøl and Mogstad, 2014; Cesarini et al., 2016), private-sector financial records (e.g., Gelman et al., 2014), and administrative data married with surveys (Ameriks et al., 2020), to answer questions that could not be successfully tackled otherwise.

Download Full-text

Computational storage: an efficient and scalable platform for big data and HPC applications

Journal Of Big Data ◽

10.1186/s40537-019-0265-5 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 2

Author(s):

Mahdi Torabzadehkashi ◽

Siavash Rezaei ◽

Ali HeydariGorji ◽

Hosein Bobarshad ◽

Vladimir Alves ◽

...

Keyword(s):

Big Data ◽

High Performance ◽

Distributed Processing ◽

Data Access ◽

Distributed Applications ◽

Process Data ◽

Storage Devices ◽

Hadoop Mapreduce ◽

Big Data Applications ◽

Application Processor

AbstractIn the era of big data applications, the demand for more sophisticated data centers and high-performance data processing mechanisms is increasing drastically. Data are originally stored in storage systems. To process data, application servers need to fetch them from storage devices, which imposes the cost of moving data to the system. This cost has a direct relation with the distance of processing engines from the data. This is the key motivation for the emergence of distributed processing platforms such as Hadoop, which move process closer to data. Computational storage devices (CSDs) push the “move process to data” paradigm to its ultimate boundaries by deploying embedded processing engines inside storage devices to process data. In this paper, we introduce Catalina, an efficient and flexible computational storage platform, that provides a seamless environment to process data in-place. Catalina is the first CSD equipped with a dedicated application processor running a full-fledged operating system that provides filesystem-level data access for the applications. Thus, a vast spectrum of applications can be ported for running on Catalina CSDs. Due to these unique features, to the best of our knowledge, Catalina CSD is the only in-storage processing platform that can be seamlessly deployed in clusters to run distributed applications such as Hadoop MapReduce and HPC applications in-place without any modifications on the underlying distributed processing framework. For the proof of concept, we build a fully functional Catalina prototype and a CSD-equipped platform using 16 Catalina CSDs to run Intel HiBench Hadoop and HPC benchmarks to investigate the benefits of deploying Catalina CSDs in the distributed processing environments. The experimental results show up to 2.2× improvement in performance and 4.3× reduction in energy consumption, respectively, for running Hadoop MapReduce benchmarks. Additionally, thanks to the Neon SIMD engines, the performance and energy efficiency of DFT algorithms are improved up to 5.4× and 8.9×, respectively.

Download Full-text

Performance analysis model for big data applications in cloud computing

Journal of Cloud Computing Advances Systems and Applications ◽

10.1186/s13677-014-0019-z ◽

2014 ◽

Vol 3 (1) ◽

Cited By ~ 8

Author(s):

Luis Eduardo Bautista Villalpando ◽

Alain April ◽

Alain Abran

Keyword(s):

Cloud Computing ◽

Big Data ◽

Performance Analysis ◽

Analysis Model ◽

Big Data Applications

Download Full-text

Preliminary Benefits of Big Data in the Construction Industry: A Case Study

Proceedings of the Institution of Civil Engineers - Management Procurement and Law ◽

10.1680/jmapl.21.00027 ◽

2022 ◽

pp. 1-11

Author(s):

Bernard Tuffour Atuahene ◽

Sittimont Kanjanabootra ◽

Thayaparan Gajendran

Keyword(s):

Big Data ◽

Construction Industry ◽

Construction Projects ◽

Big Data Applications ◽

Data Application ◽

Construction Firm ◽

Big Data Application ◽

Tangible Benefit ◽

Design Construction

Big data applications consist of i) data collection using big data sources, ii) storing and processing the data, and iii) analysing data to gain insights for creating organisational benefit. The influx of digital technologies and digitization in the construction process includes big data as one newly emerging digital technology adopted in the construction industry. Big data application is in a nascent stage in construction, and there is a need to understand the tangible benefit(s) that big data can offer the construction industry. This study explores the benefits of big data in the construction industry. Using a qualitative case study design, construction professionals in an Australian Construction firm were interviewed. The research highlights that the benefits of big data include reduction of litigation amongst projects stakeholders, enablement of near to real-time communication, and facilitation of effective subcontractor selection. By implication, on a broader scale, these benefits can improve contract management, procurement, and management of construction projects. This study contributes to an ongoing discourse on big data application, and more generally, digitization in the construction industry.

Download Full-text