Power Consumption Due to Data Movement in Distributed Programming Models

Data movement between the Convolutional Neural Network (CNN) accelerators and off-chip memory is critical concerning the overall power consumption. Minimizing power consumption is particularly important for low power embedded applications. Specific CNN computes patterns offer a possibility of significant data reuse, leading to the idea of using specialized on-chip cache memories which enable a significant improvement in power consumption. However, due to the unique caching pattern present within CNNs, standard cache memories would not be efficient. In this paper, a novel on-chip cache memory architecture, based on the idea of input feature map striping, is proposed, which requires significantly less on-chip memory resources compared to previously proposed solutions. Experiment results show that the proposed cache architecture can reduce on-chip memory size by a factor of 16 or more, while increasing power consumption no more than 15%, compared to some of the previously proposed solutions.

Download Full-text

Accelerating Spark-Based Applications with MPI and OpenACC

Complexity ◽

10.1155/2021/9943289 ◽

2021 ◽

Vol 2021 ◽

pp. 1-17

Author(s):

Saeed Alshahrani ◽

Waleed Al Shehri ◽

Jameel Almalki ◽

Ahmed M. Alghamdi ◽

Abdullah M. Alammari

Keyword(s):

Big Data ◽

Power Consumption ◽

Parallel Programming ◽

Graphics Processing Units ◽

Message Passing Interface ◽

Programming Model ◽

Programming Models ◽

Mapping Technique ◽

Big Data Applications ◽

Parallel Programming Models

The amount of data produced in scientific and commercial fields is growing dramatically. Correspondingly, big data technologies, such as Hadoop and Spark, have emerged to tackle the challenges of collecting, processing, and storing such large-scale data. Unfortunately, big data applications usually have performance issues and do not fully exploit a hardware infrastructure. One reason is that applications are developed using high-level programming languages that do not provide low-level system control in terms of performance of highly parallel programming models like message passing interface (MPI). Moreover, big data is considered a barrier of parallel programming models or accelerators (e.g., CUDA and OpenCL). Therefore, the aim of this study is to investigate how the performance of big data applications can be enhanced without sacrificing the power consumption of a hardware infrastructure. A Hybrid Spark MPI OpenACC (HSMO) system is proposed for integrating Spark as a big data programming model, with MPI and OpenACC as parallel programming models. Such integration brings together the advantages of each programming model and provides greater effectiveness. To enhance performance without sacrificing power consumption, the integration approach needs to exploit the hardware infrastructure in an intelligent manner. For achieving this performance enhancement, a mapping technique is proposed that is built based on the application’s virtual topology as well as the physical topology of the undelaying resources. To the best of our knowledge, there is no existing method in big data applications related to utilizing graphics processing units (GPUs), which are now an essential part of high-performance computing (HPC) as a powerful resource for fast computation.

Download Full-text

A meta-graph approach to analyze subgraph-centric distributed programming models

2016 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata.2016.7840587 ◽

2016 ◽

Cited By ~ 5

Author(s):

Ravikant Dindokar ◽

Neel Choudhury ◽

Yogesh Simmhan

Keyword(s):

Programming Models ◽

Distributed Programming

Download Full-text

Overview of data mining classification techniques: Traditional vs. parallel/distributed programming models

2017 6th Mediterranean Conference on Embedded Computing (MECO) ◽

10.1109/meco.2017.7977126 ◽

2017 ◽

Cited By ~ 4

Author(s):

Nuhi Besimi ◽

Betim Cico ◽

Adrian Besimi

Keyword(s):

Data Mining ◽

Programming Models ◽

Distributed Programming ◽

Classification Techniques

Download Full-text

An Analysis of Distributed Programming Models and Frameworks for Large-scale Graph Processing

IETE Journal of Research ◽

10.1080/03772063.2020.1754139 ◽

2020 ◽

pp. 1-9 ◽

Cited By ~ 1

Author(s):

Alejandro Corbellini ◽

Daniela Godoy ◽

Cristian Mateos ◽

Silvia Schiaffino ◽

Alejandro Zunino

Keyword(s):

Large Scale ◽

Programming Models ◽

Distributed Programming ◽

Graph Processing

Download Full-text

The MapReduce Model on Cascading Platform for Frequent Itemset Mining

IJCCS (Indonesian Journal of Computing and Cybernetics Systems) ◽

10.22146/ijccs.34102 ◽

2018 ◽

Vol 12 (2) ◽

pp. 149

Author(s):

Nur Rokhman ◽

Amelia Nursanti

Keyword(s):

Large Scale ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Programming Models ◽

Distributed Programming ◽

Itemset Mining ◽

Large Scale Data ◽

Mapreduce Model ◽

Large Scale Data Processing ◽

Scale Data

The implementation of parallel algorithms is very interesting research recently. Parallelism is very suitable to handle large-scale data processing. MapReduce is one of the parallel and distributed programming models. The implementation of parallel programming faces many difficulties. The Cascading gives easy scheme of Hadoop system which implements MapReduce model.Frequent itemsets are most often appear objects in a dataset. The Frequent Itemset Mining (FIM) requires complex computation. FIM is a complicated problem when implemented on large-scale data. This paper discusses the implementation of MapReduce model on Cascading for FIM. The experiment uses the Amazon dataset product co-purchasing network metadata.The experiment shows the fact that the simple mechanism of Cascading can be used to solve FIM problem. It gives time complexity O(n), more efficient than the nonparallel which has complexity O(n2/m).

Download Full-text

Distributed Programming Models for Big Data Analytics

Encyclopedia of Business Analytics and Optimization ◽

10.4018/978-1-4666-5202-6.ch071 ◽

2014 ◽

pp. 761-772

Author(s):

Rakhi Saxena

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Programming Models ◽

Distributed Programming

Download Full-text

New Logic-In-Memory Paradigms: An Architectural and Technological Perspective

Micromachines ◽

10.3390/mi10060368 ◽

2019 ◽

Vol 10 (6) ◽

pp. 368 ◽

Cited By ~ 8

Author(s):

Giulia Santoro ◽

Giovanna Turvani ◽

Mariagrazia Graziano

Keyword(s):

Power Consumption ◽

High Performance ◽

Complementary Metal Oxide Semiconductor ◽

Cmos Technology ◽

Point Of View ◽

Oxide Semiconductor ◽

Strong Impact ◽

Technological Advancement ◽

Data Movement ◽

Memory Wall

Processing systems are in continuous evolution thanks to the constant technological advancement and architectural progress. Over the years, computing systems have become more and more powerful, providing support for applications, such as Machine Learning, that require high computational power. However, the growing complexity of modern computing units and applications has had a strong impact on power consumption. In addition, the memory plays a key role on the overall power consumption of the system, especially when considering data-intensive applications. These applications, in fact, require a lot of data movement between the memory and the computing unit. The consequence is twofold: Memory accesses are expensive in terms of energy and a lot of time is wasted in accessing the memory, rather than processing, because of the performance gap that exists between memories and processing units. This gap is known as the memory wall or the von Neumann bottleneck and is due to the different rate of progress between complementary metal–oxide semiconductor (CMOS) technology and memories. However, CMOS scaling is also reaching a limit where it would not be possible to make further progress. This work addresses all these problems from an architectural and technological point of view by: (1) Proposing a novel Configurable Logic-in-Memory Architecture that exploits the in-memory computing paradigm to reduce the memory wall problem while also providing high performance thanks to its flexibility and parallelism; (2) exploring a non-CMOS technology as possible candidate technology for the Logic-in-Memory paradigm.

Download Full-text

Power Consumption Due to Data Movement in Distributed Programming Models

Middleware infrastructure for parallel and distributed programming models in heterogeneous systems

Supporting automatic recovery in offloaded distributed programming models through MPI-3 techniques

Striping input feature map cache for reducing off-chip memory traffic in CNN accelerators

Accelerating Spark-Based Applications with MPI and OpenACC

A meta-graph approach to analyze subgraph-centric distributed programming models

Overview of data mining classification techniques: Traditional vs. parallel/distributed programming models

An Analysis of Distributed Programming Models and Frameworks for Large-scale Graph Processing

The MapReduce Model on Cascading Platform for Frequent Itemset Mining

Distributed Programming Models for Big Data Analytics

New Logic-In-Memory Paradigms: An Architectural and Technological Perspective

Export Citation Format