Adaptive Granularity Based Last-Level Cache Prefetching Method with eDRAM Prefetch Buffer for Graph Processing Applications

The emergence of big data processing and machine learning has triggered the exponential growth of the working set sizes of applications. In addition, several modern applications are memory intensive with irregular memory access patterns. Therefore, we propose the concept of adaptive granularities to develop a prefetching methodology for analyzing memory access patterns based on a wider granularity concept that entails both cache lines and page granularity. The proposed prefetching module resides in the last-level cache (LLC) to handle the large working set of memory-intensive workloads. Additionally, to support memory access streams with variable intervals, we introduced an embedded-DRAM based LLC prefetch buffer that consists of three granularity-based prefetch engines and an access history table. By adaptively changing the granularity window for analyzing memory streams, the proposed model can swiftly and appropriately determine the stride of memory addresses to generate hidden delta chains from irregular memory access patterns. The proposed model achieves 18% and 15% improvements in terms of energy consumption and execution time compared to global history buffer and best offset prefetchers, respectively. In addition, our model reduced the total execution time and energy consumption by approximately 6% and 2.3%, compared to those of the Markov prefetcher and variable-length delta prefetcher.

Download Full-text

Economical Speed for Optimizing the Travel Time and Energy Consumption in Train Scheduling using a Fuzzy Multi-Objective Model

Urban Rail Transit ◽

10.1007/s40864-021-00151-w ◽

2021 ◽

Author(s):

Ahmad Reza Jafarian-Moghaddam

Keyword(s):

Energy Consumption ◽

Travel Time ◽

Pollutant Emissions ◽

Scheduling Problems ◽

Train Scheduling ◽

Multi Objective ◽

Consumption Ratio ◽

Proposed Model ◽

Objective Model ◽

Time And Energy

AbstractSpeed is one of the most influential variables in both energy consumption and train scheduling problems. Increasing speed guarantees punctuality, thereby improving railroad capacity and railway stakeholders’ satisfaction and revenues. However, a rise in speed leads to more energy consumption, costs, and thus, more pollutant emissions. Therefore, determining an economic speed, which requires a trade-off between the user’s expectations and the capabilities of the railway system in providing tractive forces to overcome the running resistance due to rail route and moving conditions, is a critical challenge in railway studies. This paper proposes a new fuzzy multi-objective model, which, by integrating micro and macro levels and determining the economical speed for trains in block sections, can optimize train travel time and energy consumption. Implementing the proposed model in a real case with different scenarios for train scheduling reveals that this model can enhance the total travel time by 19% without changing the energy consumption ratio. The proposed model has little need for input from experts’ opinions to determine the rates and parameters.

Download Full-text

Joint Optimization Offloading Strategy of Execution Time and Energy Consumption of Mobile Edge Computing

The International Arab Journal of Information Technology ◽

10.34028/iajit/18/5/114 ◽

2021 ◽

Vol 18 (5) ◽

Author(s):

Qingzhu Wang ◽

Xiaoyun Cui

Keyword(s):

Energy Consumption ◽

Mobile Devices ◽

Execution Time ◽

Optimal Solution ◽

Computation Offloading ◽

Joint Optimization ◽

Random Strategy ◽

Fitness Value ◽

Cloud Server ◽

Time And Energy

As mobile devices become more and more powerful, applications generate a large number of computing tasks, and mobile devices themselves cannot meet the needs of users. This article proposes a computation offloading model in which execution units including mobile devices, edge server, and cloud server. Previous studies on joint optimization only considered tasks execution time and the energy consumption of mobile devices, and ignored the energy consumption of edge and cloud server. However, edge server and cloud server energy consumption have a significant impact on the final offloading decision. This paper comprehensively considers execution time and energy consumption of three execution units, and formulates task offloading decision as a single-objective optimization problem. Genetic algorithm with elitism preservation and random strategy is adopted to obtain optimal solution of the problem. At last, simulation experiments show that the proposed computation offloading model has lower fitness value compared with other computation offloading models.

Download Full-text

Joint Optimization Offloading Strategy of Execution Time and Energy Consumption of Mobile Edge Computing

The International Arab Journal of Information Technology ◽

10.34028/iajit/18/5/11 ◽

2021 ◽

Vol 18 (5) ◽

Author(s):

Qingzhu Wang ◽

Xiaoyun Cui

Keyword(s):

Energy Consumption ◽

Execution Time ◽

Edge Computing ◽

Joint Optimization ◽

Mobile Edge Computing ◽

Time And Energy

Download Full-text

A methodology correlating code optimizations with data memory accesses, execution time and energy consumption

The Journal of Supercomputing ◽

10.1007/s11227-019-02880-z ◽

2019 ◽

Vol 75 (10) ◽

pp. 6710-6745 ◽

Cited By ~ 1

Author(s):

Vasilios Kelefouras ◽

Karim Djemame

Keyword(s):

Energy Consumption ◽

Execution Time ◽

Data Memory ◽

Memory Accesses ◽

Time And Energy ◽

Code Optimizations

Download Full-text

Exploring Direct Convolution Performance on the Gemmini Accelerator

10.5753/wscad.2020.14067 ◽

2020 ◽

Author(s):

Caio Vieira ◽

Arthur Lorenzon ◽

Lucas Schnorr ◽

Philippe Navaux ◽

Antonio Carlos Beck

Keyword(s):

Neural Network ◽

Computer Vision ◽

Energy Consumption ◽

Convolutional Neural Network ◽

Building Block ◽

Execution Time ◽

Performance Gap ◽

Main Building ◽

Time And Energy ◽

Lower Frequencies

Convolutional Neural Network (CNN) algorithms are becoming a recurrent solution to solve Computer Vision related problems. These networks employ convolutions as main building block, which greatly impact their performance since convolution is a costly operation. Due to its importance in CNN algorithms, this work evaluates convolution performance in the Gemmini accelerator and compare it to a conventional lightlyand heavily-loaded desktop CPU in terms of execution time and energy consumption. We show that Gemmini can achieve lower execution time and energy consumption when compared to a CPU even for small convolutions, and this performance gap grows with convolution size. Furthermore, we analyze the minimum Gemmini required frequency to match the same CPU execution time, and show that Gemmini can achieve the same runtime while working in much lower frequencies.

Download Full-text

A Formal Approach for Estimating Embedded System Execution Time and Energy Consumption

Lecture Notes in Computer Science - Integrated Circuit and System Design. Power and Timing Modeling, Optimization and Simulation ◽

10.1007/978-3-540-95948-9_38 ◽

2009 ◽

pp. 379-388 ◽

Cited By ~ 1

Author(s):

Gustavo Callou ◽

Paulo Maciel ◽

Ermeson Carneiro ◽

Bruno Nogueira ◽

Eduardo Tavares ◽

...

Keyword(s):

Energy Consumption ◽

Embedded System ◽

Execution Time ◽

Formal Approach ◽

Time And Energy

Download Full-text

Combining Automated Measurement-Based Cost Modeling With Static Worst-Case Execution-Time and Energy-Consumption Analyses

IEEE Embedded Systems Letters ◽

10.1109/les.2018.2868823 ◽

2019 ◽

Vol 11 (2) ◽

pp. 38-41 ◽

Cited By ~ 1

Author(s):

Volkmar Sieh ◽

Robert Burlacu ◽

Timo Honig ◽

Heiko Janker ◽

Phillip Raffeck ◽

...

Keyword(s):

Energy Consumption ◽

Execution Time ◽

Cost Modeling ◽

Automated Measurement ◽

Worst Case ◽

Worst Case Execution Time ◽

Time And Energy

Download Full-text

High-Level Estimation of Execution Time and Energy Consumption for Fast Homogeneous MPSoCs Prototyping

2008 The 19th IEEE/IFIP International Symposium on Rapid System Prototyping ◽

10.1109/rsp.2008.25 ◽

2008 ◽

Cited By ~ 5

Author(s):

Sergio Johann Filho ◽

Alexandra Aguiar ◽

C Marcon ◽

Fabiano Passuelo Hessel

Keyword(s):

Energy Consumption ◽

Execution Time ◽

High Level Estimation ◽

High Level ◽

Time And Energy

Download Full-text

A SPLIT L2 DATA CACHE FOR SCALABLE CC-NUMA MULTIPROCESSORS

Journal of Circuits System and Computers ◽

10.1142/s021812660500243x ◽

2005 ◽

Vol 14 (03) ◽

pp. 605-617 ◽

Cited By ~ 2

Author(s):

SUNG WOO CHUNG ◽

HYONG-SHIK KIM ◽

CHU SHIK JHON

Keyword(s):

Execution Time ◽

Memory Access ◽

Remote Memory ◽

Access Time ◽

Data Cache ◽

Total Execution Time ◽

Memory Address ◽

L2 Cache ◽

Cache Miss

In scalable CC-NUMA multiprocessors, it is crucial to reduce the average memory access time. For applications where the second-level (L2) cache is large enough, we propose a split L2 cache to utilize the surplus space. The split L2 cache is composed of a traditional LRU cache and an RVC (Remote Victim Cache) which only stores the data of remote memory address range. Thus, it reduces the average L2 cache miss time by keeping remote blocks that would be discarded otherwise. Though the split cache does not reduce the miss rates, it is observed to reduce the total execution time effectively by up to 27%.It even outperform an LRU cache of double size.

Download Full-text

EXTENDED CONTROL FLOW GRAPH BASED PERFORMANCE AND ENERGY CONSUMPTION OPTIMIZATION USING SCRATCH-PAD MEMORY

Journal of Circuits System and Computers ◽

10.1142/s0218126609005204 ◽

2009 ◽

Vol 18 (04) ◽

pp. 697-711

Author(s):

XUEXIANG WANG ◽

HANLAI PU ◽

JUN YANG ◽

LONGXING SHI

Keyword(s):

Energy Consumption ◽

Greedy Algorithm ◽

Directed Graph ◽

Execution Time ◽

Control Flow ◽

Control Flow Graph ◽

Flow Graph ◽

Scratch Pad Memory ◽

Energy Consumption Optimization ◽

Time And Energy

A Scratch-Pad memory (SPM) allocation method to improve the performance of a specified application while reducing its energy consumption is presented in this paper. Integrated in the design is an extended control flow graph (ECFG) built directly from the application's instruction flow. The application of the design is transformed into a directed graph that consists of nodes and relationships. Likewise, to provide a solution in decreasing the overhead of moving nodes to SPM, the design is enhanced with a refined greedy algorithm based on ECFG. An experiment is conducted to prove the feasibility and efficiency of the method. The results indicate that the method indeed improves performance by an average of 11% and consumes lesser energy by an average of 28%. This is in comparison to previous research which based on the control flow graph (CFG) method. The latter was discovered to have disregarded the relationships of nodes. In conclusion, the application's execution time and energy consumption were reduced by an average up to 56% and 69% respectively, compared to a non-SPM environment.

Download Full-text