scholarly journals Adaptive Granularity Based Last-Level Cache Prefetching Method with eDRAM Prefetch Buffer for Graph Processing Applications

2021 ◽  
Vol 11 (3) ◽  
pp. 991
Author(s):  
Sae-Gyeol Choi ◽  
Jeong-Geun Kim ◽  
Shin-Dug Kim

The emergence of big data processing and machine learning has triggered the exponential growth of the working set sizes of applications. In addition, several modern applications are memory intensive with irregular memory access patterns. Therefore, we propose the concept of adaptive granularities to develop a prefetching methodology for analyzing memory access patterns based on a wider granularity concept that entails both cache lines and page granularity. The proposed prefetching module resides in the last-level cache (LLC) to handle the large working set of memory-intensive workloads. Additionally, to support memory access streams with variable intervals, we introduced an embedded-DRAM based LLC prefetch buffer that consists of three granularity-based prefetch engines and an access history table. By adaptively changing the granularity window for analyzing memory streams, the proposed model can swiftly and appropriately determine the stride of memory addresses to generate hidden delta chains from irregular memory access patterns. The proposed model achieves 18% and 15% improvements in terms of energy consumption and execution time compared to global history buffer and best offset prefetchers, respectively. In addition, our model reduced the total execution time and energy consumption by approximately 6% and 2.3%, compared to those of the Markov prefetcher and variable-length delta prefetcher.

Author(s):  
Ahmad Reza Jafarian-Moghaddam

AbstractSpeed is one of the most influential variables in both energy consumption and train scheduling problems. Increasing speed guarantees punctuality, thereby improving railroad capacity and railway stakeholders’ satisfaction and revenues. However, a rise in speed leads to more energy consumption, costs, and thus, more pollutant emissions. Therefore, determining an economic speed, which requires a trade-off between the user’s expectations and the capabilities of the railway system in providing tractive forces to overcome the running resistance due to rail route and moving conditions, is a critical challenge in railway studies. This paper proposes a new fuzzy multi-objective model, which, by integrating micro and macro levels and determining the economical speed for trains in block sections, can optimize train travel time and energy consumption. Implementing the proposed model in a real case with different scenarios for train scheduling reveals that this model can enhance the total travel time by 19% without changing the energy consumption ratio. The proposed model has little need for input from experts’ opinions to determine the rates and parameters.


Author(s):  
Qingzhu Wang ◽  
Xiaoyun Cui

As mobile devices become more and more powerful, applications generate a large number of computing tasks, and mobile devices themselves cannot meet the needs of users. This article proposes a computation offloading model in which execution units including mobile devices, edge server, and cloud server. Previous studies on joint optimization only considered tasks execution time and the energy consumption of mobile devices, and ignored the energy consumption of edge and cloud server. However, edge server and cloud server energy consumption have a significant impact on the final offloading decision. This paper comprehensively considers execution time and energy consumption of three execution units, and formulates task offloading decision as a single-objective optimization problem. Genetic algorithm with elitism preservation and random strategy is adopted to obtain optimal solution of the problem. At last, simulation experiments show that the proposed computation offloading model has lower fitness value compared with other computation offloading models.


2020 ◽  
Author(s):  
Caio Vieira ◽  
Arthur Lorenzon ◽  
Lucas Schnorr ◽  
Philippe Navaux ◽  
Antonio Carlos Beck

Convolutional Neural Network (CNN) algorithms are becoming a recurrent solution to solve Computer Vision related problems. These networks employ convolutions as main building block, which greatly impact their performance since convolution is a costly operation. Due to its importance in CNN algorithms, this work evaluates convolution performance in the Gemmini accelerator and compare it to a conventional lightlyand heavily-loaded desktop CPU in terms of execution time and energy consumption. We show that Gemmini can achieve lower execution time and energy consumption when compared to a CPU even for small convolutions, and this performance gap grows with convolution size. Furthermore, we analyze the minimum Gemmini required frequency to match the same CPU execution time, and show that Gemmini can achieve the same runtime while working in much lower frequencies.


2019 ◽  
Vol 11 (2) ◽  
pp. 38-41 ◽  
Author(s):  
Volkmar Sieh ◽  
Robert Burlacu ◽  
Timo Honig ◽  
Heiko Janker ◽  
Phillip Raffeck ◽  
...  

2005 ◽  
Vol 14 (03) ◽  
pp. 605-617 ◽  
Author(s):  
SUNG WOO CHUNG ◽  
HYONG-SHIK KIM ◽  
CHU SHIK JHON

In scalable CC-NUMA multiprocessors, it is crucial to reduce the average memory access time. For applications where the second-level (L2) cache is large enough, we propose a split L2 cache to utilize the surplus space. The split L2 cache is composed of a traditional LRU cache and an RVC (Remote Victim Cache) which only stores the data of remote memory address range. Thus, it reduces the average L2 cache miss time by keeping remote blocks that would be discarded otherwise. Though the split cache does not reduce the miss rates, it is observed to reduce the total execution time effectively by up to 27%.It even outperform an LRU cache of double size.


2009 ◽  
Vol 18 (04) ◽  
pp. 697-711
Author(s):  
XUEXIANG WANG ◽  
HANLAI PU ◽  
JUN YANG ◽  
LONGXING SHI

A Scratch-Pad memory (SPM) allocation method to improve the performance of a specified application while reducing its energy consumption is presented in this paper. Integrated in the design is an extended control flow graph (ECFG) built directly from the application's instruction flow. The application of the design is transformed into a directed graph that consists of nodes and relationships. Likewise, to provide a solution in decreasing the overhead of moving nodes to SPM, the design is enhanced with a refined greedy algorithm based on ECFG. An experiment is conducted to prove the feasibility and efficiency of the method. The results indicate that the method indeed improves performance by an average of 11% and consumes lesser energy by an average of 28%. This is in comparison to previous research which based on the control flow graph (CFG) method. The latter was discovered to have disregarded the relationships of nodes. In conclusion, the application's execution time and energy consumption were reduced by an average up to 56% and 69% respectively, compared to a non-SPM environment.


Sign in / Sign up

Export Citation Format

Share Document