Limiting CPU Frequency Scaling Considering Main Memory Accesses

2014 ◽  
Vol 20 (9) ◽  
pp. 483-491 ◽  
Author(s):  
Moonju Park
2021 ◽  
Vol 20 (5s) ◽  
pp. 1-20
Author(s):  
Qingfeng Zhuge ◽  
Hao Zhang ◽  
Edwin Hsing-Mean Sha ◽  
Rui Xu ◽  
Jun Liu ◽  
...  

Efficiently accessing remote file data remains a challenging problem for data processing systems. Development of technologies in non-volatile dual in-line memory modules (NVDIMMs), in-memory file systems, and RDMA networks provide new opportunities towards solving the problem of remote data access. A general understanding about NVDIMMs, such as Intel Optane DC Persistent Memory (DCPM), is that they expand main memory capacity with a cost of multiple times lower performance than DRAM. With an in-depth exploration presented in this paper, however, we show an interesting finding that the potential of NVDIMMs for high-performance, remote in-memory accesses can be revealed through careful design. We explore multiple architectural structures for accessing remote NVDIMMs in a real system using Optane DCPM, and compare the performance of various structures. Experiments are conducted to show significant performance gaps among different ways of using NVDIMMs as memory address space accessible through RDMA interface. Furthermore, we design and implement a prototype of user-level, in-memory file system, RIMFS, in the device DAX mode on Optane DCPM. By comparing against the DAX-supported Linux file system, Ext4-DAX, we show that the performance of remote reads on RIMFS over RDMA is 11.44 higher than that on a remote Ext4-DAX on average. The experimental results also show that the performance of remote accesses on RIMFS is maintained on a heavily loaded data server with CPU utilization as high as 90%, while the performance of remote reads on Ext4-DAX is significantly reduced by 49.3%, and the performance of local reads on Ext4-DAX is even more significantly reduced by 90.1%. The performance comparisons of writes exhibit the same trends.


2021 ◽  
Vol 11 (1) ◽  
pp. 47-53
Author(s):  
Carlos A. Fajardo ◽  
Fabián Sánchez ◽  
Ana B. Ramirez

Currently, the amount of recorded data in a seismic survey is in the order of hundreds of Terabytes. The processing of such amount of data implies significant computational challenges. One of them is the I/O bottleneck between the main memory and the node memory. This bottleneck results from the fact that the disk memory access speed is thousands-fold slower than the processing speed of the co-processors (eg. GPUs). We propose a special Kirchhoff migration that develops the migration process over compressed data. The seismic data is compressed by using three well-known Matching Pursuit algorithms. Our approach seeks to reduce the number of memory accesses to the disk required by the Kirchhoff operator and to add more mathematical operations to the traditional Kirchhoff migration. Thus, we change slow operations (memory access) for fast operations (math operations). Experimental results show that the proposed method preserves, to a large extent, the seismic attributes of the image for a compression ratio up to 20:1.


2019 ◽  
Vol 28 (07) ◽  
pp. 1950113
Author(s):  
Juan Fang ◽  
Jiajia Lu ◽  
Mengxuan Wang ◽  
Hui Zhao

With more cores integrated into a single chip and the fast growth of main memory capacity, the DRAM memory design faces ever increasing challenges. Previous studies have shown that DRAM can consume up to 40% of the system power, which makes DRAM a major factor constraining the whole system’s growth in performance. Moreover, memory accesses from different applications are usually interleaved and interfere with each other, which further exacerbates the situation in memory system management. Therefore, reducing memory power consumption has become an urgent problem to be solved in both academia and industry. In this paper, we first proposed a novel strategy called Dynamic Bank Partitioning (DBP), which allocates banks to different applications based on their memory access characteristics. DBP not only effectively eliminates the interference among applications, but also fully takes advantage of bank level parallelism. Secondly, to further reduce power consumption, we propose an adaptive method to dynamically select an optimal page policy for each bank according to the characteristics of memory accesses that each bank receives. Our experimental results show that our strategy not only improves the system performance but also reduces the memory power consumption at the same time. Our proposed scheme can reduce memory power consumption up to 21.2% (10% on average across all workloads) and improve the performance to some extent. In the case that workloads are built with mixed applications, our scheme reduces the power consumption by 14% on average and improves the performance up to 12.5% (3% on average).


2021 ◽  
Vol 18 (3) ◽  
pp. 1-23
Author(s):  
Wim Heirman ◽  
Stijn Eyerman ◽  
Kristof Du Bois ◽  
Ibrahim Hur

Sparse memory accesses, which are scattered accesses to single elements of a large data structure, are a challenge for current processor architectures. Their lack of spatial and temporal locality and their irregularity makes caches and traditional stream prefetchers useless. Furthermore, performing standard caching and prefetching on sparse accesses wastes precious memory bandwidth and thrashes caches, deteriorating performance for regular accesses. Bypassing prefetchers and caches for sparse accesses, and fetching only a single element (e.g., 8 B) from main memory (subline access), can solve these issues. Deciding which accesses to handle as sparse accesses and which as regular cached accesses, is a challenging task, with a large potential impact on performance. Not only is performance reduced by treating sparse accesses as regular accesses, not caching accesses that do have locality also negatively impacts performance by significantly increasing their latency and bandwidth consumption. Furthermore, this decision depends on the dynamic environment, such as input set characteristics and system load, making a static decision by the programmer or compiler suboptimal. We propose the Instruction Spatial Locality Estimator ( ISLE ), a hardware detector that finds instructions that access isolated words in a sea of unused data. These sparse accesses are dynamically converted into uncached subline accesses, while keeping regular accesses cached. ISLE does not require modifying source code or binaries, and adapts automatically to a changing environment (input data, available bandwidth, etc.). We apply ISLE to a graph analytics processor running sparse graph workloads, and show that ISLE outperforms the performance of no subline accesses, manual sublining, and prior work on detecting sparse accesses.


2020 ◽  
Author(s):  
Prachi Sharma ◽  
Arkid Bera ◽  
Anu Gupta

<div> <div> <div> <p>To curb redundant power consumption in portable embedded and real-time applications, processors are equipped with various Dynamic Voltage and Frequency Scaling (DVFS) techniques. The accuracy of the prediction of the operating frequency of any such technique determines how power-efficient it makes a processor for a variety of programs and users. But, in the recent techniques, the focus has been too much on saving power, thus, ignoring the user-satisfaction metric, i.e. performance. The DVFS technique used to save power, in turn, introduces unwanted latency due to the high complexity of the algorithm. Also, many of the modern DVFS techniques provide feedback manually triggered by the user to change the frequency to conserve energy efficiently, thus, further increasing the reaction time. In this paper, we imple- ment a novel Artificial Neural Networks-driven frequency scaling methodology, which makes it possible to save power and boost performance at the same time, implicitly i.e. without any feedback from the user. Also, to make the system more inclusive concerning the kinds of processes run on it, we trained the ANN not only for CPU-intensive programs but also on the ones that are more memory-bound, i.e. which have frequent memory accesses during its average CPU cycle. The proposed technique has been evaluated on Intel i7-4720HQ Haswell processor and has shown performance boost by up to 20%, SoC power savings up to 16%, and Performance per Watt improvement by up to 30%, as compared to the existing DVFS technique. An open-source memory-intensive benchmark kit called Mibench was used to verify the utility of the suggested technique. </p> </div> </div> </div>


2020 ◽  
Author(s):  
Prachi Sharma ◽  
Arkid Bera ◽  
Anu Gupta

<div> <div> <div> <p>To curb redundant power consumption in portable embedded and real-time applications, processors are equipped with various Dynamic Voltage and Frequency Scaling (DVFS) techniques. The accuracy of the prediction of the operating frequency of any such technique determines how power-efficient it makes a processor for a variety of programs and users. But, in the recent techniques, the focus has been too much on saving power, thus, ignoring the user-satisfaction metric, i.e. performance. The DVFS technique used to save power, in turn, introduces unwanted latency due to the high complexity of the algorithm. Also, many of the modern DVFS techniques provide feedback manually triggered by the user to change the frequency to conserve energy efficiently, thus, further increasing the reaction time. In this paper, we imple- ment a novel Artificial Neural Networks-driven frequency scaling methodology, which makes it possible to save power and boost performance at the same time, implicitly i.e. without any feedback from the user. Also, to make the system more inclusive concerning the kinds of processes run on it, we trained the ANN not only for CPU-intensive programs but also on the ones that are more memory-bound, i.e. which have frequent memory accesses during its average CPU cycle. The proposed technique has been evaluated on Intel i7-4720HQ Haswell processor and has shown performance boost by up to 20%, SoC power savings up to 16%, and Performance per Watt improvement by up to 30%, as compared to the existing DVFS technique. An open-source memory-intensive benchmark kit called Mibench was used to verify the utility of the suggested technique. </p> </div> </div> </div>


Sign in / Sign up

Export Citation Format

Share Document