LPE: Locality-based Dead Prediction in Exclusive TLB for Large Coverage

Author(s):  
Jing Yan ◽  
Yujuan Tan ◽  
Zhulin Ma ◽  
Jingcheng Liu ◽  
Xianzhang Chen ◽  
...  

Translation lookaside buffer (TLB) is critical to modern multi-level memory systems’ performance. However, due to the limited size of the TLB itself, its address coverage is limited. Adopting a two-level exclusive TLB hierarchy can increase the coverage [M. Swanson, L. Stoller and J. Carter, Increasing TLB reach using superpages backed by shadow memory, 25th Annual Int. Symp. Computer Architecture (1998); H.P. Chang, T. Heo, J. Jeong and J. Huh Hybrid TLB coalescing: Improving TLB translation coverage under diverse fragmented memory allocations, ACM SIGARCH Comput. Arch. News 45 (2017) 444–456] to improve memory performance. However, after analyzing the existing two-level exclusive TLBs, we find that a large number of “dead” entries (they will have no further use) exist in the last-level TLB (LLT) for a long time, which occupy much cache space and result in low TLB hit-rate. Based on this observation, we first propose exploiting temporal and spatial locality to predict and identify dead entries in the exclusive LLT and remove them as soon as possible to leave room for more valid data to increase the TLB hit rates. Extensive experiments show that our method increases the average hit rate by 8.67%, to a maximum of 19.95%, and reduces total latency by an average of 9.82%, up to 24.41%.

2020 ◽  
Author(s):  
Sofia Persson ◽  
Alan Yates ◽  
Klaus Kessler ◽  
Ben Harkin

Even though memory performance is a commonly researched aspect of Obsessive-Compulsive Disorder (OCD), a coherent and unified explanation of the role of specific cognitive factors has remained elusive. To address this, the present meta-analysis examined the predictive validity of Harkin and Kessler’s (2011) Executive Function (E), Binding Complexity (B) and Memory Load (L) Classification System with regards to affected vs. unaffected memory performance in OCD. We employed a multi-level meta-analytic approach (Viechtbauer, 2010) to accommodate the interdependent nature of the EBL model and interdependency of effect sizes (305 effect sizes from 144 studies, including 4424 OCD patients). Results revealed that the EBL model predicted memory performance, i.e., as EBL demand increases, those with OCD performed progressively worse on memory tasks. Executive function was the driving mechanism behind the EBL’s impact on OCD memory performance and negated effect size differences between visual and verbal tasks in those with OCD. Comparisons of sub-task effect sizes were also generally in accord with the cognitive parameters of the EBL taxonomy. We conclude that standardised coding of tasks along individual cognitive dimensions and multi-level meta-analyses provides a new approach to examine multi-dimensional models of memory and cognitive performance in OCD and other disorders.


2013 ◽  
Vol 730 ◽  
pp. 593-606 ◽  
Author(s):  
L. Djenidi ◽  
S. F. Tardu ◽  
R. A. Antonia

AbstractA long-time direct numerical simulation (DNS) based on the lattice Boltzmann method is carried out for grid turbulence with the view to compare spatially averaged statistical properties in planes perpendicular to the mean flow with their temporal counterparts. The results show that the two averages become equal a short distance downstream of the grid. This equality indicates that the flow has become homogeneous in a plane perpendicular to the mean flow. This is an important result, since it confirms that hot-wire measurements are appropriate for testing theoretical results based on spatially averaged statistics. It is equally important in the context of DNS of grid turbulence, since it justifies the use of spatial averaging along a lateral direction and over several realizations for determining various statistical properties. Finally, the very good agreement between temporal and spatial averages validates the comparison between temporal (experiments) and spatial (DNS) statistical properties. The results are also interesting because, since the flow is stationary in time and spatially homogeneous along lateral directions, the equality between the two types of averaging provides strong support for the ergodic hypothesis in grid turbulence in planes perpendicular to the mean flow.


2012 ◽  
Vol 2012 ◽  
pp. 1-12 ◽  
Author(s):  
Shaily Mittal ◽  
Nitin

Nowadays, Multiprocessor System-on-Chip (MPSoC) architectures are mainly focused on by manufacturers to provide increased concurrency, instead of increased clock speed, for embedded systems. However, managing concurrency is a tough task. Hence, one major issue is to synchronize concurrent accesses to shared memory. An important characteristic of any system design process is memory configuration and data flow management. Although, it is very important to select a correct memory configuration, it might be equally imperative to choreograph the data flow between various levels of memory in an optimal manner. Memory map is a multiprocessor simulator to choreograph data flow in individual caches of multiple processors and shared memory systems. This simulator allows user to specify cache reconfigurations and number of processors within the application program and evaluates cache miss and hit rate for each configuration phase taking into account reconfiguration costs. The code is open source and in java.


2014 ◽  
Vol 571-572 ◽  
pp. 381-388
Author(s):  
Xian Tuo Tang ◽  
Guang Fu Zeng ◽  
Feng Wang ◽  
Zuo Cheng Xing ◽  
Chao Chao Feng

By exploiting communication temporal and spatial locality represented in actual applications, the paper proposes a locality-route pre-configuration mechanism (i.e. LRPC) on top of the Pseudo-Circuit scheme, to further accelerate network performance. Under the original Pseudo-circuit scheme, LRPC attempts to preconfigure another sharable crossbar connection at each input port within a single router when the pseudo circuit is invalid currently, so as to produce more available sharable route for packets transfer, and hence to enhance the reusability of the sharable route as well as communication performance. Our evaluation results using a cycle-accurate network simulator with traces from Splash-2 Benchmark show 5.4% and 31.6% improvement in overall network performance compared to Pseudo-Circuit and BASE_LR_SPC routers, respectively. Evaluated with synthetic workload traffic, at most 10.91% and 33.72% performance improvement can be achieved by the LRPC router under the Uniform-random, Bit-complement and Transpose traffic as compared to Pseudo-Circuit and BASE_LR_SPC routers.


1974 ◽  
Vol 3 (32) ◽  
Author(s):  
L. Phillip Caillouet ◽  
Bruce D. Shriver

This paper offers an introduction to a research effort in fault tolerant computer architecture which has been organized at the University of Southwestern Louisiana (USL). It is intended as an overview of several topics which have been isolated for study, and as an indication of preliminary undertakings with regards to one particular topic. This first area of concentration lnvolves the systematic design of fault tolerant computing systems via a multi-level approach. Efforts are being initiated also in the areas of diagnosis of microprogrammable processors via firmware, fault data management across levels of virtual machines, development of a methodology for realizing a firmware hardcore on a variety of hosts, and delineation of a minimal set of resources for the design of a practical host for a multi-level fault tolerant computing system. The research is being conducted under the auspices of Project Beta at USL.


2021 ◽  
Author(s):  
Timothy F. Brady ◽  
Maria Martinovna Robinson ◽  
Jamal Rodgers Williams ◽  
John Wixted

There is a crisis of measurement in memory research, with major implications for theory and practice. This crisis arises because of a critical complication present when measuring memory using the recognition memory task that dominates the study of working memory and long-term memory (“did you see this item? yes/no” or “did this item change? yes/no”). Such tasks give two measures of performance, the “hit rate” (how often you say you previously saw an item you actually did previously see) and the “false alarm rate” (how often you say you saw something you never saw). Yet what researchers want is one single, integrated measure of memory performance. Integrating the hit and false alarm rate into a single measure, however, requires a complex problem of counterfactual reasoning that depends on the (unknowable) distribution of underlying memory signals: when faced with two people differing in both hit rate and false alarm rate, the question of who had the better memory is really “who would have had more hits if they each had the same number of false alarms”. As a result of this difficulty, different literatures in memory research (e.g., visual working memory, eyewitness identification, picture memory, etc) have settled on a variety of distinct metrics to combine hit rates and false alarm rates (e.g., A’, corrected hit rate, percent correct, d’, diagnosticity ratios, K values, etc.). These metrics make different, contradictory assumptions about the distribution of latent memory signals, and all of their assumptions are frequently incorrect. Despite a large literature on how to properly measure memory performance, spanning decades, real-life decisions are often made using these metrics, even when they subsequently turn out to be wrong when memory is studied with better measures. We suggest that in order for the psychology and neuroscience of memory to become a cumulative, theory-driven science, more attention must be given to measurement issues. We make a concrete suggestion: the default memory task should change from old/new (“did you see this item’?”) to forced-choice (“which of these two items did you see?”). In situations where old/new variants are preferred (e.g., eyewitness identification; theoretical investigations of the nature of memory decisions), receiver operating characteristic (ROC) analysis should always be performed.


The power consumption in commercial processors and application specific integrated circuits increases with decreasing technology nodes. Power saving techniques have become a first class design point for current and future VLSI systems. These systems employ large on-chip SRAM memories. Reducing memory leakage power while maintaining data integrity is a key criterion for modern day systems. Unfortunately, state of the art techniques like power-gating can only be applied to logic as these would destroy the contents of the memory if applied to a SRAM system. Fortunately, previous works have noted large temporal and spatial locality for data patterns in commerical processors as well as application specific ICs that work on images, audio and video data. This paper presents a novel column based Energy Compression technique that saves SRAM power by selectively turning off cells based on a data pattern. This technique is applied to study the power savings in application specific inegrated circuit SRAM memories and can also be applied for commercial processors. The paper also evaluates the effects of processing images before storage and data cluster patterns for optimizing power savings..


Sign in / Sign up

Export Citation Format

Share Document