speculative execution
Recently Published Documents


TOTAL DOCUMENTS

198
(FIVE YEARS 55)

H-INDEX

19
(FIVE YEARS 5)

2022 ◽  
Vol 355 ◽  
pp. 03054
Author(s):  
Dehua Wu ◽  
Wan’ang Xiao ◽  
Shan Gao ◽  
Wanlin Gao

The Spectre attacks exploit the speculative execution vulnerabilities to exfiltrate private information by building a leakage channel. Creation of a leakage channel is the basic element for spectre attacks, among which the cache-tag side channel is considered to be the most serious one. To block the leakage channels, a novel cache applies Dynamic Mapping technology, named DmCache, is presented in this paper. DmCache applies a dynamic mapping mechanism to temporarily store all the cache lines polluted by speculative execution and keep invisible when accessing. Then it monitors the head of the reorder buffer to determine which polluted cache line can become visible. In this paper, we demonstrated that Spectre attacks exerted no impact on a processor system equipped with DmCache based on the analysis of the processor’s circuit behaviour, which equipped with the DmCache and under the Spectre attack.


Author(s):  
I. Yu. Sesin ◽  
R. G. Bolbakov

General Purpose computing for Graphical Processing Units (GPGPU) technology is a powerful tool for offloading parallel data processing tasks to Graphical Processing Units (GPUs). This technology finds its use in variety of domains – from science and commerce to hobbyists. GPU-run general-purpose programs will inevitably run into performance issues stemming from code branch predication. Code predication is a GPU feature that makes both conditional branches execute, masking the results of incorrect branch. This leads to considerable performance losses for GPU programs that have large amounts of code hidden away behind conditional operators. This paper focuses on the analysis of existing approaches to improving software performance in the context of relieving the aforementioned performance loss. Description of said approaches is provided, along with their upsides, downsides and extents of their applicability and whether they address the outlined problem. Covered approaches include: optimizing compilers, JIT-compilation, branch predictor, speculative execution, adaptive optimization, run-time algorithm specialization, profile-guided optimization. It is shown that the aforementioned methods are mostly catered to CPU-specific issues and are generally not applicable, as far as branch-predication performance loss is concerned. Lastly, we outline the need for a separate performance improving approach, addressing specifics of branch predication and GPGPU workflow.


2021 ◽  
Author(s):  
Alban Gruin ◽  
Thomas Carle ◽  
Hugues Casse ◽  
Christine Rochange

2021 ◽  
Vol 64 (12) ◽  
pp. 105-112
Author(s):  
Jiyong Yu ◽  
Mengjia Yan ◽  
Artem Khyzha ◽  
Adam Morrison ◽  
Josep Torrellas ◽  
...  

Speculative execution attacks present an enormous security threat, capable of reading arbitrary program data under malicious speculation, and later exfiltrating that data over microarchitectural covert channels. This paper proposes speculative taint tracking (STT), a high security and high performance hardware mechanism to block these attacks. The main idea is that it is safe to execute and selectively forward the results of speculative instructions that read secrets, as long as we can prove that the forwarded results do not reach potential covert channels. The technical core of the paper is a new abstraction to help identify all micro-architectural covert channels, and an architecture to quickly identify when a covert channel is no longer a threat. We further conduct a detailed formal analysis on the scheme in a companion document. When evaluated on SPEC06 workloads, STT incurs 8.5% or 14.5% performance overhead relative to an insecure machine.


2021 ◽  
Vol 5 (OOPSLA) ◽  
pp. 1-28
Author(s):  
Robert Brotzman ◽  
Danfeng Zhang ◽  
Mahmut Taylan Kandemir ◽  
Gang Tan

The high-profile Spectre attack and its variants have revealed that speculative execution may leave secret-dependent footprints in the cache, allowing an attacker to learn confidential data. However, existing static side-channel detectors either ignore speculative execution, leading to false negatives, or lack a precise cache model, leading to false positives. In this paper, somewhat surprisingly, we show that it is challenging to develop a speculation-aware static analysis with precise cache models: a combination of existing works does not necessarily catch all cache side channels. Motivated by this observation, we present a new semantic definition of security against cache-based side-channel attacks, called Speculative-Aware noninterference (SANI), which is applicable to a variety of attacks and cache models. We also develop SpecSafe to detect the violations of SANI. Unlike other speculation-aware symbolic executors, SpecSafe employs a novel program transformation so that SANI can be soundly checked by speculation-unaware side-channel detectors. SpecSafe is shown to be both scalable and accurate on a set of moderately sized benchmarks, including commonly used cryptography libraries.


2021 ◽  
Author(s):  
Rutvik Choudhary ◽  
Jiyong Yu ◽  
Christopher Fletcher ◽  
Adam Morrison

2021 ◽  
Author(s):  
Guangyuan Hu ◽  
Zecheng He ◽  
Ruby B. Lee

2021 ◽  
Vol 23 (08) ◽  
pp. 931-935
Author(s):  
Ajay Kumar Bansal ◽  
◽  
Manmohan Sharma ◽  
Ashu Gupta ◽  
◽  
...  

Modern computing systems are generally enormous in scale, consisting of hundreds to thousands of heterogeneous machine nodes, to meet rising demands for Cloud services. MapReduce and other parallel computing frameworks are frequently used on such cluster architecture to offer consumers dependable and timely services. However, Cloud workloads’ complex features, such as multi-dimensional resource requirements and dynamically changing system settings, such as dynamic node performance, are posing new difficulties for providers in terms of both customer experience and system efficiency. The straggler problem occurs when a small subset of parallelized jobs takes an excessively long time to execute in contrast to their siblings, resulting in a delayed job response and the possibility of late-timing failure. Speculative execution is the state-of-the-art method to straggler mitigation. Speculative execution has been used in numerous real-world systems with a variety of implementation improvements, but the results of this thesis’ research demonstrate that it is typically wasteful. The failure rate of speculative execution might be as high as 71 percent, according to different data center production trace logs. Straggler mitigation is a difficult task in and of itself: 1) stragglers may have varying degrees of severity in parallel job execution; 2) whether a task should be considered a straggler is highly subjective, depending on various application and system conditions; 3) the efficiency of speculative execution would be improved if dynamic node quality could be adequately modeled and predicted; 4) Other sorts of stragglers, such as those generated by data skews, are beyond speculative execution’s capabilities.


2021 ◽  
Vol 17 (3) ◽  
pp. 1-32
Author(s):  
Xingda Wei ◽  
Rong Chen ◽  
Haibo Chen ◽  
Binyu Zang

RDMA ( Remote Direct Memory Access ) has gained considerable interests in network-attached in-memory key-value stores. However, traversing the remote tree-based index in ordered key-value stores with RDMA becomes a critical obstacle, causing an order-of-magnitude slowdown and limited scalability due to multiple round trips. Using index cache with conventional wisdom—caching partial data and traversing them locally—usually leads to limited effect because of unavoidable capacity misses, massive random accesses, and costly cache invalidations. We argue that the machine learning (ML) model is a perfect cache structure for the tree-based index, termed learned cache . Based on it, we design and implement XStore , an RDMA-based ordered key-value store with a new hybrid architecture that retains a tree-based index at the server to perform dynamic workloads (e.g., inserts) and leverages a learned cache at the client to perform static workloads (e.g., gets and scans). The key idea is to decouple ML model retraining from index updating by maintaining a layer of indirection from logical to actual positions of key-value pairs. It allows a stale learned cache to continue predicting a correct position for a lookup key. XStore ensures correctness using a validation mechanism with a fallback path and further uses speculative execution to minimize the cost of cache misses. Evaluations with YCSB benchmarks and production workloads show that a single XStore server can achieve over 80 million read-only requests per second. This number outperforms state-of-the-art RDMA-based ordered key-value stores (namely, DrTM-Tree, Cell, and eRPC+Masstree) by up to 5.9× (from 3.7×). For workloads with inserts, XStore still provides up to 3.5× (from 2.7×) throughput speedup, achieving 53M reqs/s. The learned cache can also reduce client-side memory usage and further provides an efficient memory-performance tradeoff, e.g., saving 99% memory at the cost of 20% peak throughput.


Sign in / Sign up

Export Citation Format

Share Document