cache efficient
Recently Published Documents


TOTAL DOCUMENTS

66
(FIVE YEARS 17)

H-INDEX

16
(FIVE YEARS 2)

Author(s):  
Shengliang Lu ◽  
Shixuan Sun ◽  
Johns Paul ◽  
Yuchen Li ◽  
Bingsheng He
Keyword(s):  

2021 ◽  
Author(s):  
Danila Piatov ◽  
Sven Helmer ◽  
Anton Dignös ◽  
Fabio Persia

AbstractWe develop a family of efficient plane-sweeping interval join algorithms for evaluating a wide range of interval predicates such as Allen’s relationships and parameterized relationships. Our technique is based on a framework, components of which can be flexibly combined in different manners to support the required interval relation. In temporal databases, our algorithms can exploit a well-known and flexible access method, the Timeline Index, thus expanding the set of operations it supports even further. Additionally, employing a compact data structure, the gapless hash map, we utilize the CPU cache efficiently. In an experimental evaluation, we show that our approach is several times faster and scales better than state-of-the-art techniques, while being much better suited for real-time event processing.


Author(s):  
Ying Wan ◽  
Haoyu Song ◽  
Yang Xu ◽  
Yilun Wang ◽  
Tian Pan ◽  
...  
Keyword(s):  

2020 ◽  
Vol 21 (S1) ◽  
Author(s):  
Chunchun Zhao ◽  
Sartaj Sahni

Abstract Background The Damerau-Levenshtein (DL) distance metric has been widely used in the biological science. It tries to identify the similar region of DNA,RNA and protein sequences by transforming one sequence to the another using the substitution, insertion, deletion and transposition operations. Lowrance and Wagner have developed an O(mn) time O(mn) space algorithm to find the minimum cost edit sequence between strings of length m and n, respectively. In our previous research, we have developed algorithms that run in O(mn) time using only O(s∗min{m,n}+m+n) space, where s is the size of the alphabet comprising the strings, to compute the DL distance as well as the corresponding edit sequence. These are so far the fastest and most space efficient algorithms. In this paper, we focus on the development of algorithms whose asymptotic space complexity is linear. Results We develop linear space algorithms to compute the Damerau-Levenshtein (DL) distance between two strings and determine the optimal trace (corresponding edit operations.)Extensive experiments conducted on three computational platforms–Xeon E5 2603, I7-x980 and Xeon E5 2695–show that, our algorithms, in addition to using less space, are much faster than earlier algorithms. Conclusion Besides using less space than the previously known algorithms,significant run-time improvement was seen for our new algorithms on all three of our experimental platforms. On all platforms, our linear-space cache-efficient algorithms reduced run time by as much as 56.4% and 57.4% in respect to compute the DL distance and an optimal edit sequences compared to previous algorithms. Our multi-core algorithms reduced the run time by up to 59.3% compared to the best previously known multi-core algorithms.


2020 ◽  
Author(s):  
Grant P. Strimel ◽  
Ariya Rastrow ◽  
Gautam Tiwari ◽  
Adrien Piérard ◽  
Jon Webb

Author(s):  
Sanaz Gheibi ◽  
Tania Banerjee ◽  
Sanjay Ranka ◽  
Sartaj Sahni
Keyword(s):  

2020 ◽  
Vol 159 ◽  
pp. 186-197
Author(s):  
Anuj Jain ◽  
Sartaj Sahni

2019 ◽  
Vol 5 (1) ◽  
Author(s):  
Benjamin Villalonga ◽  
Sergio Boixo ◽  
Bron Nelson ◽  
Christopher Henze ◽  
Eleanor Rieffel ◽  
...  

Abstract Here we present qFlex, a flexible tensor network-based quantum circuit simulator. qFlex can compute both the exact amplitudes, essential for the verification of the quantum hardware, as well as low-fidelity amplitudes, to mimic sampling from Noisy Intermediate-Scale Quantum (NISQ) devices. In this work, we focus on random quantum circuits (RQCs) in the range of sizes expected for supremacy experiments. Fidelity f simulations are performed at a cost that is 1/f lower than perfect fidelity ones. We also present a technique to eliminate the overhead introduced by rejection sampling in most tensor network approaches. We benchmark the simulation of square lattices and Google’s Bristlecone QPU. Our analysis is supported by extensive simulations on NASA HPC clusters Pleiades and Electra. For our most computationally demanding simulation, the two clusters combined reached a peak of 20 Peta Floating Point Operations per Second (PFLOPS) (single precision), i.e., 64% of their maximum achievable performance, which represents the largest numerical computation in terms of sustained FLOPs and the number of nodes utilized ever run on NASA HPC clusters. Finally, we introduce a novel multithreaded, cache-efficient tensor index permutation algorithm of general application.


Sign in / Sign up

Export Citation Format

Share Document