cache efficient Latest Research Papers

AbstractWe develop a family of efficient plane-sweeping interval join algorithms for evaluating a wide range of interval predicates such as Allen’s relationships and parameterized relationships. Our technique is based on a framework, components of which can be flexibly combined in different manners to support the required interval relation. In temporal databases, our algorithms can exploit a well-known and flexible access method, the Timeline Index, thus expanding the set of operations it supports even further. Additionally, employing a compact data structure, the gapless hash map, we utilize the CPU cache efficiently. In an experimental evaluation, we show that our approach is several times faster and scales better than state-of-the-art techniques, while being much better suited for real-time event processing.

Download Full-text

T-Cache: Efficient Policy-Based Forwarding Using Small TCAM

IEEE/ACM Transactions on Networking ◽

10.1109/tnet.2021.3098320 ◽

2021 ◽

pp. 1-16

Author(s):

Ying Wan ◽

Haoyu Song ◽

Yang Xu ◽

Yilun Wang ◽

Tian Pan ◽

...

Keyword(s):

Cache Efficient

Download Full-text

Linear space string correction algorithm using the Damerau-Levenshtein distance

BMC Bioinformatics ◽

10.1186/s12859-019-3184-8 ◽

2020 ◽

Vol 21 (S1) ◽

Author(s):

Chunchun Zhao ◽

Sartaj Sahni

Keyword(s):

Linear Space ◽

Minimum Cost ◽

Efficient Algorithms ◽

Levenshtein Distance ◽

Correction Algorithm ◽

Similar Region ◽

String Correction ◽

Cache Efficient ◽

Run Time ◽

New Algorithms

Abstract Background The Damerau-Levenshtein (DL) distance metric has been widely used in the biological science. It tries to identify the similar region of DNA,RNA and protein sequences by transforming one sequence to the another using the substitution, insertion, deletion and transposition operations. Lowrance and Wagner have developed an O(mn) time O(mn) space algorithm to find the minimum cost edit sequence between strings of length m and n, respectively. In our previous research, we have developed algorithms that run in O(mn) time using only O(s∗min{m,n}+m+n) space, where s is the size of the alphabet comprising the strings, to compute the DL distance as well as the corresponding edit sequence. These are so far the fastest and most space efficient algorithms. In this paper, we focus on the development of algorithms whose asymptotic space complexity is linear. Results We develop linear space algorithms to compute the Damerau-Levenshtein (DL) distance between two strings and determine the optimal trace (corresponding edit operations.)Extensive experiments conducted on three computational platforms–Xeon E5 2603, I7-x980 and Xeon E5 2695–show that, our algorithms, in addition to using less space, are much faster than earlier algorithms. Conclusion Besides using less space than the previously known algorithms,significant run-time improvement was seen for our new algorithms on all three of our experimental platforms. On all platforms, our linear-space cache-efficient algorithms reduced run time by as much as 56.4% and 57.4% in respect to compute the DL distance and an optimal edit sequences compared to previous algorithms. Our multi-core algorithms reduced the run time by up to 59.3% compared to the best previously known multi-core algorithms.

Download Full-text

Rescore in a Flash: Compact, Cache Efficient Hashing Data Structures for n-Gram Language Models

10.21437/interspeech.2020-1939 ◽

2020 ◽

Author(s):

Grant P. Strimel ◽

Ariya Rastrow ◽

Gautam Tiwari ◽

Adrien Piérard ◽

Jon Webb

Keyword(s):

Data Structures ◽

Language Models ◽

Cache Efficient ◽

N Gram

Download Full-text

Cache-Efficient Parallel-Partition Algorithms using Exclusive-Read-and-Write Memory

Proceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures ◽

10.1145/3350755.3400234 ◽

2020 ◽

Author(s):

William Kuszmaul ◽

Alek Westover

Keyword(s):

Cache Efficient

Download Full-text

Cache Efficient Louvain with Local RCM

2020 IEEE Symposium on Computers and Communications (ISCC) ◽

10.1109/iscc50000.2020.9219604 ◽

2020 ◽

Author(s):

Sanaz Gheibi ◽

Tania Banerjee ◽

Sanjay Ranka ◽

Sartaj Sahni

Keyword(s):

Cache Efficient

Download Full-text

Cache efficient Value Iteration using clustering and annealing

Computer Communications ◽

10.1016/j.comcom.2020.04.058 ◽

2020 ◽

Vol 159 ◽

pp. 186-197

Author(s):

Anuj Jain ◽

Sartaj Sahni

Keyword(s):

Value Iteration ◽

Cache Efficient

Download Full-text

Automatic Generation of Parallel Cache-Efficient Code Implementing Zuker’s RNA Folding

Artificial Intelligence and Soft Computing - Lecture Notes in Computer Science ◽

10.1007/978-3-030-61401-0_60 ◽

2020 ◽

pp. 646-654

Author(s):

Marek Palkowski ◽

Wlodzimierz Bielecki ◽

Mateusz Gruzewski

Keyword(s):

Rna Folding ◽

Automatic Generation ◽

Cache Efficient ◽

Efficient Code

Download Full-text

A flexible high-performance simulator for verifying and benchmarking quantum circuits implemented on real hardware

npj Quantum Information ◽

10.1038/s41534-019-0196-1 ◽

2019 ◽

Vol 5 (1) ◽

Cited By ~ 13

Author(s):

Benjamin Villalonga ◽

Sergio Boixo ◽

Bron Nelson ◽

Christopher Henze ◽

Eleanor Rieffel ◽

...

Keyword(s):

High Performance ◽

Quantum Circuit ◽

Quantum Circuits ◽

Rejection Sampling ◽

General Application ◽

Circuit Simulator ◽

Tensor Network ◽

Cache Efficient ◽

Real Hardware ◽

Network Approaches

Abstract Here we present qFlex, a flexible tensor network-based quantum circuit simulator. qFlex can compute both the exact amplitudes, essential for the verification of the quantum hardware, as well as low-fidelity amplitudes, to mimic sampling from Noisy Intermediate-Scale Quantum (NISQ) devices. In this work, we focus on random quantum circuits (RQCs) in the range of sizes expected for supremacy experiments. Fidelity f simulations are performed at a cost that is 1/f lower than perfect fidelity ones. We also present a technique to eliminate the overhead introduced by rejection sampling in most tensor network approaches. We benchmark the simulation of square lattices and Google’s Bristlecone QPU. Our analysis is supported by extensive simulations on NASA HPC clusters Pleiades and Electra. For our most computationally demanding simulation, the two clusters combined reached a peak of 20 Peta Floating Point Operations per Second (PFLOPS) (single precision), i.e., 64% of their maximum achievable performance, which represents the largest numerical computation in terms of sustained FLOPs and the number of nodes utilized ever run on NASA HPC clusters. Finally, we introduce a novel multithreaded, cache-efficient tensor index permutation algorithm of general application.

Download Full-text

cache efficient
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Cache-Efficient Fork-Processing Patterns on Large Graphs

Cache-efficient sweeping-based interval joins for extended Allen relation predicates

T-Cache: Efficient Policy-Based Forwarding Using Small TCAM

Linear space string correction algorithm using the Damerau-Levenshtein distance

Rescore in a Flash: Compact, Cache Efficient Hashing Data Structures for n-Gram Language Models

Cache-Efficient Parallel-Partition Algorithms using Exclusive-Read-and-Write Memory

Cache Efficient Louvain with Local RCM

Cache efficient Value Iteration using clustering and annealing

Automatic Generation of Parallel Cache-Efficient Code Implementing Zuker’s RNA Folding

A flexible high-performance simulator for verifying and benchmarking quantum circuits implemented on real hardware

Export Citation Format

cache efficientRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Cache-Efficient Fork-Processing Patterns on Large Graphs

Cache-efficient sweeping-based interval joins for extended Allen relation predicates

T-Cache: Efficient Policy-Based Forwarding Using Small TCAM

Linear space string correction algorithm using the Damerau-Levenshtein distance

Rescore in a Flash: Compact, Cache Efficient Hashing Data Structures for n-Gram Language Models

Cache-Efficient Parallel-Partition Algorithms using Exclusive-Read-and-Write Memory

Cache Efficient Louvain with Local RCM

Cache efficient Value Iteration using clustering and annealing

Automatic Generation of Parallel Cache-Efficient Code Implementing Zuker’s RNA Folding

A flexible high-performance simulator for verifying and benchmarking quantum circuits implemented on real hardware

cache efficient
Recently Published Documents