Underclocked Software Prefetching: More Cores, Less Energy

Md Kamruzzaman; Steven Swanson; Dean M. Tullsen

doi:10.1109/mm.2012.54

Analysing software prefetching opportunities in hardware transactional memory

The Journal of Supercomputing ◽

10.1007/s11227-021-03897-z ◽

2021 ◽

Author(s):

Marina Shimchenko ◽

Rubén Titos-Gil ◽

Ricardo Fernández-Pascual ◽

Manuel E. Acacio ◽

Stefanos Kaxiras ◽

...

Keyword(s):

Transactional Memory ◽

Hardware Transactional Memory ◽

Software Prefetching

Fast lightweight accurate xenograft sorting

Algorithms for Molecular Biology ◽

10.1186/s13015-021-00181-w ◽

2021 ◽

Vol 16 (1) ◽

Author(s):

Jens Zentgraf ◽

Sven Rahmann

Keyword(s):

State Of The Art ◽

Hash Table ◽

Human Tumor ◽

Surrounding Tissue ◽

Cpu Time ◽

Alignment Free ◽

Time Usage ◽

The One ◽

Similar Accuracy ◽

Software Prefetching

Abstract Motivation With an increasing number of patient-derived xenograft (PDX) models being created and subsequently sequenced to study tumor heterogeneity and to guide therapy decisions, there is a similarly increasing need for methods to separate reads originating from the graft (human) tumor and reads originating from the host species’ (mouse) surrounding tissue. Two kinds of methods are in use: On the one hand, alignment-based tools require that reads are mapped and aligned (by an external mapper/aligner) to the host and graft genomes separately first; the tool itself then processes the resulting alignments and quality metrics (typically BAM files) to assign each read or read pair. On the other hand, alignment-free tools work directly on the raw read data (typically FASTQ files). Recent studies compare different approaches and tools, with varying results. Results We show that alignment-free methods for xenograft sorting are superior concerning CPU time usage and equivalent in accuracy. We improve upon the state of the art sorting by presenting a fast lightweight approach based on three-way bucketed quotiented Cuckoo hashing. Our hash table requires memory comparable to an FM index typically used for read alignment and less than other alignment-free approaches. It allows extremely fast lookups and uses less CPU time than other alignment-free methods and alignment-based methods at similar accuracy. Several engineering steps (e.g., shortcuts for unsuccessful lookups, software prefetching) improve the performance even further. Availability Our software xengsort is available under the MIT license at http://gitlab.com/genomeinformatics/xengsort. It is written in numba-compiled Python and comes with sample Snakemake workflows for hash table construction and dataset processing.

Adaptive software prefetching in scalable multiprocessors using cache information

Parallel Computing ◽

10.1016/s0167-8191(01)00085-0 ◽

2001 ◽

Vol 27 (9) ◽

pp. 1173-1195

Author(s):

Daeyeon Park ◽

Byeong Hag Seong ◽

Rafael H Saavedra

Keyword(s):

Adaptive Software ◽

Scalable Multiprocessors ◽

Software Prefetching

A Performance Study of Software Prefetching for Tracing Garbage Collectors

Lecture Notes in Computer Science - Advanced Parallel Processing Technologies ◽

10.1007/978-3-642-45293-2_12 ◽

2013 ◽

pp. 160-169

Author(s):

Hao Wu ◽

Zhenzhou Ji ◽

Suxia Zhu ◽

Zhigang Chen

Keyword(s):

Performance Study ◽

Software Prefetching ◽

A Performance

LOW POWER SYSTEM DESIGN BY COMBINING SOFTWARE PREFETCHING AND DYNAMIC VOLTAGE SCALING

Journal of Circuits System and Computers ◽

10.1142/s0218126607003964 ◽

2007 ◽

Vol 16 (05) ◽

pp. 745-767

Author(s):

SUMITKUMAR N. PAMNANI ◽

DEEPAK N. AGARWAL ◽

GANG QU ◽

DONALD YEUNG

Keyword(s):

Energy Consumption ◽

Low Power ◽

Energy Saving ◽

System Performance ◽

Performance Enhancement ◽

Voltage Scaling ◽

Dynamic Voltage Scaling ◽

Dynamic Voltage ◽

Software Prefetching ◽

Online Profiling

Performance-enhancement techniques improve CPU speed at the cost of other valuable system resources such as power and energy. Software prefetching is one such technique, tolerating memory latency for high performance. In this article, we quantitatively study this technique's impact on system performance and power/energy consumption. First, we demonstrate that software prefetching achieves an average of 36% performance improvement with 8% additional energy consumption and 69% higher power consumption on six memory-intensive benchmarks. Then we combine software prefetching with a (unrealistic) static voltage scaling technique to show that this performance gain can be converted to an average of 48% energy saving. This suggests that it is promising to build low power systems with techniques traditionally known for performance enhancement. We thus propose a practical online profiling based dynamic voltage scaling (DVS) algorithm. The algorithm monitors system's performance and adapts the voltage level accordingly to save energy while maintaining the observed system performance. Our proposed online profiling DVS algorithm achieves 38% energy saving without any significant performance loss.

Data flow analysis for software prefetching linked data structures in Java

Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques ◽

10.1109/pact.2001.953309 ◽

2002 ◽

Cited By ~ 28

Author(s):

B. Cahoon ◽

K.S. McKinley

Keyword(s):

Data Structures ◽

Linked Data ◽

Data Flow ◽

Flow Analysis ◽

Data Flow Analysis ◽

Software Prefetching

Software Prefetching for Unstructured Mesh Applications

2018 IEEE/ACM 8th Workshop on Irregular Applications: Architectures and Algorithms (IA3) ◽

10.1109/ia3.2018.00009 ◽

2018 ◽

Cited By ~ 4

Author(s):

Ioan Hadade ◽

Timothy M. Jones ◽

Feng Wang ◽

Luca di Mare

Keyword(s):

Unstructured Mesh ◽

Software Prefetching

Reducing the impact of software prefetching on register pressure

Proceedings of the 2000 ACM symposium on Applied computing - SAC '00 ◽

10.1145/338407.338561 ◽

2000 ◽

Author(s):

David W. Shrewsbury ◽

Cindy Norris

Keyword(s):

Register Pressure ◽

Software Prefetching ◽

The Impact

Improving the effectiveness of software prefetching with adaptive executions

Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique ◽

10.1109/pact.1996.552556 ◽

2002 ◽

Cited By ~ 12

Author(s):

R.H. Saavedra ◽

Daeyeon Park

Keyword(s):

Software Prefetching

Three-level performance optimization for heterogeneous systems based on software prefetching under power constraints

Future Generation Computer Systems ◽

10.1016/j.future.2018.03.009 ◽

2018 ◽

Vol 86 ◽

pp. 51-58 ◽

Cited By ~ 4

Author(s):

Zhuowei Wang ◽

Wuqing Zhao ◽

Hao Wang ◽

Lianglun Cheng

Keyword(s):

Performance Optimization ◽

Heterogeneous Systems ◽

Power Constraints ◽

Software Prefetching ◽

Level Performance