Exploiting Multi-level Parallelism for Homology Search using General Purpose Processors

As a video coding standard, H.264 achieves high compress rate while keeping good fidelity. But it requires more intensive computation than before to get such high coding performance. A Hierarchical Multi-level Parallelisms (HMLP) framework for H.264 encoder is proposed which integrates four level parallelisms - frame-level, slice-level, macroblock-level and data-level into one implementation. Each level parallelism is designed in a hierarchical parallel framework and mapped onto the multi-cores and SIMD units on multi-core architecture. According to the analysis of coding performance on each level parallelism, we propose a method to combine different parallel levels to attain a good compromise between high speedup and low bit-rate. The experimental results show that for CIF format video, our method achieves the speedup of 33.57x-42.3x with 1.04x-1.08x bit-rate increasing on 8-core Intel Xeon processor with SIMD Technology.

Download Full-text

Facts and myths about media processing on general-purpose processors

International Conference on Information Technology: Research and Education, 2003. Proceedings. ITRE2003. ◽

10.1109/itre.2003.1270569 ◽

2003 ◽

Author(s):

D. Talla ◽

L.K. John

Keyword(s):

General Purpose ◽

Media Processing ◽

General Purpose Processors

Download Full-text

Mapping Streaming Languages to General Purpose Processors through Vectorization

Languages and Compilers for Parallel Computing - Lecture Notes in Computer Science ◽

10.1007/978-3-642-13374-9_7 ◽

2010 ◽

pp. 95-110

Author(s):

Raymond Manley ◽

David Gregg

Keyword(s):

General Purpose ◽

General Purpose Processors

Download Full-text

Adjusting Thread Parallelism Dynamically to Accelerate Dynamic Programming with Irregular Workload Distribution on GPGPUs

International Journal of Grid and High Performance Computing ◽

10.4018/ijghpc.2014010101 ◽

2014 ◽

Vol 6 (1) ◽

pp. 1-20 ◽

Cited By ~ 8

Author(s):

Chao-Chin Wu ◽

Jenn-Yang Ke ◽

Heshan Lin ◽

Syun-Sheng Jhan

Keyword(s):

Dynamic Programming ◽

Inventory Management ◽

Optimization Problems ◽

Optimization Technique ◽

General Purpose ◽

Processing Unit ◽

Computing Power ◽

Workload Distribution ◽

The Difference ◽

Level Parallelism

Dynamic Programming (DP) is an important and popular method for solving a wide variety of discrete optimization problems such as scheduling, string-editing, packaging, and inventory management. DP breaks problems into simpler subproblems and combines their solutions into solutions to original ones. This paper focuses on one type of dynamic programming called Nonserial Polyadic Dynamic Programming (NPDP). To run NPDP applications efficiently on an emerging General-Purpose Graphic Processing Unit (GPGPU), the authors have to exploit more parallelism to fully utilize the computing power of the hundreds of processing units in it. However, the parallelism degree varies significantly in different phases of the NPDP applications. To address the problem, the authors propose a method that can adjust the thread-level parallelism to provide a sufficient and steadier parallelism degree for different phases. If a phase has insufficient parallelism, the authors split threads into subthreads. On the other hand, the authors can limit the total number of threads in a phase by merging threads. The authors also examine the difference between the conventional problem of finding the minimum on a GPU and the NPDP-featured problem of finding the minimums of many independent sets on a GPU. Finally, the authors examine how to design an appropriate data structure to apply the memory coalescing optimization technique. The experimental results demonstrate our method can obtain the best speedup of 13.40 over the algorithm published previously.

Download Full-text

FuMicro: A Fused Microarchitecture Design Integrating In-Order Superscalar and VLIW

VLSI Design ◽

10.1155/2016/8787919 ◽

2016 ◽

Vol 2016 ◽

pp. 1-12 ◽

Cited By ~ 3

Author(s):

Yumin Hou ◽

Hu He ◽

Xu Yang ◽

Deyuan Guo ◽

Xu Wang ◽

...

Keyword(s):

Digital Signal ◽

General Purpose ◽

Instruction Level Parallelism ◽

Instruction Set ◽

Mode Switch ◽

Development Environment ◽

General Purpose Processor ◽

Improve Instruction ◽

Library Function ◽

Level Parallelism

This paper proposes FuMicro, a fused microarchitecture integrating both in-order superscalar and Very Long Instruction Word (VLIW) in a single core. A processor with FuMicro microarchitecture can work under alternative in-order superscalar and VLIW mode, using the same pipeline and the same Instruction Set Architecture (ISA). Small modification to the compiler is made to expand the register file in VLIW mode. The decision of mode switch is made by software, and this does not need extra hardware. VLIW code can be exploited in the form of library function and the users will be exposed under only superscalar mode; by this means, we can provide the users with a convenient development environment. FuMicro could serve as a universal microarchitecture for it can be applied to different ISAs. In this paper, we focus on the implementation of FuMicro with ARM ISA. This architecture is evaluated on gem5, which is a cycle accurate microarchitecture simulation platform. By adopting FuMicro microarchitecture, the performance can be improved on an average of 10%, with the best performance improvement being 47.3%, compared with that under pure in-order superscalar mode. The result shows that FuMicro microarchitecture can improve Instruction Level Parallelism (ILP) significantly, making it promising to expand digital signal processing capability on a General Purpose Processor.

Download Full-text