Exploiting Multi-level Parallelism for Homology Search using General Purpose Processors

Author(s):  
Xiandong Meng ◽  
V. Chaudhary
2008 ◽  
Vol 10 (1) ◽  
pp. 43-51 ◽  
Author(s):  
Asadollah Shahbahrami ◽  
Ben Juurlink ◽  
Stamatis Vassiliadis

2010 ◽  
Vol 7 (1) ◽  
pp. 189-200 ◽  
Author(s):  
Haitao Wei ◽  
Yu Junqing ◽  
Li Jiang

As a video coding standard, H.264 achieves high compress rate while keeping good fidelity. But it requires more intensive computation than before to get such high coding performance. A Hierarchical Multi-level Parallelisms (HMLP) framework for H.264 encoder is proposed which integrates four level parallelisms - frame-level, slice-level, macroblock-level and data-level into one implementation. Each level parallelism is designed in a hierarchical parallel framework and mapped onto the multi-cores and SIMD units on multi-core architecture. According to the analysis of coding performance on each level parallelism, we propose a method to combine different parallel levels to attain a good compromise between high speedup and low bit-rate. The experimental results show that for CIF format video, our method achieves the speedup of 33.57x-42.3x with 1.04x-1.08x bit-rate increasing on 8-core Intel Xeon processor with SIMD Technology.


Author(s):  
Chao-Chin Wu ◽  
Jenn-Yang Ke ◽  
Heshan Lin ◽  
Syun-Sheng Jhan

Dynamic Programming (DP) is an important and popular method for solving a wide variety of discrete optimization problems such as scheduling, string-editing, packaging, and inventory management. DP breaks problems into simpler subproblems and combines their solutions into solutions to original ones. This paper focuses on one type of dynamic programming called Nonserial Polyadic Dynamic Programming (NPDP). To run NPDP applications efficiently on an emerging General-Purpose Graphic Processing Unit (GPGPU), the authors have to exploit more parallelism to fully utilize the computing power of the hundreds of processing units in it. However, the parallelism degree varies significantly in different phases of the NPDP applications. To address the problem, the authors propose a method that can adjust the thread-level parallelism to provide a sufficient and steadier parallelism degree for different phases. If a phase has insufficient parallelism, the authors split threads into subthreads. On the other hand, the authors can limit the total number of threads in a phase by merging threads. The authors also examine the difference between the conventional problem of finding the minimum on a GPU and the NPDP-featured problem of finding the minimums of many independent sets on a GPU. Finally, the authors examine how to design an appropriate data structure to apply the memory coalescing optimization technique. The experimental results demonstrate our method can obtain the best speedup of 13.40 over the algorithm published previously.


VLSI Design ◽  
2016 ◽  
Vol 2016 ◽  
pp. 1-12 ◽  
Author(s):  
Yumin Hou ◽  
Hu He ◽  
Xu Yang ◽  
Deyuan Guo ◽  
Xu Wang ◽  
...  

This paper proposes FuMicro, a fused microarchitecture integrating both in-order superscalar and Very Long Instruction Word (VLIW) in a single core. A processor with FuMicro microarchitecture can work under alternative in-order superscalar and VLIW mode, using the same pipeline and the same Instruction Set Architecture (ISA). Small modification to the compiler is made to expand the register file in VLIW mode. The decision of mode switch is made by software, and this does not need extra hardware. VLIW code can be exploited in the form of library function and the users will be exposed under only superscalar mode; by this means, we can provide the users with a convenient development environment. FuMicro could serve as a universal microarchitecture for it can be applied to different ISAs. In this paper, we focus on the implementation of FuMicro with ARM ISA. This architecture is evaluated on gem5, which is a cycle accurate microarchitecture simulation platform. By adopting FuMicro microarchitecture, the performance can be improved on an average of 10%, with the best performance improvement being 47.3%, compared with that under pure in-order superscalar mode. The result shows that FuMicro microarchitecture can improve Instruction Level Parallelism (ILP) significantly, making it promising to expand digital signal processing capability on a General Purpose Processor.


Sign in / Sign up

Export Citation Format

Share Document