scholarly journals On SPARC LEON-2 ISA Extensions Experiments for MPEG Encoding Acceleration

VLSI Design ◽  
2007 ◽  
Vol 2007 ◽  
pp. 1-10 ◽  
Author(s):  
P. Guironnet de Massas ◽  
P. Amblard ◽  
F. Pétrot

This paper presents the necessary steps to modify the implementation of the SPARCV8 architecture to enhance it with multimedia-oriented instructions. The purpose is improving video compression performance without designing dedicated coprocessors. We investigate the complexity of modifying a standard processor instruction set and show that, although not trivial, this is feasible in a few weeks. We implemented 12 new instructions and use some of them to optimize the computation of a demanding step of the MPEG encoding. The result is a performance increase of 67% in the execution of a part of this algorithm, allowing us to expect a 30% speedup in the execution of an MPEG video compression. The area increase of the integer unit is about 18% and the clock frequency is not significantly modified in an LEON-2 implementing 6 among 12 of the new instructions.

2020 ◽  
Vol 2020 (10) ◽  
pp. 136-1-136-7
Author(s):  
Daniel J Ringis ◽  
François Pitié ◽  
Anil Kokaram

The majority of internet traffic is video content. This drives the demand for video compression in order to deliver high quality video at low target bitrates. This paper investigates the impact of adjusting the rate distortion equation on compression performance. An constant of proportionality, k, is used to modify the Lagrange multiplier used in H.265 (HEVC). Direct optimisation methods are deployed to maximise BD-Rate improvement for a particular clip. This leads to up to 21% BD-Rate improvement for an individual clip. Furthermore we use a more realistic corpus of material provided by YouTube. The results show that direct optimisation using BD-rate as the objective function can lead to further gains in bitrate savings that are not available with previous approaches.


Author(s):  
Mostafa Rizk ◽  
Amer Baghdadi ◽  
Michel Jézéquel ◽  
Youssef Atat ◽  
Yasser Mohanna

Several application-specific processor design approaches have been proposed and investigated to cope with the emerging flexibility requirements jointly associated with the maximum performance efficiency and minimum implementation area and power consumption. Dynamic scheduling of a set of instructions generally leads to an overhead related to instruction decoding. To mitigate this overhead, other approaches have been proposed using static scheduling of datapath control signals. In this context, No-Instruction-Set-Computer (NISC) concept have been introduced considering that a dedicated processor to a specific application does not need an instruction set especially when it is programmed by its designers and not by its users. In this paper, the hardware architecture design of flexible NISC-based architecture design dedicated for minimum mean-squared error (MMSE) linear detection is presented. The devised design, which is used in iterative turbo-receiver, fulfills the performance requirements of emergent wireless communication standards with throughput reaching that of LTE-Advanced. FPGA hardware implementation of the detector architecture achieves a maximum throughput of 115.8 Mega symbols per second for [Formula: see text] and 6.4 Mega symbols per second for [Formula: see text] MIMO systems for an operating clock frequency of 202.67[Formula: see text]MHz.


10.14311/668 ◽  
2005 ◽  
Vol 45 (1) ◽  
Author(s):  
S. Usama ◽  
M. Montaser ◽  
O. Ahmed

Motion estimation is a method, by which temporal redundancies are reduced, which is an important aspect of video compression algorithms. In this paper we present a comparison among some of the well-known block based motion estimation algorithms. A performance evaluation of these algorithms is proposed to decide the best algorithm from the point of view of complexity and quality for noise-free video sequences and also for noisy video sequences. 


With the rising advancement of the multimedia technology, video compression is becoming a challenging problem. Although, there is availability of various standard compression algorithms, yet robust compression performance is yet to be seen in existing compression techniques. This paper also highlights that machine learning plays a significant contributory role in improving the performance of the video compression. Therefore, this manuscript offers a technical insight about the performance of existing video compression technique using machine learning approach. The contribution of this paper is its findings which states that machine learning approach do have significant advantage but the advantageous features are limited by the inherent and unsolved research problem. The core findings of this paper are basically to highlight the strength and limitations of existing methods as well as to highlight the research gap in terms of open-end research problems which requires immediate attention.


Author(s):  
Sujoy Sinha Roy ◽  
Andrea Basso

In this paper, we present an instruction set coprocessor architecture for lattice-based cryptography and implement the module lattice-based post-quantum key encapsulation mechanism (KEM) Saber as a case study. To achieve fast computation time, the architecture is fully implemented in hardware, including CCA transformations. Since polynomial multiplication plays a performance-critical role in the module and ideal lattice-based public-key cryptography, a parallel polynomial multiplier architecture is proposed that overcomes memory access bottlenecks and results in a highly parallel yet simple and easy-to-scale design. Such multipliers can compute a full multiplication in 256 cycles, but are designed to target any area/performance trade-offs. Besides optimizing polynomial multiplication, we make important design decisions and perform architectural optimizations to reduce the overall cycle counts as well as improve resource utilization. For the module dimension 3 (security comparable to AES-192), the coprocessor computes CCA key generation, encapsulation, and decapsulation in only 5,453, 6,618 and 8,034 cycles respectively, making it the fastest hardware implementation of Saber to our knowledge. On a Xilinx UltraScale+ XCZU9EG-2FFVB1156 FPGA, the entire instruction set coprocessor architecture runs at 250 MHz clock frequency and consumes 23,686 LUTs, 9,805 FFs, and 2 BRAM tiles (including 5,113 LUTs and 3,068 FFs for the Keccak core).


Author(s):  
Prasanga Dhungel ◽  
Prashant Tandan ◽  
Sandesh Bhusal ◽  
Sobit Neupane ◽  
Subarna Shakya

We present a new approach to video compression for video surveillance by refining the shortcomings of conventional approach and substitute each traditional component with their neural network counterpart. Our proposed work consists of motion estimation, compression and compensation and residue compression, learned end-to-end to minimize the rate-distortion trade off. The whole model is jointly optimized using a single loss function. Our work is based on a standard method to exploit the spatio-temporal redundancy in video frames to reduce the bit rate along with the minimization of distortions in decoded frames. We implement a neural network version of conventional video compression approach and encode the redundant frames with lower number of bits. Although, our approach is more concerned toward surveillance, it can be extended easily to general purpose videos too. Experiments show that our technique is efficient and outperforms standard MPEG encoding at comparable bitrates while preserving the visual quality.


Sign in / Sign up

Export Citation Format

Share Document