Beyond VVC: Towards Perceptual Quality Optimized Video Compression Using Multi-Scale Hybrid Approaches

Author(s):  
Zhimeng Huang ◽  
Kai Lin ◽  
Chuanmin Jia ◽  
Shanshe Wang ◽  
Siwei Ma
Author(s):  
Abderrahim Bajit

Region of interest (ROI) image and video compression techniques have been widely used in visual communication applications in an effort to deliver good quality images and videos at limited bandwidths. Foveated imaging exploits the fact that the spatial resolution of the human visual system (HVS) is highest around the point of fixation (foveation point) and decreases dramatically with increasing eccentricity. Exploiting this fact, the authors have developed an appropriate metric for the assessment of ROI coded images, adapted to foveation image coding based on psycho-visual quality optimization tools, which objectively enable us to assess the visual quality measurement with respect to the region of interest (ROI) of the human observer. The proposed metric yields a quality factor called foveation probability score (FPS) that correlates well with visual error perception and demonstrating very good perceptual quality evaluation.


Sensors ◽  
2021 ◽  
Vol 21 (10) ◽  
pp. 3351
Author(s):  
Yooho Lee ◽  
Dongsan Jun ◽  
Byung-Gyu Kim ◽  
Hunjoo Lee

Super resolution (SR) enables to generate a high-resolution (HR) image from one or more low-resolution (LR) images. Since a variety of CNN models have been recently studied in the areas of computer vision, these approaches have been combined with SR in order to provide higher image restoration. In this paper, we propose a lightweight CNN-based SR method, named multi-scale channel dense network (MCDN). In order to design the proposed network, we extracted the training images from the DIVerse 2K (DIV2K) dataset and investigated the trade-off between the SR accuracy and the network complexity. The experimental results show that the proposed method can significantly reduce the network complexity, such as the number of network parameters and total memory capacity, while maintaining slightly better or similar perceptual quality compared to the previous methods.


Author(s):  
Y. Zang ◽  
B. Yang

3D laser technology is widely used to collocate the surface information of object. For various applications, we need to extract a good perceptual quality point cloud from the scanned points. To solve the problem, most of existing methods extract important points based on a fixed scale. However, geometric features of 3D object come from various geometric scales. We propose a multi-scale construction method based on radial basis function. For each scale, important points are extracted from the point cloud based on their importance. We apply a perception metric Just-Noticeable-Difference to measure degradation of each geometric scale. Finally, scale-adaptive optimal information extraction is realized. Experiments are undertaken to evaluate the effective of the proposed method, suggesting a reliable solution for optimal information extraction of object.


Author(s):  
G. Megala, Et. al.

Video compression plays a vital role in the modern social media networking with plethora of multimedia applications. It empowers transmission medium to competently transfer videos and enable resources to store the video efficiently. Nowadays high-resolution video data are transferred through the communication channel having high bit rate in order to send multiple compressed videos. There are many advances in transmission ability, efficient storage ways of these compressed video where compression is the primary task involved in multimedia services. This paper summarizes the compression standards, describes the main concepts involved in video coding. Video compression performs conversion of large raw bits of video sequence into a small compact one, achieving high compression ratio with good video perceptual quality. Removing redundant information is the main task in the video sequence compression. A survey on various block matching algorithms, quantization and entropy coding are focused. It is found that many of the methods having computational complexities needs improvement with optimization.


Author(s):  
Menglu Wang ◽  
Xueyang Fu ◽  
Zepei Sun ◽  
Zheng-Jun Zha

Existing deep learning-based image de-blocking methods use only pixel-level loss functions to guide network training. The JPEG compression factor, which reflects the degradation degree, has not been fully utilized. However, due to the non-differentiability, the compression factor cannot be directly utilized to train deep networks. To solve this problem, we propose compression quality ranker-guided networks for this specific JPEG artifacts removal. We first design a quality ranker to measure the compression degree, which is highly correlated with the JPEG quality. Based on this differentiable ranker, we then propose one quality-related loss and one feature matching loss to guide de-blocking and perceptual quality optimization. In addition, we utilize dilated convolutions to extract multi-scale features, which enables our single model to handle multiple compression quality factors. Our method can implicitly use the information contained in the compression factors to produce better results. Experiments demonstrate that our model can achieve comparable or even better performance in both quantitative and qualitative measurements.


2008 ◽  
pp. 1441-1455
Author(s):  
Andrea Cavallaro ◽  
Stefan Winkler

The design of image and video compression or transmission systems is driven by the need for reducing the bandwidth and storage requirements of the content while maintaining its visual quality. Therefore, the objective is to define codecs that maximize perceived quality as well as automated metrics that reliably measure perceived quality. One of the common shortcomings of traditional video coders and quality metrics is the fact that they treat the entire scene uniformly, assuming that people look at every pixel of the image or video. In reality, we focus only on particular areas of the scene. In this chapter, we prioritize the visual data accordingly in order to improve the compression performance of video coders and the prediction performance of perceptual quality metrics. The proposed encoder and quality metric incorporate visual attention and use a semantic segmentation stage, which takes into account certain aspects of the cognitive behavior of people when watching a video. This semantic model corresponds to a specific human abstraction, which need not necessarily be characterized by perceptual uniformity. In particular, we concentrate on segmenting moving objects and faces, and we evaluate the perceptual impact on video coding and on quality evaluation.


Author(s):  
Andrea Cavallaro ◽  
Stefan Winkler

The design of image and video compression or transmission systems is driven by the need for reducing the bandwidth and storage requirements of the content while maintaining its visual quality. Therefore, the objective is to define codecs that maximize perceived quality as well as automated metrics that reliably measure perceived quality. One of the common shortcomings of traditional video coders and quality metrics is the fact that they treat the entire scene uniformly, assuming that people look at every pixel of the image or video. In reality, we focus only on particular areas of the scene. In this chapter, we prioritize the visual data accordingly in order to improve the compression performance of video coders and the prediction performance of perceptual quality metrics. The proposed encoder and quality metric incorporate visual attention and use a semantic segmentation stage, which takes into account certain aspects of the cognitive behavior of people when watching a video. This semantic model corresponds to a specific human abstraction, which need not necessarily be characterized by perceptual uniformity. In particular, we concentrate on segmenting moving objects and faces, and we evaluate the perceptual impact on video coding and on quality evaluation.


Author(s):  
Rajib Kumar Jha ◽  
Rajlaxmi Chouhan ◽  
Kiyoharu Aizawa ◽  
Prabir Kumar Biswas

A novel technique based on dynamic stochastic resonance (DSR) in discrete cosine transform (DCT) domain has been proposed in this paper for the enhancement of dark as well as low-contrast images. In conventional DSR-based techniques, the performance of a system can be improved by addition of external noise. However, in the proposed DSR-based work, the intrinsic noise of an image has been utilized to create a noise-induced transition of a dark image to a state of good contrast. The proposed technique significantly enhances the image contrast and color information without losing any image or color data by optimization of bistable system parameters. The performance of the proposed methodology has been measured in terms of relative contrast enhancement factor, perceptual quality measure, and color enhancement factor. When compared with the existing enhancement techniques, such as adaptive histogram equalization, gamma correction, single-scale retinex, multi-scale retinex, modified high-pass filtering, multicontrast enhancement with dynamic range compression, color enhancement by scaling, edge-preserving multi-scale decomposition, automatic control of imaging tool, and various spatial/frequency-domain SR-based techniques, the proposed technique gives remarkable performance in terms of contrast and color enhancement while ascertaining good perceptual quality.


Sign in / Sign up

Export Citation Format

Share Document