A Backward Compatible Multichannel Audio Compression Method

2013 ◽  
Vol 756-759 ◽  
pp. 977-981
Author(s):  
Xue Fei Gao ◽  
Guo Yang ◽  
Jing Wang ◽  
Xiang Xie ◽  
Jing Ming Kuang

This paper proposes a backward-compatible multichannel audio codec based on downmix and upmix operation. The codec represents a multichannel audio input signal with downmixed mono signal and spatial parametric data. The encoding method consists of three parts: spatial temporal analysis of audio signal, compressing multi-channel audio into mono audio and encoding mono signals. The proposed codec combines high audio quality and low parameter coding rate and the method is simpler and more effective than the conventional methods. With this method, its possible to transmit or store multi-channel audio signals as mono audio signals.

Author(s):  
Kazuhiro Kondo

This chapter proposes two data-hiding algorithms for stereo audio signals. The first algorithm embeds data into a stereo audio signal by adding data-dependent mutual delays to the host stereo audio signal. The second algorithm adds fixed delay echoes with polarities that are data dependent and amplitudes that are adjusted such that the interchannel correlation matches the original signal. The robustness and the quality of the data-embedded audio will be given and compared for both algorithms. Both algorithms were shown to be fairly robust against common distortions, such as added noise, audio coding, and sample rate conversion. The embedded audio quality was shown to be “fair” to “good” for the first algorithm and “good” to “excellent” for the second algorithm, depending on the input source.


2018 ◽  
Vol 2018 ◽  
pp. 1-8 ◽  
Author(s):  
S. E. Tsai ◽  
S. M. Yang

Methods based on discrete cosine transform (DCT) have been proposed for digital watermarking of audio signals; however, the watermark is often vulnerable to data compression and signal processing. This paper presents an effective audio watermarking method by energy averaging of DCT coefficients such that an audio signal with watermark is robust to data processing. The method is to divide an audio signal into segments by three parameters defining the segment length, the segment sequence of watermark location, and the frequency range of DCT coefficients for watermark location. An error correcting code is also integrated to improve audio signal quality after watermarking. Experimental results show that the method is robust to data compression and many other kinds of signal processing. No original signal is required for decoding the watermark. Comparison of watermarking performance with a recent work validates that the watermarking method has better audio quality and higher robustness.


Author(s):  
Valmir Dos Santos Nogueira Junior ◽  
Michel Pompeu Tcheou ◽  
Flávio Rainho Ávila

<p class="Standard">The atomic decomposition of signals by algorithm of the class “Matching Pursuit” (MP) has been applied in audio compression. Literature review suggests that, the use of psychoacoustic criteria allows a more compact representation of the signal, without loss of perceived quality. This work presents the implementation of an analysis system by synthesis of audio signals using MP associated with the use of psychoacoustic global masking threshold, inspired by MPEG layer I, as well as Complex Exponential Dictionaries (DEC). For the compression of the signal, we used the optimization of rate-distortion by operational curves, adjusting the Lagrange multiplier. The performance of the compression method for different types of signals is evaluated by an objective measurement standardized by the International Telecommunications Union (ITU), the PEAQ (Perceptual Evaluation of Audio Quality) based on the bit rate per sample, obtaining satisfactory results.</p>


Author(s):  
Teddy Surya Gunawan ◽  
Mira Kartiwi

<p>In recent years, multichannel audio systems are widely used in modern sound devices as it can provide more realistic and engaging experience to the listener. This paper focuses on the performance evaluation of three lossy, i.e. AAC, Ogg Vorbis, and Opus, and three lossless compression, i.e. FLAC, TrueAudio, and WavPack, for multichannel audio signals, including stereo, 5.1 and 7.1 channels. Experiments were conducted on the same three audio files but with different channel configurations. The performance of each encoder was evaluated based on its encoding time (averaged over 100 times), data reduction, and audio quality. Usually, there is always a trade-off between the three metrics. To simplify the evaluation, a new integrated performance metric was proposed that combines all the three performance metrics. Using the new measure, FLAC was found to be the best lossless compression, while Ogg Vorbis and Opus were found to be the best for lossy compression depends on the channel configuration. This result could be used in determining the proper audio format for multichannel audio systems.</p>


2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Yingjun Dong ◽  
Neil G. MacLaren ◽  
Yiding Cao ◽  
Francis J. Yammarino ◽  
Shelley D. Dionne ◽  
...  

Utterance clustering is one of the actively researched topics in audio signal processing and machine learning. This study aims to improve the performance of utterance clustering by processing multichannel (stereo) audio signals. Processed audio signals were generated by combining left- and right-channel audio signals in a few different ways and then by extracting the embedded features (also called d-vectors) from those processed audio signals. This study applied the Gaussian mixture model for supervised utterance clustering. In the training phase, a parameter-sharing Gaussian mixture model was obtained to train the model for each speaker. In the testing phase, the speaker with the maximum likelihood was selected as the detected speaker. Results of experiments with real audio recordings of multiperson discussion sessions showed that the proposed method that used multichannel audio signals achieved significantly better performance than a conventional method with mono-audio signals in more complicated conditions.


Sensors ◽  
2021 ◽  
Vol 21 (3) ◽  
pp. 676
Author(s):  
Andrej Zgank

Animal activity acoustic monitoring is becoming one of the necessary tools in agriculture, including beekeeping. It can assist in the control of beehives in remote locations. It is possible to classify bee swarm activity from audio signals using such approaches. A deep neural networks IoT-based acoustic swarm classification is proposed in this paper. Audio recordings were obtained from the Open Source Beehive project. Mel-frequency cepstral coefficients features were extracted from the audio signal. The lossless WAV and lossy MP3 audio formats were compared for IoT-based solutions. An analysis was made of the impact of the deep neural network parameters on the classification results. The best overall classification accuracy with uncompressed audio was 94.09%, but MP3 compression degraded the DNN accuracy by over 10%. The evaluation of the proposed deep neural networks IoT-based bee activity acoustic classification showed improved results if compared to the previous hidden Markov models system.


Electronics ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 1349
Author(s):  
Stefan Lattner ◽  
Javier Nistal

Lossy audio codecs compress (and decompress) digital audio streams by removing information that tends to be inaudible in human perception. Under high compression rates, such codecs may introduce a variety of impairments in the audio signal. Many works have tackled the problem of audio enhancement and compression artifact removal using deep-learning techniques. However, only a few works tackle the restoration of heavily compressed audio signals in the musical domain. In such a scenario, there is no unique solution for the restoration of the original signal. Therefore, in this study, we test a stochastic generator of a Generative Adversarial Network (GAN) architecture for this task. Such a stochastic generator, conditioned on highly compressed musical audio signals, could one day generate outputs indistinguishable from high-quality releases. Therefore, the present study may yield insights into more efficient musical data storage and transmission. We train stochastic and deterministic generators on MP3-compressed audio signals with 16, 32, and 64 kbit/s. We perform an extensive evaluation of the different experiments utilizing objective metrics and listening tests. We find that the models can improve the quality of the audio signals over the MP3 versions for 16 and 32 kbit/s and that the stochastic generators are capable of generating outputs that are closer to the original signals than those of the deterministic generators.


Sign in / Sign up

Export Citation Format

Share Document