Perceptual evaluation of audio quality under lossy networks

Voice over Internet Protocol (VoIP) systems have been spreading massively during the recent years. However, many challenges are still facing this technology among which is the lossy behavior and the uncontrolled network impairments of the Internet. In this chapter, the authors design and implement a VoIP test-bed utilizing the Adobe Real-Time Media Flow Protocol (RTMFP) that can be used for many voice interactive applications. The test-bed was used to study the effect of changing some voice parameters, mainly the encoding rate and the number of frames per packet as function of the network packet loss. Several experiments were conducted on several voice files over different packet losses, concluding in the best combination of parameters in low, moderate, and high packet loss conditions to improve the performance of voice packets measured by the Perceptual Evaluation of Speech Quality (PESQ) values.

Download Full-text

A robust audio watermarking technique based on the perceptual evaluation of audio quality algorithm in the multiresolution domain

The 10th IEEE International Symposium on Signal Processing and Information Technology ◽

10.1109/isspit.2010.5711803 ◽

2010 ◽

Cited By ~ 3

Author(s):

Masmoudi Salma ◽

Charfeddine Maha ◽

Ben Amar Chokri

Keyword(s):

Audio Watermarking ◽

Audio Quality ◽

Perceptual Evaluation

Download Full-text

Audio quality in lossy networks for media-specific forward error correction schemes

International Journal of Communication Systems ◽

10.1002/dac.2361 ◽

2012 ◽

Vol 27 (2) ◽

pp. 289-302 ◽

Cited By ~ 5

Author(s):

A. Inoie

Keyword(s):

Error Correction ◽

Forward Error Correction ◽

Lossy Networks ◽

Audio Quality ◽

Forward Error

Download Full-text

Codificação perceptiva de áudio por meio de decomposições atômicas em exponenciais complexas

Revista Principia - Divulgação Científica e Tecnológica do IFPB ◽

10.18265/1517-03062015v1n46p196-212 ◽

2019 ◽

Vol 1 (46) ◽

pp. 196

Author(s):

Valmir Dos Santos Nogueira Junior ◽

Michel Pompeu Tcheou ◽

Flávio Rainho Ávila

Keyword(s):

Matching Pursuit ◽

Rate Distortion ◽

Compact Representation ◽

Audio Signals ◽

Audio Compression ◽

Audio Quality ◽

Perceptual Evaluation ◽

Layer I ◽

Analysis System ◽

Complex Exponential

<p class="Standard">The atomic decomposition of signals by algorithm of the class “Matching Pursuit” (MP) has been applied in audio compression. Literature review suggests that, the use of psychoacoustic criteria allows a more compact representation of the signal, without loss of perceived quality. This work presents the implementation of an analysis system by synthesis of audio signals using MP associated with the use of psychoacoustic global masking threshold, inspired by MPEG layer I, as well as Complex Exponential Dictionaries (DEC). For the compression of the signal, we used the optimization of rate-distortion by operational curves, adjusting the Lagrange multiplier. The performance of the compression method for different types of signals is evaluated by an objective measurement standardized by the International Telecommunications Union (ITU), the PEAQ (Perceptual Evaluation of Audio Quality) based on the bit rate per sample, obtaining satisfactory results.</p>

Download Full-text

A Dynamic Spectrum Access Method for IBOC Broadcasting Based on the Ear Perception

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.816-817.839 ◽

2013 ◽

Vol 816-817 ◽

pp. 839-842 ◽

Cited By ~ 1

Author(s):

Wei Jiao ◽

Gang Yang ◽

Wei Wei Fang

Keyword(s):

Dynamic Spectrum Access ◽

Digital Signal ◽

Dynamic Spectrum ◽

Spectrum Access ◽

Time Dimension ◽

Access Method ◽

Audio Quality ◽

Perceptual Evaluation ◽

Digital Broadcasting ◽

Hd Radio

In-Band On-Channel (IBOC) as an AM&FM digital broadcasting technology is used in HD Radio standard. In Hybrid of HD Radio, the spectra of analog FM signal and digital signal are combined in a fixed way, without taking full advantage of spectra. To solve this problem, this paper presents a Dynamic Spectrum Access (DSA) method, using the idle spectrum on time dimension. And the improved method brings in a Perceptual Evaluation of Audio Quality (PEAQ) module based on the ear perception as the evaluation criterion to measure audio quality. The simulation results show that on the basis of insuring audio quality, this method can save a lot of spectral bandwidth to improve the available spectrum resources.

Download Full-text

Impulsive spike enhancement on gamelan audio using harmonic percussive separation

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v9i3.pp1700-1710 ◽

2019 ◽

Vol 9 (3) ◽

pp. 1700

Author(s):

Solekhan Solekhan ◽

Yoyon K. Suprapto ◽

Wirawan Wirawan

Keyword(s):

Source Separation ◽

New Method ◽

Average Error ◽

Distance Method ◽

Audio Quality ◽

Average Value ◽

Perceptual Evaluation ◽

Harmonic Components ◽

Cosine Distance ◽

Average Cosine

Impulsive spikes often occur in audio recording of gamelan where most existing methods reduce it. This research offers new method to enhance audio impulsive spike in gamelan music that is able to reduce, eliminate and even strengthen spikes. The process separates audio components into harmonics and percussive components. Percussion component is set to rise or lowered, and the results of the process combined with harmonic components again. This study proposes a new method that allows reducing, eliminating and even amplifying the spike. From the similarity test using the Cosine Distance method, it is seen that spike enhancement through Harmonic Percussive Source Separation (HPSS) has an average Cosine Distance value of 0.0004 or similar to its original, while Mean Square Error (MSE) has an average value of 0.0004 that is very small in average error and also very similar. From the Perceptual Evaluation of Audio Quality (PEAQ) testing with Harmonic Percussive Source Separation (HPSS), it has a better quality with an average Objective Difference Grade (ODG) of -0.24 or Imperceptible.

Download Full-text

Effects of a 6-Week Straw Phonation in Water Exercise Program on the Aging Voice

Journal of Speech Language and Hearing Research ◽

10.1044/2020_jslhr-19-00124 ◽

2020 ◽

Vol 63 (4) ◽

pp. 1018-1032

Author(s):

Chia-Hsin Wu ◽

Roger W. Chan

Keyword(s):

Acoustic Analysis ◽

Vocal Tract ◽

Exercise Program ◽

Analysis Of Covariance ◽

Elderly Subjects ◽

Control Group ◽

Perceptual Evaluation ◽

Positive Effects ◽

Aging Voice ◽

Before And After

Purpose Semi-occluded vocal tract (SOVT) exercises with tubes or straws have been widely used for a variety of voice disorders. Yet, the effects of longer periods of SOVT exercises (lasting for weeks) on the aging voice are not well understood. This study investigated the effects of a 6-week straw phonation in water (SPW) exercise program. Method Thirty-seven elderly subjects with self-perceived voice problems were assigned into two groups: (a) SPW exercises with six weekly sessions and home practice (experimental group) and (b) vocal hygiene education (control group). Before and after intervention (2 weeks after the completion of the exercise program), acoustic analysis, auditory–perceptual evaluation, and self-assessment of vocal impairment were conducted. Results Analysis of covariance revealed significant differences between the two groups in smoothed cepstral peak prominence measures, harmonics-to-noise ratio, the auditory–perceptual parameter of breathiness, and Voice Handicap Index-10 scores postintervention. No significant differences between the two groups were found for other measures. Conclusions Our results supported the positive effects of SOVT exercises for the aging voice, with a 6-week SPW exercise program being a clinical option. Future studies should involve long-term follow-up and additional outcome measures to better understand the efficacy of SOVT exercises, particularly SPW exercises, for the aging voice.

Download Full-text

Perceptual Evaluation of Speech Naturalness in Speakers of Varying Gender Identities

Journal of Speech Language and Hearing Research ◽

10.1044/2020_jslhr-19-00337 ◽

2020 ◽

Vol 63 (7) ◽

pp. 2054-2069

Author(s):

Brandon Merritt ◽

Tessa Bent

Keyword(s):

Spontaneous Speech ◽

Identification Accuracy ◽

Rating Task ◽

Gender Identification ◽

Identification Task ◽

Male And Female ◽

Perceptual Evaluation ◽

Speech Training ◽

And Gender ◽

Speech Naturalness

Purpose The purpose of this study was to investigate how speech naturalness relates to masculinity–femininity and gender identification (accuracy and reaction time) for cisgender male and female speakers as well as transmasculine and transfeminine speakers. Method Stimuli included spontaneous speech samples from 20 speakers who are transgender (10 transmasculine and 10 transfeminine) and 20 speakers who are cisgender (10 male and 10 female). Fifty-two listeners completed three tasks: a two-alternative forced-choice gender identification task, a speech naturalness rating task, and a masculinity/femininity rating task. Results Transfeminine and transmasculine speakers were rated as significantly less natural sounding than cisgender speakers. Speakers rated as less natural took longer to identify and were identified less accurately in the gender identification task; furthermore, they were rated as less prototypically masculine/feminine. Conclusions Perceptual speech naturalness for both transfeminine and transmasculine speakers is strongly associated with gender cues in spontaneous speech. Training to align a speaker's voice with their gender identity may concurrently improve perceptual speech naturalness. Supplemental Material https://doi.org/10.23641/asha.12543158

Download Full-text

Cultural and Linguistic Adaptation of the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V) Into Hindi

Journal of Speech Language and Hearing Research ◽

10.1044/2020_jslhr-20-00348 ◽

2020 ◽

Vol 63 (12) ◽

pp. 3974-3981

Author(s):

Ashwini Joshi ◽

Isha Baheti ◽

Vrushali Angadi

Keyword(s):

Strong Correlation ◽

Concurrent Validity ◽

Interrater Reliability ◽

Voice Quality ◽

Weak Correlation ◽

Voice Assessment ◽

Perceptual Evaluation ◽

Severity Grade ◽

Normal Voice ◽

Group A

Aim The purpose of this study was to develop and assess the reliability of a Hindi version of the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V). Reliability was assessed by comparing Hindi CAPE-V ratings with English CAPE-V ratings and by the Grade, Roughness, Breathiness, Asthenia and Strain (GRBAS) scale. Method Hindi sentences were created to match the phonemic load of the corresponding English CAPE-V sentences. The Hindi sentences were adapted for linguistic content. The original English and adapted Hindi CAPE-V and GRBAS were completed for 33 bilingual individuals with normal voice quality. Additionally, the Hindi CAPE-V and GRBAS were completed for 13 Hindi speakers with disordered voice quality. The agreement of CAPE-V ratings was assessed between language versions, GRBAS ratings, and two rater pairs (three raters in total). Pearson product–moment correlation was completed for all comparisons. Results A strong correlation ( r > .8, p < .01) was found between the Hindi CAPE-V scores and the English CAPE-V scores for most variables in normal voice participants. A weak correlation was found for the variable of strain ( r < .2, p = .400) in the normative group. A strong correlation ( r > .6, p < .01) was found between the overall severity/grade, roughness, and breathiness scores in the GRBAS scale and the CAPE-V scale in normal and disordered voice samples. Significant interrater reliability ( r > .75) was present in overall severity and breathiness. Conclusions The Hindi version of the CAPE-V demonstrates good interrater reliability and concurrent validity with the English CAPE-V and the GRBAS. The Hindi CAPE-V can be used for the auditory-perceptual voice assessment of Hindi speakers.

Download Full-text

Perceptual evaluation of audio quality under lossy networks

Perceptual evaluation of audio quality over frequency selective fading channel

Quality Analysis of VoIP in Real-Time Interactive Systems over Lossy Networks

A robust audio watermarking technique based on the perceptual evaluation of audio quality algorithm in the multiresolution domain

Audio quality in lossy networks for media-specific forward error correction schemes

Codificação perceptiva de áudio por meio de decomposições atômicas em exponenciais complexas

A Dynamic Spectrum Access Method for IBOC Broadcasting Based on the Ear Perception

Impulsive spike enhancement on gamelan audio using harmonic percussive separation

Effects of a 6-Week Straw Phonation in Water Exercise Program on the Aging Voice

Perceptual Evaluation of Speech Naturalness in Speakers of Varying Gender Identities

Cultural and Linguistic Adaptation of the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V) Into Hindi

Export Citation Format