Perceptual evaluation of audio quality under lossy networks

Author(s):  
Ala F. Khalifeh ◽  
Abdel-Karim Al-Tamimi ◽  
Khalid A. Darabkh
Author(s):  
Maha Z. Mouasher ◽  
Ala' F. Khalifeh

Voice over Internet Protocol (VoIP) systems have been spreading massively during the recent years. However, many challenges are still facing this technology among which is the lossy behavior and the uncontrolled network impairments of the Internet. In this chapter, the authors design and implement a VoIP test-bed utilizing the Adobe Real-Time Media Flow Protocol (RTMFP) that can be used for many voice interactive applications. The test-bed was used to study the effect of changing some voice parameters, mainly the encoding rate and the number of frames per packet as function of the network packet loss. Several experiments were conducted on several voice files over different packet losses, concluding in the best combination of parameters in low, moderate, and high packet loss conditions to improve the performance of voice packets measured by the Perceptual Evaluation of Speech Quality (PESQ) values.


Author(s):  
Valmir Dos Santos Nogueira Junior ◽  
Michel Pompeu Tcheou ◽  
Flávio Rainho Ávila

<p class="Standard">The atomic decomposition of signals by algorithm of the class “Matching Pursuit” (MP) has been applied in audio compression. Literature review suggests that, the use of psychoacoustic criteria allows a more compact representation of the signal, without loss of perceived quality. This work presents the implementation of an analysis system by synthesis of audio signals using MP associated with the use of psychoacoustic global masking threshold, inspired by MPEG layer I, as well as Complex Exponential Dictionaries (DEC). For the compression of the signal, we used the optimization of rate-distortion by operational curves, adjusting the Lagrange multiplier. The performance of the compression method for different types of signals is evaluated by an objective measurement standardized by the International Telecommunications Union (ITU), the PEAQ (Perceptual Evaluation of Audio Quality) based on the bit rate per sample, obtaining satisfactory results.</p>


2013 ◽  
Vol 816-817 ◽  
pp. 839-842 ◽  
Author(s):  
Wei Jiao ◽  
Gang Yang ◽  
Wei Wei Fang

In-Band On-Channel (IBOC) as an AM&FM digital broadcasting technology is used in HD Radio standard. In Hybrid of HD Radio, the spectra of analog FM signal and digital signal are combined in a fixed way, without taking full advantage of spectra. To solve this problem, this paper presents a Dynamic Spectrum Access (DSA) method, using the idle spectrum on time dimension. And the improved method brings in a Perceptual Evaluation of Audio Quality (PEAQ) module based on the ear perception as the evaluation criterion to measure audio quality. The simulation results show that on the basis of insuring audio quality, this method can save a lot of spectral bandwidth to improve the available spectrum resources.


Author(s):  
Solekhan Solekhan ◽  
Yoyon K. Suprapto ◽  
Wirawan Wirawan

Impulsive spikes often occur in audio recording of gamelan where most existing methods reduce it. This research offers new method to enhance audio impulsive spike in gamelan music that is able to reduce, eliminate and even strengthen spikes. The process separates audio components into harmonics and percussive components. Percussion component is set to rise or lowered, and the results of the process combined with harmonic components again. This study proposes a new method that allows reducing, eliminating and even amplifying the spike. From the similarity test using the Cosine Distance method, it is seen that spike enhancement through Harmonic Percussive Source Separation (HPSS) has an average Cosine Distance value of 0.0004 or similar to its original, while Mean Square Error (MSE) has an average value of 0.0004 that is very small in average error and also very similar. From the Perceptual Evaluation of Audio Quality (PEAQ) testing with Harmonic Percussive Source Separation (HPSS), it has a better quality with an average Objective Difference Grade (ODG) of -0.24 or Imperceptible.


2020 ◽  
Vol 63 (4) ◽  
pp. 1018-1032
Author(s):  
Chia-Hsin Wu ◽  
Roger W. Chan

Purpose Semi-occluded vocal tract (SOVT) exercises with tubes or straws have been widely used for a variety of voice disorders. Yet, the effects of longer periods of SOVT exercises (lasting for weeks) on the aging voice are not well understood. This study investigated the effects of a 6-week straw phonation in water (SPW) exercise program. Method Thirty-seven elderly subjects with self-perceived voice problems were assigned into two groups: (a) SPW exercises with six weekly sessions and home practice (experimental group) and (b) vocal hygiene education (control group). Before and after intervention (2 weeks after the completion of the exercise program), acoustic analysis, auditory–perceptual evaluation, and self-assessment of vocal impairment were conducted. Results Analysis of covariance revealed significant differences between the two groups in smoothed cepstral peak prominence measures, harmonics-to-noise ratio, the auditory–perceptual parameter of breathiness, and Voice Handicap Index-10 scores postintervention. No significant differences between the two groups were found for other measures. Conclusions Our results supported the positive effects of SOVT exercises for the aging voice, with a 6-week SPW exercise program being a clinical option. Future studies should involve long-term follow-up and additional outcome measures to better understand the efficacy of SOVT exercises, particularly SPW exercises, for the aging voice.


2020 ◽  
Vol 63 (7) ◽  
pp. 2054-2069
Author(s):  
Brandon Merritt ◽  
Tessa Bent

Purpose The purpose of this study was to investigate how speech naturalness relates to masculinity–femininity and gender identification (accuracy and reaction time) for cisgender male and female speakers as well as transmasculine and transfeminine speakers. Method Stimuli included spontaneous speech samples from 20 speakers who are transgender (10 transmasculine and 10 transfeminine) and 20 speakers who are cisgender (10 male and 10 female). Fifty-two listeners completed three tasks: a two-alternative forced-choice gender identification task, a speech naturalness rating task, and a masculinity/femininity rating task. Results Transfeminine and transmasculine speakers were rated as significantly less natural sounding than cisgender speakers. Speakers rated as less natural took longer to identify and were identified less accurately in the gender identification task; furthermore, they were rated as less prototypically masculine/feminine. Conclusions Perceptual speech naturalness for both transfeminine and transmasculine speakers is strongly associated with gender cues in spontaneous speech. Training to align a speaker's voice with their gender identity may concurrently improve perceptual speech naturalness. Supplemental Material https://doi.org/10.23641/asha.12543158


2020 ◽  
Vol 63 (12) ◽  
pp. 3974-3981
Author(s):  
Ashwini Joshi ◽  
Isha Baheti ◽  
Vrushali Angadi

Aim The purpose of this study was to develop and assess the reliability of a Hindi version of the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V). Reliability was assessed by comparing Hindi CAPE-V ratings with English CAPE-V ratings and by the Grade, Roughness, Breathiness, Asthenia and Strain (GRBAS) scale. Method Hindi sentences were created to match the phonemic load of the corresponding English CAPE-V sentences. The Hindi sentences were adapted for linguistic content. The original English and adapted Hindi CAPE-V and GRBAS were completed for 33 bilingual individuals with normal voice quality. Additionally, the Hindi CAPE-V and GRBAS were completed for 13 Hindi speakers with disordered voice quality. The agreement of CAPE-V ratings was assessed between language versions, GRBAS ratings, and two rater pairs (three raters in total). Pearson product–moment correlation was completed for all comparisons. Results A strong correlation ( r > .8, p < .01) was found between the Hindi CAPE-V scores and the English CAPE-V scores for most variables in normal voice participants. A weak correlation was found for the variable of strain ( r < .2, p = .400) in the normative group. A strong correlation ( r > .6, p < .01) was found between the overall severity/grade, roughness, and breathiness scores in the GRBAS scale and the CAPE-V scale in normal and disordered voice samples. Significant interrater reliability ( r > .75) was present in overall severity and breathiness. Conclusions The Hindi version of the CAPE-V demonstrates good interrater reliability and concurrent validity with the English CAPE-V and the GRBAS. The Hindi CAPE-V can be used for the auditory-perceptual voice assessment of Hindi speakers.


Sign in / Sign up

Export Citation Format

Share Document