On the uniqueness of determination of a vocal source from a speech signal and formant frequencies

2012 ◽  
Vol 85 (3) ◽  
pp. 432-435 ◽  
Author(s):  
A. S. Leonov ◽  
V. N. Sorokin
2003 ◽  
Vol 60 (2) ◽  
pp. 155-159 ◽  
Author(s):  
Jovisa Obrenovic ◽  
Milkica Nesic ◽  
Vladimir Nesic ◽  
Snezana Cekic

The influence of intensive acute hypoxia on the frequency-amplitude formant vocal O characteristics was investigated in this study. Examinees were exposed to the simulated altitudes of 5 500 m and 6 700 m in climabaro chamber and resolved Lotig?s test in the conditions of normoxia, i.e. pronounced the three-digit numbers beginning from 900, but in reversed order. Frequency and intensity values of vocal O (F1, F2, F3 and F4) extracted from the context of the pronunciation of the word eight (osam in Serbian), were measured by spectral speech signal analysis. Changes in frequency values and the intensity of the formants were examined. The obtained results showed that there were no significant changes of the formant frequencies in hypoxia condition compared to normoxia. Though significant changes of formant?s intensities were found compared to normoxia on the cited altitudes. The rise of formants intensities was found at the altitude of 5 500 m. Hypoxia at the altitude of 6 700 m caused the significant fall of the intensities in the initial period, compared to normoxia. The prolonged hypoxia exposure caused the rise of the formant intensities compared to the altitude of 5 500 m. In may be concluded that due to different altitudes, hypoxia causes different effects on the formants structure changes, compared to normoxia.


2002 ◽  
Author(s):  
Κωνσταντίνος Παστιάδης

The present doctoral thesis aims towards the development of new analysis techniques for highly disturbed voices such as the pathologic ones. Most of the currently available analysis schemes incorporate methods developed for normal or healthy voicing conditions. However, the existence of severe jitter, shimmer, noise and other forms of disturbances render these methods inappropriate for the analysis of pathological voices. The thesis outline is as follows. In Chapter 1, the most important physiological and morphological elements of speech production are analyzed. Functional alterations of these elements due to pathological origins, lead to the introduction of severe disturbances of glottal induced noise, jitter and shimmer in the radiated speech signal. Under this view, analysis of disordered speech consists of: 1) estimation of the parameters of the Vocal Tract and speech signal inversion into the glottal excitation, 2) derivation of speech and glottal disturbances, and 3) selection of important measures for the description of pathologic voices. In Chapter 2, we address the problem of Vocal Tract filter parameters estimation during noisy glottal function. Exponentially Damped Sinusoids (EDS) analysis is incorporated for this purpose, together with a new hybrid technique (MHOS-DEPE) of Higher-Order Statistics and an improved classical EDS analysis method. The examined methods are tested in parallel on synthetic signals. The MHOS-DEPE method is more efficient for lower SNRs and longer data records. The EDS analysis techniques perform considerably better than classical Linear Prediction. Based on the above framework, a new method for the exact estimation of the Inverse Vocal Tract Filter and Inverse Filtering is introduced (EDS-IF). The new method performs an exhaustive search for the parameters of the Vocal Tract in regions of previously detected formant frequencies. The sub-band analysis and the lower order estimation, incorporated in the proposed method, greatly improve the efficiency of the Inverse Filter construction and the glottal excitation estimates. Chapter 3 deals with the estimation of disturbances in the radiate speech signal. A new method (EDS-SNRio) for the separation of the noise component is proposed. The method exploits the previous EDS analysis schemes and the resulting SNR estimates avoid artifacts of classical methods due to high values of inharmonicity resulting from jitter/shimmer disturbances. The Waveform Matching (WM) technique is adopted for the estimation of objective measures of pitch and amplitude perturbation. Accordingly, Chapter 4 covers the problem of disturbance estimation in the glottal excitation. A denoising algorithm based on Discrete Cosine Transform and EDS analysis is employed on a per-period analysis. The proposed algorithm overcomes SNR estimation problems of classical methods and performs better than a wavelet-based and low-pass filtering denoising. Again, the WM technique is adopted. Additionally, a new family of jitter measures is introduced. These are based on correlation estimation of the fundamental period from energy thresholded portions of consecutive glottal cycles. The new indices may prove useful in the determination of possible non-linear behavior of the disturbance phenomena. Finally, a new spectrographic representation is introduced for the glottal disturbances under the term "Disturbogram". The Disturbogram and its underlying analytical method separate jitter, shimmer and noise in the glottal signal and offer quantitative and qualitative information about them. Its usefulness in the clinical voice evaluation is demonstrated both with synthetic and real voice signals. In Chapter 5, the initial development of the first Greek Voice Pathology database is described. A complete protocol for recording and voice screening procedures is introduced. An initial sample of 50 clinical patients of the University ORL Clinic of the AHEPA Hospital, Thessalonica, is collected and used for further voice analysis. Chapter 6 accounts for conducting acoustic analysis of the recorded pathological voices and selection of important descriptive features. Indeed, the inclusion of both speech and glottal disturbance indices confirms previously published findings about the range of vocal dysfunction. Rank correlation analysis, Principal Component Analysis and Mutual Information are employed for the selection of appropriate indices and determination of independent measures for the description of pathologic voices. Actually, the selected disturbance indices may be grouped into independent axes in such a way that reflects their functional origin (e.g. glottal vs. speech signals) and not their quantitative distinction (e.g. jitter, shimmer, noise, etc.). The Voice Component Profile (VCP) is a new graphic representation of the derived grouping of acoustic measures. VCP proves 15% more efficient in the discrimination of voice polyps from the rest of the recorded pathologies, than the Hoarseness Diagram, for a normalized Euclidean distance measure. Similar findings are obtained for the discrimination of normal and pathologic voices. Finally, Chapter 7 reviews the objectives and findings of the thesis and comments on future research directions.


Sign in / Sign up

Export Citation Format

Share Document