scholarly journals Some comments on the reliability of three-index factor analysis models in speech research

2005 ◽  
Vol 42 ◽  
pp. 219-239
Author(s):  
Christian Geng ◽  
Phil Hoole

Low- dimensional and speaker-independent linear vocal tract parametrizations can be obtained using the 3-mode PARAFAC factor analysis procedure first introduced by Harshman et al. (1977) and discussed in a series of subsequent papers in the Journal of the Acoustical Society of America (Jackson (1988), Nix et al. (1996), Hoole (1999), Zheng et al. (2003)). Nevertheless, some questions of importance have been left unanswered, e.g. none of the papers using this method has provided a consistent interpretation of the terms usually referred to as "speaker weights". This study attempts an exploration of what influences their reliability as a first step towards their consistent interpretation. With this in mind, we undertook a systematic comparison of the classical PARAFAC1 algorithm with a relaxed version, of it, PARAFAC2. This comparison was carried out on two different corpora acquired by the articulograph, which varied in vowel qualities, consonantal contexts, and the paralinguistic features accent and speech rate. The difference between these statistical approaches can grossly be described as follows: In PARAFAC1, observation units pertain to the same set of variables and the observation units are comparable. In PARAFAC2, observations pertain to the same set of variables, but observation units are not comparable. Such a situation can be easily conceived in a situation such as we are describing: The operationalization we took relies on the comparability of fleshpoint data acquired from different speakers, which need not be a good assumption due to influences like sensor placement and morphological conditions. In particular, the comparison between the two different approaches is carried out by means of so-called "leverages" on different component matrices originating in regression analysis, calculated as v = diag(A(A A)−1A ) and delivering information on how "influential" a particular loading matrix is for the model. This analysis could potentially be carried out component by component, but we confined ourselves to effects on the global factor structure. For vowels, the most influential loadings are those for the tense cognates of non-palatal vowels. For speakers, the most prominent result is the relative absence of effects of the paralinguistic variables. Results generally indicate that there is quite little influence of the model specification (i.e. PARAFAC1 or PARAFAC2) on vowel and subject components. The patterns for the articulators indicate that there are strong differences between speakers with respect to the most influential measurement as revealed by PARAFAC2: In particular, the most influential y-contribution is the tongue-back for some talkers and the tongue-dorsum for other speakers. With respect to the speaker weights, again, the leverage patterns are very similar for both PARAFAC-versions. These patterns converge with the results of the loading plots, where the articulator profiles seem to be most altered by the use of PARAFAC2. These findings, in general, are interpreted as evidence for the reliability of the PARAFAC1 speaker weights.  

Author(s):  
Marianne Pouplier

One of the most fundamental problems in research on spoken language is to understand how the categorical, systemic knowledge that speakers have in the form of a phonological grammar maps onto the continuous, high-dimensional physical speech act that transmits the linguistic message. The invariant units of phonological analysis have no invariant analogue in the signal—any given phoneme can manifest itself in many possible variants, depending on context, speech rate, utterance position and the like, and the acoustic cues for a given phoneme are spread out over time across multiple linguistic units. Speakers and listeners are highly knowledgeable about the lawfully structured variation in the signal and they skillfully exploit articulatory and acoustic trading relations when speaking and perceiving. For the scientific description of spoken language understanding this association between abstract, discrete categories and continuous speech dynamics remains a formidable challenge. Articulatory Phonology and the associated Task Dynamic model present one particular proposal on how to step up to this challenge using the mathematics of dynamical systems with the central insight being that spoken language is fundamentally based on the production and perception of linguistically defined patterns of motion. In Articulatory Phonology, primitive units of phonological representation are called gestures. Gestures are defined based on linear second order differential equations, giving them inherent spatial and temporal specifications. Gestures control the vocal tract at a macroscopic level, harnessing the many degrees of freedom in the vocal tract into low-dimensional control units. Phonology, in this model, thus directly governs the spatial and temporal orchestration of vocal tract actions.


2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Susanne A. Elsner ◽  
Sam S. Salek ◽  
Andrew Y. Finlay ◽  
Anna Hagemeier ◽  
Catherine J. Bottomley ◽  
...  

Abstract Background The Family Reported Outcome Measure (FROM-16) assesses the impact of a patient’s chronic illness on the quality of life (QoL) of the patient’s partner or family members. The aim of the study was to translate, explore the structure of and validate the FROM-16. Methods The questionnaire was translated from English into German (forward, backward, four independent translators). Six interviews with family members were conducted to confirm the questionnaire for linguistic, conceptual, semantic and experiential equivalence and its practicability. The final German translation was tested for internal consistency, reproducibility and test validity. Criterion validity was tested by correlating the scores of the FROM-16 and the Global Health Scale (GHS). Principal component analysis, factor analysis, and confirmatory factor analysis was used to assess the questionnaire’s structure and its domains. Reliability and reproducibility were tested computing the intraclass correlation coefficient (ICC) using one sample t-test for testing the hypothesis that the difference between the scores was not different from zero. Results Overall, 83 family members (61% female, median age: 61 years) completed the questionnaire at two different times (mean interval: 22 days). Internal consistency was good for the FROM-16 scores (Cronbach’s α for total score = 0.86). In those with stable GHS, the ICC for the total score was 0.87 and the difference was not different from zero (p = 0.262) indicating reproducible results. A bi-factor model with a general factor including all items, and two sub-factors comprising the items from the original 2-factor construct had the best fit. Conclusions The German FROM-16 has good reliability, test validity and practicability. It can be considered as an appropriate and generic tool to measure QoL of a patient’s partner or family member. Due to the presence of several cross-loadings we do not recommend the reporting of the scores of the two domains proposed for the original version of FROM-16 when using the German version. Thus, in reporting the results emphasis should be put on the total score. Trial registration: Retrospectively registered: DRKS00021070.


2015 ◽  
Vol 37 (5) ◽  
pp. 1201-1220 ◽  
Author(s):  
BENJAMIN G. SCHULTZ ◽  
IRENA O’BRIEN ◽  
NATALIE PHILLIPS ◽  
DAVID H. McFARLAND ◽  
DEBRA TITONE ◽  
...  

ABSTRACTWhen speakers engage in conversation, acoustic features of their utterances sometimes converge. We examined how the speech rate of participants changed when a confederate spoke at fast or slow rates during readings of scripted dialogues. A beat-tracking algorithm extracted the periodic relations between stressed syllables (beats) from acoustic recordings. The mean interbeat interval (IBI) between successive stressed syllables was compared across speech rates. Participants’ IBIs were smaller in the fast condition than in the slow condition; the difference between participants’ and the confederate's IBIs decreased across utterances. Cross-correlational analyses demonstrated mutual influences between speakers, with greater impact of the confederate on participants’ beat rates than vice versa. Beat rates converged in scripted conversations, suggesting speakers mutually entrain to one another's beat.


2018 ◽  
Vol 8 (1) ◽  
pp. 69-85 ◽  
Author(s):  
Hemanta Kumar Palo ◽  
Mihir Narayan Mohanty ◽  
Mahesh Chandra

The shape, length, and size of the vocal tract and vocal folds vary with the age of the human being. The variation may be of different age or sickness or some other conditions. Arguably, the features extracted from the utterances for the recognition task may differ for different age group. It complicates further for different emotions. The recognition system demands suitable feature extraction and clustering techniques that can separate their emotional utterances. Psychologists, criminal investigators, professional counselors, law enforcement agencies and a host of other such entities may find such analysis useful. In this article, the emotion study has been evaluated for three different age groups of people using the basic age- dependent features like pitch, speech rate, and log energy. The feature sets have been clustered for different age groups by utilizing K-means and Fuzzy c-means (FCM) algorithm for the boredom, sadness, and anger states. K-means algorithm has outperformed the FCM algorithm in terms of better clustering and lower computation time as the authors' results suggest.


Author(s):  
Jay Ryan U. Roldan ◽  
Dejan Milutinović ◽  
Zhi Li ◽  
Jacob Rosen

In this paper, we propose a quantitative approach based on identifying hand trajectory dissimilarities through the use of a multidimensional scaling (MDS) analysis. A high-rate motion capture system is used to gather three-dimensional (3D) trajectory data of healthy and stroke-impacted hemiparetic subjects. The mutual dissimilarity between any two trajectories is measured by the area between them. This area is used as a dissimilarity variable to create an MDS map. The map reveals a structure for measuring the difference and variability of individual trajectories and their groups. The results suggest that the recovery of hemiparetic subjects can be quantified by comparing the difference and variability of their individual MDS map points to the points from the cluster of healthy subject trajectories. Within the MDS map, we can identify fully recovered patients, those who are only functionally recovered, and those who are either in an early phase of, or are nonresponsive to the therapy.


2021 ◽  
Vol 10 (10(6)) ◽  
pp. 1741-1757
Author(s):  
Nkululeko Funyane

This study sought to assess if the importance attached by customers to the airline service attributes differed across low-cost and full-service airline models. A Mann-Whitney U Test was used to assess the difference between the two models. However, before subjecting the data to differential tests, an exploratory factor analysis (maximum likelihood) was performed on the fifty-five items of service attributes, reducing them into forty-two items retained into ten latent factors (airline service attributes). The results of the revealed a significant difference in the importance attached to staff competence, courtesy and responsiveness only. Such findings suggest that the positioning of airlines into binary (FSC - LCC) models could be a waste of effort and resources since airlines seem to be converging.


Sign in / Sign up

Export Citation Format

Share Document