On cross-language experiments and data-driven units for ALISP (Automatic Language Independent Speech Processing)

Author(s):  
A. Constantinescu ◽  
G. Chollet
2018 ◽  
Author(s):  
Jonathan Henry Venezia ◽  
Steven Matthew Thurman ◽  
Virginia Richards ◽  
Gregory Hickok

Existing data indicate that cortical speech processing is hierarchically organized. Numerous studies have shown that early auditory areas encode fine acoustic details while later areas encode abstracted speech patterns. However, it remains unclear precisely what speech information is encoded across these hierarchical levels. Estimation of speech-driven spectrotemporal receptive fields (STRFs) provides a means to explore cortical speech processing in terms of acoustic or linguistic information associated with characteristic spectrotemporal patterns. Here, we estimate STRFs from cortical responses to continuous speech in fMRI. Using a novel approach based on filtering randomly-selected spectrotemporal modulations (STMs) from aurally-presented sentences, STRFs were estimated for a group of listeners and categorized using a data-driven clustering algorithm. ‘Behavioral STRFs’ highlighting STMs crucial for speech recognition were derived from intelligibility judgments. Clustering revealed that STRFs in the supratemporal plane represented a broad range of STMs, while STRFs in the lateral temporal lobe represented circumscribed STM patterns important to intelligibility. Detailed analysis recovered a bilateral organization with posterior-lateral regions preferentially processing STMs associated with phonological information and anterior-lateral regions preferentially processing STMs associated with word- and phrase-level information. Regions in lateral Heschl’s gyrus preferentially processed STMs associated with vocalic information (pitch).


2019 ◽  
Vol 63 (2) ◽  
pp. 242-263
Author(s):  
Leona Polyanskaya ◽  
Maria Grazia Busà ◽  
Mikhail Ordin

We tested the hypothesis that languages can be classified by their degree of tonal rhythm (Jun, 2014). The tonal rhythms of English and Italian were quantified using the following parameters: (a) regularity of tonal alternations in time, measured as durational variability in peak-to-peak and valley-to-valley intervals; (b) magnitude of F0 excursions, measured as the range of frequencies covered by the speaker between consecutive F0 maxima and minima; (c) number of tonal target points per intonational unit; and (d) similarity of F0 rising and falling contours within intonational units. The results show that, as predicted by Jun’s prosodic typology (2014), Italian has a stronger tonal rhythm than English, expressed by higher regularity in the distribution of F0 minima turning points, larger F0 excursions, and more frequent tonal targets, indicating alternating phonological H and L tones. This cross-language difference can be explained by the relative load of F0 and durational ratios on the perception and production of speech rhythm and prominence. We suggest that research on the role of speech rhythm in speech processing and language acquisition should not be restricted to syllabic rhythm, but should also examine the role of cross-language differences in tonal rhythm.


2019 ◽  
Author(s):  
Sophie Bouton ◽  
Valerian Chambon ◽  
Narly Golestani ◽  
Anne-Lise Giraud

Brain functions are ever more explored using data-driven methods, which allow to work with very large datasets collected in relatively natural experimental settings. However, like hypothesis-driven approaches, data-driven methods do not come without drawbacks, and pose interpretation problems, particularly in cognitive domains such as speech and language, where temporal processing is a key component. While hypothesis-driven methods explicitly address speech processing as a hierarchical system, data-driven approaches probe speech processing as a system that can flexibly combine multiple and distributed features. Given the disparity of available methods and underlying concepts, synthesizing the results of hypothesis- and data-driven experiments represents a substantial challenge. Taking a number of influential examples in the recent speech and language literature, we unpack advantages and limitations of both approaches, and highlight ways in which they can be fruitfully combined, for example by using time-resolved analyses, by applying specific models at each level of information transformation, or more generally by complementing data-driven, exploratory approaches with analysis methods that question the data within more constrained model-spaces.


Author(s):  
Shahina Haque

The chapter provides an overview of the theory of speech production, analysis, and synthesis, and status of Bangla speech processing. As nasality is a distinctive feature of Bangla and all the vowels have their nasal counterpart, both Bangla vowels and nasality are also considered. The chapter reviews the state-of-the-art of nasal vowel research, cross language perception of vowel nasality, and vowel nasality transformation to be used in a speech synthesizer.


1991 ◽  
Vol 73 (1) ◽  
pp. 227-234
Author(s):  
Minola A. Pinard

Using a developmental approach, two aspects of debate in the speech perception literature were tested, (a) the nature of adult speech processing, the dichotomy being along nonlinguistic versus linguistic lines, and (b) the nature of speech processing by children of different ages, the hypotheses here implying in infancy detector-like processes and at age four “adult-like” speech perception reorganizations. Children ranging in age from 4 up to 18 years discriminated native and foreign speech contrasts. Results confirm the hypotheses for adults. It is clear that different processes are operating at different ages; however, more complex processes may come into play around the ages of 6 to 10 years; boys may use different strategies than girls, and with age, a multiplicity of processes may be concurrently active.


2009 ◽  
Vol 14 (1) ◽  
pp. 78-89 ◽  
Author(s):  
Kenneth Hugdahl ◽  
René Westerhausen

The present paper is based on a talk on hemispheric asymmetry given by Kenneth Hugdahl at the Xth European Congress of Psychology, Praha July 2007. Here, we propose that hemispheric asymmetry evolved because of a left hemisphere speech processing specialization. The evolution of speech and the need for air-based communication necessitated division of labor between the hemispheres in order to avoid having duplicate copies in both hemispheres that would increase processing redundancy. It is argued that the neuronal basis of this labor division is the structural asymmetry observed in the peri-Sylvian region in the posterior part of the temporal lobe, with a left larger than right planum temporale area. This is the only example where a structural, or anatomical, asymmetry matches a corresponding functional asymmetry. The increase in gray matter volume in the left planum temporale area corresponds to a functional asymmetry of speech processing, as indexed from both behavioral, dichotic listening, and functional neuroimaging studies. The functional anatomy of the corpus callosum also supports such a view, with regional specificity of information transfer between the hemispheres.


2004 ◽  
Vol 20 (4) ◽  
pp. 349-357 ◽  
Author(s):  
Ahmed M. Abdel-Khalek ◽  
Joaquin Tomás-Sabádo ◽  
Juana Gómez-Benito

Summary: To construct a Spanish version of the Kuwait University Anxiety Scale (S-KUAS), the Arabic and English versions of the KUAS have been separately translated into Spanish. To check the comparability in terms of meaning, the two Spanish preliminary translations were thoroughly scrutinized vis-à-vis both the Arabic and English forms by several experts. Bilingual subjects served to explore the cross-language equivalence of the English and Spanish versions of the KUAS. The correlation between the total scores on both versions was .93, and the t value was .30 (n.s.), denoting good similarity. The Alphas and 4-week test-retest reliabilities were greater than .84, while the criterion-related validity was .70 against scores on the trait subscale of the STAI. These findings denote good reliability and validity of the S-KUAS. Factor analysis yielded three high-loaded factors of Behavioral/Subjective, Cognitive/Affective, and Somatic Anxiety, equivalent to the original Arabic version. Female (n = 210) undergraduates attained significantly higher mean scores than their male (n = 102) counterparts. For the combined group of males and females, the correlation between the total score on the S-KUAS and age was -.17 (p < .01). By and large, the findings of the present study provide evidence of the utility of the S-KUAS in assessing trait anxiety levels in the Spanish undergraduate context.


Sign in / Sign up

Export Citation Format

Share Document