scholarly journals Modelling automatic detection of prosodic boundaries for brazilian portuguese spontaneous speech

2020 ◽  
Vol 9 ◽  
pp. 105-128
Author(s):  
Tommaso Raso ◽  
Bárbara Teixeira ◽  
Plínio Barbosa

Speech is segmented into intonational units marked by prosodic boundaries. This segmentation is claimed to have important consequences on syntax, information structure and cognition. This work aims both to investigate the phonetic-acoustic parameters that guide the production and perception of prosodic boundaries, and to develop models for automatic detection of prosodic boundaries in male monological spontaneous speech of Brazilian Portuguese. Two samples were segmented into intonational units by two groups of trained annotators. The boundaries perceived by the annotators were tagged as either terminal or non-terminal. A script was used to extract 111 phonetic-acoustic parameters along speech signal in a right and left windows around the boundary of each phonological word. The extracted parameters comprise measures of (1) Speech rate and rhythm; (2) Standardized segment duration; (3) Fundamental frequency; (4) Intensity; (5) Silent pause. The script considers as prosodic boundary positions at which at least 50% of the annotators indicated a boundary of the same type. A training of models composed by the parameters extracted by the script was developed; these models, were then improved heuristically. The models were developed from the two samples and from the whole data, both using non-balanced and balanced data. Linear Discriminant Analysis algorithm was adopted to produce the models. The models for terminal boundaries show a much higher performance than those for non-terminal ones. In this paper we: (i) show the methodological procedures; (ii) analyze the different models; (iii) discuss some strategies that could lead to an improvement of our results.

2018 ◽  
Vol 26 (4) ◽  
pp. 1455 ◽  
Author(s):  
Bárbara Helohá Falcão Teixeira ◽  
Maryualê Malvessi Mittmann

Abstract: This work presents the results of the analysis of multiple acoustic parameters for the construction of a model for the automatic segmentation of speech in tone units. Based on literature review, we defined sets of acoustic parameters related to the signalization of terminal and non-terminal boundaries. For each parameter, we extracted a series of measurements: 6 for speech rate and rhythm; 34 for duration; 65 for fundamental frequency; 4 for intensity and 2 measurements related to pause. These parameters were extracted from spontaneous speech fragments that were previously segmented into tone units, manually performed by 14 human annotators. We used two methods of statistical classification, Random Forest (RF) and Linear Discriminant Analysis (LDA), to generate models for the identification of prosodic boundaries. After several phases of training and testing, both methods were relatively successful in identifying terminal and non-terminal boundaries. The LDA method presented a higher accuracy in the prediction of terminal and non-terminal boundaries than the RF method, therefore the model obtained with LDA was further refined. As a result, the terminal boundary model is based on 20 acoustic measurements and shows a convergence of 80% in relation to boundaries identified by annotators in the speech sample. For non-terminal boundaries, we arrived at three models that, combined, presented a convergence of 98% in relation to the boundaries identified by annotators in the sample.Keywords: speech segmentation; prosodic boundaries; spontaneous speech.Resumo: Este trabalho apresenta os resultados da análise de múltiplos parâmetros acústicos para a construção de um modelo para a segmentação automática da fala em unidades tonais. A partir da investigação da literatura, definimos conjuntos de parâmetros acústicos relacionados à identificação de fronteiras terminais e não terminais. Para cada parâmetro, uma série de medidas foram extraídas: 6 medidas de taxa de elocução e ritmo; 34 de duração; 65 de frequência fundamental; 4 de intensidade e 2 medidas relativas às pausas. Tais parâmetros foram extraídos de fragmentos de fala espontânea previamente segmentada em unidades tonais de forma manual por 14 anotadores humanos. Utilizamos dois métodos de classificação estatística, Random Forest (RF) e Linear Discriminant Analysis (LDA), para gerar modelos de identificação de fronteiras prosódicas. Após diversas fases de treinamentos e testes, ambos os métodos apresentaram sucesso relativo na identificação de fronteiras terminais e não-terminais. O método LDA apresentou maior índice de acerto na previsão de fronteiras terminais e não-terminais do que o RF, portanto, o modelo obtido com este método foi refinado. Como resultado, O modelo para as fronteiras terminais baseia-se em 20 medidas acústicas e apresenta uma convergência de 80% em relação às fronteiras identificadas pelos anotadores na amostra de fala. Para as fronteiras não terminais, chegamos a três modelos que, combinados, apresentaram uma convergência de 98% em relação às fronteiras identificadas pelos anotadores na amostra.Palavras-chave: segmentação da fala; fronteiras prosódicas; fala espontânea.


2018 ◽  
Vol 61 (5) ◽  
pp. 1188-1202
Author(s):  
Talita Fortunato-Tavares ◽  
Richard G. Schwartz ◽  
Klara Marton ◽  
Claudia F. de Andrade ◽  
Derek Houston

Purpose This study investigated prosodic boundary effects on the comprehension of attachment ambiguities in children with cochlear implants (CIs) and normal hearing (NH) and tested the absolute boundary hypothesis and the relative boundary hypothesis. Processing speed was also investigated. Method Fifteen children with NH and 13 children with CIs (ages 8–12 years) who are monolingual speakers of Brazilian Portuguese participated in a computerized comprehension task with sentences containing prepositional phrase attachment ambiguity and manipulations of prosodic boundaries. Results Children with NH and children with CIs differed in how they used prosodic forms to disambiguate sentences. Children in both groups provided responses consistent with half of the predictions of the relative boundary hypothesis. The absolute boundary hypothesis did not characterize the syntactic disambiguation of children with CIs. Processing speed was similar in both groups. Conclusions Children with CIs do not use prosodic information to disambiguate sentences or to facilitate comprehension of unambiguous sentences similarly to children with NH. The results suggest that cross-linguistic differences may interact with syntactic disambiguation. Prosodic contrasts that affect sentence comprehension need to be addressed directly in intervention with children with CIs.


PLoS ONE ◽  
2021 ◽  
Vol 16 (5) ◽  
pp. e0250969
Author(s):  
Tirza Biron ◽  
Daniel Baum ◽  
Dominik Freche ◽  
Nadav Matalon ◽  
Netanel Ehrmann ◽  
...  

Automatic speech recognition (ASR) and natural language processing (NLP) are expected to benefit from an effective, simple, and reliable method to automatically parse conversational speech. The ability to parse conversational speech depends crucially on the ability to identify boundaries between prosodic phrases. This is done naturally by the human ear, yet has proved surprisingly difficult to achieve reliably and simply in an automatic manner. Efforts to date have focused on detecting phrase boundaries using a variety of linguistic and acoustic cues. We propose a method which does not require model training and utilizes two prosodic cues that are based on ASR output. Boundaries are identified using discontinuities in speech rate (pre-boundary lengthening and phrase-initial acceleration) and silent pauses. The resulting phrases preserve syntactic validity, exhibit pitch reset, and compare well with manual tagging of prosodic boundaries. Collectively, our findings support the notion of prosodic phrases that represent coherent patterns across textual and acoustic parameters.


Author(s):  
Frederico Amorim Cavalcante ◽  
Tommaso Raso ◽  
Giulia Bossaglia ◽  
Maryualê Mittmann ◽  
Bruno Rocha

This paper deals with an inter-annotator agreement test involving the identification of the information unit of Topic as defined within the framework of the Language into Act Theory (L-AcT). Fleiss’s kappa statistic was used to measure the agreement among the four annotators who took part in the test. The data used was sampled from C-ORAL-BRASIL II, a spontaneous speech corpus of Brazilian Portuguese. The paper begins by outlining of the theoretical underpinnings of L-AcT, dedicating special attention to aspects directly related to the notion of Topic. Section 2 presents the pilot test and discusses methodological and theoretical issues that were relevant for the design of the protocol that was eventually used in the actual test. Sections 3 and 4 deal with the test, its protocol and results (the kappa coefficient for the general agreement was 0.79, which by usual standards represents a substantial agreement). Section 5 first provides a brief review of a few studies conducted according to other frameworks which have dealt with inter-rater agreement on the annotation of information structure categories. Finally, the errors observed in the test are analyzed qualitatively.


2018 ◽  
Vol 15 (2) ◽  
pp. 130-138 ◽  
Author(s):  
Laszlo Toth ◽  
Ildiko Hoffmann ◽  
Gabor Gosztolya ◽  
Veronika Vincze ◽  
Greta Szatloczki ◽  
...  

Background: Even today the reliable diagnosis of the prodromal stages of Alzheimer's disease (AD) remains a great challenge. Our research focuses on the earliest detectable indicators of cognitive decline in mild cognitive impairment (MCI). Since the presence of language impairment has been reported even in the mild stage of AD, the aim of this study is to develop a sensitive neuropsychological screening method which is based on the analysis of spontaneous speech production during performing a memory task. In the future, this can form the basis of an Internet-based interactive screening software for the recognition of MCI. Methods: Participants were 38 healthy controls and 48 clinically diagnosed MCI patients. The provoked spontaneous speech by asking the patients to recall the content of 2 short black and white films (one direct, one delayed), and by answering one question. Acoustic parameters (hesitation ratio, speech tempo, length and number of silent and filled pauses, length of utterance) were extracted from the recorded speech signals, first manually (using the Praat software), and then automatically, with an automatic speech recognition (ASR) based tool. First, the extracted parameters were statistically analyzed. Then we applied machine learning algorithms to see whether the MCI and the control group can be discriminated automatically based on the acoustic features. Results: The statistical analysis showed significant differences for most of the acoustic parameters (speech tempo, articulation rate, silent pause, hesitation ratio, length of utterance, pause-per-utterance ratio). The most significant differences between the two groups were found in the speech tempo in the delayed recall task, and in the number of pauses for the question-answering task. The fully automated version of the analysis process – that is, using the ASR-based features in combination with machine learning - was able to separate the two classes with an F1-score of 78.8%. Conclusion: The temporal analysis of spontaneous speech can be exploited in implementing a new, automatic detection-based tool for screening MCI for the community.


2021 ◽  
pp. 002383092110333
Author(s):  
Katy Carlson ◽  
David Potter

There is growing evidence that pitch accents as well as prosodic boundaries can affect syntactic attachment. But is this an effect of their perceptual salience (the Salience Hypothesis), or is it because accents mark the position of focus (the Focus Attraction Hypothesis)? A pair of auditory comprehension experiments shows that focus position, as indicated by preceding wh-questions instead of by pitch accents, affects attachment by drawing the ambiguous phrase to the focus. This supports the Focus Attraction Hypothesis (or a pragmatic version of salience) for both these results and previous results of accents on attachment. These experiments show that information structure, as indicated with prosody or other means, influences sentence interpretation, and suggests a view on which modifiers are drawn to the most important information in a sentence.


2020 ◽  
pp. 1-21 ◽  
Author(s):  
Clément Dalloux ◽  
Vincent Claveau ◽  
Natalia Grabar ◽  
Lucas Emanuel Silva Oliveira ◽  
Claudia Maria Cabral Moro ◽  
...  

Abstract Automatic detection of negated content is often a prerequisite in information extraction systems in various domains. In the biomedical domain especially, this task is important because negation plays an important role. In this work, two main contributions are proposed. First, we work with languages which have been poorly addressed up to now: Brazilian Portuguese and French. Thus, we developed new corpora for these two languages which have been manually annotated for marking up the negation cues and their scope. Second, we propose automatic methods based on supervised machine learning approaches for the automatic detection of negation marks and of their scopes. The methods show to be robust in both languages (Brazilian Portuguese and French) and in cross-domain (general and biomedical languages) contexts. The approach is also validated on English data from the state of the art: it yields very good results and outperforms other existing approaches. Besides, the application is accessible and usable online. We assume that, through these issues (new annotated corpora, application accessible online, and cross-domain robustness), the reproducibility of the results and the robustness of the NLP applications will be augmented.


2016 ◽  
Vol 8 (2) ◽  
pp. 139-153
Author(s):  
Judit Nagy

Abstract The management of given and new information is one of the key components of accomplishing coherence in oral discourse, which is claimed to be a problematic area for language learners (Celce-Murcia, Dörnyei, and Thurrell 1995: 14). Research on discourse intonation proposes that instead of the given/new dichotomy, givenness should be viewed as a continuum, with different types of accessibility (Baumann & Grice 2006). Moreover, Prince (1992) previously categorized information structure into Hearer-old/Hearer-new and Discourse-old/Discourse-new information. There is consensus on the fact that focus or prominence associated with new information is marked with nuclear pitch accent, and its main acoustic cue, fundamental frequency (f0) (Ward & Birner 2001: 120). Non-native intonation has been reported to display numerous differences in f0 range and patterns compared to native speech (Wennerstrom 1994; Baker 2010). This study is an attempt to address the issue of marking information structure in existential there sentences by means of f0 in non-native spontaneous speech. Data originates from task-based interactions in the Wildcat Corpus of Native- and Foreign-Accented English (Van Engen et al. 2010). This paper examines two issues: (1) information structure in relation to the notions of givenness and different types of accessibility (Baumann & Grice 2006) and to Prince’s (1992) multidimensional taxonomy and (2) the use of f0 peaks to mark the prominence of new information. Several differences were measured among native speakers regarding the use of f0, sentence type, and complexity.


Sign in / Sign up

Export Citation Format

Share Document