scholarly journals Comparison of Two Forced Alignments Systems for Aligning Bribri Speech

Author(s):  
Sofía Flores Solórzano ◽  
Rolando Coto-Solano

Abstract: Forced alignment provides drastic savings in time when aligning speech recordings and is particularly useful for the study of Indigenous languages, which are severely under-resourced in corpora and models. Here we compare two forced alignment systems, FAVE-align and EasyAlign, to determine which one provides more precision when processing running speech in the Chibchan language Bribri. We aligned a segment of a story narrated in Bribri and compared the errors in finding the center of the words and the edges of phonemes when compared with the manual correction. FAVE-align showed better performance: It has an error of 7% compared to 24% with EasyAlign when finding the center of words, and errors of 22~24 ms when finding the edges of phonemes, compared to errors of 86~130 ms with EasyAlign. In addition to this, EasyAlign failed to detect 7% of phonemes, while also inserting 58 spurious phones into the transcription. Future research includes verifying these results for other genres and other Chibchan languages. Finally, these results provide additional evidence for the applicability of natural language processing methods to Chibchan languages and point to future work such as the construction of corpora and the training of automated speech recognition systems.  Spanish Abstract: El alineamiento forzado provee un ahorro drástico de tiempo al alinear grabaciones del habla, y es útil para el estudio de las lenguas indígenas, las cuales cuentan con pocos recursos para generar corpus y modelos computacionales. Aquí comparamos dos sistemas de alineamiento, FAVE-align e EasyAlign, para determinar cuál provee mayor precisión al alinear habla en la lengua chibcha bribri. Alineamos una narración y comparamos el error al tratar de encontrar el centro de las palabras y los bordes de los fonemas con sus equivalentes en una corrección manual. FAVE-align tuvo mejor rendimiento, con un error de 7% comparado con 24% de EasyAlign para el centro de las palabras, y con errores de 22~24 ms para el borde de los fonemas, comparado con 86~130 ms con EasyAlign. Además, EasyAlign no pudo detectar el 7% de los fonemas, y al mismo tiempo añadió 58 sonidos espurios a la transcripción. Como trabajo futuro verificaremos estos resultados con otros géneros hablados y con otras lenguas chibchas. Finalmente, estos resultados comprueban la aplicabilidad de los métodos de procesamiento de lengua natural a las lenguas chibchas, y apuntan a trabajo futuro en la construcción de corpus y el entrenamiento de sistemas de reconocimiento automático del habla.

2021 ◽  
Vol 7 ◽  
pp. 205520762110021
Author(s):  
Catherine Diaz-Asper ◽  
Chelsea Chandler ◽  
R Scott Turner ◽  
Brigid Reynolds ◽  
Brita Elvevåg

Objective There is a critical need to develop rapid, inexpensive and easily accessible screening tools for mild cognitive impairment (MCI) and Alzheimer’s disease (AD). We report on the efficacy of collecting speech via the telephone to subsequently develop sensitive metrics that may be used as potential biomarkers by leveraging natural language processing methods. Methods Ninety-one older individuals who were cognitively unimpaired or diagnosed with MCI or AD participated from home in an audio-recorded telephone interview, which included a standard cognitive screening tool, and the collection of speech samples. In this paper we address six questions of interest: (1) Will elderly people agree to participate in a recorded telephone interview? (2) Will they complete it? (3) Will they judge it an acceptable approach? (4) Will the speech that is collected over the telephone be of a good quality? (5) Will the speech be intelligible to human raters? (6) Will transcriptions produced by automated speech recognition accurately reflect the speech produced? Results Participants readily agreed to participate in the telephone interview, completed it in its entirety, and rated the approach as acceptable. Good quality speech was produced for further analyses to be applied, and almost all recorded words were intelligible for human transcription. Not surprisingly, human transcription outperformed off the shelf automated speech recognition software, but further investigation into automated speech recognition shows promise for its usability in future work. Conclusion Our findings demonstrate that collecting speech samples from elderly individuals via the telephone is well tolerated, practical, and inexpensive, and produces good quality data for uses such as natural language processing.


Author(s):  
Oksana Chulanova

The article discusses the capabilities of artificial intelligence technologies - technologies based on the use of artificial intelligence, including natural language processing, intellectual decision support, computer vision, speech recognition and synthesis, and promising methods of artificial intelligence. The results of the author's study and the analysis of artificial intelligence technologies and their capabilities for optimizing work with staff are presented. A study conducted by the author allowed us to develop an author's concept of integrating artificial intelligence technologies into work with personnel in the digital paradigm.


2020 ◽  
Author(s):  
Kyle Mahowald ◽  
George Kachergis ◽  
Michael C. Frank

Ambridge (2019) calls for exemplar-based accounts of language acquisition. Do modern neural networks such as transformers or word2vec – which have been extremely successful in modern natural language processing (NLP) applications – count? Although these models often have ample parametric complexity to store exemplars from their training data, they also go far beyond simple storage by processing and compressing the input via their architectural constraints. The resulting representations have been shown to encode emergent abstractions. If these models are exemplar-based then Ambridge’s theory only weakly constrains future work. On the other hand, if these systems are not exemplar models, why is it that true exemplar models are not contenders in modern NLP?


2018 ◽  
Vol 24 (3) ◽  
pp. 393-413 ◽  
Author(s):  
STELLA FRANK ◽  
DESMOND ELLIOTT ◽  
LUCIA SPECIA

AbstractTwo studies on multilingual multimodal image description provide empirical evidence towards two questions at the core of the task: (i) whether target language speakers prefer descriptions generated directly in their native language, as compared to descriptions translated from a different language; (ii) whether images improve human translation of descriptions. These results provide guidance for future work in multimodal natural language processing by first showing that on the whole, translations are not distinguished from native language descriptions, and second delineating and quantifying the information gained from the image during the human translation task.


2020 ◽  
Vol 7 (1) ◽  
pp. 54-60
Author(s):  
Falia Amalia ◽  
Moch Arif Bijaksana

Abstract — The Qur'an is one of the research in linguistic branches that have not been studied by many experts in their field so it has not gotten a popular place. Whereas in the Qur'an, very many words can be used to be researched especially in terms of Natural Language Processing such as text classification, document clustering, text summarization, etc. One of them is like the semantic similarity and the Distribution Semantic Model. The purpose of this writing is to try to create an evaluation dataset in the model of semantic distribution in Bahasa Indonesia with two classes of words that are noun and verb, looking for equal value and linkage of 500 word-pairs provided. Hopefully by looking at this, the semantic sciences that exist for the study of the Qur'an are growing, especially in the translation of the Quran in the Indonesia Language. This research was created at the same time to create datasets such as previously conducted research, in order to hope that future research with the focus of other discussions can use this dataset to help with the research. The study uses 6236 number of verses and from the number of such verses, the system gets 2193 for nouns and 1733 for verbs. The amount is processed using the Sim-rail vector method, a questionnaire against 15 respondents and gold standard, to get the performance value measured using Spearman Rank and get a correlation result of 0.909. Keywords — Natural Language Processing; Distribution Semantic Model; Sim-Rel Vector; Spearman Rank


Sign in / Sign up

Export Citation Format

Share Document