scholarly journals Accent modification for speech recognition of non-native speakers using neural style transfer

Author(s):  
Kacper Radzikowski ◽  
Le Wang ◽  
Osamu Yoshie ◽  
Robert Nowak

AbstractNowadays automatic speech recognition (ASR) systems can achieve higher and higher accuracy rates depending on the methodology applied and datasets used. The rate decreases significantly when the ASR system is being used with a non-native speaker of the language to be recognized. The main reason for this is specific pronunciation and accent features related to the mother tongue of that speaker, which influence the pronunciation. At the same time, an extremely limited volume of labeled non-native speech datasets makes it difficult to train, from the ground up, sufficiently accurate ASR systems for non-native speakers.In this research, we address the problem and its influence on the accuracy of ASR systems, using the style transfer methodology. We designed a pipeline for modifying the speech of a non-native speaker so that it more closely resembles the native speech. This paper covers experiments for accent modification using different setups and different approaches, including neural style transfer and autoencoder. The experiments were conducted on English language pronounced by Japanese speakers (UME-ERJ dataset). The results show that there is a significant relative improvement in terms of the speech recognition accuracy. Our methodology reduces the necessity of training new algorithms for non-native speech (thus overcoming the obstacle related to the data scarcity) and can be used as a wrapper for any existing ASR system. The modification can be performed in real time, before a sample is passed into the speech recognition system itself.

Multilingua ◽  
2018 ◽  
Vol 37 (3) ◽  
pp. 275-304 ◽  
Author(s):  
Jette G. Hansen Edwards

AbstractThe study employs a case study approach to examine the impact of educational backgrounds on nine Hong Kong tertiary students’ English and Cantonese language practices and identifications as native speakers of English and Cantonese. The study employed both survey and interview data to probe the participants’ English and Cantonese language use at home, school, and with peers/friends. Leung, Harris, and Rampton’s (1997, The idealized native speaker, reified ethnicities, and classroom realities.TESOL Quarterly 31(3). 543–560) framework of language affiliation, language expertise, and inheritance was used to examine the construction of a native language identity in a multilingual setting. The study found that educational background – and particularly international school experience in contrast to local government school education – had an impact on the participants’ English language usage at home and with peers, and also affected their language expertise in Cantonese. English language use at school also impacted their identifications as native speakers of both Cantonese and English, with Cantonese being viewed largely as native language based on inheritance while English was being defined as native based on their language expertise, affiliation and use, particularly in contrast to their expertise in, affiliation with, and use of Cantonese.


Author(s):  
Wire Bagye

One of the materials in the English Language course is listening, namely the skill of hearing the pronunciation of English sentences. Learning listening must use native speaker or recording material so that the pronunciation is heard according to the original. If you use the course participants as a sound source, there is a high probability of mistakes in the pronunciation of words or sentences. Then an application is needed to help with course activities as tutors or native speakers in giving correct English pronunciation. In this research, English Pronunciation Application was built using the Greenfoot Application with the object oriented programming concept. This application is to help pronoun learning that can output audio when the object clicked. The development method uses SDLC, application modeling uses UML and testing using Black-box testing. The English pronunciation application is compiled into a .jar file so that it can be run on a computer with a Windows operating system that has JDK installed. The test results show that this English learning application can replace tutors, produce the right audio, and can replace native speakers. This can be seen from the results of the questionnaire showing 82%.


2014 ◽  
Vol 2 (2) ◽  
pp. 149
Author(s):  
Priya K. Nair

In India acquisition of English language is imperative if one wants to sell oneself in the increasingly competitive job market. With a booming population the nation is filled with educated, technologically literate youth. English is not merely a foreign language in India. As India is separated by a plethora of languages knowledge of English is imperative. As the teachers in India are not native speakers of English the language they teach is not free from errors. The articulation is quite problematic as the mother tongue influence is quite pronounced. Technology helps to reduce these errors. Movies as a tool can enhance the listening and speaking skills of our students. It is quite boring to work with disembodied voices and the recorded conversations available in language labs do not sustain the learner’s interest. However learners are often forced to listen to recorded conversations of people they never see, the conversation is often stilted and contemporary idiom is hardly used. However, a completely new dimension to aural practice can be added in the classroom by using movies. <br /><p><strong> </strong></p>


2021 ◽  
Vol 3 (3) ◽  
pp. 290-315
Author(s):  
Esther Olayinka Bamigbola ◽  
Fadekemi Rukayat Umar

This study investigates the factors that are responsible for the levelling of Ìkàr??-Àkókó dialect. Specifically, the paper examines the impacts of Nigerian indigenous languages, especially Yorùbá, on the dialect. The study aims at identifying the patterns of changes in the dialect and their impacts on the ethnic identities of the people. The work is based on the variationist approach pioneered by William Labov in the late 1960s and early 1970s. The tools used for data collection include questionnaire, oral interview and observation. The findings of the study reveal that the dialect manifests different stages of changes, vital domains like home, school and work place, which are supposed to be the strongholds of this dialect are being encroached upon by languages other than the mother tongue in the study area. It was found that the changes in the dialect are not due to the influence of English language only, but to indigenous Nigerian languages, mostly Yorùbá. It was concluded that the gradual levelling of Ìkàr??-Àkókó dialect is caused in part by restricted domains of use, increase in population; lack of commitment to indigenous language use by the native speakers; and suppressive language policy in the nation. The study recommends sensitization campaigns as a way of maintaining and sustaining the status of indigenous languages.


2011 ◽  
Vol 161 ◽  
pp. 10-30 ◽  
Author(s):  
Lieven Buysse

Abstract This paper investigates how foreign language learners use discourse markers (such as so, well, you know, I mean) in English speech. These small words that do not contribute much, if anything at all, to the propositional content of a message but modify it in subtle ways, are often considered among the last elements acquired in a foreign language. This contribution reports on close scrutiny of a corpus of English-spoken interviews with Belgian native speakers of Dutch, half of whom are undergraduates majoring in Commercial Sciences and half of whom are majoring in English Linguistics, and sets it off against a comparable native speaker corpus. The investigation shows that the language learners exhibit a clear preference for “operative discourse markers” and neglect or avoid “involvement discourse markers”. It is argued that in learner speech the former take on functions typically fulfilled by the latter to a greater extent than in native speech, and that in some cases the learners revert to a code-switching strategy to cater for their pragmatic needs, bringing markers from Dutch into their English speech. Finally, questions are raised as to the place of such pragmatic devices in foreign language learning.


2018 ◽  
Vol 79 (11) ◽  
pp. 617
Author(s):  
Kelly McElroy ◽  
Laurie M. Bridges

It is widely accepted that English is the current lingua franca, especially in the scientific community. With approximately 527 million native speakers globally, English ranks as the third most-spoken language (after Chinese and Hindu-Urdu), but there are also an estimated 1.5 billion English-language learners in the world.The preeminence of English reflects the political power of the English-speaking world, carrying privileges for those who can speak, write, and read in English, and disadvantages to those who cannot. This is also the case in scholarly communication. Linguist Nicholas Subtirelu identifies three privileges for native English speakers: 1) easier access to social, political, and educational institutions; 2) access to additional forms of capital; and 3) avoiding negative opinions of one’s speech.For example, we were both born into families that speak American English at home, we were surrounded by English books and media growing up, and our entire education was in English. Even defining who counts as a “native” speaker can be refracted through other social identities. As college-educated white Americans, our English is never questioned, but the same is not true for many equally fluent people around the world. 


ExELL ◽  
2014 ◽  
Vol 2 (1) ◽  
pp. 46-67 ◽  
Author(s):  
Mirna Begagić

Abstract The importance of collocations in second language learning has been recognized in the past few decades. There have been numerous studies in L2 acquisition research that investigated how the knowledge and use of collocations at different levels of proficiency affect learners’ communicative competence and language performance. Moreover, it seems important to mention that most of the studies investigating the collocational knowledge of students learning English as their L2, indicated students’ poor performance (Fayez-Hussein 1990; Aghbar 1990; Bahns and Eldaw 1993; Stubbs 2002; Wray 2002; Nasselhauf 2005; Ozaki 2011). The aim of this paper is to explain the notion of collocation as well as its most common classification, and to point out the importance of its proper use for English language students who are native speakers of the Bosnian/Croatian/Serbian (BCS) language. Furthermore, this study examines the productive and receptive knowledge of lexical collocations in order to access students’ collocational competence. The results indicate students’ poor collocational knowledge. This can be due to the fact that collocations of the language students are learning are interfering with the collocations of their mother tongue, but also due to the way students are taught English (vocabulary negligence in comparison with grammar and unawareness of the importance of collocations in language learning).


Speech recognition is widely used in the computer science to make well-organized communication between humans and computers. This paper addresses the problem of speech recognition for Varhadi, the regional language of the state of Maharashtra in India. Varhadi is widely spoken in Maharashtra state especially in Vidharbh region. Viterbi algorithm is used to recognize unknown words using Hidden Markov Model (HMM). The dataset is developed to train the system consists of 83 isolated Varhadi words. A Mel frequency cepstral coefficient (MFCCs) is used as feature extraction to perform the acoustical analysis of speech signal. Word model is implemented in speaker independent mode for the proposed varhadi automatic speech recognition system (V-ASR). The training and test dataset consist of isolated words uttered by 8 native speakers of Varhadi language. The V-ASR system has recognized the Varhadi words satisfactorily with 92.77%. recognition performance.


Author(s):  
Yolanda Joy Calvo-Benzies

This paper focuses on non-native accents in ESP classrooms. In particular it looks at native and non-native speakers of English accents used in the audio material accompanying six ESP textbooks. In a second study, a group of undergraduate ESP students of Law and Tourism were asked to assess some of the non-native speakers accents found in these materials, focussing on aspects such as fluency, pronunciation, intelligibility and foreign accent. More specifically, they were asked to rate the following non-native accents of speakers in English: French, German, Polish, Chinese and Spanish. Results from the first part of the study show that native speaker models continue to be present in ESP textbooks to a far higher degree than non-native ones. In the second part, the non-native accents that students rated most positively were those of German and Polish speakers, and those seen in the most negative terms were French and Spanish. In general, the Law students tended to value native accents more than non-native ones, whereas students of Tourism broadly accept both native and non-native accents.


Sign in / Sign up

Export Citation Format

Share Document