Spoken Arabic dialect recognition using X-vectors

2020 ◽  
Vol 26 (6) ◽  
pp. 691-700
Author(s):  
Abualsoud Hanani ◽  
Rabee Naser

AbstractThis paper describes our automatic dialect identification system for recognizing four major Arabic dialects, as well as Modern Standard Arabic. We adapted the X-vector framework, which was originally developed for speaker recognition, to the task of Arabic dialect identification (ADI). The training and development ADI VarDial 2018 and VarDial 2017 were used to train and test all of our ADI systems. In addition to the introduced X-vectors, other systems use the traditional i-vectors, bottleneck features, phonetic features, words transcriptions, and GMM-tokens. X-vectors achieved good performance (0.687) on the ADI 2018 Discriminating between Similar Languages shared task testing dataset, outperforming other systems. The performance of the X-vector system is slightly improved (0.697) when fused with i-vectors, bottleneck features, and word uni-gram features.

Author(s):  
A. Nagesh

The feature vectors of speaker identification system plays a crucial role in the overall performance of the system. There are many new feature vectors extraction methods based on MFCC, but ultimately we want to maximize the performance of SID system.  The objective of this paper to derive Gammatone Frequency Cepstral Coefficients (GFCC) based a new set of feature vectors using Gaussian Mixer model (GMM) for speaker identification. The MFCC are the default feature vectors for speaker recognition, but they are not very robust at the presence of additive noise. The GFCC features in recent studies have shown very good robustness against noise and acoustic change. The main idea is  GFCC features based on GMM feature extraction is to improve the overall speaker identification performance in low signal to noise ratio (SNR) conditions.


1987 ◽  
Vol 16 (3) ◽  
pp. 359-367 ◽  
Author(s):  
Hassan R. Abd-El-Jawad

ABSTRACTMost researchers of Arabic sociolinguistics assume the existence of a sociolinguistic continuum with a local vernacular at the bottom and the standard variety at the top. Those researchers seem to equate the terms “prestige” and “standard”; consequently, they tend to consider Modern Standard Arabic (MSA) as the only prestige variety in all settings. This article presents evidence showing that if an adequate description of sociolinguistic variation of spoken Arabic is to be met, it is necessary to posit not only one standard speech variety, MSA, but also other prestigious local or regional varieties which act as local spoken standards competing with MSA in informal settings. It will be shown in the reported cases that in certain contexts speakers tend to switch from their local forms – though these latter may be identical to MSA – to other local features characteristic of other dominant social groups and that happen to be marked [–MSA], These local prestigious norms act like the standard spoken norms in informal settings. (Diglossic model, prestigious varieties, stereotypes, dominant social groups, competing standards, spoken Arabic).


The performance of Mel scale and Bark scale is evaluated for text-independent speaker identification system. Mel scale and Bark scale are designed according to human auditory system. The filter bank structure is defined using Mel and Bark scales for speech and speaker recognition systems to extract speaker specific speech features. In this work, performance of Mel scale and Bark scale is evaluated for text-independent speaker identification system. It is found that Bark scale centre frequencies are more effective than Mel scale centre frequencies in case of Indian dialect speaker databases. Mel scale is defined as per interpretation of pitch by human ear and Bark scale is based on critical band selectivity at which loudness becomes significantly different. The recognition rate achieved using Bark scale filter bank is 96% for AISSMSIOIT database and 95% for Marathi database.


Author(s):  
Musab T. S. Al-Kaltakchi ◽  
Haithem Abd Al-Raheem Taha ◽  
Mohanad Abd Shehab ◽  
Mohamed A.M. Abdullah

<p><span lang="EN-GB">In this paper, different feature extraction and feature normalization methods are investigated for speaker recognition. With a view to give a good representation of acoustic speech signals, Power Normalized Cepstral Coefficients (PNCCs) and Mel Frequency Cepstral Coefficients (MFCCs) are employed for feature extraction. Then, to mitigate the effect of linear channel, Cepstral Mean-Variance Normalization (CMVN) and feature warping are utilized. The current paper investigates Text-independent speaker identification system by using 16 coefficients from both the MFCCs and PNCCs features. Eight different speakers are selected from the GRID-Audiovisual database with two females and six males. The speakers are modeled using the coupling between the Universal Background Model and Gaussian Mixture Models (GMM-UBM) in order to get a fast scoring technique and better performance. The system shows 100% in terms of speaker identification accuracy. The results illustrated that PNCCs features have better performance compared to the MFCCs features to identify females compared to male speakers. Furthermore, feature wrapping reported better performance compared to the CMVN method. </span></p>


Author(s):  
Anny Tandyo ◽  
Martono Martono ◽  
Adi Widyatmoko

Article discussed a speaker identification system. Which was a part of speaker recognition. The system identified asubject based on the voice from a group of pattern had been saved before. This system used a wavelet discrete transformationas a feature extraction method and an artificial neural network of back-propagation as a classification method. The voiceinput was processed by the wavelet discrete transformation in order to obtain signal coefficient of low frequency as adecomposition result which kept voice characteristic of everyone. The coefficient then was classified artificial neural networkof back-propagation. A system trial was conducted by collecting voice samples directly by using 225 microphones in nonsoundproof rooms; contained of 15 subjects (persons) and each of them had 15 voice samples. The 10 samples were used as atraining voice and 5 others as a testing voice. Identification accuracy rate reached 84 percent. The testing was also done onthe subjects who pronounced same words. It can be concluded that, the similar selection of words by different subjects has noinfluence on the accuracy rate produced by system.Keywords: speaker identification, wavelet discrete transformation, artificial neural network, back-propagation.


1978 ◽  
Vol 14 (2) ◽  
pp. 227-258 ◽  
Author(s):  
T. F. Mitchell

Educated spoken Arabic (ESA), like any of the world's innumerable other koineized forms of speech, greatly depends for its maintenance and dissemination on the binding power of writing, on its quasi-permanence and transferability in space and time. It is understandable, too, that koines – and the koineizing tradition is an ancient one in the eastern Mediterranean – should regularly call upon earlier, whence written ancestral forms and, as far as the Arab and Islamic worlds are concerned, the fundamental importance of the immutable Koran cannot be over-estimated. Although a koine needs a spoken base, Classical Arabic, itself probably never the dialect of any single group or region (cf. Ferguson, 1959), substantially contributed through its more or less fixed written norms to an older koine and nowadays, via the so-called Modern Standard Arabic (MSA) of contemporary literature, journalism and ‘spoken prose’, to a more recently emergent pan-Arabic or, conceivably, pan-Arabics. In a not dissimilar way, in the Romance-speaking area, Latin existed for centuries as the language, for example, of clerics, side by side with developing regional koines and subsequent new literary languages which borrowed greatly from it, just as the Classical Latin koine was itself modelled by writers, like Cicero, whose cultural values were determined and carried by Greek. A crucial difference, however, between Latin vis-à-vis the Romance languages and Classical Arabic or the substantially similar MSA vis-à-vis ESA is that the latter qua spoken language may not be any more freely written than the regional vernaculars of Arabic. It is not, for instance, orthographic or orthoepic difficulties that inhibit the ‘transcribing“ of spoken Arabic of whatever kind but rather the almost mystical regard in which Arabs hold their written language to the detriment of spoken counterparts.


2016 ◽  
Vol 25 (4) ◽  
pp. 529-538
Author(s):  
H.S. Jayanna ◽  
B.G. Nagaraja

AbstractMost of the state-of-the-art speaker identification systems work on a monolingual (preferably English) scenario. Therefore, English-language autocratic countries can use the system efficiently for speaker recognition. However, there are many countries, including India, that are multilingual in nature. People in such countries have habituated to speak multiple languages. The existing speaker identification system may yield poor performance if a speaker’s train and test data are in different languages. Thus, developing a robust multilingual speaker identification system is an issue in many countries. In this work, an experimental evaluation of the modeling techniques, including self-organizing map (SOM), learning vector quantization (LVQ), and Gaussian mixture model-universal background model (GMM-UBM) classifiers for multilingual speaker identification, is presented. The monolingual and crosslingual speaker identification studies are conducted using 50 speakers of our own database. It is observed from the experimental results that the GMM-UBM classifier gives better identification performance than the SOM and LVQ classifiers. Furthermore, we propose a combination of speaker-specific information from different languages for crosslingual speaker identification, and it is observed that the combination feature gives better performance in all the crosslingual speaker identification experiments.


2004 ◽  
Vol 25 (4) ◽  
pp. 495-512 ◽  
Author(s):  
ELINOR SAIEGH–HADDAD

The study examined the impact of the phonemic and lexical distance between Modern Standard Arabic (MSA) and a spoken Arabic vernacular (SAV) on phonological analysis among kindergarten (N=24) and first grade (N=42) native Arabic-speaking children. We tested the effect of the lexical status of the word (SAV, MSA, and pseudoword), as well as the linguistic affiliation of the target phoneme (SAV vs. MSA), on initial and final phoneme isolation. Results showed that, when words were composed of SAV phonemes only, the lexical status of the word did not affect phoneme isolation. However, when MSA and pseudowords encoded both SAV and MSA phonemes, kindergarteners found MSA words significantly more difficult to analyze. Comparing children's ability to isolate SAV versus MSA phonemes revealed that all children found MSA phonemes significantly more difficult to isolate. Kindergarteners found MSA phonemes that were embedded within MSA words even more difficult to isolate. Results underscore the role of the lexical status of the stimulus word, as well as the linguistic affiliation of the target phoneme in phonological analysis in a diglossic context.


2020 ◽  
Vol 2 (1) ◽  
pp. 26-53 ◽  
Author(s):  
Khaled Al Masaeed ◽  
Naoko Taguchi ◽  
Mohammed Tamimi

Abstract This study examined the relationship between L2 proficiency and (1) appropriateness of refusals, (2) use of refusal strategies, and (3) multidialectal practices in performing refusals in Arabic. Using a spoken discourse completion task (spoken DCT), data were collected from 45 learners of Arabic at three different proficiency levels and from 15 Arabic native speakers. The situations used in the spoken DCT varied in power and social distance (i.e., refusing a friend’s request to lend money, refusing a neighbor’s request to lend a car, and refusing a boss’s request to stay late to work extra hours). Findings generally revealed a positive relationship between proficiency and L2 Arabic learners’ appropriateness, use of refusal strategies, and multidialectal practices in their refusals. However, results showed that native speakers solely employed spoken Arabic (i.e., the dialect), while learners relied heavily on Modern Standard Arabic. Analysis of refusal strategies showed that native speakers tended to provide vague explanations in their refusals except when refusing the neighbor’s request, whereas the learners preferred to provide specific reasons for their refusals. Moreover, advanced-level learners were substantially verbose; as a result, their refusals could be perceived as lecturing or criticizing their interlocutor. This paper concludes with implications for researching and teaching L2 Arabic refusals with special attention to multidialectal practices.


Sign in / Sign up

Export Citation Format

Share Document