FuzzyGCP: A deep learning architecture for automatic spoken language identification from speech signals

2021 ◽  
Vol 168 ◽  
pp. 114416
Author(s):  
Avishek Garain ◽  
Pawan Kumar Singh ◽  
Ram Sarkar
2020 ◽  
Vol 32 ◽  
pp. 01010
Author(s):  
Shubham Godbole ◽  
Vaishnavi Jadhav ◽  
Gajanan Birajdar

Spoken language is the most regular method of correspondence in this day and age. Endeavours to create language recognizable proof frameworks for Indian dialects have been very restricted because of the issue of speaker accessibility and language readability. However, the necessity of SLID is expanding for common and safeguard applications day by day. Feature extraction is a basic and important procedure performed in LID. A sound example is changed over into a spectrogram visual portrayal which describes a range of frequencies in regard with time. Three such spectrogram visuals were generated namely Log Spectrogram, Gammatonegram and IIR-CQT Spectrogram for audio samples from the standardized IIIT-H Indic Speech Database. These visual representations depict language specific details and the nature of each language. These spectrograms images were then used as an input to the CNN. Classification accuracy of 98.86% was obtained using the proposed methodology.


2019 ◽  
Vol 31 (12) ◽  
pp. 8483-8501 ◽  
Author(s):  
Himadri Mukherjee ◽  
Subhankar Ghosh ◽  
Shibaprasad Sen ◽  
Obaidullah Sk Md ◽  
K. C. Santosh ◽  
...  

2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Gundeep Singh ◽  
Sahil Sharma ◽  
Vijay Kumar ◽  
Manjit Kaur ◽  
Mohammed Baz ◽  
...  

The process of detecting language from an audio clip by an unknown speaker, regardless of gender, manner of speaking, and distinct age speaker, is defined as spoken language identification (SLID). The considerable task is to recognize the features that can distinguish between languages clearly and efficiently. The model uses audio files and converts those files into spectrogram images. It applies the convolutional neural network (CNN) to bring out main attributes or features to detect output easily. The main objective is to detect languages out of English, French, Spanish, and German, Estonian, Tamil, Mandarin, Turkish, Chinese, Arabic, Hindi, Indonesian, Portuguese, Japanese, Latin, Dutch, Portuguese, Pushto, Romanian, Korean, Russian, Swedish, Tamil, Thai, and Urdu. An experiment was conducted on different audio files using the Kaggle dataset named spoken language identification. These audio files are comprised of utterances, each of them spanning over a fixed duration of 10 seconds. The whole dataset is split into training and test sets. Preparatory results give an overall accuracy of 98%. Extensive and accurate testing show an overall accuracy of 88%.


Sign in / Sign up

Export Citation Format

Share Document