Deep learning for spoken language identification: Can we visualize speech signal patterns?

Himadri Mukherjee; Subhankar Ghosh; Shibaprasad Sen; Obaidullah Sk Md; K. C. Santosh; Santanu Phadikar; Kaushik Roy

doi:10.1007/s00521-019-04468-3

Deep learning for spoken language identification: Can we visualize speech signal patterns?

Neural Computing and Applications ◽

10.1007/s00521-019-04468-3 ◽

2019 ◽

Vol 31 (12) ◽

pp. 8483-8501 ◽

Cited By ~ 6

Author(s):

Himadri Mukherjee ◽

Subhankar Ghosh ◽

Shibaprasad Sen ◽

Obaidullah Sk Md ◽

K. C. Santosh ◽

...

Keyword(s):

Deep Learning ◽

Speech Signal ◽

Spoken Language ◽

Language Identification

Download Full-text

Multiclass Spoken Language Identification for Indian Languages using Deep Learning

2020 IEEE Bombay Section Signature Conference (IBSSC) ◽

10.1109/ibssc51096.2020.9332161 ◽

2020 ◽

Author(s):

Lakshmana Rao Arla ◽

Sridevi Bonthu ◽

Abhinav Dayal

Keyword(s):

Deep Learning ◽

Spoken Language ◽

Language Identification ◽

Indian Languages

Download Full-text

Indian Language Identification using Deep Learning

ITM Web of Conferences ◽

10.1051/itmconf/20203201010 ◽

2020 ◽

Vol 32 ◽

pp. 01010

Author(s):

Shubham Godbole ◽

Vaishnavi Jadhav ◽

Gajanan Birajdar

Keyword(s):

Feature Extraction ◽

Deep Learning ◽

Classification Accuracy ◽

Visual Representations ◽

Spoken Language ◽

Language Identification ◽

Indian Language ◽

Regular Method ◽

Speech Database ◽

Day By Day

Spoken language is the most regular method of correspondence in this day and age. Endeavours to create language recognizable proof frameworks for Indian dialects have been very restricted because of the issue of speaker accessibility and language readability. However, the necessity of SLID is expanding for common and safeguard applications day by day. Feature extraction is a basic and important procedure performed in LID. A sound example is changed over into a spectrogram visual portrayal which describes a range of frequencies in regard with time. Three such spectrogram visuals were generated namely Log Spectrogram, Gammatonegram and IIR-CQT Spectrogram for audio samples from the standardized IIIT-H Indic Speech Database. These visual representations depict language specific details and the nature of each language. These spectrograms images were then used as an input to the CNN. Classification accuracy of 98.86% was obtained using the proposed methodology.

Download Full-text

FuzzyGCP: A deep learning architecture for automatic spoken language identification from speech signals

Expert Systems with Applications ◽

10.1016/j.eswa.2020.114416 ◽

2021 ◽

Vol 168 ◽

pp. 114416

Author(s):

Avishek Garain ◽

Pawan Kumar Singh ◽

Ram Sarkar

Keyword(s):

Deep Learning ◽

Spoken Language ◽

Language Identification ◽

Speech Signals

Download Full-text

A Deep Dive Into Deep Learning Techniques for Solving Spoken Language Identification Problems

Intelligent Speech Signal Processing ◽

10.1016/b978-0-12-818130-0.00005-2 ◽

2019 ◽

pp. 81-100 ◽

Cited By ~ 4

Author(s):

Himanish Shekhar Das ◽

Pinki Roy

Keyword(s):

Deep Learning ◽

Spoken Language ◽

Language Identification ◽

Identification Problems ◽

Deep Dive ◽

Learning Techniques

Download Full-text

Spoken language identification using i-vectors, x-vectors, PLDA and logistic regression

Bulletin of Electrical Engineering and Informatics ◽

10.11591/eei.v10i4.2893 ◽

2021 ◽

Vol 10 (4) ◽

pp. 2237-2244

Author(s):

Ahmad Iqbal Abdurrahman ◽

Amalia Zahra

Keyword(s):

Logistic Regression ◽

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

Speech Signal ◽

Spoken Language ◽

Language Identification ◽

Vector Method ◽

Linear Discriminant ◽

Multiple Parameters ◽

Speech Data

In this paper, i-vector and x-vector is used to extract the features from speech signal from local Indonesia languages, namely Javanese, Sundanese and Minang languages to help classifier identify the language spoken by the speaker. Probabilistic linear discriminant analysis (PLDA) are used as the baseline classifier and logistic regression technique are used because of prior studies showing logistic regression has better performance than PLDA for classifying speech data. Once these features are extracted. The feature is going to be classified using the classifier mentioned before. In the experiment, we tried to segment the test data to three segment such as 3, 10, and 30 seconds. This study is expanded by testing multiple parameters on the i-vector and x-vector method then comparing PLDA and logistic regression performance as its classifier. The x-vector has better score than i-vector for every segmented data while using PLDA as its classifier, except where the i-vector and x-vector is using logistic regression, i-vector still has better accuracy compared to x-vector.

Download Full-text