Speech recognition is a crucial subject in human computer interaction area. The ability of a machine to recognize words and phrases in spoken language is speech recognition and then convert them to a machine-readable format. Digit recognition is a part of the speech recognition system. In this paper, three spectral based features including Mel Frequency Cepstral Coefficient (MFCC), Linear predictive coding (LPC) and formant frequencies are proposed to classify ten Kurdish uttered digits (0-9). The features are extracted from entire speech signal, and feed a pairwise SVM classifier. Experiments including each individual feature and different forms of fusion are conducted and the results are shown. The fusion of the features significantly improves the result and shows that the different features carry complementary information. The proposed model is experimented on the dataset that have been collected in Kurdistan.
Key words: Speech recognition, MFCC, LPC, Formant frequencies, uttered digits, SVM