Feature Extraction Methods Proposed for Speech Recognition Are Effective on Road Condition Monitoring Using Smartphone Inertial Sensors

The objective of our project is to develop an automatic survey system for road condition monitoring using smartphone devices. One of the main tasks of our project is the classification of paved and unpaved roads. Assuming recordings will be archived by using various types of vehicle suspension system and speeds in practice, hence, we use the multiple sensors found in smartphones and state-of-the-art machine learning techniques for signal processing. Despite usually not being paid much attention, the results of the classification are dependent on the feature extraction step. Therefore, we have to carefully choose not only the classification method but also the feature extraction method and their parameters. Simple statistics-based features are most commonly used to extract road surface information from acceleration data. In this study, we evaluated the mel-frequency cepstral coefficient (MFCC) and perceptual linear prediction coefficients (PLP) as a feature extraction step to improve the accuracy for paved and unpaved road classification. Although both MFCC and PLP have been developed in the human speech recognition field, we found that modified MFCC and PLP can be used to improve the commonly used statistical method.

Download Full-text

Formantinių požymių naudojimas kalbai atpažinti

Informacijos mokslai ◽

10.15388/im.2009.0.3236 ◽

2009 ◽

Vol 50 ◽

pp. 212-216

Author(s):

Antanas Leonas Lipeika

Keyword(s):

Feature Extraction ◽

Speech Recognition ◽

Linear Prediction ◽

Prediction Models ◽

Recognition System ◽

Feature Extraction Method ◽

Frame Length ◽

Frequency Scale ◽

Odd Order ◽

Data Points

Straipsnyje nagrinėjami formantinių požymių taikymo atpažįstant kalbą klausimai. Nustatyta, kad formantiniai požymiai tam gali būti naudojami, tačiau atpažinimo tikslumas labai priklauso nuo formantinių požymių išskyrimo metodo. Geriausi atpažinimo rezultatai gaunami formantinių požymių išskyrimui naudojant išsigimusius prognozės polinomus. Šie polinomai gali būti skaičiuojami iš lyginės arba nelyginės eilės tiesinės prognozės modelio parametrų. Be to, atpažinimui galima naudoti simetrinius arba antisimetrinius išsigimusius tiesinės prognozės polinomus. Taip pat svarbu ištirti, kaip kalbos atpažinimo rezultatai priklauso ne tik nuo išsigimusių tiesinės prognozės polinomų parinkimo, bet ir kitų atpažinimo sistemos parametrų: analizės kadro ilgio, atpažinimui naudojamų formančių skaičiaus, formantinių požymių vaizdavimui naudojamos dažnių skalės. Tyrimais nustatyta, kad geriausi atpažinimo rezultatai gaunami naudojant dvi arba tris formantes, apskaičiuotas iš simetrinių išsigimusių prognozės polinomų. Tiriant atskirų formančių informatyvumą paaiškėjo, kad didžiausias indėlis į atpažinimą yra antros formantės. Pirmos, trečios ir ketvirtos formančių indėlis maždaug vienodas, bet aukštesnės formantės mažiau atsparios balto triukšmo įtakai. Tiriant analizės kadro ilgio parinkimą nustatyta, kad geriausi atpažinimo rezultatai yra esant 500 atskaitų kadro ilgiui. Atpažinimo rezultatai taip pat gaunami geresni vaizduojant formančių trajektorijas melų skalėje.Investigation of Formant Features in Speech RecognitionAntanas Leonas Lipeika SummaryThe use of formant features in speech recognition is investigated in the paper. It was established that formant features can be used in speech recognition but recognition accuracy depends remarkably on the formant feature extraction method. The best recognition results were obtained when singular prediction polynomials were used for formant feature extraction. These polynomials can be calculated from parameters of linear prediction models of even or odd order. These polynomials can by symmetric or antisymmetric as well. Also it is important to investigate how results of speech recognition depends not only on choice of singular prediction polynomials but although on other parameters of the recognition system: frame length, number of used formants in recognition, frequency scale, used for representation of formant features. During the experiments it was defi ned that the best recognition results were obtained using 2 or 3 formants calculated from symmetric singular prediction polynomials. The experiments have shown that the most informative is the 2-nd formant. Contribution of the 1-st, 3-rd and 4-th formants is approximately similar, but higher formants are less resistant to white noise. Recognition results also depends on analysis frame length and frequency scale. The best results were obtained using 500 data points frame length and Mel frequency scale.

Download Full-text

Genetic Algorithm for Combined Speaker and Speech Recognition using Deep Neural Networks

Journal of Telecommunications and Information Technology ◽

10.26636/jtit.2018.119617 ◽

2018 ◽

Vol 2 ◽

pp. 23-31 ◽

Cited By ~ 1

Author(s):

Gurpreet Kaur ◽

Mohit Srivastava ◽

Amod Kumar

Keyword(s):

Genetic Algorithm ◽

Feature Extraction ◽

Speech Recognition ◽

Speaker Recognition ◽

Linear Prediction ◽

Rate Sensitivity ◽

Second Phase ◽

Linear Predictive Coding ◽

Mel Frequency Cepstral Coefficients ◽

Perceptual Linear Prediction

Huge growth is observed in the speech and speaker recognition ﬁeld due to many artiﬁcial intelligence algorithms being applied. Speech is used to convey messages via the language being spoken, emotions, gender and speaker identity. Many real applications in healthcare are based upon speech and speaker recognition, e.g. a voice-controlled wheelchair helps control the chair. In this paper, we use a genetic algorithm (GA) for combined speaker and speech recognition, relying on optimized Mel Frequency Cepstral Coeﬃcient (MFCC) speech features, and classiﬁcation is performed using a Deep Neural Network (DNN). In the ﬁrst phase, feature extraction using MFCC is executed. Then, feature optimization is performed using GA. In the second phase training is conducted using DNN. Evaluation and validation of the proposed work model is done by setting a real environment, and eﬃciency is calculated on the basis of such parameters as accuracy, precision rate, recall rate, sensitivity, and speciﬁcity. Also, this paper presents an evaluation of such feature extraction methods as linear predictive coding coeﬃcient (LPCC), perceptual linear prediction (PLP), mel frequency cepstral coefﬁcients (MFCC) and relative spectra ﬁltering (RASTA), with all of them used for combined speaker and speech recognition systems. A comparison of diﬀerent methods based on existing techniques for both clean and noisy environments is made as well.

Download Full-text

A mobile application for crowdsourced road condition monitoring

2019 4th International Conference on Information Technology Research (ICITR) ◽

10.1109/icitr49409.2019.9407782 ◽

2019 ◽

Author(s):

Thusithanjana Kavinda Kumara Thilakarathna ◽

Hasith E Perera ◽

H H E Jayaweera

Keyword(s):

Condition Monitoring ◽

Mobile Application ◽

Road Condition ◽

Road Condition Monitoring

Download Full-text

A UAV Based Multi-object Detection Scheme to Enhance Road Condition Monitoring and Control for Future Smart Transportation

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering - Artificial Intelligence for Communications and Networks ◽

10.1007/978-3-030-22971-9_23 ◽

2019 ◽

pp. 270-282

Author(s):

Jian Yang ◽

Jielun Zhang ◽

Feng Ye ◽

Xiaohui Cheng

Keyword(s):

Object Detection ◽

Condition Monitoring ◽

Monitoring And Control ◽

Detection Scheme ◽

Smart Transportation ◽

Road Condition ◽

Road Condition Monitoring ◽

And Control

Download Full-text

An IMU-based traffic and road condition monitoring system

HardwareX ◽

10.1016/j.ohx.2018.e00045 ◽

2018 ◽

Vol 4 ◽

pp. e00045 ◽

Cited By ~ 5

Author(s):

Tian Lei ◽

Abduallah A. Mohamed ◽

Christian Claudel

Keyword(s):

Monitoring System ◽

Condition Monitoring ◽

Road Condition ◽

Road Condition Monitoring ◽

Condition Monitoring System

Download Full-text

On-board road condition monitoring system using slip-based tyre-road friction estimation and wheel speed signal analysis

Proceedings of the Institution of Mechanical Engineers Part K Journal of Multi-body Dynamics ◽

10.1243/1464419jmbd60 ◽

2007 ◽

Vol 221 (1) ◽

pp. 129-146 ◽

Cited By ~ 17

Author(s):

K Li ◽

J A Misener ◽

K Hedrick

Keyword(s):

Monitoring System ◽

Condition Monitoring ◽

Signal Analysis ◽

Wheel Speed ◽

Road Condition ◽

Road Friction ◽

Road Condition Monitoring ◽

Friction Estimation ◽

Condition Monitoring System

Download Full-text

Discriminative Training Using Noise Robust Integrated Features and Refined HMM Modeling

Journal of Intelligent Systems ◽

10.1515/jisys-2017-0618 ◽

2018 ◽

Vol 29 (1) ◽

pp. 327-344 ◽

Cited By ~ 3

Author(s):

Mohit Dua ◽

Rajesh Kumar Aggarwal ◽

Mantosh Biswas

Keyword(s):

Feature Extraction ◽

Linear Prediction ◽

Extraction Methods ◽

Discriminative Training ◽

Mel Frequency Cepstral Coefficients ◽

Maximum Mutual Information ◽

Perceptual Linear Prediction ◽

Noise Robust ◽

Minimum Phone Error ◽

Asr System

Abstract The classical approach to build an automatic speech recognition (ASR) system uses different feature extraction methods at the front end and various parameter classification techniques at the back end. The Mel-frequency cepstral coefficients (MFCC) and perceptual linear prediction (PLP) techniques are the conventional approaches used for many years for feature extraction, and the hidden Markov model (HMM) has been the most obvious selection for feature classification. However, the performance of MFCC-HMM and PLP-HMM-based ASR system degrades in real-time environments. The proposed work discusses the implementation of discriminatively trained Hindi ASR system using noise robust integrated features and refined HMM model. It sequentially combines MFCC with PLP and MFCC with gammatone-frequency cepstral coefficient (GFCC) to obtain MF-PLP and MF-GFCC integrated feature vectors, respectively. The HMM parameters are refined using genetic algorithm (GA) and particle swarm optimization (PSO). Discriminative training of acoustic model using maximum mutual information (MMI) and minimum phone error (MPE) is preformed to enhance the accuracy of the proposed system. The results show that discriminative training using MPE with MF-GFCC integrated feature vector and PSO-HMM parameter refinement gives significantly better results than the other implemented techniques.

Download Full-text