A Novel Weighted Dynamic Time Warping for Light Weight Speaker-Dependent Speech Recognition in Noisy and Bad Recording Conditions

2014 ◽  
Vol 490-491 ◽  
pp. 1347-1355
Author(s):  
Xiang Lilan Zhang ◽  
Ji Ping Sun ◽  
Xu Hui Huang ◽  
Zhi Gang Luo

Lightweight speaker-dependent (SD) automatic speech recognition (ASR) is a promising solution for the problems of possibility of disclosing personal privacy and difficulty of obtaining training material for many seldom used English words and (often non-English) names. Dynamic time warping (DTW) algorithm is the state-of-the-art algorithm for small foot-print SD ASR applications, which have limited storage space and small vocabulary. In our previous work, we have successfully developed two fast and accurate DTW variations for clean speech data. However, speech recognition in adverse conditions is still a big challenge. In order to improve recognition accuracy in noisy and bad recording conditions, such as too high or low recording volume, we introduce a novel weighted DTW method. This method defines a feature index for each time frame of training data, and then applies it to the core DTW process to tune the final alignment score. With extensive experiments on one representative SD dataset of three speakers' recordings, our method achieves better accuracy than DTW, where 0.5% relative reduction of error rate (RRER) on clean speech data and 7.5% RRER on noisy and bad recording speech data. To the best of our knowledge, our new weighted DTW is the first weighted DTW method specially designed for speech data in noisy and bad recording conditions.

2014 ◽  
Vol 29 (6) ◽  
pp. 1072-1082 ◽  
Author(s):  
Xiang-Lilan Zhang ◽  
Zhi-Gang Luo ◽  
Ming Li

Forests ◽  
2019 ◽  
Vol 10 (11) ◽  
pp. 1040 ◽  
Author(s):  
Kai Cheng ◽  
Juanle Wang

Efficient methodologies for mapping forest types in complicated mountain areas are essential for the implementation of sustainable forest management practices and monitoring. Existing solutions dedicated to forest-type mapping are primarily focused on supervised machine learning algorithms (MLAs) using remote sensing time-series images. However, MLAs are challenged by complex and problematic forest type compositions, lack of training data, loss of temporal data caused by clouds obscuration, and selection of input feature sets for mountainous areas. The time-weighted dynamic time warping (TWDTW) is a supervised classifier, an adaptation of the dynamic time warping method for time series analysis for land cover classification. This study evaluates the performance of the TWDTW method that uses a combination of Sentinel-2 and Landsat-8 time-series images when applied to complicated mountain forest-type classifications in southern China with complex topographic conditions and forest-type compositions. The classification outputs were compared to those produced by MLAs, including random forest (RF) and support vector machine (SVM). The results presented that the three forest-type maps obtained by TWDTW, RF, and SVM have high consistency in spatial distribution. TWDTW outperformed SVM and RF with mean overall accuracy and mean kappa coefficient of 93.81% and 0.93, respectively, followed by RF and SVM. Compared with MLAs, TWDTW method achieved the higher classification accuracy than RF and SVM, with even less training data. This proved the robustness and less sensitivities to training samples of the TWDTW method when applied to mountain forest-type classifications.


2012 ◽  
Author(s):  
Rubita Sudirman ◽  
Sh. Hussain Salleh ◽  
Shaharuddin Salleh

Kertas kerja ini membentangkan pemprosesan semula ciri pertuturan pemalar Pengekodan Ramalan Linear (LPC) bagi menyediakan template rujukan yang boleh diharapkan untuk set perkataan yang hendak dicam menggunakan rangkaian neural buatan. Kertas kerja ini juga mencadangkan penggunaan cirian kenyaringan yang ditakrifkan dari data pertuturan sebagai satu lagi ciri input. Algoritma Warping Masa Dinamik (DTW) menjadi asas kepada algoritma baru yang dibangunkan, ia dipanggil sebagai DTW padanan bingkai (DTW–FF). Algoritma ini direka bentuk untuk melakukan padanan bingkai bagi pemprosesan semula input LPC. Ia bertujuan untuk menyamakan bilangan bingkai input dalam set ujian dengan set rujukan. Pernormalan bingkaian ini adalah diperlukan oleh rangkaian neural yang direka untuk membanding data yang harus mempunyai kepanjangan yang sama, sedangkan perkataan yang sama dituturkan dengan kepanjangan yang berbeza–beza. Dengan melakukan padanan bingkai, bingkai input dan rujukan boleh diubahsuai supaya bilangan bingkaian sama seperti bingkaian rujukan. Satu lagi misi kertas kerja ini ialah mentakrif dan menggunakan cirian kenyaringan menggunakan algoritma penapis harmonik. Selepas kenyaringan ditakrif dan pemalar LPC dinormalkan kepada bilangan bingkaian dikehendaki, pengecaman pertuturan menggunakan rangkaian neural dilakukan. Keputusan yang baik diperoleh sehingga mencapai ketepatan setinggi 98% menggunakan kombinasi cirian DTW–FF dan cirian kenyaringan. Di akhir kertas kerja ini, perbandingan kadar convergence antara Conjugate gradient descent (CGD), Quasi–Newton, dan Steepest Gradient Descent (SGD) dilakukan untuk mendapatkan arah carian titik global yang optimal. Keputusan menunjukkan CGD memberikan nilai titik global yang paling optimal dibandingkan dengan Quasi–Newton dan SGD. Kata kunci: Warping masa dinamik, pernormalan masa, rangkaian neural, pengecaman pertuturan, conjugate gradient descent A pre–processing of linear predictive coefficient (LPC) features for preparation of reliable reference templates for the set of words to be recognized using the artificial neural network is presented in this paper. The paper also proposes the use of pitch feature derived from the recorded speech data as another input feature. The Dynamic Time Warping algorithm (DTW) is the back–bone of the newly developed algorithm called DTW fixing frame algorithm (DTW–FF) which is designed to perform template matching for the input preprocessing. The purpose of the new algorithm is to align the input frames in the test set to the template frames in the reference set. This frame normalization is required since NN is designed to compare data of the same length, however same speech varies in their length most of the time. By doing frame fixing, the input frames and the reference frames are adjusted to the same number of frames according to the reference frames. Another task of the study is to extract pitch features using the Harmonic Filter algorithm. After pitch extraction and linear predictive coefficient (LPC) features fixed to a desired number of frames, speech recognition using neural network can be performed and results showed a very promising solution. Result showed that as high as 98% recognition can be achieved using combination of two features mentioned above. At the end of the paper, a convergence comparison between conjugate gradient descent (CGD), Quasi–Newton, and steepest gradient descent (SGD) search direction is performed and results show that the CGD outperformed the Newton and SGD. Key words: Dynamic time warping, time normalization, neural network, speech recognition, conjugate gradient descent


Sensors ◽  
2019 ◽  
Vol 19 (13) ◽  
pp. 2882 ◽  
Author(s):  
Xiaoqun Yu ◽  
Shuping Xiong

Older people face difficulty engaging in conventional rehabilitation exercises for improving physical functions over a long time period due to the passive nature of the conventional exercise, inconvenience, and cost. This study aims to develop and validate a dynamic time warping (DTW) based algorithm for assessing Kinect-enabled home-based physical rehabilitation exercises, in order to support auto-coaching in a virtual gaming environment. A DTW-based algorithm was first applied to compute motion similarity between two time series from an individual user and a virtual coach. We chose eight bone vectors of the human skeleton and body orientation as the input features and proposed a simple but innovative method to further convert the DTW distance to a meaningful performance score in terms of the percentage (0–100%), without training data and experience of experts. The effectiveness of the proposed algorithm was validated through a follow-up experiment with 21 subjects when playing a Tai Chi exergame. Results showed that the algorithm scores had a strong positive linear relationship (r = 0.86) with experts’ ratings and the calibrated algorithm scores were comparable to the gold standard. These findings suggested that the DTW-based algorithm could be effectively used for automatic performance evaluation of an individual when performing home-based rehabilitation exercises.


2017 ◽  
Vol 10 (13) ◽  
pp. 248
Author(s):  
John Sahaya Rani Alex ◽  
Mitali Bhojwani

Objective of this research is to implement a speech recognition algorithm in smaller form factor device. Speech recognition is an extensively used inmobile and in numerous consumer electronics devices. Dynamic time warping (DTW) method which is based on dynamic programming is chosen tobe implemented for speech recognition because of the latest trend in evolving computing power. Implementation of DTW in field-programmable gatearray is chosen for its featured flexibility, parallelization and shorter time to market. The above algorithm is implemented using Verilog on Xilinx ISE.The warping cost is less if the similarity is found and is more for dissimilar sequences which is verified in the simulation output. The results indicatethat real time implementation of DTW based speech recognition could be done in future.


Sign in / Sign up

Export Citation Format

Share Document