Speech Recognition for Endoscopic Automatic Positioning System

2012 ◽  
Vol 588-589 ◽  
pp. 1296-1299
Author(s):  
Ning Ma ◽  
Xiao Dong Chen ◽  
Ya Nan Li ◽  
Qing Yun Yin ◽  
Yi Wang ◽  
...  

A novel system for minimally invasive surgery is presented in this paper. The system utilized an Endoscopic Automatic Positioner (EAP) controlled by Speech Recognition Engine to implement the clamping and dynamically positioning of the laparoscope. The motion instructions of the EAP are transformed from voice commands of specific doctor recognized by an improved algorithm named Normalized Average- Dynamic Time Warping (NA-DTW). An embedded platform based on ARM is designed to run the NA-DTW on Windows CE operating system. 1250 groups of experiments from 10 individual speakers demonstrate the performance of DTW. Compared with traditional algorithms, the enhanced algorithm improves the recognition rate from 96.6% to 99.76% and shortens the time of calculation by 51%. The results demonstrate the enhanced algorithm being effective and can satisfy the real time requirement in embedded system.

2012 ◽  
Vol 542-543 ◽  
pp. 1324-1329
Author(s):  
Zhi Guo He ◽  
Ze Min Liu

The algorithm of derivative dynamic time warping (DDTW) can overcome the shortcoming of dynamic time warping (DTW) and the computational complexity has not increased. In this paper, the algorithm of DDTW was applied to Chinese connected word speech recognition. For each isolated word, as an independent reference template and as basic recognition unit, there was an independent reference template to correspond; the matching between some word of the test string and a reference template was done by the DDTW, and the reference string which had the minimum cumulative distance was as output. The experimental results show that our method is obviously superior to all the methods based on DTW, and the recognition rate has reached 90%.


2014 ◽  
Vol 29 (6) ◽  
pp. 1072-1082 ◽  
Author(s):  
Xiang-Lilan Zhang ◽  
Zhi-Gang Luo ◽  
Ming Li

2012 ◽  
Author(s):  
Rubita Sudirman ◽  
Sh. Hussain Salleh ◽  
Shaharuddin Salleh

Kertas kerja ini membentangkan pemprosesan semula ciri pertuturan pemalar Pengekodan Ramalan Linear (LPC) bagi menyediakan template rujukan yang boleh diharapkan untuk set perkataan yang hendak dicam menggunakan rangkaian neural buatan. Kertas kerja ini juga mencadangkan penggunaan cirian kenyaringan yang ditakrifkan dari data pertuturan sebagai satu lagi ciri input. Algoritma Warping Masa Dinamik (DTW) menjadi asas kepada algoritma baru yang dibangunkan, ia dipanggil sebagai DTW padanan bingkai (DTW–FF). Algoritma ini direka bentuk untuk melakukan padanan bingkai bagi pemprosesan semula input LPC. Ia bertujuan untuk menyamakan bilangan bingkai input dalam set ujian dengan set rujukan. Pernormalan bingkaian ini adalah diperlukan oleh rangkaian neural yang direka untuk membanding data yang harus mempunyai kepanjangan yang sama, sedangkan perkataan yang sama dituturkan dengan kepanjangan yang berbeza–beza. Dengan melakukan padanan bingkai, bingkai input dan rujukan boleh diubahsuai supaya bilangan bingkaian sama seperti bingkaian rujukan. Satu lagi misi kertas kerja ini ialah mentakrif dan menggunakan cirian kenyaringan menggunakan algoritma penapis harmonik. Selepas kenyaringan ditakrif dan pemalar LPC dinormalkan kepada bilangan bingkaian dikehendaki, pengecaman pertuturan menggunakan rangkaian neural dilakukan. Keputusan yang baik diperoleh sehingga mencapai ketepatan setinggi 98% menggunakan kombinasi cirian DTW–FF dan cirian kenyaringan. Di akhir kertas kerja ini, perbandingan kadar convergence antara Conjugate gradient descent (CGD), Quasi–Newton, dan Steepest Gradient Descent (SGD) dilakukan untuk mendapatkan arah carian titik global yang optimal. Keputusan menunjukkan CGD memberikan nilai titik global yang paling optimal dibandingkan dengan Quasi–Newton dan SGD. Kata kunci: Warping masa dinamik, pernormalan masa, rangkaian neural, pengecaman pertuturan, conjugate gradient descent A pre–processing of linear predictive coefficient (LPC) features for preparation of reliable reference templates for the set of words to be recognized using the artificial neural network is presented in this paper. The paper also proposes the use of pitch feature derived from the recorded speech data as another input feature. The Dynamic Time Warping algorithm (DTW) is the back–bone of the newly developed algorithm called DTW fixing frame algorithm (DTW–FF) which is designed to perform template matching for the input preprocessing. The purpose of the new algorithm is to align the input frames in the test set to the template frames in the reference set. This frame normalization is required since NN is designed to compare data of the same length, however same speech varies in their length most of the time. By doing frame fixing, the input frames and the reference frames are adjusted to the same number of frames according to the reference frames. Another task of the study is to extract pitch features using the Harmonic Filter algorithm. After pitch extraction and linear predictive coefficient (LPC) features fixed to a desired number of frames, speech recognition using neural network can be performed and results showed a very promising solution. Result showed that as high as 98% recognition can be achieved using combination of two features mentioned above. At the end of the paper, a convergence comparison between conjugate gradient descent (CGD), Quasi–Newton, and steepest gradient descent (SGD) search direction is performed and results show that the CGD outperformed the Newton and SGD. Key words: Dynamic time warping, time normalization, neural network, speech recognition, conjugate gradient descent


2017 ◽  
Vol 10 (13) ◽  
pp. 248
Author(s):  
John Sahaya Rani Alex ◽  
Mitali Bhojwani

Objective of this research is to implement a speech recognition algorithm in smaller form factor device. Speech recognition is an extensively used inmobile and in numerous consumer electronics devices. Dynamic time warping (DTW) method which is based on dynamic programming is chosen tobe implemented for speech recognition because of the latest trend in evolving computing power. Implementation of DTW in field-programmable gatearray is chosen for its featured flexibility, parallelization and shorter time to market. The above algorithm is implemented using Verilog on Xilinx ISE.The warping cost is less if the similarity is found and is more for dissimilar sequences which is verified in the simulation output. The results indicatethat real time implementation of DTW based speech recognition could be done in future.


2012 ◽  
Vol 12 (02) ◽  
pp. 1250016 ◽  
Author(s):  
K. C. SANTOSH ◽  
CHOLWICH NATTEE ◽  
BART LAMIROY

In this paper, we propose a new scheme for Devanagari natural handwritten character recognition. It is primarily based on spatial similarity-based stroke clustering. A feature of a stroke consists of a string of pen-tip positions and directions at every pen-tip position along the trajectory. It uses the dynamic time warping algorithm to align handwritten strokes with stored stroke templates and determine their similarity. Experiments are carried out with the help of 25 native writers and a recognition rate of approximately 95% is achieved. Our recognizer is robust to a large range of writing style and handles variation in the number of strokes, their order, shapes and sizes and similarities among classes.


Sign in / Sign up

Export Citation Format

Share Document