Speech to Text Processing for Interactive Agent of Virtual Tour Navigation

Author(s):  
Dian Ahkam Sani ◽  
Muchammad Saifulloh

The development of science and technology is one way to replace the method of human interaction with computers, one of which is to provide voice input. Conversion of sound into text form with the Backpropagation method can be understood and realized through feature extraction, including the use of Linear Predictive Coding (LPC). Linear Predictive Coding is one way to represent the signal in obtaining the features of each sound pattern. In brief, the way this speech recognition system worked was by inputting human voice through a microphone (analog signal) which then sampled with a sampling speed of 8000 Hz so that it became a digital signal with the assistance of sound card on the computer. The digital signal from the sample then entered the initial process using LPC, so that several LPC coefficients were obtained. The LPC outputs were then trained using the Backpropagation learning method. The results of the learning were classified with a word and stored in a database afterwards. The results of the test were in the form of an introduction program that able display the voice plots. the results of speech recognition with voice recognition percentage of respondents in the database iss 80% of the 100 data in the test in Real Time

2021 ◽  
pp. 1-16
Author(s):  
Adwait Naik

In recent years, the integration of human-robot interaction with speech recognition has gained a lot of pace in the manufacturing industries. Conventional methods to control the robots include semi-autonomous, fully-autonomous, and wired methods. Operating through a teaching pendant or a joystick is easy to implement but is not effective when the robot is deployed to perform complex repetitive tasks. Speech and touch are natural ways of communicating for humans and speech recognition, being the best option, is a heavily researched technology. In this study, we aim at developing a stable and robust speech recognition system to allow humans to communicate with machines (roboticarm) in a seamless manner. This paper investigates the potential of the linear predictive coding technique to develop a stable and robust HMM-based phoneme speech recognition system for applications in robotics. Our system is divided into three segments: a microphone array, a voice module, and a robotic arm with three degrees of freedom (DOF). To validate our approach, we performed experiments with simple and complex sentences for various robotic activities such as manipulating a cube and pick and place tasks. Moreover, we also analyzed the test results to rectify problems including accuracy and recognition score.


2020 ◽  
Vol 13 (4) ◽  
pp. 650-656
Author(s):  
Somayeh Khajehasani ◽  
Louiza Dehyadegari

Background: Today, the automatic intelligent system requirement has caused an increasing consideration on the interactive modern techniques between human being and machine. These techniques generally consist of two types: audio and visual methods. Meanwhile, the need for developing the algorithms that enable the human speech recognition by machine is of high importance and frequently studied by the researchers. Objective: Using artificial intelligence methods has led to better results in human speech recognition, but the basic problem is the lack of an appropriate strategy to select the recognition data among the huge amount of speech information that practically makes it impossible for the available algorithms to work. Method: In this article, to solve the problem, the linear predictive coding coefficients extraction method is used to sum up the data related to the English digits pronunciation. After extracting the database, it is utilized to an Elman neural network to recognize the relation between the linear coding coefficients of an audio file with the pronounced digit. Results: The results show that this method has a good performance compared to other methods. According to the experiments, the obtained results of network training (99% recognition accuracy) indicate that the network still has better performance than RBF despite many errors. Conclusion: The results of the experiments showed that the Elman memory neural network has had an acceptable performance in recognizing the speech signal compared to the other algorithms. The use of the linear predictive coding coefficients along with the Elman neural network has led to higher recognition accuracy and improved the speech recognition system.


2019 ◽  
Vol 6 (2) ◽  
pp. 78-85 ◽  
Author(s):  
Saman Muhammad Omer ◽  
Jihad Anwar Qadir ◽  
Zrar Khalid Abdul

Speech recognition is a crucial subject in human computer interaction area. The ability of a machine to recognize words and phrases in spoken language is speech recognition and then convert them to a machine-readable format. Digit recognition is a part of the speech recognition system. In this paper, three spectral based features including Mel Frequency Cepstral Coefficient (MFCC), Linear predictive coding (LPC) and formant frequencies are proposed to classify ten Kurdish uttered digits (0-9). The features are extracted from entire speech signal, and feed a pairwise SVM classifier. Experiments including each individual feature and different forms of fusion are conducted and the results are shown. The fusion of the features significantly improves the result and shows that the different features carry complementary information. The proposed model is experimented on the dataset that have been collected in Kurdistan. Key words: Speech recognition, MFCC, LPC, Formant frequencies, uttered digits, SVM


2019 ◽  
Vol 8 (4) ◽  
pp. 7272-7277

In recent trend, Speech recognition has become extensively used in customer service based organization. It has acquired great deal of research in pattern matching employed machine learning (learning speech by experience) and neural networks based speech endorsement domains. Speech recognition is the technology of capturing and perceiving human voice, interpreting it, producing text from it, managing digital devices and assisting visually impaired and older adults using unequivocal digital signal processing. In this paper we have presented a comprehensive study of different methodologies in android enabled speech recognition system that focused at analysis of the operability and reliability of voice note app. Subsequently we have suggested and experimented an android based speech recognizer app viz. Annotate which predominately focus on voice dictation in five different languages (English, Hindi, Tamil, Malayalam and Telugu) and extracting text from image using Automatic Speech Recognition (ASR) and Optical Character Recognition (OCR) algorithm. Finally, we identified opportunities for future enhancements in this realm.


2020 ◽  
Vol 9 (1) ◽  
pp. 2431-2435

ASR is the use of system software and hardware based techniques to identify and process human voice. In this research, Tamil words are analyzed, segmented as syllables, followed by feature extraction and recognition. Syllables are segmented using short term energy and segmentation is done in order to minimize the corpus size. The algorithm for syllable segmentation works by performing the STE function of the continuous speech signal. The proposed approach for speech recognition uses the combination of Mel-Frequency Cepstral Coefficients (MFCC) and Linear Predictive Coding (LPC). MFCC features are used to extract a feature vector containing all information about the linguistic message. The LPC affords a robust, dependable and correct technique for estimating the parameters that signify the vocal tract system.LPC features can reduce the bit rate of speech (i.e reducing the measurement of transmitting signal).The combined feature extraction technique will minimize the size of transmitting signal. Then the proposed FE algorithm is evaluated on the speech corpus using the Random forest approach. Random forest is an effective algorithm which can build a reliable training model as its training time is less because the classifier works on the subset of features alone.


2017 ◽  
Vol 24 (2) ◽  
pp. 17-26
Author(s):  
Mustafa Yagimli ◽  
Huseyin Kursat Tezer

Abstract The real-time voice command recognition system used for this study, aims to increase the situational awareness, therefore the safety of navigation, related especially to the close manoeuvres of warships, and the courses of commercial vessels in narrow waters. The developed system, the safety of navigation that has become especially important in precision manoeuvres, has become controllable with voice command recognition-based software. The system was observed to work with 90.6% accuracy using Mel Frequency Cepstral Coefficients (MFCC) and Dynamic Time Warping (DTW) parameters and with 85.5% accuracy using Linear Predictive Coding (LPC) and DTW parameters.


2021 ◽  
Author(s):  
akuwan saleh

Technological developments in the world have no boundaries. One of them is Speech Recognition. At first, words spoken by humans cannot be recognized by computers. To be recognizable, the word is processed using a specific method. Linear Predictive Coding Method (LPC) is a method used in this research to extract the characteristics of speech. The result of the LPC method is the LPC coefficient which is the number of LPC orders plus 1. The LPC coefficient is processed using Fast Fourier Transform (FFT) 512 to simplify the process of speech recognition. The results are then trained using Backpropagation Neural Network (BPNN) to recognize the spoken word. Speech recognition on the program is implemented as an animated object motion controller on the computer. The end result of this research is animated objects move in accordance with the spoken word. The optimal BPNN structure in this research is to use traingda training function, number of nodes 3, learning rate 0.05, epoch 1000, performance goal 0,00001. This structure can produce the smallest MSE value that is 0,000009957. So, this structure can recognize new words with 100% accuracy for trained data, 80% for the same respondents with trained data and reach 67.5% for new respondents.


Sign in / Sign up

Export Citation Format

Share Document