scholarly journals SPEECH RECOGNITION APPLICATION AS AN ANIMATED OBJECT MOVEMENT CONTROLLER SYSTEM

2021 ◽  
Author(s):  
akuwan saleh

Technological developments in the world have no boundaries. One of them is Speech Recognition. At first, words spoken by humans cannot be recognized by computers. To be recognizable, the word is processed using a specific method. Linear Predictive Coding Method (LPC) is a method used in this research to extract the characteristics of speech. The result of the LPC method is the LPC coefficient which is the number of LPC orders plus 1. The LPC coefficient is processed using Fast Fourier Transform (FFT) 512 to simplify the process of speech recognition. The results are then trained using Backpropagation Neural Network (BPNN) to recognize the spoken word. Speech recognition on the program is implemented as an animated object motion controller on the computer. The end result of this research is animated objects move in accordance with the spoken word. The optimal BPNN structure in this research is to use traingda training function, number of nodes 3, learning rate 0.05, epoch 1000, performance goal 0,00001. This structure can produce the smallest MSE value that is 0,000009957. So, this structure can recognize new words with 100% accuracy for trained data, 80% for the same respondents with trained data and reach 67.5% for new respondents.

2021 ◽  
pp. 1-16
Author(s):  
Adwait Naik

In recent years, the integration of human-robot interaction with speech recognition has gained a lot of pace in the manufacturing industries. Conventional methods to control the robots include semi-autonomous, fully-autonomous, and wired methods. Operating through a teaching pendant or a joystick is easy to implement but is not effective when the robot is deployed to perform complex repetitive tasks. Speech and touch are natural ways of communicating for humans and speech recognition, being the best option, is a heavily researched technology. In this study, we aim at developing a stable and robust speech recognition system to allow humans to communicate with machines (roboticarm) in a seamless manner. This paper investigates the potential of the linear predictive coding technique to develop a stable and robust HMM-based phoneme speech recognition system for applications in robotics. Our system is divided into three segments: a microphone array, a voice module, and a robotic arm with three degrees of freedom (DOF). To validate our approach, we performed experiments with simple and complex sentences for various robotic activities such as manipulating a cube and pick and place tasks. Moreover, we also analyzed the test results to rectify problems including accuracy and recognition score.


Author(s):  
Dian Ahkam Sani ◽  
Muchammad Saifulloh

The development of science and technology is one way to replace the method of human interaction with computers, one of which is to provide voice input. Conversion of sound into text form with the Backpropagation method can be understood and realized through feature extraction, including the use of Linear Predictive Coding (LPC). Linear Predictive Coding is one way to represent the signal in obtaining the features of each sound pattern. In brief, the way this speech recognition system worked was by inputting human voice through a microphone (analog signal) which then sampled with a sampling speed of 8000 Hz so that it became a digital signal with the assistance of sound card on the computer. The digital signal from the sample then entered the initial process using LPC, so that several LPC coefficients were obtained. The LPC outputs were then trained using the Backpropagation learning method. The results of the learning were classified with a word and stored in a database afterwards. The results of the test were in the form of an introduction program that able display the voice plots. the results of speech recognition with voice recognition percentage of respondents in the database iss 80% of the 100 data in the test in Real Time


2020 ◽  
Vol 13 (4) ◽  
pp. 650-656
Author(s):  
Somayeh Khajehasani ◽  
Louiza Dehyadegari

Background: Today, the automatic intelligent system requirement has caused an increasing consideration on the interactive modern techniques between human being and machine. These techniques generally consist of two types: audio and visual methods. Meanwhile, the need for developing the algorithms that enable the human speech recognition by machine is of high importance and frequently studied by the researchers. Objective: Using artificial intelligence methods has led to better results in human speech recognition, but the basic problem is the lack of an appropriate strategy to select the recognition data among the huge amount of speech information that practically makes it impossible for the available algorithms to work. Method: In this article, to solve the problem, the linear predictive coding coefficients extraction method is used to sum up the data related to the English digits pronunciation. After extracting the database, it is utilized to an Elman neural network to recognize the relation between the linear coding coefficients of an audio file with the pronounced digit. Results: The results show that this method has a good performance compared to other methods. According to the experiments, the obtained results of network training (99% recognition accuracy) indicate that the network still has better performance than RBF despite many errors. Conclusion: The results of the experiments showed that the Elman memory neural network has had an acceptable performance in recognizing the speech signal compared to the other algorithms. The use of the linear predictive coding coefficients along with the Elman neural network has led to higher recognition accuracy and improved the speech recognition system.


2021 ◽  
Vol 2 (2) ◽  
pp. 95-100
Author(s):  
Davita Nadia Fadhilah ◽  
Rita Magdalena ◽  
Sofia Sa’idah

Humans have a variety of characteristics that are different from one another. Characteristics possessed by humans are genuine which can be used as a differentiator between one individual and another, one of which is sound. Voice recognition is called speech recognition. In this study, it was developed as an individual voice recognition system using a combination of the Linear Predictive Coding (LPC) method of feature extraction and K-Nearest Neighbor (K-NN) classification in the speech recognition process. Testing is done by testing changes in several parameters, namely the LPC order value, the number of frames, the K value, and different distance methods. The results of the parameter combination test showed a fairly good presentation of 73.56321839% with the combination parameter or LPC 8, the number of frames 480, the value of K 5, with the distance method used by Chebychev.


2019 ◽  
Vol 6 (2) ◽  
pp. 78-85 ◽  
Author(s):  
Saman Muhammad Omer ◽  
Jihad Anwar Qadir ◽  
Zrar Khalid Abdul

Speech recognition is a crucial subject in human computer interaction area. The ability of a machine to recognize words and phrases in spoken language is speech recognition and then convert them to a machine-readable format. Digit recognition is a part of the speech recognition system. In this paper, three spectral based features including Mel Frequency Cepstral Coefficient (MFCC), Linear predictive coding (LPC) and formant frequencies are proposed to classify ten Kurdish uttered digits (0-9). The features are extracted from entire speech signal, and feed a pairwise SVM classifier. Experiments including each individual feature and different forms of fusion are conducted and the results are shown. The fusion of the features significantly improves the result and shows that the different features carry complementary information. The proposed model is experimented on the dataset that have been collected in Kurdistan. Key words: Speech recognition, MFCC, LPC, Formant frequencies, uttered digits, SVM


Sign in / Sign up

Export Citation Format

Share Document