SPEECH RECOGNITION APPLICATION AS AN ANIMATED OBJECT MOVEMENT CONTROLLER SYSTEM

Mapping Intimacies ◽

10.31219/osf.io/swdht ◽

2021 ◽

Author(s):

akuwan saleh

Keyword(s):

Speech Recognition ◽

Predictive Coding ◽

Spoken Word ◽

Object Motion ◽

Specific Method ◽

Linear Predictive Coding ◽

Performance Goal ◽

New Words ◽

Technological Developments ◽

Function Number

Technological developments in the world have no boundaries. One of them is Speech Recognition. At first, words spoken by humans cannot be recognized by computers. To be recognizable, the word is processed using a specific method. Linear Predictive Coding Method (LPC) is a method used in this research to extract the characteristics of speech. The result of the LPC method is the LPC coefficient which is the number of LPC orders plus 1. The LPC coefficient is processed using Fast Fourier Transform (FFT) 512 to simplify the process of speech recognition. The results are then trained using Backpropagation Neural Network (BPNN) to recognize the spoken word. Speech recognition on the program is implemented as an animated object motion controller on the computer. The end result of this research is animated objects move in accordance with the spoken word. The optimal BPNN structure in this research is to use traingda training function, number of nodes 3, learning rate 0.05, epoch 1000, performance goal 0,00001. This structure can produce the smallest MSE value that is 0,000009957. So, this structure can recognize new words with 100% accuracy for trained data, 80% for the same respondents with trained data and reach 67.5% for new respondents.

Download Full-text

HMM-based phoneme speech recognition system for the control and command of industrial robots

Czasopismo Techniczne ◽

10.37705/techtrans/e2021002 ◽

2021 ◽

pp. 1-16

Author(s):

Adwait Naik

Keyword(s):

Speech Recognition ◽

Degrees Of Freedom ◽

Microphone Array ◽

Predictive Coding ◽

Recognition System ◽

Human Robot Interaction ◽

Industrial Robots ◽

Speech Recognition System ◽

Linear Predictive Coding ◽

Complex Sentences

In recent years, the integration of human-robot interaction with speech recognition has gained a lot of pace in the manufacturing industries. Conventional methods to control the robots include semi-autonomous, fully-autonomous, and wired methods. Operating through a teaching pendant or a joystick is easy to implement but is not effective when the robot is deployed to perform complex repetitive tasks. Speech and touch are natural ways of communicating for humans and speech recognition, being the best option, is a heavily researched technology. In this study, we aim at developing a stable and robust speech recognition system to allow humans to communicate with machines (roboticarm) in a seamless manner. This paper investigates the potential of the linear predictive coding technique to develop a stable and robust HMM-based phoneme speech recognition system for applications in robotics. Our system is divided into three segments: a microphone array, a voice module, and a robotic arm with three degrees of freedom (DOF). To validate our approach, we performed experiments with simple and complex sentences for various robotic activities such as manipulating a cube and pick and place tasks. Moreover, we also analyzed the test results to rectify problems including accuracy and recognition score.

Download Full-text

Speech to Text Processing for Interactive Agent of Virtual Tour Navigation

International Journal of Artificial Intelligence & Robotics (IJAIR) ◽

10.25139/ijair.v1i1.2030 ◽

2019 ◽

Vol 1 (1) ◽

pp. 31

Author(s):

Dian Ahkam Sani ◽

Muchammad Saifulloh

Keyword(s):

Speech Recognition ◽

Text Processing ◽

Predictive Coding ◽

Digital Signal ◽

Recognition System ◽

Human Interaction ◽

Linear Predictive Coding ◽

Voice Input ◽

Human Voice ◽

Backpropagation Method

The development of science and technology is one way to replace the method of human interaction with computers, one of which is to provide voice input. Conversion of sound into text form with the Backpropagation method can be understood and realized through feature extraction, including the use of Linear Predictive Coding (LPC). Linear Predictive Coding is one way to represent the signal in obtaining the features of each sound pattern. In brief, the way this speech recognition system worked was by inputting human voice through a microphone (analog signal) which then sampled with a sampling speed of 8000 Hz so that it became a digital signal with the assistance of sound card on the computer. The digital signal from the sample then entered the initial process using LPC, so that several LPC coefficients were obtained. The LPC outputs were then trained using the Backpropagation learning method. The results of the learning were classified with a word and stored in a database afterwards. The results of the test were in the form of an introduction program that able display the voice plots. the results of speech recognition with voice recognition percentage of respondents in the database iss 80% of the 100 data in the test in Real Time

Download Full-text

Speech Recognition Using Elman Artificial Neural Network and Linear Predictive Coding

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666190411113728 ◽

2020 ◽

Vol 13 (4) ◽

pp. 650-656

Author(s):

Somayeh Khajehasani ◽

Louiza Dehyadegari

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Intelligent System ◽

Recognition Accuracy ◽

Predictive Coding ◽

Recognition System ◽

Visual Methods ◽

Linear Predictive Coding ◽

Elman Neural Network ◽

Human Speech

Background: Today, the automatic intelligent system requirement has caused an increasing consideration on the interactive modern techniques between human being and machine. These techniques generally consist of two types: audio and visual methods. Meanwhile, the need for developing the algorithms that enable the human speech recognition by machine is of high importance and frequently studied by the researchers. Objective: Using artificial intelligence methods has led to better results in human speech recognition, but the basic problem is the lack of an appropriate strategy to select the recognition data among the huge amount of speech information that practically makes it impossible for the available algorithms to work. Method: In this article, to solve the problem, the linear predictive coding coefficients extraction method is used to sum up the data related to the English digits pronunciation. After extracting the database, it is utilized to an Elman neural network to recognize the relation between the linear coding coefficients of an audio file with the pronounced digit. Results: The results show that this method has a good performance compared to other methods. According to the experiments, the obtained results of network training (99% recognition accuracy) indicate that the network still has better performance than RBF despite many errors. Conclusion: The results of the experiments showed that the Elman memory neural network has had an acceptable performance in recognizing the speech signal compared to the other algorithms. The use of the linear predictive coding coefficients along with the Elman neural network has led to higher recognition accuracy and improved the speech recognition system.

Download Full-text

INDIVIDUAL IDENTIFICATION SYSTEM DESIGN THROUGH VOICE USING LINEAR PREDICTIVE CODING METHOD AND K-NEAREST NEIGHBOR

Jurnal Teknik Informatika (Jutif) ◽

10.20884/1.jutif.2021.2.2.71 ◽

2021 ◽

Vol 2 (2) ◽

pp. 95-100

Author(s):

Davita Nadia Fadhilah ◽

Rita Magdalena ◽

Sofia Sa’idah

Keyword(s):

Speech Recognition ◽

Nearest Neighbor ◽

Predictive Coding ◽

Voice Recognition ◽

Individual Identification ◽

Identification System ◽

K Nearest Neighbor ◽

Linear Predictive Coding ◽

Distance Method ◽

K Value

Humans have a variety of characteristics that are different from one another. Characteristics possessed by humans are genuine which can be used as a differentiator between one individual and another, one of which is sound. Voice recognition is called speech recognition. In this study, it was developed as an individual voice recognition system using a combination of the Linear Predictive Coding (LPC) method of feature extraction and K-Nearest Neighbor (K-NN) classification in the speech recognition process. Testing is done by testing changes in several parameters, namely the LPC order value, the number of frames, the K value, and different distance methods. The results of the parameter combination test showed a fairly good presentation of 73.56321839% with the combination parameter or LPC 8, the number of frames 480, the value of K 5, with the distance method used by Chebychev.

Download Full-text

Uttered Kurdish digit recognition system

Journal of Raparin University ◽

10.26750/vol(6).no(2).paper5 ◽

2019 ◽

Vol 6 (2) ◽

pp. 78-85 ◽

Cited By ~ 1

Author(s):

Saman Muhammad Omer ◽

Jihad Anwar Qadir ◽

Zrar Khalid Abdul

Keyword(s):

Speech Recognition ◽

Predictive Coding ◽

Recognition System ◽

Svm Classifier ◽

Linear Predictive Coding ◽

Formant Frequencies ◽

Digit Recognition ◽

Proposed Model ◽

Machine Readable ◽

Mel Frequency Cepstral Coefficient

Speech recognition is a crucial subject in human computer interaction area. The ability of a machine to recognize words and phrases in spoken language is speech recognition and then convert them to a machine-readable format. Digit recognition is a part of the speech recognition system. In this paper, three spectral based features including Mel Frequency Cepstral Coefficient (MFCC), Linear predictive coding (LPC) and formant frequencies are proposed to classify ten Kurdish uttered digits (0-9). The features are extracted from entire speech signal, and feed a pairwise SVM classifier. Experiments including each individual feature and different forms of fusion are conducted and the results are shown. The fusion of the features significantly improves the result and shows that the different features carry complementary information. The proposed model is experimented on the dataset that have been collected in Kurdistan. Key words: Speech recognition, MFCC, LPC, Formant frequencies, uttered digits, SVM

Download Full-text