scholarly journals Implementation of Voice Recognition Via CNN and LSTM

The voice recognition system uses CNN a lot. This is because CNN has the optimized ability to recognize and classify targets. CNN, however, has a problem that the bigger the object to be recognized, the more expensive the computational costs are. In this paper, we are going to solve these problems through MFCC feature extraction and model roll combining CNN and LSTM to present the possibility of performing voice recognition even through low-cost devices.

Author(s):  
Basavaraj N Hiremath ◽  
Malini M Patilb

The voice recognition system is about cognizing the signals, by feature extraction and identification of related parameters. The whole process is referred to as voice analytics. The paper aims at analysing and synthesizing the phonetics of voice using a computer program called “PRAAT”. The work carried out in the paper also supports the analysis of voice segmentation labelling, analyse the unique features of voice cues, understanding physics of voice, further the process is carried out to recognize sarcasm. Different unique features identified in the work are, intensity, pitch, formants related to read, speak, interactive and declarative sentences by using principle component analysis.


2019 ◽  
Vol 12 (3) ◽  
pp. 22-28
Author(s):  
Jinan N. Shehab

Home automation becomes important, because it gives the user convenient and easy method to use home appliances. This paper aims to help people with special needs or physical disabilities and injuries by paralysis to control any device using infrared technology using voice commands based on the voice recognition system (voice recognition unit V3) system can recognize voice commands, convert them to desired data coordination and data transmission via IR transmitter and microcontroller (Arduino Uno) Receiving this signal by IR sensor to control TV receiver then get a full remote control that works by voice commands. The software consists of a Micro C language programmable microcontroller. This system is of low cost and flexible with growing variety of devices that can be controlled.


1986 ◽  
Vol 30 (7) ◽  
pp. 638-641
Author(s):  
John P. Zenyuh ◽  
John M. Reising

The objective of this study was to compare the relative effectiveness of three modes of subsystem control: a voice recognition system with visual feedback presented on the head-up display, a standard multifunction control device with tailored switching logic, and a remotely operated multifunction control with feedback presented on the head-up display. Comparisons were based on measures of interference with a loading task and overall speed and accuracy of the control operations performed. The working hypothesis was that the voice system and head-up multifunction control would manifest substantially lower interference with the primary task, while subsystem control operation times would remain unaffected by control mode. The results indicate that performance with the remote touch panel was significantly poorer than with the voice or standard multifunction control systems.


In this paper, the systems of speaker identification of a text-dependent and independent nature were considered. Feature extraction was performed using chalk-frequency cepstral coefficients (MFCC). The vector quantization method for the automatic identification of a person by voice has been investigated. Using the extracted features, the code book from each speaker was built by clustering the feature vectors. Speakers were modeled using vector quantization (VQ). Using the extracted features, the code book from each speaker was built by clustering the feature vectors. Codebooks of all announcers were collected in the database. From the results, it can be said that vector quantization using cepstral features produces good results for creating a voice recognition system.


AVITEC ◽  
2019 ◽  
Vol 1 (1) ◽  
Author(s):  
Noor Fita Indri Prayoga

Voice is one of  way to communicate and express yourself. Speaker recognition is a process carried out by a device to recognize the speaker through the voice. This study designed a speaker recognition system that was able to identify speakers based on what was said by using dynamic time warping (DTW) method based in matlab. To design a speaker recognition system begins with the process of reference data and test data. Both processes have the same process, which starts with sound recording, preprocessing, and feature extraction. In this system, the Fast Fourier Transform (FFT) method is used to extract the features. The results of the feature extraction process from the two data will be compared using the DTW method. Calculations using DTW that produce the smallest value will be determined as the output. The test results show that the system can identify the voice with the best level of recognition accuracy of 90%, and the average recognition accuracy of 80%. The results were obtained from 50 tests, carried out by 5 people consisting of 3 men and 2 women, each speaker said a predetermined word


Author(s):  
Vishakha Patil ◽  

Elevator has over time become an important part of our day-to-day life. It is used as an everyday transport device useful to move goods as well as persons. In the modern world, the city and crowded areas require multiform buildings. According to wheelchair access laws, elevators/lifts are a must requirement in new multi-stored buildings. The main purpose of this project is to operate the elevator by voice command. The project is operating based on voice, which could help handicap people or dwarf people to travel from one place to another without the help of any other person. The use of a microcontroller is to control different devices and integrate each module, namely- voice module, motor module, and LCD. LCD is used to display the present status of the lift. The reading edge of our project is the “voice recognition system” which genet’s exceptional result while recognizing speech.


Human voice recognition by computers has been ever developing area since 1952. It is challenging task for a computer to understand and act according to human voice rather than to commands or programs. The reason is that no two human’s voice or style or pitch will be similar and every word is not pronounced by everyone in a similar fashion. Background noises and disturbances may confuse the system. The voice or accent of the same person may change according to the user’s mood, situation, time etc. despite of all these challenges, voice recognition and speech to text conversion has reached a successful stage. Voice processing technology deserves still more research. As a tip of iceberg of this research we contribute our work on this are and we propose a new method i.e., VRSML (Voice Recognition System through Machine Learning) mainly focuses on Speech to text conversion, then analyzing the text extracted from speech in the form of tokens through Machine Learning. After analyzing the derived text, reports are created in textual as well graphical format to represent the vocabulary levels used in that speech. As Supervised learning algorithm from Machine Learning is employed to classify the tokens derived from text, the reports will be more accurate and will be generated faster.


1987 ◽  
Vol 31 (4) ◽  
pp. 424-427
Author(s):  
Christian P. Skriver

This report presents the results of an experiment that measured performance in a simulated ASW message entry task with two modes of data input—vocal and manual. The subjects (Ss) were 12 Naval enlisted men. The independent variable was message data entry mode—vocal or manual. The dependent variables were: time to enter 20 lines of text, data entry errors that were corrected by the Ss, and errors that remained undetected. All Ss were trained to use the voice recognition system with a 100 word vocabulary set. The task was for the S to read one line of message text from a display and then re-enter the text below the displayed text via either voice recognizer or keyboard until 20 lines of text had been entered. Keyboard entry was found to be slightly faster (11%) than voice recognition input. While the number of initial errors (corrected) in the vocal input mode was over three times greater than the number for manual input, the remaining input errors (uncorrected) were about the same.


Generally, in hospitals the dental chair can be operated forward/backward or upward/downward according to the treatment for the patients which is operated by human. Sometimes the chair will not function properly due to piston rust and over weighted patient and the dentist may have pain in the legs due to continuous operation of the chair. To overcome these issues, planning to design a voice recognition dental chair for the doctors in hospitals. This project describes the design of a smart, motorized, voice controlled dental chair. The voice command is given by the dentist/human, sensor recognizes the voice and sends the command to the Arduino. This voice command is converted to string and it is responsible for movement of chair. The intelligent dental chair is designed in such a way that it can be controlled easily by the doctor and has an advantage is the low cost design. This system was designed and developed to avoid wasting the energy and time of the doctor


Author(s):  
Mohammad Shahrul Izham Sharifuddin ◽  
Sharifalillah Nordin ◽  
Azliza Mohd Ali

In this paper, we develop an intelligent wheelchair using CNNs and SVM voice recognition methods. The data is collected from Google and some of them are self-recorded. There are four types of data to be recognized which are go, left, right, and stop. Voice data are extracted using MFCC feature extraction technique. CNNs and SVM are then used to classify and recognize the voice data. The motor driver is embedded in Raspberry PI 3B+  to control the movement of the wheelchair prototype. CNNs produced higher accuracy i.e. 95.30% compared to SVM which is only 72.39%. On the other hand, SVM only took 8.21 seconds while CNNs took 250.03 seconds to execute. Therefore, CNNs produce better result because noise are filtered in the feature extraction layer before classified in the classification layer. However, CNNs took longer time due to the complexity of the networks and the less complexity implementation in SVM give shorter processing time.


Sign in / Sign up

Export Citation Format

Share Document