Jarvis: Desktop Assistant

“Jarvis” was main character of Tony’s Stark’s life assistant in Movies Iron Man. Unlike original comic in which Jarvis was Stark’s human butler, the movie version of Jarvis is an intelligent computer that converses with stark, monitors his household and help to build and program his superhero suit. In this Project Jarvis is Digital Life Assistant which uses mainly human communication means such Twitter, instant message and voice to create two way connections between human and his apartment, controlling lights and appliances, assist in cooking, notify him of breaking news, Facebook’s Notifications and many more. In our project we mainly use voice as communication means so the Jarvis is basically the Speech recognition application. The concept of speech technology really encompasses two technologies: Synthesizer and recognizer. A speech synthesizer takes as input and produces an audio stream as output. A speech recognizer on the other hand does opposite. It takes an audio stream as input and thus turns it into text transcription. The voice is a signal of infinite information. A direct analysisand synthesizing the complex voice signal is due to too much information contained in the signal. Therefore the digital signal processes such as Feature Extraction and Feature Matching are introduced to represent the voice signal. In this project we directly use speech engine which use Feature extraction technique as Mel scaled frequency cepstral. The mel- scaled frequency cepstral coefficients (MFCCs) derived from Fourier transform and filter bank analysis are perhaps the most widely used front- ends in state-of-the-art speech recognition systems. Our aim to create more and more functionalities which can help human to assist in their daily life and also reduces their efforts. In our test we check all this functionality is working properly. We test this on 2 speakers(1 Female and 1 Male) for accuracy purpose.

Download Full-text

Speech Recognition Using MFCC and VQLBG

International Journal of Advances in Applied Sciences ◽

10.11591/ijaas.v4.i4.pp151-156 ◽

2015 ◽

Vol 4 (4) ◽

pp. 151

Author(s):

M. Suman ◽

K. Harish ◽

K. Manoj Kumar ◽

S. Samrajyam

Keyword(s):

Feature Extraction ◽

Speaker Recognition ◽

Feature Matching ◽

Biometric Recognition ◽

Mel Frequency Cepstral Coefficients ◽

Voice Input ◽

Voice Signal ◽

Area Unit ◽

The Voice ◽

Algorithmic Rule

<p>Speaker Recognition is the computing task of confirmatory a user’s claimed identity mistreatment characteristics extracted from their voices. This technique is one of the most helpful and in style biometric recognition techniques in the world particularly connected to areas in that security could be a major concern. It are often used for authentication, police work, rhetorical speaker recognition and variety of connected activities. The method of Speaker recognition consists of two modules particularly feature extraction and have matching. Feature extraction is that the method during which we have a tendency to extract a tiny low quantity of knowledge from the voice signal that will later be used to represent every speaker. Feature matching involves identification of the unknown speaker by scrutiny the extracted options from his/her voice input with those from a collection of identified speakers. Our projected work consists of truncating a recorded voice signal, framing it, passing it through a window perform, conniving the Short Term FFT, extracting its options and Matching it with a hold on guide. Cepstral constant Calculation and Mel frequency Cepstral Coefficients (MFCC) area unit applied for feature extraction purpose.VQLBG (Vector Quantization via Linde-Buzo-Gray) algorithmic rule is used for generating guide and feature matching purpose.</p>

Download Full-text

Voice re-education and its evaluation through digital signal processing: A case study

Loquens ◽

10.3989/loquens.2017.040 ◽

2017 ◽

Vol 4 (1) ◽

pp. 040

Author(s):

Zulema Santana-López ◽

Óscar Domínguez-Jaén ◽

Jesús B. Alonso ◽

María Del Carmen Mato-Carrodeguas

Keyword(s):

Voice Quality ◽

Quality Measurement ◽

Digital Signal ◽

Communication Process ◽

Time Frequency ◽

Voice Signal ◽

Functional Dysphonia ◽

Before And After ◽

Voice Pathologies ◽

The Voice

Voice pathologies, caused either by functional dysphonia or organic lesions, or even by just an inappropriate emission of the voice, may lead to vocal abuse, affecting significantly the communication process. The present study is based on the case of a single patient diagnosed with myasthenia gravis (Erb-Goldflam syndrome). In this case, this affection has caused, among other disruptions, a dysarthria. For its treatment, a technique for the education and re-education of the voice has been used, based on a resonator element: the cellophane screen. This article shows the results obtained in the patient after applying a vocal re-education technique called the Cimardi Method: the Cellophane Screen, which is a pioneering technique in this field. Changes in the patient’s voice signal have been studied before and after the application of the Cimardi Method in different domains of study: time-frequency, spectrum, and cepstrum. Moreover, parameters for voice quality measurement, such as shimmer, jitter and harmonic-to-noise ratio (HNR), have been used to quantify the results obtained with the Cimardi Method. Once the results were analyzed, it has been observed that the Cimardi Method helps to produce a more natural and free vocal emission, which is very useful as a rehabilitation therapy for those people presenting certain vocal disorders.

Download Full-text

SPEECH RECOGNITION USING SOM AND ACTUATION VIA NETWORK IN MATLAB

International Journal of Electronics Signals and Systems ◽

10.47893/ijess.2014.1166 ◽

2014 ◽

pp. 191-194

Author(s):

A. SUBASH CHANDAR ◽

S. SURIYANARAYANAN ◽

M. MANIKANDAN

Keyword(s):

Speech Recognition ◽

Real Time ◽

Acoustic Noise ◽

Lms Algorithm ◽

Prototype System ◽

Mean Square ◽

Least Mean Square ◽

Self Organizing Maps ◽

Voice Signal ◽

The Voice

This paper proposes a method of Speech recognition using Self Organizing Maps (SOM) and actuation through network in Matlab. The different words spoken by the user at client end are captured and filtered using Least Mean Square (LMS) algorithm to remove the acoustic noise. FFT is taken for the filtered voice signal. The voice spectrum is recognized using trained SOM and appropriate label is sent to server PC. The client and the server communication are established using User Datagram Protocol (UDP). Microcontroller (AT89S52) is used to control the speed of the actuator depending upon the input it receives from the client. Real-time working of the prototype system has been verified with successful speech recognition, transmission, reception and actuation via network.

Download Full-text

Coordination of Speech Recognition Devices in Intelligent Environments with Multiple Responsive Devices

Proceedings ◽

10.3390/proceedings2019031054 ◽

2019 ◽

Vol 31 (1) ◽

pp. 54

Author(s):

Benítez-Guijarro ◽

Callejas ◽

Noguera ◽

Benghazi

Keyword(s):

Speech Recognition ◽

Speech Processing ◽

Intelligent Environments ◽

Intelligent Environment ◽

Voice Signal ◽

Reliable Source ◽

Acoustic Quality ◽

Multiple Devices ◽

Single Input ◽

The Voice

Devices with oral interfaces are enabling new interesting interaction scenarios and ways of interaction in ambient intelligence settings. The use of several of such devices in the same environment opens up the possibility to compare the inputs gathered from each one of them and perform a more accurate recognition and processing of user speech. However, the combination of multiple devices presents coordination challenges, as the processing of one voice signal by different speech processing units may result in conflicting outputs and it is necessary to decide which is the most reliable source. This paper presents an approach to rank several sources of spoken input in multi-device environments in order to give preference to the input with the highest estimated quality. The voice signals received by the multiple devices are assessed in terms of their calculated acoustic quality and the reliability of the speech recognition hypotheses produced. After this assessment, each input is assigned a unique score that allows the audio sources to be ranked so as to pick the best to be processed by the system. In order to validate this approach, we have performed an evaluation using a corpus of 4608 audios recorded in a two-room intelligent environment with 24 microphones. The experimental results show that our ranking approach makes it possible to successfully orchestrate an increasing number of acoustic inputs, obtaining better recognition rates than considering a single input, both in clear and noisy settings.

Download Full-text

Design of Voice Carpooling System Based on the LD3320

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.602-605.2913 ◽

2014 ◽

Vol 602-605 ◽

pp. 2913-2916

Author(s):

Man Hua Yu ◽

He Gong ◽

Zi Yu Wu ◽

Shi Jun Li

Keyword(s):

Voice Recognition ◽

Digital Signal ◽

Wireless Transmission ◽

Utilization Rate ◽

Wireless Data ◽

Voice Signal ◽

Wireless Data Transmission ◽

Display Function ◽

The Voice ◽

Core Part

In order to make emends for the disadvantages of a taxi software, and considering the disadvantages of existing car-sharing system for a taxi, a voice carpool system, which was embedded the voice recognition technology and the wireless transmission technology, was proposed and designed. The recognition module can recognize the voice of drivers and passengers. The data and information in the system are transfered using wireless transmission technology. The ultra-thin LED screen is the core part of the system. In the system, the address saied by drivers or passengers is transformed the voice signal. Then the part of voice recognition for the system change the voice signal to related digital signal and transmit the digital signal to the part of display for the system. Eventualy the address information of passengers is displayed on the electronic screen. So passengers on the street can catch sight of the direction ,which the taxi will go. The system operated steadily, and achieved inputing of the voice carpool information, wireless data transmission and bi-directional display function on the LED creen. It will satisfy and help people to obtain the carpool informations clearly, intuitively and quickly. Meanwhile, it can also help to improve the utilization rate of taxies. So the voice carpool system will be applied certainly and popularly.

Download Full-text

A Pilot Research on Android Based Voice Recognition Application

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d5284.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 7272-7277

Keyword(s):

Speech Recognition ◽

Customer Service ◽

Character Recognition ◽

Optical Character Recognition ◽

Digital Signal ◽

Recognition System ◽

Digital Devices ◽

Human Voice ◽

Speech Recognizer ◽

Comprehensive Study

In recent trend, Speech recognition has become extensively used in customer service based organization. It has acquired great deal of research in pattern matching employed machine learning (learning speech by experience) and neural networks based speech endorsement domains. Speech recognition is the technology of capturing and perceiving human voice, interpreting it, producing text from it, managing digital devices and assisting visually impaired and older adults using unequivocal digital signal processing. In this paper we have presented a comprehensive study of different methodologies in android enabled speech recognition system that focused at analysis of the operability and reliability of voice note app. Subsequently we have suggested and experimented an android based speech recognizer app viz. Annotate which predominately focus on voice dictation in five different languages (English, Hindi, Tamil, Malayalam and Telugu) and extracting text from image using Automatic Speech Recognition (ASR) and Optical Character Recognition (OCR) algorithm. Finally, we identified opportunities for future enhancements in this realm.

Download Full-text

Using Wave Equation to Extract Digital Signal Features

Engineering, Technology & Applied Science Research ◽

10.48084/etasr.2088 ◽

2018 ◽

Vol 8 (4) ◽

pp. 3153-3156

Author(s):

A. Y. Al-Rawashdeh ◽

Z. Al-Qadi

Keyword(s):

Wave Equation ◽

Digital Signal ◽

Extraction Methods ◽

Data Types ◽

Security Systems ◽

Voice Signal ◽

Signal Features ◽

Signal Type ◽

Array Generation ◽

The Voice

Voice signals are one of the most popular data types. They are used in various applications like security systems. In the current study a method based on wave equation was proposed, implemented and tested. This method was used for correct feature array generation. The feature array can be used as a key to identify the voice signal without any dependence on the voice signal type or size. Results indicated that the proposed method can produce a unique feature array for each voice signal. They also showed that the proposed method can be faster than other feature extraction methods.

Download Full-text

Speech Recognition using Wavelet based Feature Extraction Techniques

Global Journal of Enterprise Information System ◽

10.18311/gjeis/2017/16120 ◽

2017 ◽

Vol 9 (2) ◽

pp. 23

Author(s):

Pardeep Sangwan ◽

Dinesh Sheoran ◽

Saurabh Bhardwaj

Keyword(s):

Feature Extraction ◽

Wavelet Transform ◽

Speech Recognition ◽

Discrete Wavelet Transform ◽

Wavelet Packet ◽

Discrete Wavelet ◽

Wavelet Packet Decomposition ◽

Extraction Techniques ◽

Speech Recognizers ◽

Speech Recognizer

Speech recognition by machine may be defined as the conversion of human speech signal into textual form automatically by the machine without any human intervention. Two feature extraction techniques utilizing DWT (Discrete Wavelet Transform) and WPD (Wavelet Packet Decomposition) for speech recognition are discussed in the present article. The comparison of two speech recognizer, first, based on Discrete Wavelet Transform and the second based on Wavelet Packet Decomposition, and with four classifiers has been done in this paper. The proposed method is implemented for a database consisting of ten digits and two hundred speakers, making it a database of 2000 speech samples. The results present the accuracy rate of the respective speech recognizers.

Download Full-text

Unalike methodologies of feature extraction & feature matching in Speech Recognition

2014 International Conference on High Performance Computing and Applications (ICHPCA) ◽

10.1109/ichpca.2014.7045340 ◽

2014 ◽

Cited By ~ 1

Author(s):

Ruchismita Tripathy ◽

Hrudaya Kumar Tripathy

Keyword(s):

Feature Extraction ◽

Speech Recognition ◽

Feature Matching

Download Full-text

Speech Signal for the Recognition of Emotion by Cultural Genetic Algorithm

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.945-949.2447 ◽

2014 ◽

Vol 945-949 ◽

pp. 2447-2450

Author(s):

Cong Cong Chen ◽

Wei Gong ◽

Wen Long Fu

Keyword(s):

Genetic Algorithm ◽

Speech Recognition ◽

Speech Signal ◽

Optimal Solution ◽

Recognition System ◽

Speech Emotion Recognition ◽

Signal Recognition ◽

Genetic Method ◽

Voice Signal ◽

The Voice

In the speech emotion recognition system, voice signal recognition is the most critical step, the simple signal recognition can lead to errors. In this paper the cultural genetic method applied in speech recognition optimizes the voice features combination to find the optimal solution, and it provides effective method to improve the efficiency of the speech recognition.

Download Full-text