scholarly journals Jarvis: Desktop Assistant

Author(s):  
Sohan Singh & Anupam Lakhanpal. Shashwat Shukla., Srishti Sinha.,

“Jarvis” was main character of Tony’s Stark’s life assistant in Movies Iron Man. Unlike original comic in which Jarvis was Stark’s human butler, the movie version of Jarvis is an intelligent computer that converses with stark, monitors his household and help to build and program his superhero suit. In this Project Jarvis is Digital Life Assistant which uses mainly human communication means such Twitter, instant message and voice to create two way connections between human and his apartment, controlling lights and appliances, assist in cooking, notify him of breaking news, Facebook’s Notifications and many more. In our project we mainly use voice as communication means so the Jarvis is basically the Speech recognition application. The concept of speech technology really encompasses two technologies: Synthesizer and recognizer. A speech synthesizer takes as input and produces an audio stream as output. A speech recognizer on the other hand does opposite. It takes an audio stream as input and thus turns it into text transcription. The voice is a signal of infinite information. A direct analysisand synthesizing the complex voice signal is due to too much information contained in the signal. Therefore the digital signal processes such as Feature Extraction and Feature Matching are introduced to represent the voice signal. In this project we directly use speech engine which use Feature extraction technique as Mel scaled frequency cepstral. The mel- scaled frequency cepstral coefficients (MFCCs) derived from Fourier transform and filter bank analysis are perhaps the most widely used front- ends in state-of-the-art speech recognition systems. Our aim to create more and more functionalities which can help human to assist in their daily life and also reduces their efforts. In our test we check all this functionality is working properly. We test this on 2 speakers(1 Female and 1 Male) for accuracy purpose.

Author(s):  
M. Suman ◽  
K. Harish ◽  
K. Manoj Kumar ◽  
S. Samrajyam

<p>Speaker  Recognition  is  the  computing  task  of confirmatory a user’s claimed  identity mistreatment characteristics extracted  from  their  voices.  This  technique  is  one  of  the  most helpful  and in style  biometric  recognition  techniques  in  the  world particularly connected  to  areas  in that security could be a major concern. It are often used for authentication, police work, rhetorical speaker recognition and variety of connected activities. The method of Speaker recognition consists of two modules particularly feature extraction and have matching. Feature extraction is that the method during which we have a tendency to extract a tiny low quantity of knowledge from  the  voice  signal  that will  later  be  used  to  represent every  speaker.    Feature  matching involves  identification  of  the  unknown  speaker  by scrutiny  the  extracted options  from his/her voice input with those from a collection of identified speakers. Our projected  work  consists  of  truncating  a  recorded  voice  signal,  framing  it,  passing  it through  a  window perform, conniving  the  Short  Term  FFT,  extracting  its options  and Matching it with a hold on guide.  Cepstral constant  Calculation  and  Mel  frequency Cepstral  Coefficients  (MFCC) area unit  applied  for  feature  extraction  purpose.VQLBG (Vector Quantization via Linde-Buzo-Gray) algorithmic rule is used for generating guide and feature matching purpose.</p>


Loquens ◽  
2017 ◽  
Vol 4 (1) ◽  
pp. 040
Author(s):  
Zulema Santana-López ◽  
Óscar Domínguez-Jaén ◽  
Jesús B. Alonso ◽  
María Del Carmen Mato-Carrodeguas

Voice pathologies, caused either by functional dysphonia or organic lesions, or even by just an inappropriate emission of the voice, may lead to vocal abuse, affecting significantly the communication process. The present study is based on the case of a single patient diagnosed with myasthenia gravis (Erb-Goldflam syndrome). In this case, this affection has caused, among other disruptions, a dysarthria. For its treatment, a technique for the education and re-education of the voice has been used, based on a resonator element: the cellophane screen. This article shows the results obtained in the patient after applying a vocal re-education technique called the Cimardi Method: the Cellophane Screen, which is a pioneering technique in this field. Changes in the patient’s voice signal have been studied before and after the application of the Cimardi Method in different domains of study: time-frequency, spectrum, and cepstrum. Moreover, parameters for voice quality measurement, such as shimmer, jitter and harmonic-to-noise ratio (HNR), have been used to quantify the results obtained with the Cimardi Method. Once the results were analyzed, it has been observed that the Cimardi Method helps to produce a more natural and free vocal emission, which is very useful as a rehabilitation therapy for those people presenting certain vocal disorders.


Author(s):  
A. SUBASH CHANDAR ◽  
S. SURIYANARAYANAN ◽  
M. MANIKANDAN

This paper proposes a method of Speech recognition using Self Organizing Maps (SOM) and actuation through network in Matlab. The different words spoken by the user at client end are captured and filtered using Least Mean Square (LMS) algorithm to remove the acoustic noise. FFT is taken for the filtered voice signal. The voice spectrum is recognized using trained SOM and appropriate label is sent to server PC. The client and the server communication are established using User Datagram Protocol (UDP). Microcontroller (AT89S52) is used to control the speed of the actuator depending upon the input it receives from the client. Real-time working of the prototype system has been verified with successful speech recognition, transmission, reception and actuation via network.


Proceedings ◽  
2019 ◽  
Vol 31 (1) ◽  
pp. 54
Author(s):  
Benítez-Guijarro ◽  
Callejas ◽  
Noguera ◽  
Benghazi

Devices with oral interfaces are enabling new interesting interaction scenarios and ways of interaction in ambient intelligence settings. The use of several of such devices in the same environment opens up the possibility to compare the inputs gathered from each one of them and perform a more accurate recognition and processing of user speech. However, the combination of multiple devices presents coordination challenges, as the processing of one voice signal by different speech processing units may result in conflicting outputs and it is necessary to decide which is the most reliable source. This paper presents an approach to rank several sources of spoken input in multi-device environments in order to give preference to the input with the highest estimated quality. The voice signals received by the multiple devices are assessed in terms of their calculated acoustic quality and the reliability of the speech recognition hypotheses produced. After this assessment, each input is assigned a unique score that allows the audio sources to be ranked so as to pick the best to be processed by the system. In order to validate this approach, we have performed an evaluation using a corpus of 4608 audios recorded in a two-room intelligent environment with 24 microphones. The experimental results show that our ranking approach makes it possible to successfully orchestrate an increasing number of acoustic inputs, obtaining better recognition rates than considering a single input, both in clear and noisy settings.


2014 ◽  
Vol 602-605 ◽  
pp. 2913-2916
Author(s):  
Man Hua Yu ◽  
He Gong ◽  
Zi Yu Wu ◽  
Shi Jun Li

In order to make emends for the disadvantages of a taxi software, and considering the disadvantages of existing car-sharing system for a taxi, a voice carpool system, which was embedded the voice recognition technology and the wireless transmission technology, was proposed and designed. The recognition module can recognize the voice of drivers and passengers. The data and information in the system are transfered using wireless transmission technology. The ultra-thin LED screen is the core part of the system. In the system, the address saied by drivers or passengers is transformed the voice signal. Then the part of voice recognition for the system change the voice signal to related digital signal and transmit the digital signal to the part of display for the system. Eventualy the address information of passengers is displayed on the electronic screen. So passengers on the street can catch sight of the direction ,which the taxi will go. The system operated steadily, and achieved inputing of the voice carpool information, wireless data transmission and bi-directional display function on the LED creen. It will satisfy and help people to obtain the carpool informations clearly, intuitively and quickly. Meanwhile, it can also help to improve the utilization rate of taxies. So the voice carpool system will be applied certainly and popularly.


2019 ◽  
Vol 8 (4) ◽  
pp. 7272-7277

In recent trend, Speech recognition has become extensively used in customer service based organization. It has acquired great deal of research in pattern matching employed machine learning (learning speech by experience) and neural networks based speech endorsement domains. Speech recognition is the technology of capturing and perceiving human voice, interpreting it, producing text from it, managing digital devices and assisting visually impaired and older adults using unequivocal digital signal processing. In this paper we have presented a comprehensive study of different methodologies in android enabled speech recognition system that focused at analysis of the operability and reliability of voice note app. Subsequently we have suggested and experimented an android based speech recognizer app viz. Annotate which predominately focus on voice dictation in five different languages (English, Hindi, Tamil, Malayalam and Telugu) and extracting text from image using Automatic Speech Recognition (ASR) and Optical Character Recognition (OCR) algorithm. Finally, we identified opportunities for future enhancements in this realm.


2018 ◽  
Vol 8 (4) ◽  
pp. 3153-3156
Author(s):  
A. Y. Al-Rawashdeh ◽  
Z. Al-Qadi

Voice signals are one of the most popular data types. They are used in various applications like security systems. In the current study a method based on wave equation was proposed, implemented and tested. This method was used for correct feature array generation. The feature array can be used as a key to identify the voice signal without any dependence on the voice signal type or size. Results indicated that the proposed method can produce a unique feature array for each voice signal. They also showed that the proposed method can be faster than other feature extraction methods.


2017 ◽  
Vol 9 (2) ◽  
pp. 23
Author(s):  
Pardeep Sangwan ◽  
Dinesh Sheoran ◽  
Saurabh Bhardwaj

Speech recognition by machine may be defined as the conversion of human speech signal into textual form automatically by the machine without any human intervention. Two feature extraction techniques utilizing DWT (Discrete Wavelet Transform) and WPD (Wavelet Packet Decomposition) for speech recognition are discussed in the present article. The comparison of two speech recognizer, first, based on Discrete Wavelet Transform and the second based on Wavelet Packet Decomposition, and with four classifiers has been done in this paper. The proposed method is implemented for a database consisting of ten digits and two hundred speakers, making it a database of 2000 speech samples. The results present the accuracy rate of the respective speech recognizers.


2014 ◽  
Vol 945-949 ◽  
pp. 2447-2450
Author(s):  
Cong Cong Chen ◽  
Wei Gong ◽  
Wen Long Fu

In the speech emotion recognition system, voice signal recognition is the most critical step, the simple signal recognition can lead to errors. In this paper the cultural genetic method applied in speech recognition optimizes the voice features combination to find the optimal solution, and it provides effective method to improve the efficiency of the speech recognition.


Sign in / Sign up

Export Citation Format

Share Document