speech interface
Recently Published Documents


TOTAL DOCUMENTS

198
(FIVE YEARS 28)

H-INDEX

12
(FIVE YEARS 3)

2022 ◽  
Vol 12 (2) ◽  
pp. 827
Author(s):  
Ki-Seung Lee

Moderate performance in terms of intelligibility and naturalness can be obtained using previously established silent speech interface (SSI) methods. Nevertheless, a common problem associated with SSI has involved deficiencies in estimating the spectrum details, which results in synthesized speech signals that are rough, harsh, and unclear. In this study, harmonic enhancement (HE), was used during postprocessing to alleviate this problem by emphasizing the spectral fine structure of speech signals. To improve the subjective quality of synthesized speech, the difference between synthesized and actual speech was established by calculating the distance in the perceptual domains instead of using the conventional mean square error (MSE). Two deep neural networks (DNNs) were employed to separately estimate the speech spectra and the filter coefficients of HE, connected in a cascading manner. The DNNs were trained to incrementally and iteratively minimize both the MSE and the perceptual distance (PD). A feasibility test showed that the perceptual evaluation of speech quality (PESQ) and the short-time objective intelligibility measure (STOI) were improved by 17.8 and 2.9%, respectively, compared with previous methods. Subjective listening tests revealed that the proposed method yielded perceptually preferred results compared with that of the conventional MSE-based method.


2021 ◽  
Vol E104.D (12) ◽  
pp. 2209-2217
Author(s):  
Hongcui WANG ◽  
Pierre ROUSSEL ◽  
Bruce DENBY

2021 ◽  
Author(s):  
Thomas Wachsmuth ◽  
Christian Thormann ◽  
Alexander Winkler

Sensors ◽  
2021 ◽  
Vol 21 (16) ◽  
pp. 5291
Author(s):  
Eldad Holdengreber ◽  
Roi Yozevitch ◽  
Vitali Khavkin

Muteness at its various levels is a common disability. Most of the technological solutions to the problem creates vocal speech through the transition from mute languages to vocal acoustic sounds. We present a new approach for creating speech: a technology that does not require prior knowledge of sign language. This technology is based on the most basic level of speech according to the phonetic division into vowels and consonants. The speech itself is expected to be expressed through sensing of the hand movements, as the movements are divided into three rotations: yaw, pitch, and roll. The proposed algorithm converts these rotations through programming to vowels and consonants. For the hand movement sensing, we used a depth camera and standard speakers in order to produce the sounds. The combination of the programmed depth camera and the speakers, together with the cognitive activity of the brain, is integrated into a unique speech interface. Using this interface, the user can develop speech through an intuitive cognitive process in accordance with the ongoing brain activity, similar to the natural use of the vocal cords. Based on the performance of the presented speech interface prototype, it is substantiated that the proposed device could be a solution for those suffering from speech disabilities.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Utku Kale ◽  
Michael Herrera ◽  
András Nagy

Purpose The purpose of this research is to investigate the pragmatic failure and other language-related risks between pilots and air traffic controllers in intercultural aviation communication. The paper attempts to provide recommendations for the minimization of these risks, thereby improving aviation safety by reducing the rate of aviation incidents and accidents. Pragmatic failure refers to the miscomprehension of intended pragmatic meaning. As opposed to semantic meaning, it depends on the context and is highly influenced by culture. Design/methodology/approach The risk of pragmatic failure in aviation is presented hypothetically, and examples of language-related communication failure in air-to-ground communication between pilots and air traffic controllers (ATCOs) involving language are examined, including an example involving pragmatic failure. A questionnaire has been developed to survey pilots and ATCOs who communicate over radiotelephony. Results from 212 respondents are presented and conclusions are drawn. Findings The authors propose, based on linguistic theory and the results of this survey, that native English-speaking aviation operators gain more familiarity with the inner workings of the English language, in particular regarding the difference between semantic and pragmatic meaning. They benefit from this awareness whenever communicating with people of other cultures to develop the valuable skill of focusing on semantic meaning while avoiding adding pragmatic meaning. This minimizes the potential of misunderstanding when an emergency arises that cannot be dealt with through the International Civil Aviation Organization standard phraseology and when the listener of this message is someone from a different culture. Practical implications Language and communication are the main tools that play a vital role in reducing the rate of aircraft incidents and accidents. In aviation, pilots and ATCOs are neither in face-to-face contact nor have a video speech interface between them while communicating with each other. Their communications are conducted entirely through radio messages using a specialized language designed to make communication as accurate and efficient as possible. This study, therefore, is important in terms of investigating the risks of pragmatic failure and of language errors in general between pilots and air traffic controllers. This research will be a useful guide for designing training for operators (pilots and ATCOs) as well. Originality/value The main focus of the study is to investigate reasons for pragmatic failure and other language-related causes of misunderstanding between pilots and air traffic controllers over air-to-ground communication. To illustrate these roles, a questionnaire has been developed for pilots and ATCOs who communicate over aeronautical radiotelephony and examples of aircraft accidents were given.


2021 ◽  
Vol 8 ◽  
Author(s):  
Lukas Grasse ◽  
Sylvain J. Boutros ◽  
Matthew S. Tata

The Covid-19 pandemic has had a widespread effect across the globe. The major effect on health-care workers and the vulnerable populations they serve has been of particular concern. Near-complete lockdown has been a common strategy to reduce the spread of the pandemic in environments such as live-in care facilities. Robotics is a promising area of research that can assist in reducing the spread of covid-19, while also preventing the need for complete physical isolation. The research presented in this paper demonstrates a speech-controlled, self-sanitizing robot that enables the delivery of items from a visitor to a resident of a care facility. The system is automated to reduce the burden on facility staff, and it is controlled entirely through hands-free audio interaction in order to reduce transmission of the virus. We demonstrate an end-to-end delivery test, and an in-depth evaluation of the speech interface. We also recorded a speech dataset with two conditions: the talker wearing a face mask and the talker not wearing a face mask. We then used this dataset to evaluate the speech recognition system. This enabled us to test the effect of face masks on speech recognition interfaces in the context of autonomous systems.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1399
Author(s):  
Wookey Lee ◽  
Jessica Jiwon Seong ◽  
Busra Ozlu ◽  
Bong Sup Shim ◽  
Azizbek Marakhimov ◽  
...  

Voice is one of the essential mechanisms for communicating and expressing one’s intentions as a human being. There are several causes of voice inability, including disease, accident, vocal abuse, medical surgery, ageing, and environmental pollution, and the risk of voice loss continues to increase. Novel approaches should have been developed for speech recognition and production because that would seriously undermine the quality of life and sometimes leads to isolation from society. In this review, we survey mouth interface technologies which are mouth-mounted devices for speech recognition, production, and volitional control, and the corresponding research to develop artificial mouth technologies based on various sensors, including electromyography (EMG), electroencephalography (EEG), electropalatography (EPG), electromagnetic articulography (EMA), permanent magnet articulography (PMA), gyros, images and 3-axial magnetic sensors, especially with deep learning techniques. We especially research various deep learning technologies related to voice recognition, including visual speech recognition, silent speech interface, and analyze its flow, and systematize them into a taxonomy. Finally, we discuss methods to solve the communication problems of people with disabilities in speaking and future research with respect to deep learning components.


Sign in / Sign up

Export Citation Format

Share Document