speech information
Recently Published Documents


TOTAL DOCUMENTS

288
(FIVE YEARS 83)

H-INDEX

25
(FIVE YEARS 3)

2021 ◽  
Author(s):  
René Groh ◽  
Zhengdong Lei ◽  
Lisa Martignetti ◽  
Nicole YK Li-Jessen ◽  
Andreas M Kist

Mobile health wearables are often embedded with small processors for signal acquisition and analysis. These embedded wearable systems are, however, limited with low available memory and computational power. Advances in machine learning, especially deep neural networks (DNNs), have been adopted for efficient and intelligent applications to overcome constrained computational environments. In this study, evolutionary optimized DNNs were analyzed to classify three common airway-related symptoms, namely coughs, throat clears and dry swallows. As opposed to typical microphone-acoustic signals, mechano-acoustic data signals, which did not contain identifiable speech information for better privacy protection, were acquired from laboratory-generated and publicly available datasets. The optimized DNNs had a low footprint of less than 150 kB and predicted airway symptoms of interests with 83.7% accuracy on unseen data. By performing explainable AI techniques, namely occlusion experiments and class activation maps, mel-frequency bands up to 8,000 Hz were found as the most important feature for the classification. We further found that DNN decisions were consistently relying on these specific features, fostering trust and transparency of proposed DNNs. Our proposed efficient and explainable DNN is expected to support edge computing on mechano-acoustic sensing wearables for remote, long-term monitoring of airway symptoms.


Author(s):  
Weigao Su ◽  
Daibo Liu ◽  
Taiyuan Zhang ◽  
Hongbo Jiang

Motion sensors in modern smartphones have been exploited for audio eavesdropping in loudspeaker mode due to their sensitivity to vibrations. In this paper, we further move one step forward to explore the feasibility of using built-in accelerometer to eavesdrop on the telephone conversation of caller/callee who takes the phone against cheek-ear and design our attack Vibphone. The inspiration behind Vibphone is that the speech-induced vibrations (SIV) can be transmitted through the physical contact of phone-cheek to accelerometer with the traces of voice content. To this end, Vibphone faces three main challenges: i) Accurately detecting SIV signals from miscellaneous disturbance; ii) Combating the impact of device diversity to work with a variety of attack scenarios; and iii) Enhancing feature-agnostic recognition model to generalize to newly issued devices and reduce training overhead. To address these challenges, we first conduct an in-depth investigation on SIV features to figure out the root cause of device diversity impacts and identify a set of critical features that are highly relevant to the voice content retained in SIV signals and independent of specific devices. On top of these pivotal observations, we propose a combo method that is the integration of extracted critical features and deep neural network to recognize speech information from the spectrogram representation of acceleration signals. We implement the attack using commodity smartphones and the results show it is highly effective. Our work brings to light a fundamental design vulnerability in the vast majority of currently deployed smartphones, which may put people's speech privacy at risk during phone calls. We also propose a practical and effective defense solution. We validate that it is feasible to prevent audio eavesdropping by using random variation of sampling rate.


2021 ◽  
Author(s):  
Mahmoud Keshavarzi ◽  
Áine Ní Choisdealbha ◽  
Adam Attaheri ◽  
Sinead Rocha ◽  
Perrine Brusini ◽  
...  

Computational models that successfully translate neural activity into speech are multiplying in the adult literature, with non-linear convolutional neural network (CNN) approaches joining the more frequently-employed linear and mutual information (MI) models. Despite the promise of these methods for uncovering the neural basis of language acquisition by the human brain, similar studies with infants are rare. Existing infant studies rely on simpler cross-correlation and other linear techniques and aim only to establish neural tracking of the broadband speech envelope. Here, three novel computational models were applied to measure whether low-frequency speech envelope information was encoded in infant neural activity. Backward linear and CNN models were applied to estimate speech information from neural activity using linear versus nonlinear approaches, and a MI model measured how well the acoustic stimuli were encoded in infant neural responses. Fifty infants provided EEG recordings when aged 4, 7, and 11 months, while listening passively to natural speech (sung nursery rhymes) presented by video with a female singer. Each model computed speech information for these nursery rhymes in two different frequency bands, delta (1 – 4 Hz) and theta (4 – 8 Hz), thought to provide different types of linguistic information. All three models demonstrated significant levels of performance for delta-band and theta-band neural activity from 4 months of age. All models also demonstrated higher accuracy for the delta-band neural response in the infant brain. However, only the linear and MI models showed developmental (age-related) effects, and these developmental effects differed by model. Accordingly, the choice of algorithm used to decode speech envelope information from neural activity in the infant brain may determine the developmental conclusions that can be drawn. Better understanding of the strengths and weaknesses of each modelling approach will be fundamental to improving our understanding of how the human brain builds a language system.


2021 ◽  
Vol Publish Ahead of Print ◽  
Author(s):  
Sigrid Polspoel ◽  
Sophia E. Kramer ◽  
Bas van Dijk ◽  
Cas Smits

Author(s):  
Muratova Nafisa

Abstract: Speech conditions are the level of the listener and the speaker, their behavior in the speech process, as well as the purpose of the speech - all of these are tools other than extralinguistic linguistic units. A special place in speech is given to paralinguistic devices that accompany linguistic units in the process of interaction. The following manifestations of the use of extralinguistic means of the sender and the addressee are observed during the discourse. Keywords: paralinguistic means, extra linguistic means, sender, addressee, extralinguistic means, verbal, nonverbal means, forms of speech communication, types of speech, information exchange


2021 ◽  
pp. 9-15
Author(s):  
H. P. Orel

This article is devoted to the consideration of the components of the legal provision ofhuman rights in the development of social networks. The issue of the legal status of persons –participants of Internet communication is considered. Such rights include: the right to association;the right to freedom of thought and speech; information rights related to the dissemination,transmission, receipt and use of information. Also, this article covers the issue of illegalmanifestations that entail violations of legal rights and interests. For an individual user, this isillegal access to personal data, disclosure of confidential information; defamation; copyrightinfringement; fraud, misuse of bank data, etc. Covers the security of personal data of users ofsocial networks. The main legal act in force today in the field of personal data protection onthe Internet is the Council of Europe Convention for the Protection of Individuals with regardto Automatic Processing of Personal Data. It is determined that social networks strengthen theright to participate in the management of state affairs, including through free elections, providingadditional opportunities for public debate, improving their quality, stimulating democraticprocesses, activity, initiative, awareness and involvement of citizens in issues related to relatedto public administration. It is stated that due to the potential threats arising in connectionwith the functioning of social networks and other institutions of Internet communication, apromising direction is the creation of legal regimes of human rights in terms of regulatingInternet relations to disseminate information while ensuring the balance of interests of allparticipants. and their harmonization with the basics of public order. At the same time, certainproblems, such as reputation protection, protection of intellectual property, should be solvedin line with the already established sectoral regulation, developing it taking into account thespecifics of Internet communication.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Xiaoqiang Chi ◽  
Yang Xiang

Paraphrase generation is an essential yet challenging task in natural language processing. Neural-network-based approaches towards paraphrase generation have achieved remarkable success in recent years. Previous neural paraphrase generation approaches ignore linguistic knowledge, such as part-of-speech information regardless of its availability. The underlying assumption is that neural nets could learn such information implicitly when given sufficient data. However, it would be difficult for neural nets to learn such information properly when data are scarce. In this work, we endeavor to probe into the efficacy of explicit part-of-speech information for the task of paraphrase generation in low-resource scenarios. To this end, we devise three mechanisms to fuse part-of-speech information under the framework of sequence-to-sequence learning. We demonstrate the utility of part-of-speech information in low-resource paraphrase generation through extensive experiments on multiple datasets of varying sizes and genres.


2021 ◽  
Vol 64 (10) ◽  
pp. 4014-4029
Author(s):  
Kathy R. Vander Werff ◽  
Christopher E. Niemczak ◽  
Kenneth Morse

Purpose Background noise has been categorized as energetic masking due to spectrotemporal overlap of the target and masker on the auditory periphery or informational masking due to cognitive-level interference from relevant content such as speech. The effects of masking on cortical and sensory auditory processing can be objectively studied with the cortical auditory evoked potential (CAEP). However, whether effects on neural response morphology are due to energetic spectrotemporal differences or informational content is not fully understood. The current multi-experiment series was designed to assess the effects of speech versus nonspeech maskers on the neural encoding of speech information in the central auditory system, specifically in terms of the effects of speech babble noise maskers varying by talker number. Method CAEPs were recorded from normal-hearing young adults in response to speech syllables in the presence of energetic maskers (white or speech-shaped noise) and varying amounts of informational maskers (speech babble maskers). The primary manipulation of informational masking was the number of talkers in speech babble, and results on CAEPs were compared to those of nonspeech maskers with different temporal and spectral characteristics. Results Even when nonspeech noise maskers were spectrally shaped and temporally modulated to speech babble maskers, notable changes in the typical morphology of the CAEP in response to speech stimuli were identified in the presence of primarily energetic maskers and speech babble maskers with varying numbers of talkers. Conclusions While differences in CAEP outcomes did not reach significance by number of talkers, neural components were significantly affected by speech babble maskers compared to nonspeech maskers. These results suggest an informational masking influence on neural encoding of speech information at the sensory cortical level of auditory processing, even without active participation on the part of the listener.


Doklady BGUIR ◽  
2021 ◽  
Vol 19 (6) ◽  
pp. 14-22
Author(s):  
N. S. Sanko ◽  
M. I. Vashkevich

The purpose of this article is to investigate the application of DFT-modulated filter bank in systems with significant spectral component amplification like hearing aid. There is a description of analysis / synthesis method based on short-time Fourier transform (STFT), which is used in most systems of speech information processing. It is shown that DFT-modulated filter bank is a generalization of STFT-method. In analysis / synthesis system based on DFT-modulated filter bank, the input signal is divided into subbands, passing through the analysis filter bank then each subband is amplified and the last step is to reconstruct the signal with synthesis filter bank. However, in digital systems with significant spectral component amplification, the resulting signal is distorted after reconstruction because of amplification factor difference in each subband. The article provides expressions for the distortion and the aliasing functions, allowing to estimate the distortion value, which appears in analysis / synthesis system of DFT-modulated filter bank. Efficient algorithms for calculating the distortion and the aliasing functions are also offered. In future it is planning to develop a procedure for optimizing the DFT-modulated filter bank based on the proposed efficient algorithms for calculating distortion and spectral aliasing in the filter bank.


Author(s):  
В.С. Бабин

Развитие авиационных и ракетно-космических систем специальной связи в настоящий момент характеризуется увеличением объемов речевых информационных сообщений. Речь, как наиболее привычная форма общения, позволяет выражать эмоции и личные качества говорящего, чего гораздо сложнее добиться иными путями информационного обмена. Растущий спрос на голосовую связь порождает необходимость выработки перспективных технологий повышения эффективности функционирования таких систем, анализ которых и проведен в данной статье. The development of aviation and rocket-space systems of special communication is currently characterized by an increase in the volume of speech information messages. Speech, as the most familiar form of communication, allows you to express the emotions and personal qualities of the speaker, which is much more difficult to achieve in other ways of information exchange. The growing demand for voice communication creates the need to develop promising technologies to improve the efficiency of such systems, the analysis of which is carried out in this article.


Sign in / Sign up

Export Citation Format

Share Document