audio signal
Recently Published Documents





2022 ◽  
Vol 3 (1) ◽  
pp. 1-11
Flavio Bertini ◽  
Davide Allevi ◽  
Gianluca Lutero ◽  
Danilo Montesi ◽  
Laura Calzà

The World Health Organization estimates that 50 million people are currently living with dementia worldwide and this figure will almost triple by 2050. Current pharmacological treatments are only symptomatic, and drugs or other therapies are ineffective in slowing down or curing the neurodegenerative process at the basis of dementia. Therefore, early detection of cognitive decline is of the utmost importance to respond significantly and deliver preventive interventions. Recently, the researchers showed that speech alterations might be one of the earliest signs of cognitive defect, observable well in advance before other cognitive deficits become manifest. In this article, we propose a full automated method able to classify the audio file of the subjects according to the progress level of the pathology. In particular, we trained a specific type of artificial neural network, called autoencoder, using the visual representation of the audio signal of the subjects, that is, the spectrogram. Moreover, we used a data augmentation approach to overcome the problem of the large amount of annotated data usually required during the training phase, which represents one of the most major obstacles in deep learning. We evaluated the proposed method using a dataset of 288 audio files from 96 subjects: 48 healthy controls and 48 cognitively impaired participants. The proposed method obtained good classification results compared to the state-of-the-art neuropsychological screening tests and, with an accuracy of 90.57%, outperformed the methods based on manual transcription and annotation of speech.

2022 ◽  
pp. 108174
Habib Ben Abdallah ◽  
Christopher J. Henry ◽  
Sheela Ramanna

2022 ◽  
Vol 31 (2) ◽  
pp. 693-706
Rakesh Kumar ◽  
Meenu Gupta ◽  
Shakeel Ahmed ◽  
Abdulaziz Alhumam ◽  
Tushar Aggarwal

2022 ◽  
pp. 100488
João Paulo Lemos Escola ◽  
Uender Barbosa de Souza ◽  
Rodrigo Capobianco Guido ◽  
Ivan Nunes da Silva ◽  
Jovander da Silva Freitas ◽  

2022 ◽  
pp. 272-290
Soumen Santra ◽  
Arpan Deyasi

Text-to-Braille conversion as well as speech-to-Braille conversion are not available in combined form so far for the visually impaired, and there is tremendous need of a device that can look after this special class of people. The present chapter deals with a novel model that is designed to help both types of impaired people, be it visual problem or related with hearing. The proposal is itself unique and is also supported by experimental results available within the laboratory condition. This device will help people to read from text with their Braille language and will also help to convert the same form to audio signal. Since text and audio are the two main interfaces for any person to communicate with the external world apart from functionalities of sensory organs, the work has relevance. With the help of DANET, the same data, in text or speech form, can be accessed in more than one digital device simultaneously.

2022 ◽  
pp. 125-142
Vijay Srinivas Srinivas Tida ◽  
Raghabendra Shah ◽  
Xiali Hei

The laser-based audio signal injection can be used for attacking voice controllable systems. An attacker can aim an amplitude-modulated light at the microphone's aperture, and the signal injection acts as a remote voice-command attack on voice-controllable systems. Attackers are using vulnerabilities to steal things that are in the form of physical devices or the form of virtual using making orders, withdrawal of money, etc. Therefore, detection of these signals is important because almost every device can be attacked using these amplitude-modulated laser signals. In this project, the authors use deep learning to detect the incoming signals as normal voice commands or laser-based audio signals. Mel frequency cepstral coefficients (MFCC) are derived from the audio signals to classify the input audio signals. If the audio signals are identified as laser signals, the voice command can be disabled, and an alert can be displayed to the victim. The maximum accuracy of the machine learning model was 100%, and in the real world, it's around 95%.

2022 ◽  
Vol 2153 (1) ◽  
pp. 012014
E Gelvez-Almeida ◽  
A Váasquez-Coronel ◽  
R Guatelli ◽  
V Aubin ◽  
M Mora

Abstract Extreme learning machine is an algorithm that has shown a good performance facing classification and regression problems. It has gained great acceptance by the scientific community due to the simplicity of the model and its sola great generalization capacity. This work proposes the use of extreme learning machine neural networks to carry out the classification between Parkinson’s disease patients and healthy individuals. The descriptor used corresponds to the feature vector generated applying the local binary Pattern algorithm to the grayscale spectrograms. The spectrograms are obtained from the audio signal samples from the considered repository. Experiments are conducted with single hidden layer and multilayer extreme learning machine networks comparing the results of each structure. Results show that hierarchical extreme learning machine with three hidden layers has a better general performance over multilayer extreme learning machine networks and a single hidden layer extreme learning machine. The rate of success obtained is within the ranges presented in the literature. However, the hierarchical network training time is considerably faster compared to multilayer networks of three or two hidden layers.

2021 ◽  
Vol 4 (4) ◽  
Roman O. Yaroshenko

The visualisation systems are spread widely as personal computer’s software. The system, that are processing audio data are presented in this article. The system visualizes the ratio of spectrum amplitudes and has fixed frequency binding to colours. The technology of audio signals processing by the device and components of the device were considered. For the increasing information processing speed was used 32bit controller and graphic equalizer with seven passbands. Music visualization it is function, that are spread widely in mediaplayer’s software, on a different operation systems. This function shows animated images that are depends on music signal. Images are usually reproduced in the real time mode and synchronized with a played audio-track. Music and visualization are merges in the different kind of art: opera, ballett, music drama or movies. Dependencies of auditory and visual sensations are used for increasing the emotional perseption for ordinary listeners . In the systems, that are currently being actively promoted, are used several tools for personal computers, such as: After Effects – The Audio Spectrum Effect, VSDC Video Editor Free – Audio Spectrum Visualizer, Magic Music Visuals. The software, that are mentioned above, has a one disadvantage: the using of streaming video is not possible with the simultaneous receipt of audio and requires processing and rendering of the resulting video series. The purpose of the work is to determine the features of spectral analysis of music information and taking into account real-time data processing. Propose a variant of the music information visualization system, which displays the spectral composition of music and the amplitude of individual harmonics, and filling the LED-matrix with the appropriate color depending on the amplitude of the audio signal, with the possibility of wireless signal transmission from the music source to the visual effects device. The technology of frequency analysis of the spectrum with estimation of amplitude of spectrum’s components of the musical data, that is arriving on the device is chosen for this project. The method is based on the analysis of the spectrum in the selected frequency bands, which in turn simplifies the function of finding maxima at different frequencies. The proposed variant of the musical information visualization system provides display on the LED-matrix of colors that correspond to the frequencies spectrum’s components in the musical composition. Moreover, the number of involved LEDs is proportional to the ratio of the amplitudes of the signal’s frequency components. The desired result is achieved by using a Fast Fourier Transform and selecting Khan or Heming windows for providing a better analysis results of the signal spectrum. The amplitudes of the individual components of the spectrum are estimated additionally and each frequency band has its own color. The work of the system is to analyze the components of the spectrum and frequency of musical information. This information affects the display of colors on the LED matrix. The using of a 32-bit microcontroller provides sufficient speed of audio signal processing with minimal delays. For the increasing the accuracy and speed up the frequency analysis, the sound range is divided into seven bands. For this purpose was used seven-band graphic equalizer MSGEQ7. Music information is transmitted to the system via Bluetooth, which greatly simplifies the selection and connection of the music data source.

Symmetry ◽  
2021 ◽  
Vol 14 (1) ◽  
pp. 17
Wanying Dai ◽  
Xiangliang Xu ◽  
Xiaoming Song ◽  
Guodong Li

The data space for audio signals is large, the correlation is strong, and the traditional encryption algorithm cannot meet the needs of efficiency and safety. To solve this problem, an audio encryption algorithm based on Chen memristor chaotic system is proposed. The core idea of the algorithm is to encrypt the audio signal into the color image information. Most of the traditional audio encryption algorithms are transmitted in the form of noise, which makes it easy to attract the attention of attackers. In this paper, a special encryption method is used to obtain higher security. Firstly, the Fast Walsh–Hadamar Transform (FWHT) is used to compress and denoise the signal. Different from the Fast Fourier Transform (FFT) and the Discrete Cosine Transform (DCT), FWHT has good energy compression characteristics. In addition, compared with that of the triangular basis function of the Fast Fourier Transform, the rectangular basis function of the FWHT can be more effectively implemented in the digital circuit to transform the reconstructed dual-channel audio signal into the R and B layers of the digital image matrix, respectively. Furthermore, a new Chen memristor chaotic system solves the periodic window problems, such as the limited chaos range and nonuniform distribution. It can generate a mask block with high complexity and fill it into the G layer of the color image matrix to obtain a color audio image. In the next place, combining plaintext information with color audio images, interactive channel shuffling can not only weaken the correlation between adjacent samples, but also effectively resist selective plaintext attacks. Finally, the cryptographic block is used for overlapping diffusion encryption to fill the silence period of the speech signal, so as to obtain the ciphertext audio. Experimental results and comparative analysis show that the algorithm is suitable for different types of audio signals, and can resist many common cryptographic analysis attacks. Compared with that of similar audio encryption algorithms, the security index of the algorithm is better, and the efficiency of the algorithm is greatly improved.

2021 ◽  
Azza Dandooh ◽  
Adel S. El‐Fishawy ◽  
Fathi E. Abd El‐Samie ◽  
Ezz El‐Din Hemdan

Sign in / Sign up

Export Citation Format

Share Document