Deep Learning Approach for Protecting Voice-Controllable Devices From Laser Attacks

The laser-based audio signal injection can be used for attacking voice controllable systems. An attacker can aim an amplitude-modulated light at the microphone's aperture, and the signal injection acts as a remote voice-command attack on voice-controllable systems. Attackers are using vulnerabilities to steal things that are in the form of physical devices or the form of virtual using making orders, withdrawal of money, etc. Therefore, detection of these signals is important because almost every device can be attacked using these amplitude-modulated laser signals. In this project, the authors use deep learning to detect the incoming signals as normal voice commands or laser-based audio signals. Mel frequency cepstral coefficients (MFCC) are derived from the audio signals to classify the input audio signals. If the audio signals are identified as laser signals, the voice command can be disabled, and an alert can be displayed to the victim. The maximum accuracy of the machine learning model was 100%, and in the real world, it's around 95%.

Download Full-text

IoT-Based Bee Swarm Activity Acoustic Classification Using Deep Neural Networks

Sensors ◽

10.3390/s21030676 ◽

2021 ◽

Vol 21 (3) ◽

pp. 676

Author(s):

Andrej Zgank

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Markov Models ◽

Audio Signal ◽

Audio Signals ◽

Mel Frequency Cepstral Coefficients ◽

Animal Activity ◽

The Impact ◽

Acoustic Classification ◽

Swarm Activity

Animal activity acoustic monitoring is becoming one of the necessary tools in agriculture, including beekeeping. It can assist in the control of beehives in remote locations. It is possible to classify bee swarm activity from audio signals using such approaches. A deep neural networks IoT-based acoustic swarm classification is proposed in this paper. Audio recordings were obtained from the Open Source Beehive project. Mel-frequency cepstral coefficients features were extracted from the audio signal. The lossless WAV and lossy MP3 audio formats were compared for IoT-based solutions. An analysis was made of the impact of the deep neural network parameters on the classification results. The best overall classification accuracy with uncompressed audio was 94.09%, but MP3 compression degraded the DNN accuracy by over 10%. The evaluation of the proposed deep neural networks IoT-based bee activity acoustic classification showed improved results if compared to the previous hidden Markov models system.

Download Full-text

G-Cocktail: An Algorithm to Address Cocktail Party Problem of Gujarati Language using CatBoost

10.21203/rs.3.rs-305722/v1 ◽

2021 ◽

Author(s):

Monika Gupta ◽

R K Singh ◽

Sachin Singh

Keyword(s):

Indian Languages ◽

Cocktail Party ◽

Mel Frequency Cepstral Coefficients ◽

Indian Language ◽

Native Languages ◽

Voice Command ◽

Cocktail Party Problem ◽

Voice Signal ◽

Gujarati Language ◽

The Voice

Abstract The pandemic caused due to COVID-19, has seen things going online. People tired of typing prefer to give voice commands. Most of the voice based applications and devices are not prepared to handle the native languages. Moreover, in a party environment it is difficult to identify a voice command as there are many speakers. The proposed work addresses the Cocktail party problem of Indian language, Gujarati. The voice response systems like, Siri, Alexa, Google Assistant as of now work on single voice command. The proposed algorithm G- Cocktail would help these applications to identify command given in Gujarati even from a mixed voice signal. Benchmark Dataset is taken from Microsoft and Linguistic Data Consortium for Indian Languages(LDC-IL) comprising single words and phrases. G-Cocktail utilizes the power of CatBoost algorithm to classify and identify the voice. Voice print of the entire sound files is created using Pitch, and Mel Frequency Cepstral Coefficients (MFCC). Seventy percent of the voice prints are used to train the network and thirty percent for testing. The proposed work is tested and compared with K-means, Naïve Bayes, and LightGBM.

Download Full-text

Arabic Speech Classification Method Based on Padding and Deep Learning Neural Network

Baghdad Science Journal ◽

10.21123/bsj.2021.18.2(suppl.).0925 ◽

2021 ◽

Vol 18 (2(Suppl.)) ◽

pp. 0925

Author(s):

Asroni Asroni ◽

Ku Ruhana Ku-Mahamud ◽

Cahya Damarjati ◽

Hasan Basri Slamat

Keyword(s):

Neural Network ◽

Deep Learning ◽

Convolution Neural Network ◽

Classification Model ◽

Mel Frequency Cepstral Coefficients ◽

Speech Classification ◽

Deep Learning Neural Network ◽

Voice Data ◽

The Voice

Deep learning convolution neural network has been widely used to recognize or classify voice. Various techniques have been used together with convolution neural network to prepare voice data before the training process in developing the classification model. However, not all model can produce good classification accuracy as there are many types of voice or speech. Classification of Arabic alphabet pronunciation is a one of the types of voice and accurate pronunciation is required in the learning of the Qur’an reading. Thus, the technique to process the pronunciation and training of the processed data requires specific approach. To overcome this issue, a method based on padding and deep learning convolution neural network is proposed to evaluate the pronunciation of the Arabic alphabet. Voice data from six school children are recorded and used to test the performance of the proposed method. The padding technique has been used to augment the voice data before feeding the data to the CNN structure to developed the classification model. In addition, three other feature extraction techniques have been introduced to enable the comparison of the proposed method which employs padding technique. The performance of the proposed method with padding technique is at par with the spectrogram but better than mel-spectrogram and mel-frequency cepstral coefficients. Results also show that the proposed method was able to distinguish the Arabic alphabets that are difficult to pronounce. The proposed method with padding technique may be extended to address other voice pronunciation ability other than the Arabic alphabets.

Download Full-text

Lampu Sein Helm Sepeda Berbasis Voice Recognition

Manutech : Jurnal Teknologi Manufaktur ◽

10.33504/manutech.v11i01.96 ◽

2019 ◽

Vol 11 (01) ◽

pp. 20-25

Author(s):

Indra Saputra ◽

Parulian Silalahi ◽

Bayu Cahyawan ◽

Imam Akbar

Keyword(s):

Voice Recognition ◽

Driving Safety ◽

Bicycle Helmet ◽

Turn Signal ◽

Voice Command ◽

The Voice ◽

Indicator Signal

Bicycles are not equipped with the turn signal. For driving safety, a bicycle helmet with a turn signal is designed with voice rrecognition. It is using the Arduino Nano as a controller to control the ON and OFF of turn signal lights with voice commands. This device uses a Voice Recognition sensor and microphone that placed on a bicycle helmet. When the voice command is mentioned in the microphone, the Voice Recognition sensor will detect the command specified, the sensor will automatically read and send a signal to Arduino, then the turn signal will light up as instructed, the Arduino on the helmet will send an indicator signal via the Bluetooth Module. The device is able to detect sound with a percentage of 80%. The tool can work with a distance of <2 meters with noise <71 db.

Download Full-text

Stochastic Restoration of Heavily Compressed Musical Audio Using Generative Adversarial Networks

Electronics ◽

10.3390/electronics10111349 ◽

2021 ◽

Vol 10 (11) ◽

pp. 1349

Author(s):

Stefan Lattner ◽

Javier Nistal

Keyword(s):

Data Storage ◽

Audio Signal ◽

Human Perception ◽

Generative Adversarial Networks ◽

Audio Signals ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Extensive Evaluation ◽

Listening Tests ◽

Musical Audio

Lossy audio codecs compress (and decompress) digital audio streams by removing information that tends to be inaudible in human perception. Under high compression rates, such codecs may introduce a variety of impairments in the audio signal. Many works have tackled the problem of audio enhancement and compression artifact removal using deep-learning techniques. However, only a few works tackle the restoration of heavily compressed audio signals in the musical domain. In such a scenario, there is no unique solution for the restoration of the original signal. Therefore, in this study, we test a stochastic generator of a Generative Adversarial Network (GAN) architecture for this task. Such a stochastic generator, conditioned on highly compressed musical audio signals, could one day generate outputs indistinguishable from high-quality releases. Therefore, the present study may yield insights into more efficient musical data storage and transmission. We train stochastic and deterministic generators on MP3-compressed audio signals with 16, 32, and 64 kbit/s. We perform an extensive evaluation of the different experiments utilizing objective metrics and listening tests. We find that the models can improve the quality of the audio signals over the MP3 versions for 16 and 32 kbit/s and that the stochastic generators are capable of generating outputs that are closer to the original signals than those of the deterministic generators.

Download Full-text

Multilayered convolutional neural network-based auto-CODEC for audio signal denoising using mel-frequency cepstral coefficients

Neural Computing and Applications ◽

10.1007/s00521-021-05782-5 ◽

2021 ◽

Author(s):

Shivangi Raj ◽

P. Prakasam ◽

Shubham Gupta

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Audio Signal ◽

Signal Denoising ◽

Mel Frequency Cepstral Coefficients ◽

Cepstral Coefficients

Download Full-text

Digit Recognition Applied to Reconstructed Audio Signals Using Deep Learning

2020 25th International Conference on Pattern Recognition (ICPR) ◽

10.1109/icpr48806.2021.9413183 ◽

2021 ◽

Author(s):

Anastasia-Sotiria Toufa ◽

Constantine Kotropoulos

Keyword(s):

Deep Learning ◽

Audio Signals ◽

Digit Recognition

Download Full-text

Diabetes classification application with efficient missing and outliers data handling algorithms

Complex & Intelligent Systems ◽

10.1007/s40747-021-00349-2 ◽

2021 ◽

Author(s):

Hanaa Torkey ◽

Elhossiny Ibrahim ◽

EZZ El-Din Hemdan ◽

Ayman El-Sayed ◽

Marwa A. Shouman

Keyword(s):

Deep Learning ◽

Healthcare System ◽

Wearable Sensors ◽

Model Performance ◽

Analysis Model ◽

Human Errors ◽

Web Based ◽

Machine Learning Model ◽

Complete Dataset ◽

Smart Wearable

AbstractCommunication between sensors spread everywhere in healthcare systems may cause some missing in the transferred features. Repairing the data problems of sensing devices by artificial intelligence technologies have facilitated the Medical Internet of Things (MIoT) and its emerging applications in Healthcare. MIoT has great potential to affect the patient's life. Data collected from smart wearable devices size dramatically increases with data collected from millions of patients who are suffering from diseases such as diabetes. However, sensors or human errors lead to missing some values of the data. The major challenge of this problem is how to predict this value to maintain the data analysis model performance within a good range. In this paper, a complete healthcare system for diabetics has been used, as well as two new algorithms are developed to handle the crucial problem of missed data from MIoT wearable sensors. The proposed work is based on the integration of Random Forest, mean, class' mean, interquartile range (IQR), and Deep Learning to produce a clean and complete dataset. Which can enhance any machine learning model performance. Moreover, the outliers repair technique is proposed based on dataset class detection, then repair it by Deep Learning (DL). The final model accuracy with the two steps of imputation and outliers repair is 97.41% and 99.71% Area Under Curve (AUC). The used healthcare system is a web-based diabetes classification application using flask to be used in hospitals and healthcare centers for the patient diagnosed with an effective fashion.

Download Full-text

Deep Learning for Audio Signal Source Positioning Using Microphone Array

2019 Seventh International Conference on Digital Information Processing and Communications (ICDIPC) ◽

10.1109/icdipc.2019.8723738 ◽

2019 ◽

Author(s):

Resul Adanur ◽

Yildiray Yesilyurt ◽

Cem Sisman ◽

Selim Sagir ◽

Ismail Kaya

Keyword(s):

Deep Learning ◽

Microphone Array ◽

Audio Signal ◽

Signal Source ◽

Source Positioning

Download Full-text

Deep Learning-Based Cattle Vocal Classification Model and Real-Time Livestock Monitoring System with Noise Filtering

Animals ◽

10.3390/ani11020357 ◽

2021 ◽

Vol 11 (2) ◽

pp. 357

Author(s):

Dae-Hyun Jung ◽

Na Yeon Kim ◽

Sang Ho Moon ◽

Changho Jhin ◽

Hak-Jin Kim ◽

...

Keyword(s):

Deep Learning ◽

Real Time ◽

Model Development ◽

Classification Model ◽

Intelligence Analysis ◽

Noise Filtering ◽

Meat Industry ◽

Mel Frequency Cepstral Coefficients ◽

Web Based ◽

The Status

The priority placed on animal welfare in the meat industry is increasing the importance of understanding livestock behavior. In this study, we developed a web-based monitoring and recording system based on artificial intelligence analysis for the classification of cattle sounds. The deep learning classification model of the system is a convolutional neural network (CNN) model that takes voice information converted to Mel-frequency cepstral coefficients (MFCCs) as input. The CNN model first achieved an accuracy of 91.38% in recognizing cattle sounds. Further, short-time Fourier transform-based noise filtering was applied to remove background noise, improving the classification model accuracy to 94.18%. Categorized cattle voices were then classified into four classes, and a total of 897 classification records were acquired for the classification model development. A final accuracy of 81.96% was obtained for the model. Our proposed web-based platform that provides information obtained from a total of 12 sound sensors provides cattle vocalization monitoring in real time, enabling farm owners to determine the status of their cattle.

Download Full-text