Deep Learning Approach for Protecting Voice-Controllable Devices From Laser Attacks

2022 ◽  
pp. 125-142
Author(s):  
Vijay Srinivas Srinivas Tida ◽  
Raghabendra Shah ◽  
Xiali Hei

The laser-based audio signal injection can be used for attacking voice controllable systems. An attacker can aim an amplitude-modulated light at the microphone's aperture, and the signal injection acts as a remote voice-command attack on voice-controllable systems. Attackers are using vulnerabilities to steal things that are in the form of physical devices or the form of virtual using making orders, withdrawal of money, etc. Therefore, detection of these signals is important because almost every device can be attacked using these amplitude-modulated laser signals. In this project, the authors use deep learning to detect the incoming signals as normal voice commands or laser-based audio signals. Mel frequency cepstral coefficients (MFCC) are derived from the audio signals to classify the input audio signals. If the audio signals are identified as laser signals, the voice command can be disabled, and an alert can be displayed to the victim. The maximum accuracy of the machine learning model was 100%, and in the real world, it's around 95%.

Sensors ◽  
2021 ◽  
Vol 21 (3) ◽  
pp. 676
Author(s):  
Andrej Zgank

Animal activity acoustic monitoring is becoming one of the necessary tools in agriculture, including beekeeping. It can assist in the control of beehives in remote locations. It is possible to classify bee swarm activity from audio signals using such approaches. A deep neural networks IoT-based acoustic swarm classification is proposed in this paper. Audio recordings were obtained from the Open Source Beehive project. Mel-frequency cepstral coefficients features were extracted from the audio signal. The lossless WAV and lossy MP3 audio formats were compared for IoT-based solutions. An analysis was made of the impact of the deep neural network parameters on the classification results. The best overall classification accuracy with uncompressed audio was 94.09%, but MP3 compression degraded the DNN accuracy by over 10%. The evaluation of the proposed deep neural networks IoT-based bee activity acoustic classification showed improved results if compared to the previous hidden Markov models system.


2021 ◽  
Author(s):  
Monika Gupta ◽  
R K Singh ◽  
Sachin Singh

Abstract The pandemic caused due to COVID-19, has seen things going online. People tired of typing prefer to give voice commands. Most of the voice based applications and devices are not prepared to handle the native languages. Moreover, in a party environment it is difficult to identify a voice command as there are many speakers. The proposed work addresses the Cocktail party problem of Indian language, Gujarati. The voice response systems like, Siri, Alexa, Google Assistant as of now work on single voice command. The proposed algorithm G- Cocktail would help these applications to identify command given in Gujarati even from a mixed voice signal. Benchmark Dataset is taken from Microsoft and Linguistic Data Consortium for Indian Languages(LDC-IL) comprising single words and phrases. G-Cocktail utilizes the power of CatBoost algorithm to classify and identify the voice. Voice print of the entire sound files is created using Pitch, and Mel Frequency Cepstral Coefficients (MFCC). Seventy percent of the voice prints are used to train the network and thirty percent for testing. The proposed work is tested and compared with K-means, Naïve Bayes, and LightGBM.


2021 ◽  
Vol 18 (2(Suppl.)) ◽  
pp. 0925
Author(s):  
Asroni Asroni ◽  
Ku Ruhana Ku-Mahamud ◽  
Cahya Damarjati ◽  
Hasan Basri Slamat

Deep learning convolution neural network has been widely used to recognize or classify voice. Various techniques have been used together with convolution neural network to prepare voice data before the training process in developing the classification model. However, not all model can produce good classification accuracy as there are many types of voice or speech. Classification of Arabic alphabet pronunciation is a one of the types of voice and accurate pronunciation is required in the learning of the Qur’an reading. Thus, the technique to process the pronunciation and training of the processed data requires specific approach. To overcome this issue, a method based on padding and deep learning convolution neural network is proposed to evaluate the pronunciation of the Arabic alphabet. Voice data from six school children are recorded and used to test the performance of the proposed method. The padding technique has been used to augment the voice data before feeding the data to the CNN structure to developed the classification model. In addition, three other feature extraction techniques have been introduced to enable the comparison of the proposed method which employs padding technique. The performance of the proposed method with padding technique is at par with the spectrogram but better than mel-spectrogram and mel-frequency cepstral coefficients. Results also show that the proposed method was able to distinguish the Arabic alphabets that are difficult to pronounce. The proposed method with padding technique may be extended to address other voice pronunciation ability other than the Arabic alphabets.


2019 ◽  
Vol 11 (01) ◽  
pp. 20-25
Author(s):  
Indra Saputra ◽  
Parulian Silalahi ◽  
Bayu Cahyawan ◽  
Imam Akbar

Bicycles are not equipped with the turn signal. For driving safety, a bicycle helmet with a turn signal is designed with voice rrecognition. It is using the Arduino Nano as a controller to control the ON and OFF of turn signal lights with voice commands. This device uses a Voice Recognition sensor and microphone that placed on a bicycle helmet. When the voice command is mentioned in the microphone, the Voice Recognition sensor will detect the command specified, the sensor will automatically read and send a signal to Arduino, then the turn signal will light up as instructed, the Arduino on the helmet will send an indicator signal via the Bluetooth Module. The device is able to detect sound with a percentage of 80%. The tool can work with a distance of <2 meters with noise <71 db.


Electronics ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 1349
Author(s):  
Stefan Lattner ◽  
Javier Nistal

Lossy audio codecs compress (and decompress) digital audio streams by removing information that tends to be inaudible in human perception. Under high compression rates, such codecs may introduce a variety of impairments in the audio signal. Many works have tackled the problem of audio enhancement and compression artifact removal using deep-learning techniques. However, only a few works tackle the restoration of heavily compressed audio signals in the musical domain. In such a scenario, there is no unique solution for the restoration of the original signal. Therefore, in this study, we test a stochastic generator of a Generative Adversarial Network (GAN) architecture for this task. Such a stochastic generator, conditioned on highly compressed musical audio signals, could one day generate outputs indistinguishable from high-quality releases. Therefore, the present study may yield insights into more efficient musical data storage and transmission. We train stochastic and deterministic generators on MP3-compressed audio signals with 16, 32, and 64 kbit/s. We perform an extensive evaluation of the different experiments utilizing objective metrics and listening tests. We find that the models can improve the quality of the audio signals over the MP3 versions for 16 and 32 kbit/s and that the stochastic generators are capable of generating outputs that are closer to the original signals than those of the deterministic generators.


Author(s):  
Hanaa Torkey ◽  
Elhossiny Ibrahim ◽  
EZZ El-Din Hemdan ◽  
Ayman El-Sayed ◽  
Marwa A. Shouman

AbstractCommunication between sensors spread everywhere in healthcare systems may cause some missing in the transferred features. Repairing the data problems of sensing devices by artificial intelligence technologies have facilitated the Medical Internet of Things (MIoT) and its emerging applications in Healthcare. MIoT has great potential to affect the patient's life. Data collected from smart wearable devices size dramatically increases with data collected from millions of patients who are suffering from diseases such as diabetes. However, sensors or human errors lead to missing some values of the data. The major challenge of this problem is how to predict this value to maintain the data analysis model performance within a good range. In this paper, a complete healthcare system for diabetics has been used, as well as two new algorithms are developed to handle the crucial problem of missed data from MIoT wearable sensors. The proposed work is based on the integration of Random Forest, mean, class' mean, interquartile range (IQR), and Deep Learning to produce a clean and complete dataset. Which can enhance any machine learning model performance. Moreover, the outliers repair technique is proposed based on dataset class detection, then repair it by Deep Learning (DL). The final model accuracy with the two steps of imputation and outliers repair is 97.41% and 99.71% Area Under Curve (AUC). The used healthcare system is a web-based diabetes classification application using flask to be used in hospitals and healthcare centers for the patient diagnosed with an effective fashion.


Animals ◽  
2021 ◽  
Vol 11 (2) ◽  
pp. 357
Author(s):  
Dae-Hyun Jung ◽  
Na Yeon Kim ◽  
Sang Ho Moon ◽  
Changho Jhin ◽  
Hak-Jin Kim ◽  
...  

The priority placed on animal welfare in the meat industry is increasing the importance of understanding livestock behavior. In this study, we developed a web-based monitoring and recording system based on artificial intelligence analysis for the classification of cattle sounds. The deep learning classification model of the system is a convolutional neural network (CNN) model that takes voice information converted to Mel-frequency cepstral coefficients (MFCCs) as input. The CNN model first achieved an accuracy of 91.38% in recognizing cattle sounds. Further, short-time Fourier transform-based noise filtering was applied to remove background noise, improving the classification model accuracy to 94.18%. Categorized cattle voices were then classified into four classes, and a total of 897 classification records were acquired for the classification model development. A final accuracy of 81.96% was obtained for the model. Our proposed web-based platform that provides information obtained from a total of 12 sound sensors provides cattle vocalization monitoring in real time, enabling farm owners to determine the status of their cattle.


Sign in / Sign up

Export Citation Format

Share Document