Audio Keyword Reconstruction from On-Device Motion Sensor Signals via Neural Frequency Unfolding

Author(s):  
Tianshi Wang ◽  
Shuochao Yao ◽  
Shengzhong Liu ◽  
Jinyang Li ◽  
Dongxin Liu ◽  
...  

In this paper, we present a novel deep neural network architecture that reconstructs the high-frequency audio of selected spoken human words from low-sampling-rate signals of (ego-)motion sensors, such as accelerometer and gyroscope data, recorded on everyday mobile devices. As the sampling rate of such motion sensors is much lower than the Nyquist rate of ordinary human voice (around 6kHz+), these motion sensor recordings suffer from a significant frequency aliasing effect. In order to recover the original high-frequency audio signal, our neural network introduces a novel layer, called the alias unfolding layer, specialized in expanding the bandwidth of an aliased signal by reversing the frequency folding process in the time-frequency domain. While perfect unfolding is known to be unrealizable, we leverage the sparsity of the original signal to arrive at a sufficiently accurate statistical approximation. Comprehensive experiments show that our neural network significantly outperforms the state of the art in audio reconstruction from motion sensor data, effectively reconstructing a pre-trained set of spoken keywords from low-frequency motion sensor signals (with a sampling rate of 100-400 Hz). The approach demonstrates the potential risk of information leakage from motion sensors in smart mobile devices.

Sensors ◽  
2019 ◽  
Vol 19 (3) ◽  
pp. 546 ◽  
Author(s):  
Haibin Yu ◽  
Guoxiong Pan ◽  
Mian Pan ◽  
Chong Li ◽  
Wenyan Jia ◽  
...  

Recently, egocentric activity recognition has attracted considerable attention in the pattern recognition and artificial intelligence communities because of its wide applicability in medical care, smart homes, and security monitoring. In this study, we developed and implemented a deep-learning-based hierarchical fusion framework for the recognition of egocentric activities of daily living (ADLs) in a wearable hybrid sensor system comprising motion sensors and cameras. Long short-term memory (LSTM) and a convolutional neural network are used to perform egocentric ADL recognition based on motion sensor data and photo streaming in different layers, respectively. The motion sensor data are used solely for activity classification according to motion state, while the photo stream is used for further specific activity recognition in the motion state groups. Thus, both motion sensor data and photo stream work in their most suitable classification mode to significantly reduce the negative influence of sensor differences on the fusion results. Experimental results show that the proposed method not only is more accurate than the existing direct fusion method (by up to 6%) but also avoids the time-consuming computation of optical flow in the existing method, which makes the proposed algorithm less complex and more suitable for practical application.


2020 ◽  
Vol 10 (18) ◽  
pp. 6591
Author(s):  
Do-Soo Kwon ◽  
Chungkuk Jin ◽  
MooHyun Kim ◽  
Weoncheol Koo

This paper presents a machine learning method for detecting the mooring failures of SFT (submerged floating tunnel) based on DNN (deep neural network). The floater-mooring-coupled hydro-elastic time-domain numerical simulations are conducted under various random wave excitations and failure/intact scenarios. Then, the big-data is collected at various locations of numerical motion sensors along the SFT to be used for the present DNN algorithm. In the input layer, tunnel motion-sensor signals and wave conditions are inputted while the output layer provides the probabilities of 21 failure scenarios. In the optimization stage, the numbers of hidden layers, neurons of each layer, and epochs for reliable performance are selected. Several activation functions and optimizers are also tested for the present DNN model, and Sigmoid function and Adamax are respectively adopted to enhance the classification accuracy. Moreover, a systematic sensitivity test with respect to the numbers and arrangements of sensors is performed to find the appropriate sensor combination to achieve target prediction accuracy. The technique of confusion matrix is used to represent the accuracy of the DNN algorithms for various cases, and the classification accuracy as high as 98.1% is obtained with seven sensors. The results of this study demonstrate that the DNN model can effectively monitor the mooring failures of SFTs utilizing real-time sensor signals.


Author(s):  
Musa Peker ◽  
Serkan Ballı ◽  
Ensar Arif Sağbaş

Human activity recognition (HAR) is a growing field that provides valuable information about a person. Sensor-equipped smartwatches stand out in these studies in terms of their portability and cost. HAR systems usually preprocess raw signals, decompose signals, and then extract attributes to be used in the classifier. Attribute selection is an important step to reduce data size and provide appropriate parameters. In this chapter, classification of eight different actions (brushing teeth, walking, running, vacuuming, writing on the board, writing on paper, using the keyboard, and stationary) has been performed with smartwatch motion sensor data. Forty-two different features have been extracted from the motion sensor signals and the feature selection has been performed with the ReliefF algorithm. After that, performance evaluation has been performed with four different machine learning methods. With this study in which the best results have been obtained with the kernel-based extreme learning machine (KELM) algorithm, estimation of human action has been performed with high accuracy.


2021 ◽  
Author(s):  
Pierre Rouge ◽  
Ali Moukadem ◽  
Alain Dieterlen ◽  
Antoine Boutet ◽  
Carole Frindel

Author(s):  
K Ashwini ◽  
P M Durai Raj Vincent

Background: The cry is the universal language for babies to communicate with others. Infant cry classification is a kind of speech recognition problem that should be treated wisely. In the last few years, it has been gaining its momentum which will be very helpful for the caretaker. Objective: This study aims to develop infant cry classification system predictive model by converting the audio signals into spectrogram image then implementing deep convolutional neural network. It performs end to end learning process and thereby reducing the complexity involved in audio signal analysis and improves the performance using optimization technique. Method: A time frequency-based analysis called Short Time Fourier Transform (STFT) is applied to generate the spectrogram. 256 DFT (Discrete Fourier Transform) points are considered to compute the Fourier transform. A Deep convolutional neural network called AlexNet with few enhancements is done in this work to classify the recorded infant cry. To improve the effectiveness of the above mentioned neural network, Stochastic Gradient Descent with Momentum (SGDM) is used to train the algorithm. Results: A deep neural network-based infant cry classification system achieves a maximum accuracy of 95% in the classification of sleepy cries. The result shows that convolutional neural network with SGDM optimization acquires higher prediction accuracy. Conclusion: Since this proposed work is compared with convolutional neural network with SGD and Naïve Bayes and based on the result, it is implied the convolutional neural network with SGDM performs better than the other techniques.


Complexity ◽  
2018 ◽  
Vol 2018 ◽  
pp. 1-10 ◽  
Author(s):  
Vladimir A. Maksimenko ◽  
Semen A. Kurkin ◽  
Elena N. Pitsik ◽  
Vyacheslav Yu. Musatov ◽  
Anastasia E. Runnova ◽  
...  

We apply artificial neural network (ANN) for recognition and classification of electroencephalographic (EEG) patterns associated with motor imagery in untrained subjects. Classification accuracy is optimized by reducing complexity of input experimental data. From multichannel EEG recorded by the set of 31 electrodes arranged according to extended international 10-10 system, we select an appropriate type of ANN which reaches 80 ± 10% accuracy for single trial classification. Then, we reduce the number of the EEG channels and obtain an appropriate recognition quality (up to 73 ± 15%) using only 8 electrodes located in frontal lobe. Finally, we analyze the time-frequency structure of EEG signals and find that motor-related features associated with left and right leg motor imagery are more pronounced in the mu (8–13 Hz) and delta (1–5 Hz) brainwaves than in the high-frequency beta brainwave (15–30 Hz). Based on the obtained results, we propose further ANN optimization by preprocessing the EEG signals with a low-pass filter with different cutoffs. We demonstrate that the filtration of high-frequency spectral components significantly enhances the classification performance (up to 90 ± 5% accuracy using 8 electrodes only). The obtained results are of particular interest for the development of brain-computer interfaces for untrained subjects.


Mathematics ◽  
2021 ◽  
Vol 9 (15) ◽  
pp. 1786
Author(s):  
Cezara Benegui ◽  
Radu Tudor Ionescu

In this paper, we propose an enhanced version of the Authentication with Built-in Camera (ABC) protocol by employing a deep learning solution based on built-in motion sensors. The standard ABC protocol identifies mobile devices based on the photo-response non-uniformity (PRNU) of the camera sensor, while also considering QR-code-based meta-information. During registration, users are required to capture photos using their smartphone camera. The photos are sent to a server that computes the camera fingerprint, storing it as an authentication trait. During authentication, the user is required to take two photos that contain two QR codes presented on a screen. The presented QR code images also contain a unique probe signal, similar to a camera fingerprint, generated by the protocol. During verification, the server computes the fingerprint of the received photos and authenticates the user if (i) the probe signal is present, (ii) the metadata embedded in the QR codes is correct and (iii) the camera fingerprint is identified correctly. However, the protocol is vulnerable to forgery attacks when the attacker can compute the camera fingerprint from external photos, as shown in our preliminary work. Hence, attackers can easily remove their PRNU from the authentication photos without completely altering the probe signal, resulting in attacks that bypass the defense systems of the ABC protocol. In this context, we propose an enhancement to the ABC protocol, using motion sensor data as an additional and passive authentication layer. Smartphones can be identified through their motion sensor data, which, unlike photos, is never posted by users on social media platforms, thus being more secure than using photographs alone. To this end, we transform motion signals into embedding vectors produced by deep neural networks, applying Support Vector Machines for the smartphone identification task. Our change to the ABC protocol results in a multi-modal protocol that lowers the false acceptance rate for the attack proposed in our previous work to a percentage as low as 0.07%. In this paper, we present the attack that makes ABC vulnerable, as well as our multi-modal ABC protocol along with relevant experiments and results.


2020 ◽  
Vol 10 (23) ◽  
pp. 8482
Author(s):  
Konstantinos Peppas ◽  
Apostolos C. Tsolakis ◽  
Stelios Krinidis ◽  
Dimitrios Tzovaras

Given the ubiquity of mobile devices, understanding the context of human activity with non-intrusive solutions is of great value. A novel deep neural network model is proposed, which combines feature extraction and convolutional layers, able to recognize human physical activity in real-time from tri-axial accelerometer data when run on a mobile device. It uses a two-layer convolutional neural network to extract local features, which are combined with 40 statistical features and are fed to a fully-connected layer. It improves the classification performance, while it takes up 5–8 times less storage space and outputs more than double the throughput of the current state-of-the-art user-independent implementation on the Wireless Sensor Data Mining (WISDM) dataset. It achieves 94.18% classification accuracy on a 10-fold user-independent cross-validation of the WISDM dataset. The model is further tested on the Actitracker dataset, achieving 79.12% accuracy, while the size and throughput of the model are evaluated on a mobile device.


Geophysics ◽  
2019 ◽  
Vol 84 (1) ◽  
pp. K1-K9 ◽  
Author(s):  
Xin Wu ◽  
Guoqiang Xue ◽  
Pan Xiao ◽  
Jutao Li ◽  
Lihua Liu ◽  
...  

In helicopter-borne transient electromagnetic (HTEM) signal processing, removal of motion-induced noise is one of the most important steps. A special type of short-term noise, which could be classified as high-frequency motion-induced noise (HFM noise) based on its cause and time-frequency features, was observed in the field data of the Chinese Academy of Sciences-HTEM system. Because the HFM noise is an in-band noise for the HTEM response, it usually remains after the normal denoising procedure developed for the conventional motion-induced noise. To solve this problem, we have developed a three-stage workflow to remove the HFM noise using the wavelet neural network (WNN). In the first stage, the WNN training is performed, and the data segment in which the HFM noise is dominant is selected as the sample set. In the second stage, the HFM noise corresponding to the data segment in which the earth’s response coexisted with the HFM noise is predicted using the well-trained WNN. In the last stage, the predicted HFM noise is removed from the corresponding original data. As an example, we applied our workflow in the field data observed in Inner Mongolia, the HFM noise is removed effectively, and the results provide a strong data foundation for the subsequent processing procedures.


Sign in / Sign up

Export Citation Format

Share Document