Voice pathology detection and classification from speech signals and EGG signals based on a multimodal fusion method

2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Lei Geng ◽  
Hongfeng Shan ◽  
Zhitao Xiao ◽  
Wei Wang ◽  
Mei Wei

Abstract Automatic voice pathology detection and classification plays an important role in the diagnosis and prevention of voice disorders. To accurately describe the pronunciation characteristics of patients with dysarthria and improve the effect of pathological voice detection, this study proposes a pathological voice detection method based on a multi-modal network structure. First, speech signals and electroglottography (EGG) signals are mapped from the time domain to the frequency domain spectrogram via a short-time Fourier transform (STFT). The Mel filter bank acts on the spectrogram to enhance the signal’s harmonics and denoise. Second, a pre-trained convolutional neural network (CNN) is used as the backbone network to extract sound state features and vocal cord vibration features from the two signals. To obtain a better classification effect, the fused features are input into the long short-term memory (LSTM) network for voice feature selection and enhancement. The proposed system achieves 95.73% for accuracy with 96.10% F1-score and 96.73% recall using the Saarbrucken Voice Database (SVD); thus, enabling a new method for pathological speech detection.

2020 ◽  
Vol 218 ◽  
pp. 01050
Author(s):  
Yiqing Hua

The goal of this paper is to compare the accuracy of bitcoin price in USD prediction based on two different model, Long Short term Memory (LSTM) network and ARIMA model. Real-time price data is collected by Pycurl from Bitfine. LSTM model is implemented by Keras and TensorFlow. ARIMA model used in this paper is mainly to present a classical comparison of time series forecasting, as expected, it could make efficient prediction limited in short-time interval, and the outcome depends on the time period. The LSTM could reach a better performance, with extra, indispensable time for model training, especially via CPU.


2021 ◽  
Vol 13 (12) ◽  
pp. 6953
Author(s):  
Yixing Du ◽  
Zhijian Hu

Data-driven methods using synchrophasor measurements have a broad application prospect in Transient Stability Assessment (TSA). Most previous studies only focused on predicting whether the power system is stable or not after disturbance, which lacked a quantitative analysis of the risk of transient stability. Therefore, this paper proposes a two-stage power system TSA method based on snapshot ensemble long short-term memory (LSTM) network. This method can efficiently build an ensemble model through a single training process, and employ the disturbed trajectory measurements as the inputs, which can realize rapid end-to-end TSA. In the first stage, dynamic hierarchical assessment is carried out through the classifier, so as to screen out credible samples step by step. In the second stage, the regressor is used to predict the transient stability margin of the credible stable samples and the undetermined samples, and combined with the built risk function to realize the risk quantification of transient angle stability. Furthermore, by modifying the loss function of the model, it effectively overcomes sample imbalance and overlapping. The simulation results show that the proposed method can not only accurately predict binary information representing transient stability status of samples, but also reasonably reflect the transient safety risk level of power systems, providing reliable reference for the subsequent control.


Energies ◽  
2021 ◽  
Vol 14 (11) ◽  
pp. 3030
Author(s):  
Simon Liebermann ◽  
Jung-Sup Um ◽  
YoungSeok Hwang ◽  
Stephan Schlüter

Due to the globally increasing share of renewable energy sources like wind and solar power, precise forecasts for weather data are becoming more and more important. To compute such forecasts numerous authors apply neural networks (NN), whereby models became ever more complex recently. Using solar irradiation as an example, we verify if this additional complexity is required in terms of forecasting precision. Different NN models, namely the long-short term (LSTM) neural network, a convolutional neural network (CNN), and combinations of both are benchmarked against each other. The naive forecast is included as a baseline. Various locations across Europe are tested to analyze the models’ performance under different climate conditions. Forecasts up to 24 h in advance are generated and compared using different goodness of fit (GoF) measures. Besides, errors are analyzed in the time domain. As expected, the error of all models increases with rising forecasting horizon. Over all test stations it shows that combining an LSTM network with a CNN yields the best performance. However, regarding the chosen GoF measures, differences to the alternative approaches are fairly small. The hybrid model’s advantage lies not in the improved GoF but in its versatility: contrary to an LSTM or a CNN, it produces good results under all tested weather conditions.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1181
Author(s):  
Chenhao Zhu ◽  
Sheng Cai ◽  
Yifan Yang ◽  
Wei Xu ◽  
Honghai Shen ◽  
...  

In applications such as carrier attitude control and mobile device navigation, a micro-electro-mechanical-system (MEMS) gyroscope will inevitably be affected by random vibration, which significantly affects the performance of the MEMS gyroscope. In order to solve the degradation of MEMS gyroscope performance in random vibration environments, in this paper, a combined method of a long short-term memory (LSTM) network and Kalman filter (KF) is proposed for error compensation, where Kalman filter parameters are iteratively optimized using the Kalman smoother and expectation-maximization (EM) algorithm. In order to verify the effectiveness of the proposed method, we performed a linear random vibration test to acquire MEMS gyroscope data. Subsequently, an analysis of the effects of input data step size and network topology on gyroscope error compensation performance is presented. Furthermore, the autoregressive moving average-Kalman filter (ARMA-KF) model, which is commonly used in gyroscope error compensation, was also combined with the LSTM network as a comparison method. The results show that, for the x-axis data, the proposed combined method reduces the standard deviation (STD) by 51.58% and 31.92% compared to the bidirectional LSTM (BiLSTM) network, and EM-KF method, respectively. For the z-axis data, the proposed combined method reduces the standard deviation by 29.19% and 12.75% compared to the BiLSTM network and EM-KF method, respectively. Furthermore, for x-axis data and z-axis data, the proposed combined method reduces the standard deviation by 46.54% and 22.30% compared to the BiLSTM-ARMA-KF method, respectively, and the output is smoother, proving the effectiveness of the proposed method.


2021 ◽  
Vol 2 (2) ◽  
Author(s):  
Kate Highnam ◽  
Domenic Puzio ◽  
Song Luo ◽  
Nicholas R. Jennings

AbstractBotnets and malware continue to avoid detection by static rule engines when using domain generation algorithms (DGAs) for callouts to unique, dynamically generated web addresses. Common DGA detection techniques fail to reliably detect DGA variants that combine random dictionary words to create domain names that closely mirror legitimate domains. To combat this, we created a novel hybrid neural network, Bilbo the “bagging” model, that analyses domains and scores the likelihood they are generated by such algorithms and therefore are potentially malicious. Bilbo is the first parallel usage of a convolutional neural network (CNN) and a long short-term memory (LSTM) network for DGA detection. Our unique architecture is found to be the most consistent in performance in terms of AUC, $$F_1$$ F 1 score, and accuracy when generalising across different dictionary DGA classification tasks compared to current state-of-the-art deep learning architectures. We validate using reverse-engineered dictionary DGA domains and detail our real-time implementation strategy for scoring real-world network logs within a large enterprise. In 4 h of actual network traffic, the model discovered at least five potential command-and-control networks that commercial vendor tools did not flag.


Coatings ◽  
2021 ◽  
Vol 11 (8) ◽  
pp. 909
Author(s):  
Azamatjon Kakhramon ugli Malikov ◽  
Younho Cho ◽  
Young H. Kim ◽  
Jeongnam Kim ◽  
Junpil Park ◽  
...  

Ultrasonic non-destructive analysis is a promising and effective method for the inspection of protective coating materials. Offshore coating exhibits a high attenuation rate of ultrasonic energy due to the absorption and ultrasonic pulse echo testing becomes difficult due to the small amplitude of the second echo from the back wall of the coating layer. In order to address these problems, an advanced ultrasonic signal analysis has been proposed. An ultrasonic delay line was applied due to the high attenuation of the coating layer. A short-time Fourier transform (STFT) of the waveform was implemented to measure the thickness and state of bonding of coating materials. The thickness of the coating material was estimated by the projection of the STFT into the time-domain. The bonding and debonding of the coating layers were distinguished using the ratio of the STFT magnitude peaks of the two subsequent wave echoes. In addition, the advantage of the STFT-based approach is that it can accurately and quickly estimate the time of flight (TOF) of a signal even at low signal-to-noise ratios. Finally, a convolutional neural network (CNN) was applied to automatically determine the bonding state of the coatings. The time–frequency representation of the waveform was used as the input to the CNN. The experimental results demonstrated that the proposed method automatically determines the bonding state of the coatings with high accuracy. The present approach is more efficient compared to the method of estimating bonding state using attenuation.


Author(s):  
Zhang Chao ◽  
Wang Wei-zhi ◽  
Zhang Chen ◽  
Fan Bin ◽  
Wang Jian-guo ◽  
...  

Accurate and reliable fault diagnosis is one of the key and difficult issues in mechanical condition monitoring. In recent years, Convolutional Neural Network (CNN) has been widely used in mechanical condition monitoring, which is also a great breakthrough in the field of bearing fault diagnosis. However, CNN can only extract local features of signals. The model accuracy and generalization of the original vibration signals are very low in the process of vibration signal processing only by CNN. Based on the above problems, this paper improves the traditional convolution layer of CNN, and builds the learning module (local feature learning block, LFLB) of the local characteristics. At the same time, the Long Short-Term Memory (LSTM) is introduced into the network, which is used to extract the global features. This paper proposes the new neural network—improved CNN-LSTM network. The extracted deep feature is used for fault classification. The improved CNN-LSTM network is applied to the processing of the vibration signal of the faulty bearing collected by the bearing failure laboratory of Inner Mongolia University of science and technology. The results show that the accuracy of the improved CNN-LSTM network on the same batch test set is 98.75%, which is about 24% higher than that of the traditional CNN. The proposed network is applied to the bearing data collection of Western Reserve University under the condition that the network parameters remain unchanged. The experiment shows that the improved CNN-LSTM network has better generalization than the traditional CNN.


Author(s):  
Niels Hørbye Christiansen ◽  
Per Erlend Torbergsen Voie ◽  
Jan Høgsberg ◽  
Nils Sødahl

Dynamic analyses of slender marine structures are computationally expensive. Recently it has been shown how a hybrid method which combines FEM models and artificial neural networks (ANN) can be used to reduce the computation time spend on the time domain simulations associated with fatigue analysis of mooring lines by two orders of magnitude. The present study shows how an ANN trained to perform nonlinear dynamic response simulation can be optimized using a method known as optimal brain damage (OBD) and thereby be used to rank the importance of all analysis input. Both the training and the optimization of the ANN are based on one short time domain simulation sequence generated by a FEM model of the structure. This means that it is possible to evaluate the importance of input parameters based on this single simulation only. The method is tested on a numerical model of mooring lines on a floating off-shore installation. It is shown that it is possible to estimate the cost of ignoring one or more input variables in an analysis.


Symmetry ◽  
2018 ◽  
Vol 10 (8) ◽  
pp. 329
Author(s):  
Jiufei Luo ◽  
Haitao Xu ◽  
Kai Zheng ◽  
Xinyi Li ◽  
Song Feng

Asymmetric windows are of increasing interest to researchers because of the nonlinear and adjustable phase response, as well as alterable time delay. Short-time phase distortion can provide an essential improvement in speech coding, and also has better performance in speech recognition. The merits of asymmetric windows in the aspect of spectral behaviors have an important function in frequency component detection and parameter estimation. In this paper, the phase response of windows were further studied, and the phase characteristics of symmetric and asymmetric windows are described. The relationship between the barycenter of windows in the time domain, and the phase characteristic at the center of the main lobe in the frequency domain, was established. In light of the relationship, an improved version of the asymmetric window- based frequency estimation algorithm was proposed. The improved algorithm has advantages of straightforward implementation and computational efficiency. The numeric simulation results also indicate that the improved approach is more robust than the traditional method against additive random noise.


2021 ◽  
Author(s):  
Chonghua Xue ◽  
Cody Karjadi ◽  
Ioannis Ch. Paschalidis ◽  
Rhoda Au ◽  
Vijaya B. Kolachalama

AbstractBackgroundIdentification of reliable, affordable and easy-to-use strategies for detection of dementia are sorely needed. Digital technologies, such as individual voice recordings, offer an attractive modality to assess cognition but methods that could automatically analyze such data without any pre-processing are not readily available.MethodsWe used a subset of 1264 digital voice recordings of neuropsychological examinations administered to participants from the Framingham Heart Study (FHS), a community-based longitudinal observational study. The recordings were 73 minutes in duration, on average, and contained at least two speakers (participant and clinician). Of the total voice recordings, 483 were of participants with normal cognition (NC), 451 recordings were of participants with mild cognitive impairment (MCI), and 330 were of participants with dementia. We developed two deep learning models (a two-level long short-term memory (LSTM) network and a convolutional neural network (CNN)), which used the raw audio recordings to classify if the recording included a participant with only NC or only dementia, and also to differentiate between recordings corresponding to non-demented (NC+MCI) and demented participants.FindingsBased on 5-fold cross-validation, the LSTM model achieved a mean (±std) area under the sensitivity-specificity curve (AUC) of 0.744±0.038, mean accuracy of 0.680±0.032, mean sensitivity of 0.719±0.112, and mean specificity of 0.652±0.089 in predicting cases with dementia from those with normal cognition. The CNN model achieved a mean AUC of 0.805±0.027, mean accuracy of 0.740±0.033, mean sensitivity of 0.735±0.094, and mean specificity of 0.750±0.083 in predicting cases with only dementia from those with only NC. For the task related to classification of demented participants from non-demented ones, the LSTM model achieved a mean AUC of 0.659±0.043, mean accuracy of 0.701±0.057, mean sensitivity of 0.245±0.161 and mean specificity of 0.856±0.105. The CNN model achieved a mean AUC of 0.730±0.039, mean accuracy of 0.735±0.046, mean sensitivity of 0.443±0.113, and mean specificity of 0.840±0.076 in predicting cases with dementia from those who were not demented.InterpretationThis proof-of-concept study demonstrates the potential that raw audio recordings of neuropsychological testing performed on individuals recruited within a community cohort setting can provide a level of screening for dementia.


Sign in / Sign up

Export Citation Format

Share Document