Voice pathology detection and classification from speech signals and EGG signals based on a multimodal fusion method

Lei Geng; Hongfeng Shan; Zhitao Xiao; Wei Wang; Mei Wei

doi:10.1515/bmt-2021-0112

Voice pathology detection and classification from speech signals and EGG signals based on a multimodal fusion method

Biomedical Engineering / Biomedizinische Technik ◽

10.1515/bmt-2021-0112 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Lei Geng ◽

Hongfeng Shan ◽

Zhitao Xiao ◽

Wei Wang ◽

Mei Wei

Keyword(s):

Short Term Memory ◽

Multimodal Fusion ◽

Speech Signals ◽

Speech Detection ◽

Voice Pathology Detection ◽

Voice Detection ◽

Lstm Network ◽

The Time Domain ◽

Short Time ◽

Pathological Voice

Abstract Automatic voice pathology detection and classification plays an important role in the diagnosis and prevention of voice disorders. To accurately describe the pronunciation characteristics of patients with dysarthria and improve the effect of pathological voice detection, this study proposes a pathological voice detection method based on a multi-modal network structure. First, speech signals and electroglottography (EGG) signals are mapped from the time domain to the frequency domain spectrogram via a short-time Fourier transform (STFT). The Mel filter bank acts on the spectrogram to enhance the signal’s harmonics and denoise. Second, a pre-trained convolutional neural network (CNN) is used as the backbone network to extract sound state features and vocal cord vibration features from the two signals. To obtain a better classification effect, the fused features are input into the long short-term memory (LSTM) network for voice feature selection and enhancement. The proposed system achieves 95.73% for accuracy with 96.10% F1-score and 96.73% recall using the Saarbrucken Voice Database (SVD); thus, enabling a new method for pathological speech detection.

Download Full-text

Bitcoin price prediction using ARIMA and LSTM

E3S Web of Conferences ◽

10.1051/e3sconf/202021801050 ◽

2020 ◽

Vol 218 ◽

pp. 01050

Author(s):

Yiqing Hua

Keyword(s):

Short Term Memory ◽

Arima Model ◽

Time Interval ◽

Short Time Interval ◽

Time Period ◽

Efficient Prediction ◽

Model Training ◽

Time Price ◽

Lstm Network ◽

Short Time

The goal of this paper is to compare the accuracy of bitcoin price in USD prediction based on two different model, Long Short term Memory (LSTM) network and ARIMA model. Real-time price data is collected by Pycurl from Bitfine. LSTM model is implemented by Keras and TensorFlow. ARIMA model used in this paper is mainly to present a classical comparison of time series forecasting, as expected, it could make efficient prediction limited in short-time interval, and the outcome depends on the time period. The LSTM could reach a better performance, with extra, indispensable time for model training, especially via CPU.

Download Full-text

Power System Transient Stability Assessment Based on Snapshot Ensemble LSTM Network

Sustainability ◽

10.3390/su13126953 ◽

2021 ◽

Vol 13 (12) ◽

pp. 6953

Author(s):

Yixing Du ◽

Zhijian Hu

Keyword(s):

Power Systems ◽

Power System ◽

Transient Stability ◽

Short Term Memory ◽

Risk Function ◽

Stability Margin ◽

Risk Level ◽

Stability Assessment ◽

Lstm Network ◽

Hierarchical Assessment

Data-driven methods using synchrophasor measurements have a broad application prospect in Transient Stability Assessment (TSA). Most previous studies only focused on predicting whether the power system is stable or not after disturbance, which lacked a quantitative analysis of the risk of transient stability. Therefore, this paper proposes a two-stage power system TSA method based on snapshot ensemble long short-term memory (LSTM) network. This method can efficiently build an ensemble model through a single training process, and employ the disturbed trajectory measurements as the inputs, which can realize rapid end-to-end TSA. In the first stage, dynamic hierarchical assessment is carried out through the classifier, so as to screen out credible samples step by step. In the second stage, the regressor is used to predict the transient stability margin of the credible stable samples and the undetermined samples, and combined with the built risk function to realize the risk quantification of transient angle stability. Furthermore, by modifying the loss function of the model, it effectively overcomes sample imbalance and overlapping. The simulation results show that the proposed method can not only accurately predict binary information representing transient stability status of samples, but also reasonably reflect the transient safety risk level of power systems, providing reliable reference for the subsequent control.

Download Full-text

Performance Evaluation of Neural Network-Based Short-Term Solar Irradiation Forecasts

Energies ◽

10.3390/en14113030 ◽

2021 ◽

Vol 14 (11) ◽

pp. 3030

Author(s):

Simon Liebermann ◽

Jung-Sup Um ◽

YoungSeok Hwang ◽

Stephan Schlüter

Keyword(s):

Neural Network ◽

Goodness Of Fit ◽

Renewable Energy Sources ◽

Weather Conditions ◽

Solar Irradiation ◽

Weather Data ◽

Short Term ◽

Climate Conditions ◽

Lstm Network ◽

The Time Domain

Due to the globally increasing share of renewable energy sources like wind and solar power, precise forecasts for weather data are becoming more and more important. To compute such forecasts numerous authors apply neural networks (NN), whereby models became ever more complex recently. Using solar irradiation as an example, we verify if this additional complexity is required in terms of forecasting precision. Different NN models, namely the long-short term (LSTM) neural network, a convolutional neural network (CNN), and combinations of both are benchmarked against each other. The naive forecast is included as a baseline. Various locations across Europe are tested to analyze the models’ performance under different climate conditions. Forecasts up to 24 h in advance are generated and compared using different goodness of fit (GoF) measures. Besides, errors are analyzed in the time domain. As expected, the error of all models increases with rising forecasting horizon. Over all test stations it shows that combining an LSTM network with a CNN yields the best performance. However, regarding the chosen GoF measures, differences to the alternative approaches are fairly small. The hybrid model’s advantage lies not in the improved GoF but in its versatility: contrary to an LSTM or a CNN, it produces good results under all tested weather conditions.

Download Full-text

A Combined Method for MEMS Gyroscope Error Compensation Using a Long Short-Term Memory Network and Kalman Filter in Random Vibration Environments

Sensors ◽

10.3390/s21041181 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1181

Author(s):

Chenhao Zhu ◽

Sheng Cai ◽

Yifan Yang ◽

Wei Xu ◽

Honghai Shen ◽

...

Keyword(s):

Kalman Filter ◽

Standard Deviation ◽

Error Compensation ◽

Random Vibration ◽

Short Term Memory ◽

Combined Method ◽

Short Term ◽

Mems Gyroscope ◽

Long Short Term Memory ◽

Lstm Network

In applications such as carrier attitude control and mobile device navigation, a micro-electro-mechanical-system (MEMS) gyroscope will inevitably be affected by random vibration, which significantly affects the performance of the MEMS gyroscope. In order to solve the degradation of MEMS gyroscope performance in random vibration environments, in this paper, a combined method of a long short-term memory (LSTM) network and Kalman filter (KF) is proposed for error compensation, where Kalman filter parameters are iteratively optimized using the Kalman smoother and expectation-maximization (EM) algorithm. In order to verify the effectiveness of the proposed method, we performed a linear random vibration test to acquire MEMS gyroscope data. Subsequently, an analysis of the effects of input data step size and network topology on gyroscope error compensation performance is presented. Furthermore, the autoregressive moving average-Kalman filter (ARMA-KF) model, which is commonly used in gyroscope error compensation, was also combined with the LSTM network as a comparison method. The results show that, for the x-axis data, the proposed combined method reduces the standard deviation (STD) by 51.58% and 31.92% compared to the bidirectional LSTM (BiLSTM) network, and EM-KF method, respectively. For the z-axis data, the proposed combined method reduces the standard deviation by 29.19% and 12.75% compared to the BiLSTM network and EM-KF method, respectively. Furthermore, for x-axis data and z-axis data, the proposed combined method reduces the standard deviation by 46.54% and 22.30% compared to the BiLSTM-ARMA-KF method, respectively, and the output is smoother, proving the effectiveness of the proposed method.

Download Full-text

Real-Time Detection of Dictionary DGA Network Traffic Using Deep Learning

SN Computer Science ◽

10.1007/s42979-021-00507-w ◽

2021 ◽

Vol 2 (2) ◽

Author(s):

Kate Highnam ◽

Domenic Puzio ◽

Song Luo ◽

Nicholas R. Jennings

Keyword(s):

Neural Network ◽

Deep Learning ◽

Real Time ◽

Network Traffic ◽

Short Term Memory ◽

Domain Names ◽

Control Networks ◽

Detection Techniques ◽

Lstm Network ◽

And Control

AbstractBotnets and malware continue to avoid detection by static rule engines when using domain generation algorithms (DGAs) for callouts to unique, dynamically generated web addresses. Common DGA detection techniques fail to reliably detect DGA variants that combine random dictionary words to create domain names that closely mirror legitimate domains. To combat this, we created a novel hybrid neural network, Bilbo the “bagging” model, that analyses domains and scores the likelihood they are generated by such algorithms and therefore are potentially malicious. Bilbo is the first parallel usage of a convolutional neural network (CNN) and a long short-term memory (LSTM) network for DGA detection. Our unique architecture is found to be the most consistent in performance in terms of AUC, $$F_1$$ F 1 score, and accuracy when generalising across different dictionary DGA classification tasks compared to current state-of-the-art deep learning architectures. We validate using reverse-engineered dictionary DGA domains and detail our real-time implementation strategy for scoring real-world network logs within a large enterprise. In 4 h of actual network traffic, the model discovered at least five potential command-and-control networks that commercial vendor tools did not flag.

Download Full-text

Ultrasonic Assessment of Thickness and Bonding Quality of Coating Layer Based on Short-Time Fourier Transform and Convolutional Neural Networks

Coatings ◽

10.3390/coatings11080909 ◽

2021 ◽

Vol 11 (8) ◽

pp. 909

Author(s):

Azamatjon Kakhramon ugli Malikov ◽

Younho Cho ◽

Young H. Kim ◽

Jeongnam Kim ◽

Junpil Park ◽

...

Keyword(s):

Fourier Transform ◽

Coating Layer ◽

Ultrasonic Pulse ◽

Short Time Fourier Transform ◽

Coating Materials ◽

Time Frequency ◽

High Attenuation ◽

Bonding State ◽

The Time Domain ◽

Short Time

Ultrasonic non-destructive analysis is a promising and effective method for the inspection of protective coating materials. Offshore coating exhibits a high attenuation rate of ultrasonic energy due to the absorption and ultrasonic pulse echo testing becomes difficult due to the small amplitude of the second echo from the back wall of the coating layer. In order to address these problems, an advanced ultrasonic signal analysis has been proposed. An ultrasonic delay line was applied due to the high attenuation of the coating layer. A short-time Fourier transform (STFT) of the waveform was implemented to measure the thickness and state of bonding of coating materials. The thickness of the coating material was estimated by the projection of the STFT into the time-domain. The bonding and debonding of the coating layers were distinguished using the ratio of the STFT magnitude peaks of the two subsequent wave echoes. In addition, the advantage of the STFT-based approach is that it can accurately and quickly estimate the time of flight (TOF) of a signal even at low signal-to-noise ratios. Finally, a convolutional neural network (CNN) was applied to automatically determine the bonding state of the coatings. The time–frequency representation of the waveform was used as the input to the CNN. The experimental results demonstrated that the proposed method automatically determines the bonding state of the coatings with high accuracy. The present approach is more efficient compared to the method of estimating bonding state using attenuation.

Download Full-text

Extraction of local and global features by a convolutional neural network–long short-term memory network for diagnosing bearing faults

Proceedings of the Institution of Mechanical Engineers Part C Journal of Mechanical Engineering Science ◽

10.1177/09544062211016505 ◽

2021 ◽

pp. 095440622110165

Author(s):

Zhang Chao ◽

Wang Wei-zhi ◽

Zhang Chen ◽

Fan Bin ◽

Wang Jian-guo ◽

...

Keyword(s):

Neural Network ◽

Fault Diagnosis ◽

Condition Monitoring ◽

Short Term Memory ◽

Vibration Signal ◽

Short Term ◽

Global Features ◽

Term Memory ◽

Long Short Term Memory ◽

Lstm Network

Accurate and reliable fault diagnosis is one of the key and difficult issues in mechanical condition monitoring. In recent years, Convolutional Neural Network (CNN) has been widely used in mechanical condition monitoring, which is also a great breakthrough in the field of bearing fault diagnosis. However, CNN can only extract local features of signals. The model accuracy and generalization of the original vibration signals are very low in the process of vibration signal processing only by CNN. Based on the above problems, this paper improves the traditional convolution layer of CNN, and builds the learning module (local feature learning block, LFLB) of the local characteristics. At the same time, the Long Short-Term Memory (LSTM) is introduced into the network, which is used to extract the global features. This paper proposes the new neural network—improved CNN-LSTM network. The extracted deep feature is used for fault classification. The improved CNN-LSTM network is applied to the processing of the vibration signal of the faulty bearing collected by the bearing failure laboratory of Inner Mongolia University of science and technology. The results show that the accuracy of the improved CNN-LSTM network on the same batch test set is 98.75%, which is about 24% higher than that of the traditional CNN. The proposed network is applied to the bearing data collection of Western Reserve University under the condition that the network parameters remain unchanged. The experiment shows that the improved CNN-LSTM network has better generalization than the traditional CNN.

Download Full-text

Optimized Mooring Line Simulation Using a Hybrid Method Time Domain Scheme

Volume 1B: Offshore Technology ◽

10.1115/omae2014-23939 ◽

2014 ◽

Cited By ~ 3

Author(s):

Niels Hørbye Christiansen ◽

Per Erlend Torbergsen Voie ◽

Jan Høgsberg ◽

Nils Sødahl

Keyword(s):

Hybrid Method ◽

Time Domain ◽

Computation Time ◽

Mooring Line ◽

Mooring Lines ◽

Fem Model ◽

Input Variables ◽

The Time Domain ◽

The Cost ◽

Short Time

Dynamic analyses of slender marine structures are computationally expensive. Recently it has been shown how a hybrid method which combines FEM models and artificial neural networks (ANN) can be used to reduce the computation time spend on the time domain simulations associated with fatigue analysis of mooring lines by two orders of magnitude. The present study shows how an ANN trained to perform nonlinear dynamic response simulation can be optimized using a method known as optimal brain damage (OBD) and thereby be used to rank the importance of all analysis input. Both the training and the optimization of the ANN are based on one short time domain simulation sequence generated by a FEM model of the structure. This means that it is possible to evaluate the importance of input parameters based on this single simulation only. The method is tested on a numerical model of mooring lines on a floating off-shore installation. It is shown that it is possible to estimate the cost of ignoring one or more input variables in an analysis.

Download Full-text

Barycenter Theorem in Phase Characteristics of Symmetric and Asymmetric Windows

Symmetry ◽

10.3390/sym10080329 ◽

2018 ◽

Vol 10 (8) ◽

pp. 329

Author(s):

Jiufei Luo ◽

Haitao Xu ◽

Kai Zheng ◽

Xinyi Li ◽

Song Feng

Keyword(s):

Frequency Estimation ◽

Random Noise ◽

Estimation Algorithm ◽

Frequency Component ◽

Phase Response ◽

Numeric Simulation ◽

Simulation Results ◽

The Time Domain ◽

Short Time ◽

The Relationship

Asymmetric windows are of increasing interest to researchers because of the nonlinear and adjustable phase response, as well as alterable time delay. Short-time phase distortion can provide an essential improvement in speech coding, and also has better performance in speech recognition. The merits of asymmetric windows in the aspect of spectral behaviors have an important function in frequency component detection and parameter estimation. In this paper, the phase response of windows were further studied, and the phase characteristics of symmetric and asymmetric windows are described. The relationship between the barycenter of windows in the time domain, and the phase characteristic at the center of the main lobe in the frequency domain, was established. In light of the relationship, an improved version of the asymmetric window- based frequency estimation algorithm was proposed. The improved algorithm has advantages of straightforward implementation and computational efficiency. The numeric simulation results also indicate that the improved approach is more robust than the traditional method against additive random noise.

Download Full-text

Detection of dementia on raw voice recordings using deep learning: A Framingham Heart Study

10.1101/2021.03.04.21252582 ◽

2021 ◽

Author(s):

Chonghua Xue ◽

Cody Karjadi ◽

Ioannis Ch. Paschalidis ◽

Rhoda Au ◽

Vijaya B. Kolachalama

Keyword(s):

Deep Learning ◽

Framingham Heart Study ◽

Short Term Memory ◽

Neuropsychological Testing ◽

Heart Study ◽

Audio Recordings ◽

Lstm Network ◽

Sensitivity Specificity ◽

Normal Cognition ◽

Mean Sensitivity

AbstractBackgroundIdentification of reliable, affordable and easy-to-use strategies for detection of dementia are sorely needed. Digital technologies, such as individual voice recordings, offer an attractive modality to assess cognition but methods that could automatically analyze such data without any pre-processing are not readily available.MethodsWe used a subset of 1264 digital voice recordings of neuropsychological examinations administered to participants from the Framingham Heart Study (FHS), a community-based longitudinal observational study. The recordings were 73 minutes in duration, on average, and contained at least two speakers (participant and clinician). Of the total voice recordings, 483 were of participants with normal cognition (NC), 451 recordings were of participants with mild cognitive impairment (MCI), and 330 were of participants with dementia. We developed two deep learning models (a two-level long short-term memory (LSTM) network and a convolutional neural network (CNN)), which used the raw audio recordings to classify if the recording included a participant with only NC or only dementia, and also to differentiate between recordings corresponding to non-demented (NC+MCI) and demented participants.FindingsBased on 5-fold cross-validation, the LSTM model achieved a mean (±std) area under the sensitivity-specificity curve (AUC) of 0.744±0.038, mean accuracy of 0.680±0.032, mean sensitivity of 0.719±0.112, and mean specificity of 0.652±0.089 in predicting cases with dementia from those with normal cognition. The CNN model achieved a mean AUC of 0.805±0.027, mean accuracy of 0.740±0.033, mean sensitivity of 0.735±0.094, and mean specificity of 0.750±0.083 in predicting cases with only dementia from those with only NC. For the task related to classification of demented participants from non-demented ones, the LSTM model achieved a mean AUC of 0.659±0.043, mean accuracy of 0.701±0.057, mean sensitivity of 0.245±0.161 and mean specificity of 0.856±0.105. The CNN model achieved a mean AUC of 0.730±0.039, mean accuracy of 0.735±0.046, mean sensitivity of 0.443±0.113, and mean specificity of 0.840±0.076 in predicting cases with dementia from those who were not demented.InterpretationThis proof-of-concept study demonstrates the potential that raw audio recordings of neuropsychological testing performed on individuals recruited within a community cohort setting can provide a level of screening for dementia.

Download Full-text