spectrogram feature
Recently Published Documents


TOTAL DOCUMENTS

7
(FIVE YEARS 4)

H-INDEX

2
(FIVE YEARS 2)

2021 ◽  
Vol 15 (1) ◽  
pp. 41-55
Author(s):  
Hoang Van Truong ◽  
Nguyen Chi Hieu ◽  
Pham Ngoc Giao ◽  
Nguyen Xuan Phong

Anomaly detection in the sound from machines is an important task in machine monitoring. An autoencoder architecture based on the reconstruction error using a log-Mel spectrogram feature is a conventional approach for this domain. However, because of the non-stationary nature of some sounds from the target machine, such a conventional approach does not perform well in those circumstances. In this paper, we propose a novel approach regarding the choice of used features and a new auto-encoder architecture. We created the Mixed Feature, which is a mixture of different sound representations, and a new deep learning method called Fully-Connected U-Net, a form of autoencoder architecture. With experiments on the same dataset as the baseline system, using the same architecture for all types of machines, the experimental results showed that our methods outperformed the baseline system in terms of the AUC and pAUC evaluation metrics. The optimized model achieved 83.38% AUC and 64.51% pAUC on average overall machine types on the developed dataset and outperformed the published baseline by 13.43% AUC and 8.13% pAUC.


Electronics ◽  
2020 ◽  
Vol 9 (5) ◽  
pp. 713 ◽  
Author(s):  
Yeonguk Yu ◽  
Yoon-Joong Kim

We propose a speech-emotion recognition (SER) model with an “attention-long Long Short-Term Memory (LSTM)-attention” component to combine IS09, a commonly used feature for SER, and mel spectrogram, and we analyze the reliability problem of the interactive emotional dyadic motion capture (IEMOCAP) database. The attention mechanism of the model focuses on emotion-related elements of the IS09 and mel spectrogram feature and the emotion-related duration from the time of the feature. Thus, the model extracts emotion information from a given speech signal. The proposed model for the baseline study achieved a weighted accuracy (WA) of 68% for the improvised dataset of IEMOCAP. However, the WA of the proposed model of the main study and modified models could not achieve more than 68% in the improvised dataset. This is because of the reliability limit of the IEMOCAP dataset. A more reliable dataset is required for a more accurate evaluation of the model’s performance. Therefore, in this study, we reconstructed a more reliable dataset based on the labeling results provided by IEMOCAP. The experimental results of the model for the more reliable dataset confirmed a WA of 73%.


2019 ◽  
Vol 1 (2) ◽  
pp. p113 ◽  
Author(s):  
Wu Min ◽  
Zhu Shanshan

Language recognition is an important branch of speech technology. As a front-end technology of speech information processing, higher recognition accuracy is required. It is found through research that there are obvious differences between the language maps of different languages, which can be used for language identification. This paper uses a convolutional neural network as a classification model, and compares the language recognition effects of traditional language recognition features and spectrogram features on the five language recognition tasks of Chinese, Japanese, Vietnamese, Russian, and Spanish through experiments. The best effect is the ivector feature, and the spectrogram feature has a higher F value than the low-dimensional ivector feature.


Sign in / Sign up

Export Citation Format

Share Document