SMCS: Automatic Real-Time Classification of Ambient Sounds, Based on a Deep Neural Network and Mel Frequency Cepstral Coefficients

Author(s):  
María José Mora-Regalado ◽  
Omar Ruiz-Vivanco ◽  
Alexandra González-Eras ◽  
Pablo Torres-Carrión
2021 ◽  
Author(s):  
AKHILA NAZ K A ◽  
R S Jeena ◽  
P Niyas

Abstract An arrhythmia is a condition which represents irregular beating of the heart, beating of the heart too fast, too slow, or too early compared to a normal heartbeat. Diagnosis of various cardiac conditions can be done by the proper analysis, detection, and classification of life-threatening arrhythmia. Computer aided automatic detection can provide accurate and fast results when compared with manual processing. This paper proposes a reliable and novel arrhythmia classification approach using deep learning. A Deep Neural Network (DNN) with three hidden layers has been developed for arrhythmia classification using MIT-BIH arrhythmia database. The network classifies the input ECG signals into six groups: normal heartbeat and five arrhythmia classes. The proposed model was found to be very promising with an accuracy of 99.45 percent. The real time signal classification and the application of internet of things (IOT) are the other highlights of the work.


2016 ◽  
Vol 23 (3) ◽  
pp. 325-350 ◽  
Author(s):  
ROMAIN SERIZEL ◽  
DIEGO GIULIANI

AbstractThis paper introduces deep neural network (DNN)–hidden Markov model (HMM)-based methods to tackle speech recognition in heterogeneous groups of speakers including children. We target three speaker groups consisting of children, adult males and adult females. Two different kind of approaches are introduced here: approaches based on DNN adaptation and approaches relying on vocal-tract length normalisation (VTLN). First, the recent approach that consists in adapting a general DNN to domain/language specific data is extended to target age/gender groups in the context of DNN–HMM. Then, VTLN is investigated by training a DNN–HMM system by using either mel frequency cepstral coefficients normalised with standard VTLN or mel frequency cepstral coefficients derived acoustic features combined with the posterior probabilities of the VTLN warping factors. In this later, novel, approach the posterior probabilities of the warping factors are obtained with a separate DNN and the decoding can be operated in a single pass when the VTLN approach requires two decoding passes. Finally, the different approaches presented here are combined to take advantage of their complementarity. The combination of several approaches is shown to improve the baseline phone error rate performance by thirty per cent to thirty-five per cent relative and the baseline word error rate performance by about ten per cent relative.


2021 ◽  
Author(s):  
Nowrin Akter Surovi ◽  
Audelia G. Dharmawan ◽  
Gim Song Soh

Abstract In Wire Arc Additive Manufacturing (WAAM), weld beads are deposited bead-by-bead and layer-by-layer, leading to the final part. Thus, the lack of uniformity or geometrically defective bead will subsequently lead to voids in the printed part, which will have a great impact on the overall part quality and mechanical strength. To resolve this, several techniques have been proposed to identity such defects using vision or thermal-based sensing, so as to aid in the implementation of in-situ corrective measures to save time and cost. However, due to the environment that they are operating in, these sensors are not an effective way of picking up irregularities as compared to acoustic sensing. Therefore, in this paper, we seek to study into three acoustic feature-based machine learning frameworks — Principal Component Analysis (PCA) + K-Nearest Neighbors (KNN), Mel Frequency Cepstral Coefficients (MFCC) + Neural Network (NN) and Mel Frequency Cepstral Coefficients (MFCC) + Convolutional Neural Network (CNN) and evaluate their performance for the real-time identification of geometrically defective weld bead. Experiments are carried out on stainless steel (ER316LSi), bronze (ERCuNiAl) and mixed dataset containing both stainless steel and bronze. The results show that all three frameworks outperform the state-of-the-art acoustic signal based ANN approach in terms of accuracy. The best performing framework PCA+KNN outperforms ANN by more than 15%, 30% and 30% for stainless steel, bronze and mixed datasets, respectively.


2020 ◽  
Vol 2 (4) ◽  
pp. 167-172
Author(s):  
Nen-Fu Huang ◽  
Dong-Lin Chou ◽  
Chia-An Lee ◽  
Feng-Ping Wu ◽  
An-Chi Chuang ◽  
...  

2021 ◽  
Vol 57 (4) ◽  
pp. 30-39
Author(s):  
Thuận Thương Thái

Điều khiển bằng giọng nói là một chức năng quan trọng trong nhiều thiết bị di động, hệ thống nhà thông minh, đặc biệt đó là một giải pháp giúp cho người khuyết tật có thể điều khiển được các thiết bị thông dụng trong cuộc sống. Bài báo trình bày một phương pháp nhận dạng tiếng nói điều khiển ngắn sử dụng đặc trưng MFCC (Mel frequency cepstral coefficients) và mô hình convolutional neural network (CNN). Dữ liệu âm thanh đầu vào là các file wave được giả định có thời lượng đúng 1 giây. Một cửa sổ trượt kích thước 30 ms với bước dịch chuyển 10 ms lần lượt trượt trên dữ liệu đầu vào để tính các thông số MFCC. Với mỗi tập tin đầu vào sẽ thu được 98 đặc trưng MFCC, mỗi đặc trưng MFCC là một vector 40 chiều (tương ứng 40 hệ số của các bộ lọc Mel-scales). Nghiên cứu đã để xuất sử dụng 3 mô hình Neural Network để phân lớp các tập tin tiếng nói điều khiển này: Mô hình Vanilla Neural Network 1 layer (1 softmax layer), Deep Neural Network - DNN (với 3 layers ẩn kết nối đầy đủ và 1 lớp output) và mô hình Convolution Neural Network - CNN. Các thực nghiệm được thực hiện trên tập dữ liệu “Speech Commands Dataset” của Google (https://ai.googleblog.com/2017/08/launching-speech-commands-dataset.html) gồm 65.000 mẫu được chia thành 30 lớp. Kết quả thực nghiệm cho thấy mô hình CNN đạt...


Author(s):  
David T. Wang ◽  
Brady Williamson ◽  
Thomas Eluvathingal ◽  
Bruce Mahoney ◽  
Jennifer Scheler

Sign in / Sign up

Export Citation Format

Share Document