Nhận dạng tiếng nói điều khiển với convolutional neural network (CNN)

Điều khiển bằng giọng nói là một chức năng quan trọng trong nhiều thiết bị di động, hệ thống nhà thông minh, đặc biệt đó là một giải pháp giúp cho người khuyết tật có thể điều khiển được các thiết bị thông dụng trong cuộc sống. Bài báo trình bày một phương pháp nhận dạng tiếng nói điều khiển ngắn sử dụng đặc trưng MFCC (Mel frequency cepstral coefficients) và mô hình convolutional neural network (CNN). Dữ liệu âm thanh đầu vào là các file wave được giả định có thời lượng đúng 1 giây. Một cửa sổ trượt kích thước 30 ms với bước dịch chuyển 10 ms lần lượt trượt trên dữ liệu đầu vào để tính các thông số MFCC. Với mỗi tập tin đầu vào sẽ thu được 98 đặc trưng MFCC, mỗi đặc trưng MFCC là một vector 40 chiều (tương ứng 40 hệ số của các bộ lọc Mel-scales). Nghiên cứu đã để xuất sử dụng 3 mô hình Neural Network để phân lớp các tập tin tiếng nói điều khiển này: Mô hình Vanilla Neural Network 1 layer (1 softmax layer), Deep Neural Network - DNN (với 3 layers ẩn kết nối đầy đủ và 1 lớp output) và mô hình Convolution Neural Network - CNN. Các thực nghiệm được thực hiện trên tập dữ liệu “Speech Commands Dataset” của Google (https://ai.googleblog.com/2017/08/launching-speech-commands-dataset.html) gồm 65.000 mẫu được chia thành 30 lớp. Kết quả thực nghiệm cho thấy mô hình CNN đạt...

Download Full-text

Multilayered convolutional neural network-based auto-CODEC for audio signal denoising using mel-frequency cepstral coefficients

Neural Computing and Applications ◽

10.1007/s00521-021-05782-5 ◽

2021 ◽

Author(s):

Shivangi Raj ◽

P. Prakasam ◽

Shubham Gupta

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Audio Signal ◽

Signal Denoising ◽

Mel Frequency Cepstral Coefficients ◽

Cepstral Coefficients

Download Full-text

SMCS: Automatic Real-Time Classification of Ambient Sounds, Based on a Deep Neural Network and Mel Frequency Cepstral Coefficients

Communications in Computer and Information Science - Applied Technologies ◽

10.1007/978-3-030-42520-3_20 ◽

2020 ◽

pp. 245-253

Author(s):

María José Mora-Regalado ◽

Omar Ruiz-Vivanco ◽

Alexandra González-Eras ◽

Pablo Torres-Carrión

Keyword(s):

Neural Network ◽

Real Time ◽

Deep Neural Network ◽

Mel Frequency Cepstral Coefficients ◽

Cepstral Coefficients ◽

Real Time Classification

Download Full-text

AutoVAT: An Automated Visual Acuity Test Using Spoken Digit Recognition with Mel Frequency Cepstral Coefficients and Convolutional Neural Network

Procedia Computer Science ◽

10.1016/j.procs.2021.01.029 ◽

2021 ◽

Vol 179 ◽

pp. 458-467

Author(s):

Derryl Taufik ◽

Novita Hanafiah

Keyword(s):

Neural Network ◽

Visual Acuity ◽

Convolutional Neural Network ◽

Mel Frequency Cepstral Coefficients ◽

Digit Recognition ◽

Visual Acuity Test ◽

Cepstral Coefficients ◽

Acuity Test

Download Full-text

Active Sonar Target Classification with Power-Normalized Cepstral Coefficients and Convolutional Neural Network

Applied Sciences ◽

10.3390/app10238450 ◽

2020 ◽

Vol 10 (23) ◽

pp. 8450

Author(s):

Seungwoo Lee ◽

Iksu Seo ◽

Jongwon Seok ◽

Yunsu Kim ◽

Dong Seog Han

Keyword(s):

Neural Network ◽

Experimental Data ◽

Convolutional Neural Network ◽

Signal To Noise Ratio ◽

Target Classification ◽

Signal To Noise ◽

Mel Frequency Cepstral Coefficients ◽

Active Sonar ◽

Cepstral Coefficients ◽

Noise Ratio

Detection and classification of unidentified underwater targets maneuvering in complex underwater environments are critical for active sonar systems. In previous studies, many detection methods were applied to separate targets from the clutter using signals that exceed a preset threshold determined by the sonar console operator. This is because the high signal-to-noise ratio target has enough feature vector components to separate. However, in a real environment, the signal-to-noise ratio of the received target does not always exceed the threshold. Therefore, a target detection algorithm for various target signal-to-noise ratio environments is required; strong clutter energy can lead to false detection, while weak target signals reduce the probability of detection. It also uses long pulse repetition intervals for long-range detection and high ambient noise, requiring classification processing for each ping without accumulating pings. In this study, a target classification algorithm is proposed that can be applied to signals in real underwater environments above the noise level without a threshold set by the sonar console operator, and the classification performance of the algorithm is verified. The active sonar for long-range target detection has low-resolution data; thus, feature vector extraction algorithms are required. Feature vectors are extracted from the experimental data using Power-Normalized Cepstral Coefficients for target classification. Feature vectors are also extracted with Mel-Frequency Cepstral Coefficients and compared with the proposed algorithm. A convolutional neural network was employed as the classifier. In addition, the proposed algorithm is to be compared with the result of target classification using a spectrogram and convolutional neural network. Experimental data were obtained using a hull-mounted active sonar system operating on a Korean naval ship in the East Sea of South Korea and a real maneuvering underwater target. From the experimental data with 29 pings, we extracted 361 target and 3351 clutter data. It is difficult to collect real underwater target data from the real sea environment. Therefore, the number of target data was increased using the data augmentation technique. Eighty percent of the data was used for training and the rest was used for testing. Accuracy value curves and classification rate tables are presented for performance analysis and discussion. Results showed that the proposed algorithm has a higher classification rate than Mel-Frequency Cepstral Coefficients without affecting the target classification by the signal level. Additionally, the obtained results showed that target classification is possible within one ping data without any ping accumulation.

Download Full-text

Deep-neural network approaches for speech recognition with heterogeneous groups of speakers including children

Natural Language Engineering ◽

10.1017/s135132491600005x ◽

2016 ◽

Vol 23 (3) ◽

pp. 325-350 ◽

Cited By ~ 15

Author(s):

ROMAIN SERIZEL ◽

DIEGO GIULIANI

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Error Rate ◽

Deep Neural Network ◽

Vocal Tract ◽

Rate Performance ◽

Posterior Probabilities ◽

Mel Frequency Cepstral Coefficients ◽

Heterogeneous Groups ◽

Cepstral Coefficients

AbstractThis paper introduces deep neural network (DNN)–hidden Markov model (HMM)-based methods to tackle speech recognition in heterogeneous groups of speakers including children. We target three speaker groups consisting of children, adult males and adult females. Two different kind of approaches are introduced here: approaches based on DNN adaptation and approaches relying on vocal-tract length normalisation (VTLN). First, the recent approach that consists in adapting a general DNN to domain/language specific data is extended to target age/gender groups in the context of DNN–HMM. Then, VTLN is investigated by training a DNN–HMM system by using either mel frequency cepstral coefficients normalised with standard VTLN or mel frequency cepstral coefficients derived acoustic features combined with the posterior probabilities of the VTLN warping factors. In this later, novel, approach the posterior probabilities of the warping factors are obtained with a separate DNN and the decoding can be operated in a single pass when the VTLN approach requires two decoding passes. Finally, the different approaches presented here are combined to take advantage of their complementarity. The combination of several approaches is shown to improve the baseline phone error rate performance by thirty per cent to thirty-five per cent relative and the baseline word error rate performance by about ten per cent relative.

Download Full-text

Heartbeat sound classification using Mel-frequency cepstral coefficients and deep convolutional neural network

Advances in Computational Techniques for Biomedical Image Analysis ◽

10.1016/b978-0-12-820024-7.00006-2 ◽

2020 ◽

pp. 115-131

Author(s):

Shamik Tiwari ◽

Varun Sapra ◽

Anurag Jain

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Deep Convolutional Neural Network ◽

Mel Frequency Cepstral Coefficients ◽

Sound Classification ◽

Cepstral Coefficients

Download Full-text

Covid Classification Using Audio Data

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.38675 ◽

2021 ◽

Vol 9 (10) ◽

pp. 1633-1637

Author(s):

Adwait Patil

Keyword(s):

Neural Network ◽

Deep Learning ◽

Convolutional Neural Network ◽

Binary Classification ◽

Image Data ◽

Audio Classification ◽

Mel Frequency Cepstral Coefficients ◽

Audio Data ◽

Cepstral Coefficients ◽

Audio Files

Abstract: Coronavirus outbreak has affected the entire world adversely this project has been developed in order to help common masses diagnose their chances of been covid positive just by using coughing sound and basic patient data. Audio classification is one of the most interesting applications of deep learning. Similar to image data audio data is also stored in form of bits and to understand and analyze this audio data we have used Mel frequency cepstral coefficients (MFCCs) which makes it possible to feed the audio to our neural network. In this project we have used Coughvid a crowdsource dataset consisting of 27000 audio files and metadata of same amount of patients. In this project we have used a 1D Convolutional Neural Network (CNN) to process the audio and metadata. Future scope for this project will be a model that rates how likely it is that a person is infected instead of binary classification. Keywords: Audio classification, Mel frequency cepstral coefficients, Convolutional neural network, deep learning, Coughvid

Download Full-text

Method of determination of the text direction on the image with the use of convolutional neural network

Informatization and communication ◽

10.34219/2078-8320-2020-11-2-96-99 ◽

2020 ◽

pp. 96-99

Author(s):

P.L. Nikolaev

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Deep Neural Network ◽

Binary Classification ◽

Synthetic Data ◽

Real Data ◽

Method Of Determination ◽

Classification Of Images

This article deals with method of binary classification of images with small text on them Classification is based on the fact that the text can have 2 directions – it can be positioned horizontally and read from left to right or it can be turned 180 degrees so the image must be rotated to read the sign. This type of text can be found on the covers of a variety of books, so in case of recognizing the covers, it is necessary first to determine the direction of the text before we will directly recognize it. The article suggests the development of a deep neural network for determination of the text position in the context of book covers recognizing. The results of training and testing of a convolutional neural network on synthetic data as well as the examples of the network functioning on the real data are presented.

Download Full-text

Volumetric Feature-Based Alzheimer’s Disease Diagnosis From sMRI Data Using a Convolutional Neural Network and a Deep Neural Network

IEEE Access ◽

10.1109/access.2021.3059658 ◽

2021 ◽

Vol 9 ◽

pp. 29870-29882

Author(s):

Abol Basher ◽

Byeong C. Kim ◽

Kun Ho Lee ◽

Ho Yub Jung

Keyword(s):

Neural Network ◽

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Convolutional Neural Network ◽

Deep Neural Network ◽

Disease Diagnosis ◽

Feature Based ◽

Alzheimer’S Disease Diagnosis

Download Full-text

Spectral Convolution Feature-Based SPD Matrix Representation for Signal Detection Using a Deep Neural Network

Entropy ◽

10.3390/e22090949 ◽

2020 ◽

Vol 22 (9) ◽

pp. 949

Author(s):

Jiangyi Wang ◽

Min Liu ◽

Xinwu Zeng ◽

Xiaoqiang Hua

Keyword(s):

Neural Network ◽

Signal Detection ◽

Convolutional Neural Network ◽

Deep Neural Network ◽

Detection Method ◽

Learning Algorithm ◽

Simulated Data ◽

Data Sets ◽

Feature Maps ◽

Simulated Data Sets

Convolutional neural networks have powerful performances in many visual tasks because of their hierarchical structures and powerful feature extraction capabilities. SPD (symmetric positive definition) matrix is paid attention to in visual classification, because it has excellent ability to learn proper statistical representation and distinguish samples with different information. In this paper, a deep neural network signal detection method based on spectral convolution features is proposed. In this method, local features extracted from convolutional neural network are used to construct the SPD matrix, and a deep learning algorithm for the SPD matrix is used to detect target signals. Feature maps extracted by two kinds of convolutional neural network models are applied in this study. Based on this method, signal detection has become a binary classification problem of signals in samples. In order to prove the availability and superiority of this method, simulated and semi-physical simulated data sets are used. The results show that, under low SCR (signal-to-clutter ratio), compared with the spectral signal detection method based on the deep neural network, this method can obtain a gain of 0.5–2 dB on simulated data sets and semi-physical simulated data sets.

Download Full-text