AutoVAT: An Automated Visual Acuity Test Using Spoken Digit Recognition with Mel Frequency Cepstral Coefficients and Convolutional Neural Network

Detection and classification of unidentified underwater targets maneuvering in complex underwater environments are critical for active sonar systems. In previous studies, many detection methods were applied to separate targets from the clutter using signals that exceed a preset threshold determined by the sonar console operator. This is because the high signal-to-noise ratio target has enough feature vector components to separate. However, in a real environment, the signal-to-noise ratio of the received target does not always exceed the threshold. Therefore, a target detection algorithm for various target signal-to-noise ratio environments is required; strong clutter energy can lead to false detection, while weak target signals reduce the probability of detection. It also uses long pulse repetition intervals for long-range detection and high ambient noise, requiring classification processing for each ping without accumulating pings. In this study, a target classification algorithm is proposed that can be applied to signals in real underwater environments above the noise level without a threshold set by the sonar console operator, and the classification performance of the algorithm is verified. The active sonar for long-range target detection has low-resolution data; thus, feature vector extraction algorithms are required. Feature vectors are extracted from the experimental data using Power-Normalized Cepstral Coefficients for target classification. Feature vectors are also extracted with Mel-Frequency Cepstral Coefficients and compared with the proposed algorithm. A convolutional neural network was employed as the classifier. In addition, the proposed algorithm is to be compared with the result of target classification using a spectrogram and convolutional neural network. Experimental data were obtained using a hull-mounted active sonar system operating on a Korean naval ship in the East Sea of South Korea and a real maneuvering underwater target. From the experimental data with 29 pings, we extracted 361 target and 3351 clutter data. It is difficult to collect real underwater target data from the real sea environment. Therefore, the number of target data was increased using the data augmentation technique. Eighty percent of the data was used for training and the rest was used for testing. Accuracy value curves and classification rate tables are presented for performance analysis and discussion. Results showed that the proposed algorithm has a higher classification rate than Mel-Frequency Cepstral Coefficients without affecting the target classification by the signal level. Additionally, the obtained results showed that target classification is possible within one ping data without any ping accumulation.

Download Full-text

Heartbeat sound classification using Mel-frequency cepstral coefficients and deep convolutional neural network

Advances in Computational Techniques for Biomedical Image Analysis ◽

10.1016/b978-0-12-820024-7.00006-2 ◽

2020 ◽

pp. 115-131

Author(s):

Shamik Tiwari ◽

Varun Sapra ◽

Anurag Jain

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Deep Convolutional Neural Network ◽

Mel Frequency Cepstral Coefficients ◽

Sound Classification ◽

Cepstral Coefficients

Download Full-text

Nhận dạng tiếng nói điều khiển với convolutional neural network (CNN)

Can Tho University Journal of Science ◽

10.22144/ctu.jvn.2021.111 ◽

2021 ◽

Vol 57 (4) ◽

pp. 30-39

Author(s):

Thuận Thương Thái

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Deep Neural Network ◽

Convolution Neural Network ◽

Mel Frequency Cepstral Coefficients ◽

Cepstral Coefficients

Điều khiển bằng giọng nói là một chức năng quan trọng trong nhiều thiết bị di động, hệ thống nhà thông minh, đặc biệt đó là một giải pháp giúp cho người khuyết tật có thể điều khiển được các thiết bị thông dụng trong cuộc sống. Bài báo trình bày một phương pháp nhận dạng tiếng nói điều khiển ngắn sử dụng đặc trưng MFCC (Mel frequency cepstral coefficients) và mô hình convolutional neural network (CNN). Dữ liệu âm thanh đầu vào là các file wave được giả định có thời lượng đúng 1 giây. Một cửa sổ trượt kích thước 30 ms với bước dịch chuyển 10 ms lần lượt trượt trên dữ liệu đầu vào để tính các thông số MFCC. Với mỗi tập tin đầu vào sẽ thu được 98 đặc trưng MFCC, mỗi đặc trưng MFCC là một vector 40 chiều (tương ứng 40 hệ số của các bộ lọc Mel-scales). Nghiên cứu đã để xuất sử dụng 3 mô hình Neural Network để phân lớp các tập tin tiếng nói điều khiển này: Mô hình Vanilla Neural Network 1 layer (1 softmax layer), Deep Neural Network - DNN (với 3 layers ẩn kết nối đầy đủ và 1 lớp output) và mô hình Convolution Neural Network - CNN. Các thực nghiệm được thực hiện trên tập dữ liệu “Speech Commands Dataset” của Google (https://ai.googleblog.com/2017/08/launching-speech-commands-dataset.html) gồm 65.000 mẫu được chia thành 30 lớp. Kết quả thực nghiệm cho thấy mô hình CNN đạt...

Download Full-text

Covid Classification Using Audio Data

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.38675 ◽

2021 ◽

Vol 9 (10) ◽

pp. 1633-1637

Author(s):

Adwait Patil

Keyword(s):

Neural Network ◽

Deep Learning ◽

Convolutional Neural Network ◽

Binary Classification ◽

Image Data ◽

Audio Classification ◽

Mel Frequency Cepstral Coefficients ◽

Audio Data ◽

Cepstral Coefficients ◽

Audio Files

Abstract: Coronavirus outbreak has affected the entire world adversely this project has been developed in order to help common masses diagnose their chances of been covid positive just by using coughing sound and basic patient data. Audio classification is one of the most interesting applications of deep learning. Similar to image data audio data is also stored in form of bits and to understand and analyze this audio data we have used Mel frequency cepstral coefficients (MFCCs) which makes it possible to feed the audio to our neural network. In this project we have used Coughvid a crowdsource dataset consisting of 27000 audio files and metadata of same amount of patients. In this project we have used a 1D Convolutional Neural Network (CNN) to process the audio and metadata. Future scope for this project will be a model that rates how likely it is that a person is infected instead of binary classification. Keywords: Audio classification, Mel frequency cepstral coefficients, Convolutional neural network, deep learning, Coughvid

Download Full-text

Faculty Opinions recommendation of A new dynamic visual acuity test to assess peripheral vestibular function.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.5430957.5389055 ◽

2010 ◽

Author(s):

Detlef Kömpf ◽

Christoph Helmchen ◽

Björn Machner

Keyword(s):

Visual Acuity ◽

Vestibular Function ◽

Dynamic Visual Acuity ◽

Visual Acuity Test ◽

Acuity Test

Download Full-text

A novel computerized visual acuity test in children

Journal of American Association for Pediatric Ophthalmology and Strabismus ◽

10.1016/j.jaapos.2006.11.103 ◽

2007 ◽

Vol 11 (1) ◽

pp. 92

Author(s):

Jeong-Min Hwang ◽

Young Joo Shin ◽

In Bum Lee ◽

Won Ryang Wee ◽

Jin Hak Lee

Keyword(s):

Visual Acuity ◽

Visual Acuity Test ◽

Acuity Test

Download Full-text

SMCS: Automatic Real-Time Classification of Ambient Sounds, Based on a Deep Neural Network and Mel Frequency Cepstral Coefficients

Communications in Computer and Information Science - Applied Technologies ◽

10.1007/978-3-030-42520-3_20 ◽

2020 ◽

pp. 245-253

Author(s):

María José Mora-Regalado ◽

Omar Ruiz-Vivanco ◽

Alexandra González-Eras ◽

Pablo Torres-Carrión

Keyword(s):

Neural Network ◽

Real Time ◽

Deep Neural Network ◽

Mel Frequency Cepstral Coefficients ◽

Cepstral Coefficients ◽

Real Time Classification

Download Full-text

Suggested Systems for the Uniform Illumination of Visual Acuity Test Charts

Military Medicine ◽

10.1093/milmed/116.1.37 ◽

1955 ◽

Vol 116 (1) ◽

pp. 37-42 ◽

Cited By ~ 1

Author(s):

Lawrence T. Odland ◽

Louise L. Sloan

Keyword(s):

Visual Acuity ◽

Uniform Illumination ◽

Visual Acuity Test ◽

Acuity Test

Download Full-text

Persian Handwritten Digit Recognition Using Combination of Convolutional Neural Network and Support Vector Machine Methods

The International Arab Journal of Information Technology ◽

10.34028/iajit/17/4/16 ◽

2020 ◽

Vol 17 (4) ◽

pp. 572-578

Author(s):

Mohammad Parseh ◽

Mohammad Rahmanimanesh ◽

Parviz Keshavarzi

Keyword(s):

Neural Network ◽

Support Vector Machine ◽

Feature Extraction ◽

Convolutional Neural Network ◽

Recognition Rate ◽

Support Vector ◽

Svm Classifier ◽

Handwritten Digit Recognition ◽

Digit Recognition ◽

Handwritten Digit

Persian handwritten digit recognition is one of the important topics of image processing which significantly considered by researchers due to its many applications. The most important challenges in Persian handwritten digit recognition is the existence of various patterns in Persian digit writing that makes the feature extraction step to be more complicated.Since the handcraft feature extraction methods are complicated processes and their performance level are not stable, most of the recent studies have concentrated on proposing a suitable method for automatic feature extraction. In this paper, an automatic method based on machine learning is proposed for high-level feature extraction from Persian digit images by using Convolutional Neural Network (CNN). After that, a non-linear multi-class Support Vector Machine (SVM) classifier is used for data classification instead of fully connected layer in final layer of CNN. The proposed method has been applied to HODA dataset and obtained 99.56% of recognition rate. Experimental results are comparable with previous state-of-the-art methods

Download Full-text