scholarly journals AutoVAT: An Automated Visual Acuity Test Using Spoken Digit Recognition with Mel Frequency Cepstral Coefficients and Convolutional Neural Network

2021 ◽  
Vol 179 ◽  
pp. 458-467
Author(s):  
Derryl Taufik ◽  
Novita Hanafiah
2020 ◽  
Vol 10 (23) ◽  
pp. 8450
Author(s):  
Seungwoo Lee ◽  
Iksu Seo ◽  
Jongwon Seok ◽  
Yunsu Kim ◽  
Dong Seog Han

Detection and classification of unidentified underwater targets maneuvering in complex underwater environments are critical for active sonar systems. In previous studies, many detection methods were applied to separate targets from the clutter using signals that exceed a preset threshold determined by the sonar console operator. This is because the high signal-to-noise ratio target has enough feature vector components to separate. However, in a real environment, the signal-to-noise ratio of the received target does not always exceed the threshold. Therefore, a target detection algorithm for various target signal-to-noise ratio environments is required; strong clutter energy can lead to false detection, while weak target signals reduce the probability of detection. It also uses long pulse repetition intervals for long-range detection and high ambient noise, requiring classification processing for each ping without accumulating pings. In this study, a target classification algorithm is proposed that can be applied to signals in real underwater environments above the noise level without a threshold set by the sonar console operator, and the classification performance of the algorithm is verified. The active sonar for long-range target detection has low-resolution data; thus, feature vector extraction algorithms are required. Feature vectors are extracted from the experimental data using Power-Normalized Cepstral Coefficients for target classification. Feature vectors are also extracted with Mel-Frequency Cepstral Coefficients and compared with the proposed algorithm. A convolutional neural network was employed as the classifier. In addition, the proposed algorithm is to be compared with the result of target classification using a spectrogram and convolutional neural network. Experimental data were obtained using a hull-mounted active sonar system operating on a Korean naval ship in the East Sea of South Korea and a real maneuvering underwater target. From the experimental data with 29 pings, we extracted 361 target and 3351 clutter data. It is difficult to collect real underwater target data from the real sea environment. Therefore, the number of target data was increased using the data augmentation technique. Eighty percent of the data was used for training and the rest was used for testing. Accuracy value curves and classification rate tables are presented for performance analysis and discussion. Results showed that the proposed algorithm has a higher classification rate than Mel-Frequency Cepstral Coefficients without affecting the target classification by the signal level. Additionally, the obtained results showed that target classification is possible within one ping data without any ping accumulation.


2021 ◽  
Vol 57 (4) ◽  
pp. 30-39
Author(s):  
Thuận Thương Thái

Điều khiển bằng giọng nói là một chức năng quan trọng trong nhiều thiết bị di động, hệ thống nhà thông minh, đặc biệt đó là một giải pháp giúp cho người khuyết tật có thể điều khiển được các thiết bị thông dụng trong cuộc sống. Bài báo trình bày một phương pháp nhận dạng tiếng nói điều khiển ngắn sử dụng đặc trưng MFCC (Mel frequency cepstral coefficients) và mô hình convolutional neural network (CNN). Dữ liệu âm thanh đầu vào là các file wave được giả định có thời lượng đúng 1 giây. Một cửa sổ trượt kích thước 30 ms với bước dịch chuyển 10 ms lần lượt trượt trên dữ liệu đầu vào để tính các thông số MFCC. Với mỗi tập tin đầu vào sẽ thu được 98 đặc trưng MFCC, mỗi đặc trưng MFCC là một vector 40 chiều (tương ứng 40 hệ số của các bộ lọc Mel-scales). Nghiên cứu đã để xuất sử dụng 3 mô hình Neural Network để phân lớp các tập tin tiếng nói điều khiển này: Mô hình Vanilla Neural Network 1 layer (1 softmax layer), Deep Neural Network - DNN (với 3 layers ẩn kết nối đầy đủ và 1 lớp output) và mô hình Convolution Neural Network - CNN. Các thực nghiệm được thực hiện trên tập dữ liệu “Speech Commands Dataset” của Google (https://ai.googleblog.com/2017/08/launching-speech-commands-dataset.html) gồm 65.000 mẫu được chia thành 30 lớp. Kết quả thực nghiệm cho thấy mô hình CNN đạt...


Author(s):  
Adwait Patil

Abstract: Coronavirus outbreak has affected the entire world adversely this project has been developed in order to help common masses diagnose their chances of been covid positive just by using coughing sound and basic patient data. Audio classification is one of the most interesting applications of deep learning. Similar to image data audio data is also stored in form of bits and to understand and analyze this audio data we have used Mel frequency cepstral coefficients (MFCCs) which makes it possible to feed the audio to our neural network. In this project we have used Coughvid a crowdsource dataset consisting of 27000 audio files and metadata of same amount of patients. In this project we have used a 1D Convolutional Neural Network (CNN) to process the audio and metadata. Future scope for this project will be a model that rates how likely it is that a person is infected instead of binary classification. Keywords: Audio classification, Mel frequency cepstral coefficients, Convolutional neural network, deep learning, Coughvid


Author(s):  
Jeong-Min Hwang ◽  
Young Joo Shin ◽  
In Bum Lee ◽  
Won Ryang Wee ◽  
Jin Hak Lee

1955 ◽  
Vol 116 (1) ◽  
pp. 37-42 ◽  
Author(s):  
Lawrence T. Odland ◽  
Louise L. Sloan

2020 ◽  
Vol 17 (4) ◽  
pp. 572-578
Author(s):  
Mohammad Parseh ◽  
Mohammad Rahmanimanesh ◽  
Parviz Keshavarzi

Persian handwritten digit recognition is one of the important topics of image processing which significantly considered by researchers due to its many applications. The most important challenges in Persian handwritten digit recognition is the existence of various patterns in Persian digit writing that makes the feature extraction step to be more complicated.Since the handcraft feature extraction methods are complicated processes and their performance level are not stable, most of the recent studies have concentrated on proposing a suitable method for automatic feature extraction. In this paper, an automatic method based on machine learning is proposed for high-level feature extraction from Persian digit images by using Convolutional Neural Network (CNN). After that, a non-linear multi-class Support Vector Machine (SVM) classifier is used for data classification instead of fully connected layer in final layer of CNN. The proposed method has been applied to HODA dataset and obtained 99.56% of recognition rate. Experimental results are comparable with previous state-of-the-art methods


Sign in / Sign up

Export Citation Format

Share Document