Speaker Recognition of Noisy Short Utterance Based on Speech Frame Quality Discrimination and Three-stage Classification Model

2015 ◽  
Vol 8 (3) ◽  
pp. 135-146 ◽  
Author(s):  
Ying Chen ◽  
Zhenmin Tang
2020 ◽  
Author(s):  
Zhangfang Hu ◽  
Yaqin Fu ◽  
Xuan Xu ◽  
Hongwei Zhang

2013 ◽  
Author(s):  
A. Kanagasundaram ◽  
D. Dean ◽  
Javier Gonzalez-Dominguez ◽  
S. Sridharan ◽  
D. Ramos ◽  
...  

2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Huamin Zhao ◽  
Defang Xu ◽  
Olarewaju Lawal ◽  
Shujuan Zhang

How to quickly and accurately judge the maturity of muskmelon is very important to consumers and muskmelon sorting staff. This paper presents a novel approach to solve the difficulty of muskmelon maturity stage classification in greenhouse and other complex environments. The color characteristics of muskmelon were used as the main feature of maturity discrimination. A modified 29-layer ResNet was applied with the proposed two-way data augmentation methods for the maturity stages of muskmelon classification using indoor and outdoor datasets to create a robust classification model that can generalize better. The results showed that code data augmentation which is the first way caused more performance degradation than input image augmentation—the second way. This established the effectiveness of the code data augmentation compared to image augmentation. Nevertheless, the two-way data augmentations including the combination of outdoor and indoor datasets to create a classification model revealed an excellent performance of F1 score ∼99%, and hence the model is applicable to computer-based platform for quick muskmelon stages of maturity classification.


2018 ◽  
Vol 14 (7) ◽  
pp. 3244-3252 ◽  
Author(s):  
Zheli Liu ◽  
Zhendong Wu ◽  
Tong Li ◽  
Jin Li ◽  
Chao Shen

Information ◽  
2020 ◽  
Vol 11 (4) ◽  
pp. 205 ◽  
Author(s):  
Nikolaos Vryzas ◽  
Nikolaos Tsipas ◽  
Charalampos Dimoulas

Radio is evolving in a changing digital media ecosystem. Audio-on-demand has shaped the landscape of big unstructured audio data available online. In this paper, a framework for knowledge extraction is introduced, to improve discoverability and enrichment of the provided content. A web application for live radio production and streaming is developed. The application offers typical live mixing and broadcasting functionality, while performing real-time annotation as a background process by logging user operation events. For the needs of a typical radio station, a supervised speaker classification model is trained for the recognition of 24 known speakers. The model is based on a convolutional neural network (CNN) architecture. Since not all speakers are known in radio shows, a CNN-based speaker diarization method is also proposed. The trained model is used for the extraction of fixed-size identity d-vectors. Several clustering algorithms are evaluated, having the d-vectors as input. The supervised speaker recognition model for 24 speakers scores an accuracy of 88.34%, while unsupervised speaker diarization scores a maximum accuracy of 87.22%, as tested on an audio file with speech segments from three unknown speakers. The results are considered encouraging regarding the applicability of the proposed methodology.


Sign in / Sign up

Export Citation Format

Share Document