Speaker Recognition of Noisy Short Utterance Based on Speech Frame Quality Discrimination and Three-stage Classification Model

How to quickly and accurately judge the maturity of muskmelon is very important to consumers and muskmelon sorting staff. This paper presents a novel approach to solve the difficulty of muskmelon maturity stage classification in greenhouse and other complex environments. The color characteristics of muskmelon were used as the main feature of maturity discrimination. A modified 29-layer ResNet was applied with the proposed two-way data augmentation methods for the maturity stages of muskmelon classification using indoor and outdoor datasets to create a robust classification model that can generalize better. The results showed that code data augmentation which is the first way caused more performance degradation than input image augmentation—the second way. This established the effectiveness of the code data augmentation compared to image augmentation. Nevertheless, the two-way data augmentations including the combination of outdoor and indoor datasets to create a classification model revealed an excellent performance of F1 score ∼99%, and hence the model is applicable to computer-based platform for quick muskmelon stages of maturity classification.

Download Full-text

Prosodic Features Based Text-dependent Speaker Recognition with Short Utterance

Communications in Computer and Information Science - Computational Intelligence and Intelligent Systems ◽

10.1007/978-981-10-0356-1_57 ◽

2016 ◽

pp. 541-552

Author(s):

Jianwu Zhang ◽

Jianchao He ◽

Zhendong Wu ◽

Ping Li

Keyword(s):

Speaker Recognition ◽

Prosodic Features ◽

Short Utterance

Download Full-text

GMM and CNN Hybrid Method for Short Utterance Speaker Recognition

IEEE Transactions on Industrial Informatics ◽

10.1109/tii.2018.2799928 ◽

2018 ◽

Vol 14 (7) ◽

pp. 3244-3252 ◽

Cited By ~ 48

Author(s):

Zheli Liu ◽

Zhendong Wu ◽

Tong Li ◽

Jin Li ◽

Chao Shen

Keyword(s):

Hybrid Method ◽

Speaker Recognition ◽

Short Utterance

Download Full-text

Web Radio Automation for Audio Stream Management in the Era of Big Data

Information ◽

10.3390/info11040205 ◽

2020 ◽

Vol 11 (4) ◽

pp. 205 ◽

Cited By ~ 1

Author(s):

Nikolaos Vryzas ◽

Nikolaos Tsipas ◽

Charalampos Dimoulas

Keyword(s):

Digital Media ◽

Speaker Recognition ◽

Web Application ◽

Clustering Algorithms ◽

Radio Station ◽

Classification Model ◽

Speaker Diarization ◽

Stream Management ◽

Media Ecosystem ◽

Audio Data

Radio is evolving in a changing digital media ecosystem. Audio-on-demand has shaped the landscape of big unstructured audio data available online. In this paper, a framework for knowledge extraction is introduced, to improve discoverability and enrichment of the provided content. A web application for live radio production and streaming is developed. The application offers typical live mixing and broadcasting functionality, while performing real-time annotation as a background process by logging user operation events. For the needs of a typical radio station, a supervised speaker classification model is trained for the recognition of 24 known speakers. The model is based on a convolutional neural network (CNN) architecture. Since not all speakers are known in radio shows, a CNN-based speaker diarization method is also proposed. The trained model is used for the extraction of fixed-size identity d-vectors. Several clustering algorithms are evaluated, having the d-vectors as input. The supervised speaker recognition model for 24 speakers scores an accuracy of 88.34%, while unsupervised speaker diarization scores a maximum accuracy of 87.22%, as tested on an audio file with speech segments from three unknown speakers. The results are considered encouraging regarding the applicability of the proposed methodology.

Download Full-text