scholarly journals Semi-Natural and Spontaneous Speech Recognition Using Deep Neural Networks with Hybrid Features Unification

Processes ◽  
2021 ◽  
Vol 9 (12) ◽  
pp. 2286
Author(s):  
Ammar Amjad ◽  
Lal Khan ◽  
Hsien-Tsung Chang

Recently, identifying speech emotions in a spontaneous database has been a complex and demanding study area. This research presents an entirely new approach for recognizing semi-natural and spontaneous speech emotions with multiple feature fusion and deep neural networks (DNN). A proposed framework extracts the most discriminative features from hybrid acoustic feature sets. However, these feature sets may contain duplicate and irrelevant information, leading to inadequate emotional identification. Therefore, an support vector machine (SVM) algorithm is utilized to identify the most discriminative audio feature map after obtaining the relevant features learned by the fusion approach. We investigated our approach utilizing the eNTERFACE05 and BAUM-1s benchmark databases and observed a significant identification accuracy of 76% for a speaker-independent experiment with SVM and 59% accuracy with, respectively. Furthermore, experiments on the eNTERFACE05 and BAUM-1s dataset indicate that the suggested framework outperformed current state-of-the-art techniques on the semi-natural and spontaneous datasets.

SLEEP ◽  
2021 ◽  
Vol 44 (Supplement_2) ◽  
pp. A164-A164
Author(s):  
Pahnwat Taweesedt ◽  
JungYoon Kim ◽  
Jaehyun Park ◽  
Jangwoon Park ◽  
Munish Sharma ◽  
...  

Abstract Introduction Obstructive sleep apnea (OSA) is a common sleep-related breathing disorder with an estimation of one billion people. Full-night polysomnography is considered the gold standard for OSA diagnosis. However, it is time-consuming, expensive and is not readily available in many parts of the world. Many screening questionnaires and scores have been proposed for OSA prediction with high sensitivity and low specificity. The present study is intended to develop models with various machine learning techniques to predict the severity of OSA by incorporating features from multiple questionnaires. Methods Subjects who underwent full-night polysomnography in Torr sleep center, Texas and completed 5 OSA screening questionnaires/scores were included. OSA was diagnosed by using Apnea-Hypopnea Index ≥ 5. We trained five different machine learning models including Deep Neural Networks with the scaled principal component analysis (DNN-PCA), Random Forest (RF), Adaptive Boosting classifier (ABC), and K-Nearest Neighbors classifier (KNC) and Support Vector Machine Classifier (SVMC). Training:Testing subject ratio of 65:35 was used. All features including demographic data, body measurement, snoring and sleepiness history were obtained from 5 OSA screening questionnaires/scores (STOP-BANG questionnaires, Berlin questionnaires, NoSAS score, NAMES score and No-Apnea score). Performance parametrics were used to compare between machine learning models. Results Of 180 subjects, 51.5 % of subjects were male with mean (SD) age of 53.6 (15.1). One hundred and nineteen subjects were diagnosed with OSA. Area Under the Receiver Operating Characteristic Curve (AUROC) of DNN-PCA, RF, ABC, KNC, SVMC, STOP-BANG questionnaire, Berlin questionnaire, NoSAS score, NAMES score, and No-Apnea score were 0.85, 0.68, 0.52, 0.74, 0.75, 0.61, 0.63, 0,61, 0.58 and 0,58 respectively. DNN-PCA showed the highest AUROC with sensitivity of 0.79, specificity of 0.67, positive-predictivity of 0.93, F1 score of 0.86, and accuracy of 0.77. Conclusion Our result showed that DNN-PCA outperforms OSA screening questionnaires, scores and other machine learning models. Support (if any):


2021 ◽  
Vol 5 (2) ◽  
Author(s):  
Alexander Knyshov ◽  
Samantha Hoang ◽  
Christiane Weirauch

Abstract Automated insect identification systems have been explored for more than two decades but have only recently started to take advantage of powerful and versatile convolutional neural networks (CNNs). While typical CNN applications still require large training image datasets with hundreds of images per taxon, pretrained CNNs recently have been shown to be highly accurate, while being trained on much smaller datasets. We here evaluate the performance of CNN-based machine learning approaches in identifying three curated species-level dorsal habitus datasets for Miridae, the plant bugs. Miridae are of economic importance, but species-level identifications are challenging and typically rely on information other than dorsal habitus (e.g., host plants, locality, genitalic structures). Each dataset contained 2–6 species and 126–246 images in total, with a mean of only 32 images per species for the most difficult dataset. We find that closely related species of plant bugs can be identified with 80–90% accuracy based on their dorsal habitus alone. The pretrained CNN performed 10–20% better than a taxon expert who had access to the same dorsal habitus images. We find that feature extraction protocols (selection and combination of blocks of CNN layers) impact identification accuracy much more than the classifying mechanism (support vector machine and deep neural network classifiers). While our network has much lower accuracy on photographs of live insects (62%), overall results confirm that a pretrained CNN can be straightforwardly adapted to collection-based images for a new taxonomic group and successfully extract relevant features to classify insect species.


2021 ◽  
Author(s):  
Guojun Huang ◽  
Cheng Wang ◽  
Xi Fu

Aims: Individualized patient profiling is instrumental for personalized management in hepatocellular carcinoma (HCC). This study built a model based on bidirectional deep neural networks (BiDNNs), an unsupervised machine-learning approach, to integrate multi-omics data and predict survival in HCC. Methods: DNA methylation and mRNA expression data for HCC samples from the TCGA database were integrated using BiDNNs. With optimal clusters as labels, a support vector machine model was developed to predict survival. Results: Using the BiDNN-based model, samples were clustered into two survival subgroups. The survival subgroup classification was an independent prognostic factor. BiDNNs were superior to multimodal autoencoders. Conclusion: This study constructed and validated a BiDNN-based model for predicting prognosis in HCC, with implications for individualized therapies in HCC.


2020 ◽  
Vol 12 (15) ◽  
pp. 2353
Author(s):  
Henning Heiselberg

Classification of ships and icebergs in the Arctic in satellite images is an important problem. We study how to train deep neural networks for improving the discrimination of ships and icebergs in multispectral satellite images. We also analyze synthetic-aperture radar (SAR) images for comparison. The annotated datasets of ships and icebergs are collected from multispectral Sentinel-2 data and taken from the C-CORE dataset of Sentinel-1 SAR images. Convolutional Neural Networks with a range of hyperparameters are tested and optimized. Classification accuracies are considerably better for deep neural networks than for support vector machines. Deeper neural nets improve the accuracy per epoch but at the cost of longer processing time. Extending the datasets with semi-supervised data from Greenland improves the accuracy considerably whereas data augmentation by rotating and flipping the images has little effect. The resulting classification accuracies for ships and icebergs are 86% for the SAR data and 96% for the MSI data due to the better resolution and more multispectral bands. The size and quality of the datasets are essential for training the deep neural networks, and methods to improve them are discussed. The reduced false alarm rates and exploitation of multisensory data are important for Arctic search and rescue services.


2012 ◽  
Vol 4 (1) ◽  
pp. 1-16 ◽  
Author(s):  
Fei Peng ◽  
Juan Liu ◽  
Min Long

Examining the identification of natural images (NI) and computer generated graphics (CG), a novel method is proposed based on hybrid features. Since the image acquisition pipelines are different, some differences exist in statistical, visual, and noise characteristics between natural images and computer generated graphics. Firstly, the mean, variance, kurtosis, skew-ness, and median of the histograms of grayscale image in the spatial and wavelet domain are selected as statistical features. Secondly, the fractal dimensions of grayscale image and wavelet sub-bands are extracted as visual features. Thirdly, considering the shortage of the photo response non-uniformity noise (PRNU) acquired from wavelet based de-noising filter, a pre-processing of Gaussian high pass filter is applied to the image before the extraction of PRNU, and the physical features are calculated from the enhanced PRNU. In the identification, a support vector machine (SVM) classifier is used in experiments and an average classification accuracy of 94.29% is achieved, where the classification accuracy for computer generated graphics is 97.3% and for natural images is 91.28%. Analysis and discussion show that the method is suitable for the identification of natural images and computer generated graphics and can achieve better identification accuracy than the existing methods with fewer dimensions of features.


Author(s):  
Fei Peng ◽  
Juan Liu ◽  
Min Long

Examining the identification of natural images (NI) and computer generated graphics (CG), a novel method is proposed based on hybrid features. Since the image acquisition pipelines are different, some differences exist in statistical, visual, and noise characteristics between natural images and computer generated graphics. Firstly, the mean, variance, kurtosis, skew-ness, and median of the histograms of grayscale image in the spatial and wavelet domain are selected as statistical features. Secondly, the fractal dimensions of grayscale image and wavelet sub-bands are extracted as visual features. Thirdly, considering the shortage of the photo response non-uniformity noise (PRNU) acquired from wavelet based de-noising filter, a pre-processing of Gaussian high pass filter is applied to the image before the extraction of PRNU, and the physical features are calculated from the enhanced PRNU. In the identification, a support vector machine (SVM) classifier is used in experiments and an average classification accuracy of 94.29% is achieved, where the classification accuracy for computer generated graphics is 97.3% and for natural images is 91.28%. Analysis and discussion show that the method is suitable for the identification of natural images and computer generated graphics and can achieve better identification accuracy than the existing methods with fewer dimensions of features.


Symmetry ◽  
2019 ◽  
Vol 11 (4) ◽  
pp. 525 ◽  
Author(s):  
SON ◽  
KWON ◽  
PARK

Automatic gender classification in speech is a challenging research field with a wide range of applications in HCI (humancomputer interaction). A couple of decades of research have shown promising results, but there is still a need for improvement. Until now, gender classification has been made using differences in the spectral characteristics of males and females. We assumed that a neutral margin exists between the male and female spectral range. This margin causes misclassification of gender. To address this limitation, we studied three non-lexical speech features (fillers, overlapping, and lengthening). From the statistical analysis, we found that overlapping and lengthening are effective in gender classification. Next, we performed gender classification using overlapping, lengthening, and the baseline acoustic feature, Mel Frequency Cepstral Coefficient (MFCC). We have tried to achieve the best results by using various combinations of features at the same time or sequentially. We used two types of machine-learning methods, support vector machine (SVM) and recurrent neural networks (RNN), to classify the gender. We achieved 89.61% with RNN using a feature set including MFCC, overlapping, and lengthening at the same time. Also, we have reclassified using non-lexical features with only data belonging to the neutral margin which was empirically selected based on the result of gender classification with only MFCC. As a result, we determined that the accuracy of classification with RNN using lengthening was 1.83% better than when MFCC alone was used. We concluded that new speech features could be effective in improving gender classification through a behavioral approach, notably including emergency calls.


Sign in / Sign up

Export Citation Format

Share Document