On training targets for deep learning approaches to clean speech magnitude spectrum estimation

The estimation of the clean speech short-time magnitude spectrum (MS) is key for speech enhancement and separation. Moreover, an automatic speech recognition (ASR) system that employs a front-end relies on clean speech MS estimation to remain robust. Training targets for deep learning approaches to clean speech MS estimation fall into three main categories: computational auditory scene analysis (CASA), MS, and minimum mean-square error (MMSE) training targets. In this study, we aim to determine which training target produces enhanced/separated speech at the highest quality and intelligibility, and which is most suitable as a front-end for robust ASR. The training targets were evaluated using a temporal convolutional network (TCN) on the DEMAND Voice Bank and Deep Xi datasets---which include real-world non-stationary and coloured noise sources at multiple SNR levels. Seven objective measures were used, including the word error rate (WER) of the Deep Speech ASR system. We find that MMSE training targets produce the highest objective quality scores. We also find that CASA training targets, in particular the ideal ratio mask (IRM), produce the highest intelligibility scores and perform best as a front-end for robust ASR.

Download Full-text

On Training Targets for Deep Learning Approaches to Clean Speech Magnitude Spectrum Estimation

10.36227/techrxiv.13012760.v1 ◽

2020 ◽

Author(s):

Aaron Nicolson ◽

Kuldip K. Paliwal

Keyword(s):

Deep Learning ◽

Minimum Mean Square Error ◽

Auditory Scene Analysis ◽

Spectrum Estimation ◽

Learning Approaches ◽

Computational Auditory Scene Analysis ◽

Convolutional Network ◽

Magnitude Spectrum ◽

Front End ◽

Asr System

The estimation of the clean speech short-time magnitude spectrum (MS) is key for speech enhancement and separation. Moreover, an automatic speech recognition (ASR) system that employs a front-end relies on clean speech MS estimation to remain robust. Training targets for deep learning approaches to clean speech MS estimation fall into three main categories: computational auditory scene analysis (CASA), MS, and minimum mean-square error (MMSE) training targets. In this study, we aim to determine which training target produces enhanced/separated speech at the highest quality and intelligibility, and which is most suitable as a front-end for robust ASR. The training targets were evaluated using a temporal convolutional network (TCN) on the DEMAND Voice Bank and Deep Xi datasets---which include real-world non-stationary and coloured noise sources at multiple SNR levels. Seven objective measures were used, including the word error rate (WER) of the Deep Speech ASR system. We find that MMSE training targets produce the highest objective quality scores. We also find that CASA training targets, in particular the ideal ratio mask (IRM), produce the highest intelligibility scores and perform best as a front-end for robust ASR.

Download Full-text

On Training Targets for Deep Learning Approaches to Clean Speech Magnitude Spectrum Estimation

10.36227/techrxiv.13012760.v2 ◽

2021 ◽

Author(s):

Aaron Nicolson ◽

Kuldip K. Paliwal

Keyword(s):

Deep Learning ◽

Speech Enhancement ◽

Signal To Noise Ratio ◽

Minimum Mean Square Error ◽

Auditory Scene Analysis ◽

Spectrum Estimation ◽

Learning Approaches ◽

Magnitude Spectrum ◽

Front End ◽

Asr System

Estimation of the clean speech short-time magnitude spectrum (MS) is key for speech enhancement and separation. Moreover, an automatic speech recognition (ASR) system that employs a front-end relies on clean speech MS estimation to remain robust. Training targets for deep learning approaches to clean speech MS estimation fall into three categories: computational auditory scene analysis (CASA), MS, and minimum mean-square error (MMSE) estimator training targets. The choice of training target can have a significant impact on speech enhancement/separation and robust ASR performance. Motivated by this, we find which training target produces enhanced/separated speech at the highest quality and intelligibility, and which is best for an ASR front-end. Three different deep neural network (DNN) types and two datasets that include real-world non-stationary and coloured noise sources at multiple SNR levels were used for evaluation. Ten objective measures were employed, including the word error rate (WER) of the Deep Speech ASR system. We find that training targets that estimate the <i>a priori</i> signal-to-noise ratio (SNR) for MMSE estimators produce the highest objective quality scores. Moreover, we find that the gain of MMSE estimators and the ideal amplitude mask (IAM) produce the highest objective intelligibility scores and are most suitable for an ASR front-end.

Download Full-text

Deep Learning Approaches for Whiteboard Image Quality Enhancement

Color and Imaging Conference ◽

10.2352/j.imagingsci.technol.2019.63.4.040404 ◽

2019 ◽

Vol 2019 (1) ◽

pp. 360-368

Author(s):

Mekides Assefa Abebe ◽

Jon Yngve Hardeberg

Keyword(s):

Deep Learning ◽

Image Quality ◽

Image Data ◽

Quality Enhancement ◽

Network Architectures ◽

Learning Approaches ◽

Data Set ◽

Image Quality Enhancement ◽

Processing Techniques ◽

White Balancing

Different whiteboard image degradations highly reduce the legibility of pen-stroke content as well as the overall quality of the images. Consequently, different researchers addressed the problem through different image enhancement techniques. Most of the state-of-the-art approaches applied common image processing techniques such as background foreground segmentation, text extraction, contrast and color enhancements and white balancing. However, such types of conventional enhancement methods are incapable of recovering severely degraded pen-stroke contents and produce artifacts in the presence of complex pen-stroke illustrations. In order to surmount such problems, the authors have proposed a deep learning based solution. They have contributed a new whiteboard image data set and adopted two deep convolutional neural network architectures for whiteboard image quality enhancement applications. Their different evaluations of the trained models demonstrated their superior performances over the conventional methods.

Download Full-text

Assessment of the Risk Factors of MDD Recurrence Based on Deep Learning Approaches

SSRN Electronic Journal ◽

10.2139/ssrn.3411719 ◽

2019 ◽

Author(s):

Qian Wu ◽

Weiling Zhao ◽

Xiaobo Yang ◽

Hua Tan ◽

Lei You ◽

...

Keyword(s):

Risk Factors ◽

Deep Learning ◽

Learning Approaches

Download Full-text

The Problem of Fraudulent Content on the Web: Deep Learning Approaches

SSRN Electronic Journal ◽

10.2139/ssrn.3575411 ◽

2020 ◽

Author(s):

Priyanka Meel ◽

Farhin Bano ◽

Dr. Dinesh K. Vishwakarma

Keyword(s):

Deep Learning ◽

Learning Approaches ◽

The Web

Download Full-text

Deep convolutional neural networks for cardiovascular vulnerable plaque detection

MATEC Web of Conferences ◽

10.1051/matecconf/201927702024 ◽

2019 ◽

Vol 277 ◽

pp. 02024 ◽

Cited By ~ 1

Author(s):

Lincan Li ◽

Tong Jia ◽

Tianqi Meng ◽

Yizhe Liu

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Convolutional Neural Networks ◽

Vulnerable Plaque ◽

Recall Rate ◽

Superior Performance ◽

Learning Approaches ◽

Deep Convolutional Neural Networks ◽

Vulnerable Plaques ◽

Plaque Detection

In this paper, an accurate two-stage deep learning method is proposed to detect vulnerable plaques in ultrasonic images of cardiovascular. Firstly, a Fully Convonutional Neural Network (FCN) named U-Net is used to segment the original Intravascular Optical Coherence Tomography (IVOCT) cardiovascular images. We experiment on different threshold values to find the best threshold for removing noise and background in the original images. Secondly, a modified Faster RCNN is adopted to do precise detection. The modified Faster R-CNN utilize six-scale anchors (122,162,322,642,1282,2562) instead of the conventional one scale or three scale approaches. First, we present three problems in cardiovascular vulnerable plaque diagnosis, then we demonstrate how our method solve these problems. The proposed method in this paper apply deep convolutional neural networks to the whole diagnostic procedure. Test results show the Recall rate, Precision rate, IoU (Intersection-over-Union) rate and Total score are 0.94, 0.885, 0.913 and 0.913 respectively, higher than the 1st team of CCCV2017 Cardiovascular OCT Vulnerable Plaque Detection Challenge. AP of the designed Faster RCNN is 83.4%, higher than conventional approaches which use one-scale or three-scale anchors. These results demonstrate the superior performance of our proposed method and the power of deep learning approaches in diagnose cardiovascular vulnerable plaques.

Download Full-text

Deep learning systems detect dysplasia with human-like accuracy using histopathology and probe-based confocal laser endomicroscopy

Scientific Reports ◽

10.1038/s41598-021-84510-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Shan Guleria ◽

Tilak U. Shah ◽

J. Vincent Pulido ◽

Matthew Fasullo ◽

Lubaina Ehsan ◽

...

Keyword(s):

Deep Learning ◽

Diagnostic Accuracy ◽

High Sensitivity ◽

Confocal Laser Endomicroscopy ◽

Confocal Laser ◽

Learning Approaches ◽

Learning Models ◽

Whole Slide Image ◽

Slide Image ◽

Level Model

AbstractProbe-based confocal laser endomicroscopy (pCLE) allows for real-time diagnosis of dysplasia and cancer in Barrett’s esophagus (BE) but is limited by low sensitivity. Even the gold standard of histopathology is hindered by poor agreement between pathologists. We deployed deep-learning-based image and video analysis in order to improve diagnostic accuracy of pCLE videos and biopsy images. Blinded experts categorized biopsies and pCLE videos as squamous, non-dysplastic BE, or dysplasia/cancer, and deep learning models were trained to classify the data into these three categories. Biopsy classification was conducted using two distinct approaches—a patch-level model and a whole-slide-image-level model. Gradient-weighted class activation maps (Grad-CAMs) were extracted from pCLE and biopsy models in order to determine tissue structures deemed relevant by the models. 1970 pCLE videos, 897,931 biopsy patches, and 387 whole-slide images were used to train, test, and validate the models. In pCLE analysis, models achieved a high sensitivity for dysplasia (71%) and an overall accuracy of 90% for all classes. For biopsies at the patch level, the model achieved a sensitivity of 72% for dysplasia and an overall accuracy of 90%. The whole-slide-image-level model achieved a sensitivity of 90% for dysplasia and 94% overall accuracy. Grad-CAMs for all models showed activation in medically relevant tissue regions. Our deep learning models achieved high diagnostic accuracy for both pCLE-based and histopathologic diagnosis of esophageal dysplasia and its precursors, similar to human accuracy in prior studies. These machine learning approaches may improve accuracy and efficiency of current screening protocols.

Download Full-text

Two deep learning approaches to forecasting disaggregated freight flows: convolutional and encoder–decoder recurrent

Soft Computing ◽

10.1007/s00500-021-05678-5 ◽

2021 ◽

Author(s):

Isidro Lloret ◽

José A. Troyano ◽

Fernando Enríquez ◽

Juan-José González-de-la-Rosa

Keyword(s):

Deep Learning ◽

Learning Approaches

Download Full-text

Deep Learning with Neuroimaging and Genomics in Alzheimer’s Disease

International Journal of Molecular Sciences ◽

10.3390/ijms22157911 ◽

2021 ◽

Vol 22 (15) ◽

pp. 7911

Author(s):

Eugene Lin ◽

Chieh-Hsin Lin ◽

Hsien-Yuan Lane

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Deep Learning ◽

Future Research ◽

Learning Approaches ◽

Learning Models ◽

Learning Techniques ◽

Neuroimaging Data ◽

Similarities And Differences ◽

Normal Controls

A growing body of evidence currently proposes that deep learning approaches can serve as an essential cornerstone for the diagnosis and prediction of Alzheimer’s disease (AD). In light of the latest advancements in neuroimaging and genomics, numerous deep learning models are being exploited to distinguish AD from normal controls and/or to distinguish AD from mild cognitive impairment in recent research studies. In this review, we focus on the latest developments for AD prediction using deep learning techniques in cooperation with the principles of neuroimaging and genomics. First, we narrate various investigations that make use of deep learning algorithms to establish AD prediction using genomics or neuroimaging data. Particularly, we delineate relevant integrative neuroimaging genomics investigations that leverage deep learning methods to forecast AD on the basis of incorporating both neuroimaging and genomics data. Moreover, we outline the limitations as regards to the recent AD investigations of deep learning with neuroimaging and genomics. Finally, we depict a discussion of challenges and directions for future research. The main novelty of this work is that we summarize the major points of these investigations and scrutinize the similarities and differences among these investigations.

Download Full-text