A Unified Framework for the Generation of Glottal Signals in Deep Learning-based Parametric Speech Synthesis Systems

Statistical parametric speech synthesis based on Hidden Markov Models has been an important technique for the production of artificial voices, due to its ability to produce results with high intelligibility and sophisticated features such as voice conversion and accent modification with a small footprint, particularly for low-resource languages where deep learning-based techniques remain unexplored. Despite the progress, the quality of the results, mainly based on Hidden Markov Models (HMM) does not reach those of the predominant approaches, based on unit selection of speech segments of deep learning. One of the proposals to improve the quality of HMM-based speech has been incorporating postfiltering stages, which pretend to increase the quality while preserving the advantages of the process. In this paper, we present a new approach to postfiltering synthesized voices with the application of discriminative postfilters, with several long short-term memory (LSTM) deep neural networks. Our motivation stems from modeling specific mapping from synthesized to natural speech on those segments corresponding to voiced or unvoiced sounds, due to the different qualities of those sounds and how HMM-based voices can present distinct degradation on each one. The paper analyses the discriminative postfilters obtained using five voices, evaluated using three objective measures, Mel cepstral distance and subjective tests. The results indicate the advantages of the discriminative postilters in comparison with the HTS voice and the non-discriminative postfilters.

Download Full-text

Towards Realizing Sign Language-to-Speech Conversion by Combining Deep Learning and Statistical Parametric Speech Synthesis

Communications in Computer and Information Science - Social Computing ◽

10.1007/978-981-10-2053-7_61 ◽

2016 ◽

pp. 678-690 ◽

Cited By ~ 2

Author(s):

Xiaochun An ◽

Hongwu Yang ◽

Zhenye Gan

Keyword(s):

Deep Learning ◽

Sign Language ◽

Speech Synthesis ◽

Statistical Parametric Speech Synthesis ◽

Parametric Speech Synthesis

Download Full-text

A unified framework for automated person re-indentification

Transport and Communication Science Journal ◽

10.25073/tcsj.71.7.11 ◽

2020 ◽

Vol 71 (7) ◽

pp. 868-880

Author(s):

Nguyen Hong-Quan ◽

Nguyen Thuy-Binh ◽

Tran Duc-Long ◽

Le Thi-Lan

Keyword(s):

Deep Learning ◽

Video Analysis ◽

Camera Network ◽

Unified Framework ◽

Person Detection ◽

Practical Applications ◽

Detection And Tracking ◽

Analysis System ◽

Bounding Boxes

Along with the strong development of camera networks, a video analysis system has been become more and more popular and has been applied in various practical applications. In this paper, we focus on person re-identification (person ReID) task that is a crucial step of video analysis systems. The purpose of person ReID is to associate multiple images of a given person when moving in a non-overlapping camera network. Many efforts have been made to person ReID. However, most of studies on person ReID only deal with well-alignment bounding boxes which are detected manually and considered as the perfect inputs for person ReID. In fact, when building a fully automated person ReID system the quality of the two previous steps that are person detection and tracking may have a strong effect on the person ReID performance. The contribution of this paper are two-folds. First, a unified framework for person ReID based on deep learning models is proposed. In this framework, the coupling of a deep neural network for person detection and a deep-learning-based tracking method is used. Besides, features extracted from an improved ResNet architecture are proposed for person representation to achieve a higher ReID accuracy. Second, our self-built dataset is introduced and employed for evaluation of all three steps in the fully automated person ReID framework.

Download Full-text

Integrating Articulatory Information in Deep Learning-Based Text-to-Speech Synthesis

10.21437/interspeech.2017-1762 ◽

2017 ◽

Cited By ~ 1

Author(s):

Beiming Cao ◽

Myungjong Kim ◽

Jan van Santen ◽

Ted Mau ◽

Jun Wang

Keyword(s):

Deep Learning ◽

Speech Synthesis ◽

Text To Speech ◽

Text To Speech Synthesis

Download Full-text

Evaluation of Block-Wise Parameter Generation for Statistical Parametric Speech Synthesis

10.21437/ssw.2019-31 ◽

2019 ◽

Author(s):

Nobuyuki Nishizawa ◽

Tomohiro Obara ◽

Gen Hattori

Keyword(s):

Speech Synthesis ◽

Statistical Parametric Speech Synthesis ◽

Parametric Speech Synthesis

Download Full-text

A Novel Load Forecasting Approach Based on Smart Meter Data Using Advance Preprocessing and Hybrid Deep Learning

Applied Sciences ◽

10.3390/app11062742 ◽

2021 ◽

Vol 11 (6) ◽

pp. 2742

Author(s):

Fatih Ünal ◽

Abdulaziz Almalaq ◽

Sami Ekici

Keyword(s):

Deep Learning ◽

Energy Consumption ◽

Prediction Models ◽

Short Term Memory ◽

Critical Role ◽

Load Forecasting ◽

Unified Framework ◽

Short Term ◽

Planning And Scheduling ◽

Short Term Load Forecasting

Short-term load forecasting models play a critical role in distribution companies in making effective decisions in their planning and scheduling for production and load balancing. Unlike aggregated load forecasting at the distribution level or substations, forecasting load profiles of many end-users at the customer-level, thanks to smart meters, is a complicated problem due to the high variability and uncertainty of load consumptions as well as customer privacy issues. In terms of customers’ short-term load forecasting, these models include a high level of nonlinearity between input data and output predictions, demanding more robustness, higher prediction accuracy, and generalizability. In this paper, we develop an advanced preprocessing technique coupled with a hybrid sequential learning-based energy forecasting model that employs a convolution neural network (CNN) and bidirectional long short-term memory (BLSTM) within a unified framework for accurate energy consumption prediction. The energy consumption outliers and feature clustering are extracted at the advanced preprocessing stage. The novel hybrid deep learning approach based on data features coding and decoding is implemented in the prediction stage. The proposed approach is tested and validated using real-world datasets in Turkey, and the results outperformed the traditional prediction models compared in this paper.

Download Full-text

TOWARDS AN UNIFIED FRAMEWORK FOR VALIDATING DEEP LEARNING METHODS FOR COLORECTAL POLYPS: FIRST STEPS

British Journal of Surgery ◽

10.1093/bjs/znab160.035 ◽

2021 ◽

Vol 108 (Supplement_3) ◽

Author(s):

L F Sánchez Peralta ◽

J F Ortega Morán ◽

Cr L Saratxaga ◽

J B Pagador ◽

A Picón ◽

...

Keyword(s):

Deep Learning ◽

Colorectal Polyps ◽

Polyp Detection ◽

Unified Framework ◽

Endoscopic Image ◽

Learning Techniques ◽

Great Utility ◽

Adenoma Detection ◽

Definition Of ◽

Basic Features

Abstract INTRODUCTION Deep learning techniques have significantly contributed to the field of medical imaging analysis. In case of colorectal cancer, they have shown a great utility for increasing the adenoma detection rate at colonoscopy, but a common validation methodology is still missing. In this study, we present preliminary efforts towards the definition of a validation framework. MATERIAL AND METHODS Different models based on different backbones and encoder-decoder architectures have been trained with a publicly available dataset that contains white light and NBI colonoscopy videos, with 76 different lesions from colonoscopy procedures in 48 human patients. A computer aided detection (CADe) demonstrator has been implemented to show the performance of the models. RESULTS This CADe demonstrator shows the areas detected as polyp by overlapping the predicted mask on the endoscopic image. It allows selecting the video to be used, among those from the test set. Although it only present basic features such as play, pause and moving to the next video, it easily loads the model and allows for visualization of results. The demonstrator is accompanied by a set of metrics to be used depending on the aimed task: polyp detection, localization and segmentation. CONCLUSIONS The use of this CADe demonstrator, together with a publicly available dataset and predefined metrics will allow for an easier and more fair comparison of methods. Further work is still required to validate the proposed framework.

Download Full-text

Product Inspection Methodology via Deep Learning: An Overview

Sensors ◽

10.3390/s21155039 ◽

2021 ◽

Vol 21 (15) ◽

pp. 5039

Author(s):

Tae-Hyun Kim ◽

Hye-Rin Kim ◽

Yeong-Jun Cho

Keyword(s):

Deep Learning ◽

Product Quality ◽

Inspection System ◽

Quality Inspection ◽

Learning Models ◽

Unified Framework ◽

Learning Techniques ◽

Inspection Systems ◽

System Maintenance ◽

Test Scenarios

In this study, we present a framework for product quality inspection based on deep learning techniques. First, we categorize several deep learning models that can be applied to product inspection systems. In addition, we explain the steps for building a deep-learning-based inspection system in detail. Second, we address connection schemes that efficiently link deep learning models to product inspection systems. Finally, we propose an effective method that can maintain and enhance a product inspection system according to improvement goals of the existing product inspection systems. The proposed system is observed to possess good system maintenance and stability owing to the proposed methods. All the proposed methods are integrated into a unified framework and we provide detailed explanations of each proposed method. In order to verify the effectiveness of the proposed system, we compare and analyze the performance of the methods in various test scenarios. We expect that our study will provide useful guidelines to readers who desire to implement deep-learning-based systems for product inspection.

Download Full-text