scholarly journals A Survey on “Text-to-Speech Systems for Real-Time Audio Synthesis”

Author(s):  
Prof. Mrunalinee Patole ◽  
Akhilesh Pandey ◽  
Kaustubh Bhagwat ◽  
Mukesh Vaishnav ◽  
Salikram Chadar

Text to Speech (TTS) is a form of speech synthesis wherein the text is converted right into a spoken human-like voice output. The state of the art strategies for TTS employs a neural network based totally method. This paintings pursuits to take a look at a number of the problems and barriers gift inside the contemporary works, especially Tacotron-2, and attempts to in addition enhance its performance by means of editing its structure. till now many papers were published on these topics that display various exceptional TTS structures by means of developing new TTS products. The aim is to have a look at different textual content-to-Speech structures. in comparison to different text-to-Speech systems, Tacotron2 has multiple blessings. In opportunity algorithms like CNN, speedy-CNN the algorithmic program may not investigate the photo fully however in YOLO the algorithmic application check out the picture absolutely by predicting the bounding boxes through using convolutional network and possibilities for those packing containers and detects the image faster in comparison to alternative algorithms.

Author(s):  
Rajae Moumen ◽  
Raddouane Chiheb ◽  
Rdouan Faizi

The aim of this research is to propose a fully convolutional approach to address the problem of real-time scene text detection for Arabic language. Text detection is performed using a two-steps multi-scale approach. The first step uses light-weighted fully convolutional network: TextBlockDetector FCN, an adaptation of VGG-16 to eliminate non-textual elements, localize wide scale text and give text scale estimation. The second step determines narrow scale range of text using fully convolutional network for maximum performance. To evaluate the system, we confront the results of the framework to the results obtained with single VGG-16 fully deployed for text detection in one-shot; in addition to previous results in the state-of-the-art. For training and testing, we initiate a dataset of 575 images manually processed along with data augmentation to enrich training process. The system scores a precision of 0.651 vs 0.64 in the state-of-the-art and a FPS of 24.3 vs 31.7 for a VGG-16 fully deployed.


2015 ◽  
Vol 738-739 ◽  
pp. 1105-1110 ◽  
Author(s):  
Yuan Qing Qin ◽  
Ying Jie Cheng ◽  
Chun Jie Zhou

This paper mainly surveys the state-of-the-art on real-time communicaton in industrial wireless local networks(WLANs), and also identifys the suitable approaches to deal with the real-time requirements in future. Firstly, this paper summarizes the features of industrial WLANs and the challenges it encounters. Then according to the real-time problems of industrial WLAN, the fundamental mechanism of each recent representative resolution is analyzed in detail. Meanwhile, the characteristics and performance of these resolutions are adequately compared. Finally, this paper concludes the current of the research and discusses the future development of industrial WLANs.


Sensors ◽  
2022 ◽  
Vol 22 (2) ◽  
pp. 458
Author(s):  
Zakaria El Mrabet ◽  
Niroop Sugunaraj ◽  
Prakash Ranganathan ◽  
Shrirang Abhyankar

Power system failures or outages due to short-circuits or “faults” can result in long service interruptions leading to significant socio-economic consequences. It is critical for electrical utilities to quickly ascertain fault characteristics, including location, type, and duration, to reduce the service time of an outage. Existing fault detection mechanisms (relays and digital fault recorders) are slow to communicate the fault characteristics upstream to the substations and control centers for action to be taken quickly. Fortunately, due to availability of high-resolution phasor measurement units (PMUs), more event-driven solutions can be captured in real time. In this paper, we propose a data-driven approach for determining fault characteristics using samples of fault trajectories. A random forest regressor (RFR)-based model is used to detect real-time fault location and its duration simultaneously. This model is based on combining multiple uncorrelated trees with state-of-the-art boosting and aggregating techniques in order to obtain robust generalizations and greater accuracy without overfitting or underfitting. Four cases were studied to evaluate the performance of RFR: 1. Detecting fault location (case 1), 2. Predicting fault duration (case 2), 3. Handling missing data (case 3), and 4. Identifying fault location and length in a real-time streaming environment (case 4). A comparative analysis was conducted between the RFR algorithm and state-of-the-art models, including deep neural network, Hoeffding tree, neural network, support vector machine, decision tree, naive Bayesian, and K-nearest neighborhood. Experiments revealed that RFR consistently outperformed the other models in detection accuracy, prediction error, and processing time.


Author(s):  
FRANCK LECLERC ◽  
RÉJEAN PLAMONDON

This paper is a follow up to an article published in 1989 by R. Plamondon and G. Lorette on the state of the art in automatic signature verification and writer identification. It summarizes the activity from year 1989 to 1993 in automatic signature verification. For this purpose, we report on the different projects dealing with dynamic, static and neural network approaches. In each section, a brief description of the major investigations is given.


Author(s):  
Joseph Wilder ◽  
J.K. Aggarwal ◽  
P. Besl ◽  
T. Kanade ◽  
A. Slotwinski ◽  
...  

2021 ◽  
Author(s):  
Muhammad Shahroz Nadeem ◽  
Sibt Hussain ◽  
Fatih Kurugollu

This paper solves the textual deblurring problem, In this paper we propose a new loss function, we provide empirical evaluation of the design choices based on which a memory friendly CNN model is proposed, that performs better then the state of the art CNN method.


Author(s):  
Chenggang Yan ◽  
Tong Teng ◽  
Yutao Liu ◽  
Yongbing Zhang ◽  
Haoqian Wang ◽  
...  

The difficulty of no-reference image quality assessment (NR IQA) often lies in the lack of knowledge about the distortion in the image, which makes quality assessment blind and thus inefficient. To tackle such issue, in this article, we propose a novel scheme for precise NR IQA, which includes two successive steps, i.e., distortion identification and targeted quality evaluation. In the first step, we employ the well-known Inception-ResNet-v2 neural network to train a classifier that classifies the possible distortion in the image into the four most common distortion types, i.e., Gaussian white noise (WN), Gaussian blur (GB), jpeg compression (JPEG), and jpeg2000 compression (JP2K). Specifically, the deep neural network is trained on the large-scale Waterloo Exploration database, which ensures the robustness and high performance of distortion classification. In the second step, after determining the distortion type of the image, we then design a specific approach to quantify the image distortion level, which can estimate the image quality specially and more precisely. Extensive experiments performed on LIVE, TID2013, CSIQ, and Waterloo Exploration databases demonstrate that (1) the accuracy of our distortion classification is higher than that of the state-of-the-art distortion classification methods, and (2) the proposed NR IQA method outperforms the state-of-the-art NR IQA methods in quantifying the image quality.


Sensors ◽  
2019 ◽  
Vol 19 (23) ◽  
pp. 5243
Author(s):  
Zhang ◽  
Pan ◽  
Ma

Docking ring is a circular hatch of spacecraft that allows servicing spacecraft to dock in various space missions. The detection of the ring is greatly beneficial to automatic capture, rendezvous and docking. Based on its geometrical shape, we propose a real-time docking ring detection method for on-orbit spacecraft. Firstly, we extract arcs from the edge mask and classify them into four classes according to edge direction and convexity. By developing the arc selection strategy, we select a combination of arcs possibly belonging to the same ellipse, and then estimate its parameters via the least squares fitting technique. Candidate ellipses are validated according to the fitness of the estimation with the actual edge pixels. The experiments show that our method is superior to the state-of-the-art methods, and can be used in real time application. The method can also be extended to other applications.


Author(s):  
Jianwen Jiang ◽  
Di Bao ◽  
Ziqiang Chen ◽  
Xibin Zhao ◽  
Yue Gao

3D shape retrieval has attracted much attention and become a hot topic in computer vision field recently.With the development of deep learning, 3D shape retrieval has also made great progress and many view-based methods have been introduced in recent years. However, how to represent 3D shapes better is still a challenging problem. At the same time, the intrinsic hierarchical associations among views still have not been well utilized. In order to tackle these problems, in this paper, we propose a multi-loop-view convolutional neural network (MLVCNN) framework for 3D shape retrieval. In this method, multiple groups of views are extracted from different loop directions first. Given these multiple loop views, the proposed MLVCNN framework introduces a hierarchical view-loop-shape architecture, i.e., the view level, the loop level, and the shape level, to conduct 3D shape representation from different scales. In the view-level, a convolutional neural network is first trained to extract view features. Then, the proposed Loop Normalization and LSTM are utilized for each loop of view to generate the loop-level features, which considering the intrinsic associations of the different views in the same loop. Finally, all the loop-level descriptors are combined into a shape-level descriptor for 3D shape representation, which is used for 3D shape retrieval. Our proposed method has been evaluated on the public 3D shape benchmark, i.e., ModelNet40. Experiments and comparisons with the state-of-the-art methods show that the proposed MLVCNN method can achieve significant performance improvement on 3D shape retrieval tasks. Our MLVCNN outperforms the state-of-the-art methods by the mAP of 4.84% in 3D shape retrieval task. We have also evaluated the performance of the proposed method on the 3D shape classification task where MLVCNN also achieves superior performance compared with recent methods.


2020 ◽  
Vol 10 (2) ◽  
pp. 84 ◽  
Author(s):  
Atif Mehmood ◽  
Muazzam Maqsood ◽  
Muzaffar Bashir ◽  
Yang Shuyuan

Alzheimer’s disease (AD) may cause damage to the memory cells permanently, which results in the form of dementia. The diagnosis of Alzheimer’s disease at an early stage is a problematic task for researchers. For this, machine learning and deep convolutional neural network (CNN) based approaches are readily available to solve various problems related to brain image data analysis. In clinical research, magnetic resonance imaging (MRI) is used to diagnose AD. For accurate classification of dementia stages, we need highly discriminative features obtained from MRI images. Recently advanced deep CNN-based models successfully proved their accuracy. However, due to a smaller number of image samples available in the datasets, there exist problems of over-fitting hindering the performance of deep learning approaches. In this research, we developed a Siamese convolutional neural network (SCNN) model inspired by VGG-16 (also called Oxford Net) to classify dementia stages. In our approach, we extend the insufficient and imbalanced data by using augmentation approaches. Experiments are performed on a publicly available dataset open access series of imaging studies (OASIS), by using the proposed approach, an excellent test accuracy of 99.05% is achieved for the classification of dementia stages. We compared our model with the state-of-the-art models and discovered that the proposed model outperformed the state-of-the-art models in terms of performance, efficiency, and accuracy.


Sign in / Sign up

Export Citation Format

Share Document