Subword Recognition in Historical Arabic Documents using C-GRUs

The recent years have witnessed an increased tendency to digitize historical manuscripts that not only ensures the preservation of these collections but also allows researchers and end-users’ direct access to these images. Recognition of Arabic handwriting is challenging due to the highly cursive nature of the script and other challenges associated with historical documents (degradation etc.). This paper presents an end-to-end system to recognize Arabic handwritten sub words in historical documents. More specifically, we introduce a hybrid CNN-GRU model where the shallow convolutional network learns robust feature representations while the GRU layers carry out the sequence modelling and generate the transcription of the text. The proposed system is evaluated on two different datasets, IBN SINA and VML-HD reporting recognition rates of 96.10% and 98.60% respectively. A comparison with existing techniques evaluated on the same datasets validates the effectiveness of our proposed model in characterizing Arabic subwords.

Download Full-text

Efficient End-to-End Sentence-Level Lipreading with Temporal Convolutional Networks

Applied Sciences ◽

10.3390/app11156975 ◽

2021 ◽

Vol 11 (15) ◽

pp. 6975

Author(s):

Tao Zhang ◽

Lun He ◽

Xudong Li ◽

Guoqing Feng

Keyword(s):

Performance Improvement ◽

State Of The Art ◽

Error Rates ◽

Convolutional Network ◽

Convolutional Networks ◽

Sentence Level ◽

End To End ◽

High Level ◽

Improved Accuracy ◽

Talking Face

Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved a high level of accuracy on large datasets and made breakthrough progress. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data and have the defects of disappearing training gradient and slow convergence. To overcome these problems, we proposed an efficient end-to-end sentence-level lipreading model, using an encoder based on a 3D convolutional network, ResNet50, Temporal Convolutional Network (TCN), and a CTC objective function as the decoder. More importantly, the proposed architecture incorporates TCN as a feature learner to decode feature. It can partly eliminate the defects of RNN (LSTM, GRU) gradient disappearance and insufficient performance, and this yields notable performance improvement as well as faster convergence. Experiments show that the training and convergence speed are 50% faster than the state-of-the-art method, and improved accuracy by 2.4% on the GRID dataset.

Download Full-text

Adieu recurrence? End-to-end speech emotion recognition using a context stacking dilated convolutional network

2020 28th European Signal Processing Conference (EUSIPCO) ◽

10.23919/eusipco47968.2020.9287667 ◽

2021 ◽

Author(s):

Duowei Tang ◽

Peter Kuppens ◽

Luc Geurts ◽

Toon van Waterschoot

Keyword(s):

Emotion Recognition ◽

Speech Emotion Recognition ◽

Convolutional Network ◽

End To End

Download Full-text

Hierarchical Concept-Driven Language Model

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3451167 ◽

2021 ◽

Vol 15 (6) ◽

pp. 1-22

Author(s):

Yashen Wang ◽

Huanhuan Zhang ◽

Zhirun Liu ◽

Qiang Zhou

Keyword(s):

Language Model ◽

Generation Process ◽

Data Generation ◽

Modeling Framework ◽

Long Distance ◽

Short Text ◽

Proposed Model ◽

Scalable Inference ◽

End To End ◽

Hidden Layer

For guiding natural language generation, many semantic-driven methods have been proposed. While clearly improving the performance of the end-to-end training task, these existing semantic-driven methods still have clear limitations: for example, (i) they only utilize shallow semantic signals (e.g., from topic models) with only a single stochastic hidden layer in their data generation process, which suffer easily from noise (especially adapted for short-text etc.) and lack of interpretation; (ii) they ignore the sentence order and document context, as they treat each document as a bag of sentences, and fail to capture the long-distance dependencies and global semantic meaning of a document. To overcome these problems, we propose a novel semantic-driven language modeling framework, which is a method to learn a Hierarchical Language Model and a Recurrent Conceptualization-enhanced Gamma Belief Network, simultaneously. For scalable inference, we develop the auto-encoding Variational Recurrent Inference, allowing efficient end-to-end training and simultaneously capturing global semantics from a text corpus. Especially, this article introduces concept information derived from high-quality lexical knowledge graph Probase, which leverages strong interpretability and anti-nose capability for the proposed model. Moreover, the proposed model captures not only intra-sentence word dependencies, but also temporal transitions between sentences and inter-sentence concept dependence. Experiments conducted on several NLP tasks validate the superiority of the proposed approach, which could effectively infer meaningful hierarchical concept structure of document and hierarchical multi-scale structures of sequences, even compared with latest state-of-the-art Transformer-based models.

Download Full-text

A Convolutional Network with Multi-Scale and Attention Mechanisms for End-to-End Single-Channel Speech Enhancement

IEEE Signal Processing Letters ◽

10.1109/lsp.2021.3093859 ◽

2021 ◽

pp. 1-1

Author(s):

Xiang Xiaoxiao ◽

Zhang Xiaojuan ◽

Chen Haozhe

Keyword(s):

Speech Enhancement ◽

Single Channel ◽

Convolutional Network ◽

Multi Scale ◽

End To End

Download Full-text

Adversarial Deep Structural Networks for Mammographic Mass Segmentation

10.1101/095786 ◽

2016 ◽

Cited By ~ 13

Author(s):

Wentao Zhu ◽

Xiaohui Xie

Keyword(s):

Conditional Random Fields ◽

Model Potential ◽

Natural Image ◽

Structural Learning ◽

Mass Detection ◽

Convolutional Network ◽

Adversarial Training ◽

Mass Segmentation ◽

End To End ◽

Public Datasets

AbstractMass segmentation is an important task in mammogram analysis, providing effective morphological features and regions of interest (ROI) for mass detection and classification. Inspired by the success of using deep convolutional features for natural image analysis and conditional random fields (CRF) for structural learning, we propose an end-to-end network for mammographic mass segmentation. The network employs a fully convolutional network (FCN) to model potential function, followed by a CRF to perform structural learning. Because the mass distribution varies greatly with pixel position, the FCN is combined with position priori for the task. Due to the small size of mammogram datasets, we use adversarial training to control over-fitting. Four models with different convolutional kernels are further fused to improve the segmentation results. Experimental results on two public datasets, INbreast and DDSM-BCRP, show that our end-to-end network combined with adversarial training achieves the-state-of-the-art results.

Download Full-text

Residual Invertible Spatio-Temporal Network for Video Super-Resolution

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015981 ◽

2019 ◽

Vol 33 ◽

pp. 5981-5988 ◽

Cited By ~ 12

Author(s):

Xiaobin Zhu ◽

Zhuangzi Li ◽

Xiao-Yu Zhang ◽

Changsheng Li ◽

Yaqi Liu ◽

...

Keyword(s):

Spatial Information ◽

Super Resolution ◽

Temporal Consistency ◽

Temporal Network ◽

Convolutional Network ◽

Feature Representations ◽

Video Frames ◽

Temporal Features ◽

Benchmark Datasets ◽

Spatio Temporal

Video super-resolution is a challenging task, which has attracted great attention in research and industry communities. In this paper, we propose a novel end-to-end architecture, called Residual Invertible Spatio-Temporal Network (RISTN) for video super-resolution. The RISTN can sufficiently exploit the spatial information from low-resolution to high-resolution, and effectively models the temporal consistency from consecutive video frames. Compared with existing recurrent convolutional network based approaches, RISTN is much deeper but more efficient. It consists of three major components: In the spatial component, a lightweight residual invertible block is designed to reduce information loss during feature transformation and provide robust feature representations. In the temporal component, a novel recurrent convolutional model with residual dense connections is proposed to construct deeper network and avoid feature degradation. In the reconstruction component, a new fusion method based on the sparse strategy is proposed to integrate the spatial and temporal features. Experiments on public benchmark datasets demonstrate that RISTN outperforms the state-ofthe-art methods.

Download Full-text

Auditory Inspired Convolutional Neural Networks for Ship Type Classification with Raw Hydrophone Data

Entropy ◽

10.3390/e20120990 ◽

2018 ◽

Vol 20 (12) ◽

pp. 990 ◽

Cited By ~ 6

Author(s):

Sheng Shen ◽

Honghui Yang ◽

Junhao Li ◽

Guanghui Xu ◽

Meiping Sheng

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Filter Banks ◽

Auditory Filter ◽

Feature Representations ◽

Proposed Model ◽

Energy Pooling ◽

Practical Guidelines ◽

Frequency Components ◽

Underwater Acoustic Signal

Detecting and classifying ships based on radiated noise provide practical guidelines for the reduction of underwater noise footprint of shipping. In this paper, the detection and classification are implemented by auditory inspired convolutional neural networks trained from raw underwater acoustic signal. The proposed model includes three parts. The first part is performed by a multi-scale 1D time convolutional layer initialized by auditory filter banks. Signals are decomposed into frequency components by convolution operation. In the second part, the decomposed signals are converted into frequency domain by permute layer and energy pooling layer to form frequency distribution in auditory cortex. Then, 2D frequency convolutional layers are applied to discover spectro-temporal patterns, as well as preserve locality and reduce spectral variations in ship noise. In the third part, the whole model is optimized with an objective function of classification to obtain appropriate auditory filters and feature representations that are correlative with ship categories. The optimization reflects the plasticity of auditory system. Experiments on five ship types and background noise show that the proposed approach achieved an overall classification accuracy of 79.2%, which improved by 6% compared to conventional approaches. Auditory filter banks were adaptive in shape to improve accuracy of classification.

Download Full-text

Graph-Driven Generative Models for Heterogeneous Multi-Task Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5446 ◽

2020 ◽

Vol 34 (01) ◽

pp. 979-988

Author(s):

Wenlin Wang ◽

Hongteng Xu ◽

Zhe Gan ◽

Bai Li ◽

Guoyin Wang ◽

...

Keyword(s):

Generative Models ◽

Healthcare Applications ◽

Convolutional Network ◽

Learning Tasks ◽

Proposed Model ◽

Heterogeneous Learning ◽

Clinical Topic ◽

Generative Processes ◽

Admission Type ◽

Uniform Manner

We propose a novel graph-driven generative model, that unifies multiple heterogeneous learning tasks into the same framework. The proposed model is based on the fact that heterogeneous learning tasks, which correspond to different generative processes, often rely on data with a shared graph structure. Accordingly, our model combines a graph convolutional network (GCN) with multiple variational autoencoders, thus embedding the nodes of the graph (i.e., samples for the tasks) in a uniform manner, while specializing their organization and usage to different tasks. With a focus on healthcare applications (tasks), including clinical topic modeling, procedure recommendation and admission-type prediction, we demonstrate that our method successfully leverages information across different tasks, boosting performance in all tasks and outperforming existing state-of-the-art approaches.

Download Full-text

Revisiting Graph Based Collaborative Filtering: A Linear Residual Graph Convolutional Network Approach

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5330 ◽

2020 ◽

Vol 34 (01) ◽

pp. 27-34 ◽

Cited By ~ 5

Author(s):

Lei Chen ◽

Le Wu ◽

Richang Hong ◽

Kun Zhang ◽

Meng Wang

Keyword(s):

Collaborative Filtering ◽

Representation Learning ◽

Superior Performance ◽

Convolutional Network ◽

Convolutional Networks ◽

Proposed Model ◽

Non Linear ◽

Efficiency And Effectiveness ◽

Residual Graph ◽

Interaction Modeling

Graph Convolutional Networks~(GCNs) are state-of-the-art graph based representation learning models by iteratively stacking multiple layers of convolution aggregation operations and non-linear activation operations. Recently, in Collaborative Filtering~(CF) based Recommender Systems~(RS), by treating the user-item interaction behavior as a bipartite graph, some researchers model higher-layer collaborative signals with GCNs. These GCN based recommender models show superior performance compared to traditional works. However, these models suffer from training difficulty with non-linear activations for large user-item graphs. Besides, most GCN based models could not model deeper layers due to the over smoothing effect with the graph convolution operation. In this paper, we revisit GCN based CF models from two aspects. First, we empirically show that removing non-linearities would enhance recommendation performance, which is consistent with the theories in simple graph convolutional networks. Second, we propose a residual network structure that is specifically designed for CF with user-item interaction modeling, which alleviates the over smoothing problem in graph convolution aggregation operation with sparse user-item interaction data. The proposed model is a linear model and it is easy to train, scale to large datasets, and yield better efficiency and effectiveness on two real datasets. We publish the source code at https://github.com/newlei/LR-GCCF.

Download Full-text

An Innovative Approach for the Protection of Healthcare Information Through the End-to-End Pseudo-Anonymization of End-Users

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering - Internet of Things. User-Centric IoT ◽

10.1007/978-3-319-19656-5_30 ◽

2015 ◽

pp. 210-216

Author(s):

Panagiotis Gouvas ◽

Anastasios Zafeiropoulos ◽

Konstantinos Perakis ◽

Thanasis Bouras

Keyword(s):

End Users ◽

Innovative Approach ◽

Healthcare Information ◽

End To End

Download Full-text