APSIPA Transactions on Signal and Information Processing

Analyzing public opinion on COVID-19 through different perspectives and stages

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2021.5 ◽

2021 ◽

pp. 1-13

Author(s):

Yuqi Gao ◽

Hang Hua ◽

Jiebo Luo

Keyword(s):

Public Opinion

Demystifying data and AI for manufacturing: case studies from a major computer maker

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2021.3 ◽

2021 ◽

Vol 10 ◽

Author(s):

Yi-Chun Chen ◽

Bo-Huei He ◽

Shih-Sung Lin ◽

Jonathan Hans Soeseno ◽

Daniel Stanley Tan ◽

...

Keyword(s):

Training Data ◽

Smart Manufacturing ◽

Reliable System ◽

Process Innovations ◽

Optical Inspection ◽

Technical Details ◽

Automation Technology ◽

Electronic Parts ◽

Traditional Approaches ◽

Inspection Machine

In this article, we discuss the backgrounds and technical details about several smart manufacturing projects in a tier-one electronics manufacturing facility. We devise a process to manage logistic forecast and inventory preparation for electronic parts using historical data and a recurrent neural network to achieve significant improvement over current methods. We present a system for automatically qualifying laptop software for mass production through computer vision and automation technology. The result is a reliable system that can save hundreds of man-years in the qualification process. Finally, we create a deep learning-based algorithm for visual inspection of product appearances, which requires significantly less defect training data compared to traditional approaches. For production needs, we design an automatic optical inspection machine suitable for our algorithm and process. We also discuss the issues for data collection and enabling smart manufacturing projects in a factory setting, where the projects operate on a delicate balance between process innovations and cost-saving measures.

Audio-to-score singing transcription based on a CRNN-HSMM hybrid model

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2021.4 ◽

2021 ◽

Vol 10 ◽

Author(s):

Ryo Nishikimi ◽

Eita Nakamura ◽

Masataka Goto ◽

Kazuyoshi Yoshii

Keyword(s):

Hybrid Model ◽

Viterbi Algorithm ◽

Language Model ◽

Expressive Power ◽

Acoustic Model ◽

Musical Language ◽

Musical Score ◽

Singing Voice ◽

Generative Process ◽

Music Signal

This paper describes an automatic singing transcription (AST) method that estimates a human-readable musical score of a sung melody from an input music signal. Because of the considerable pitch and temporal variation of a singing voice, a naive cascading approach that estimates an F0 contour and quantizes it with estimated tatum times cannot avoid many pitch and rhythm errors. To solve this problem, we formulate a unified generative model of a music signal that consists of a semi-Markov language model representing the generative process of latent musical notes conditioned on musical keys and an acoustic model based on a convolutional recurrent neural network (CRNN) representing the generative process of an observed music signal from the notes. The resulting CRNN-HSMM hybrid model enables us to estimate the most-likely musical notes from a music signal with the Viterbi algorithm, while leveraging both the grammatical knowledge about musical notes and the expressive power of the CRNN. The experimental results showed that the proposed method outperformed the conventional state-of-the-art method and the integration of the musical language model with the acoustic model has a positive effect on the AST performance.

Laplacian networks: bounding indicator function smoothness for neural networks robustness

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2021.2 ◽

2021 ◽

Vol 10 ◽

Author(s):

Carlos Lassance ◽

Vincent Gripon ◽

Antonio Ortega

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Supervised Learning ◽

Indicator Function ◽

Training Data ◽

Theoretical Justification ◽

The Past ◽

Noisy Examples

For the past few years, deep learning (DL) robustness (i.e. the ability to maintain the same decision when inputs are subject to perturbations) has become a question of paramount importance, in particular in settings where misclassification can have dramatic consequences. To address this question, authors have proposed different approaches, such as adding regularizers or training using noisy examples. In this paper we introduce a regularizer based on the Laplacian of similarity graphs obtained from the representation of training data at each layer of the DL architecture. This regularizer penalizes large changes (across consecutive layers in the architecture) in the distance between examples of different classes, and as such enforces smooth variations of the class boundaries. We provide theoretical justification for this regularizer and demonstrate its effectiveness to improve robustness on classical supervised learning vision datasets for various types of perturbations. We also show it can be combined with existing methods to increase overall robustness.

TGHop: an explainable, efficient, and lightweight method for texture generation

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2021.15 ◽

2021 ◽

Vol 10 ◽

Author(s):

Xuejing Lei ◽

Ganning Zhao ◽

Kaitai Zhang ◽

C.-C. Jay Kuo

Keyword(s):

Deep Neural Networks ◽

High Quality ◽

Real Samples ◽

The Core ◽

Fast Speed ◽

Large Size ◽

Texture Generation ◽

Model Size ◽

Small Model ◽

Computationally Expensive

An explainable, efficient, and lightweight method for texture generation, called TGHop (an acronym of Texture Generation PixelHop), is proposed in this work. Although synthesis of visually pleasant texture can be achieved by deep neural networks, the associated models are large in size, difficult to explain in theory, and computationally expensive in training. In contrast, TGHop is small in its model size, mathematically transparent, efficient in training and inference, and able to generate high-quality texture. Given an exemplary texture, TGHop first crops many sample patches out of it to form a collection of sample patches called the source. Then, it analyzes pixel statistics of samples from the source and obtains a sequence of fine-to-coarse subspaces for these patches by using the PixelHop++ framework. To generate texture patches with TGHop, we begin with the coarsest subspace, which is called the core, and attempt to generate samples in each subspace by following the distribution of real samples. Finally, texture patches are stitched to form texture images of a large size. It is demonstrated by experimental results that TGHop can generate texture images of superior quality with a small model size and at a fast speed.

Robust deep convolutional neural network against image distortions

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2021.14 ◽

2021 ◽

Vol 10 ◽

Author(s):

Liang-Yao Wang ◽

Sau-Gee Chen ◽

Feng-Tsun Chien

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

High Frequency ◽

Input Image ◽

Processing Unit ◽

Different Types ◽

Hybrid Module ◽

Frequency Components ◽

Discrete Wavelets ◽

Value Decomposition

Many approaches have been proposed in the literature to enhance the robustness of Convolutional Neural Network (CNN)-based architectures against image distortions. Attempts to combat various types of distortions can be made by combining multiple expert networks, each trained by a certain type of distorted images, which however lead to a large model with high complexity. In this paper, we propose a CNN-based architecture with a pre-processing unit in which only undistorted data are used for training. The pre-processing unit employs discrete cosine transform (DCT) and discrete wavelets transform (DWT) to remove high-frequency components while capturing prominent high-frequency features in the undistorted data by means of random selection. We further utilize the singular value decomposition (SVD) to extract features before feeding the preprocessed data into the CNN for training. During testing, distorted images directly enter the CNN for classification without having to go through the hybrid module. Five different types of distortions are produced in the SVHN dataset and the CIFAR-10/100 datasets. Experimental results show that the proposed DCT-DWT-SVD module built upon the CNN architecture provides a classifier robust to input image distortions, outperforming the state-of-the-art approaches in terms of accuracy under different types of distortions.

Automatic Deception Detection using Multiple Speech and Language Communicative Descriptors in Dialogs

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2021.6 ◽

2021 ◽

Vol 10 ◽

Author(s):

Huang-Cheng Chou ◽

Yi-Wen Liu ◽

Chi-Chun Lee

Keyword(s):

Domain Knowledge ◽

Deception Detection ◽

Temporal Dynamics ◽

Human Life ◽

Acoustic Features ◽

Textual Information ◽

Behavior Understanding ◽

Proposed Model ◽

Deceptive Behavior ◽

Textual Features

While deceptive behaviors are a natural part of human life, it is well known that human is generally bad at detecting deception. In this study, we present an automatic deception detection framework by comprehensively integrating prior domain knowledge in deceptive behavior understanding. Specifically, we compute acoustics, textual information, implicatures with non-verbal behaviors, and conversational temporal dynamics for improving automatic deception detection in dialogs. The proposed model reaches start-of-the-art performance on the Daily Deceptive Dialogues corpus of Mandarin (DDDM) database, 80.61% unweighted accuracy recall in deception recognition. In the further analyses, we reveal that (i) the deceivers’ deception behaviors can be observed from the interrogators’ behaviors in the conversational temporal dynamics features and (ii) some of the acoustic features (e.g. loudness and MFCC) and textual features are significant and effective indicators to detect deception behaviors.

A protection method of trained CNN model with a secret key from unauthorized access

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2021.9 ◽

2021 ◽

Vol 10 ◽

Author(s):

AprilPyone Maungmaung ◽

Hitoshi Kiya

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

State Of The Art ◽

Network Models ◽

The State ◽

Secret Key ◽

Neural Network Models ◽

Unauthorized Access ◽

Protection Method ◽

Novel Method

In this paper, we propose a novel method for protecting convolutional neural network models with a secret key set so that unauthorized users without the correct key set cannot access trained models. The method enables us to protect not only from copyright infringement but also the functionality of a model from unauthorized access without any noticeable overhead. We introduce three block-wise transformations with a secret key set to generate learnable transformed images: pixel shuffling, negative/positive transformation, and format-preserving Feistel-based encryption. Protected models are trained by using transformed images. The results of experiments with the CIFAR and ImageNet datasets show that the performance of a protected model was close to that of non-protected models when the key set was correct, while the accuracy severely dropped when an incorrect key set was given. The protected model was also demonstrated to be robust against various attacks. Compared with the state-of-the-art model protection with passports, the proposed method does not have any additional layers in the network, and therefore, there is no overhead during training and inference processes.

Cross-layer knowledge distillation with KL divergence and offline ensemble for compressing deep neural network

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2021.16 ◽

2021 ◽

Vol 10 ◽

Author(s):

Hsing-Hung Chou ◽

Ching-Te Chiu ◽

Yi-Ping Liao

Keyword(s):

Semantic Segmentation ◽

Experimental Results ◽

Cross Layer ◽

Compression Rate ◽

Student Model ◽

Kl Divergence ◽

Knowledge Distillation ◽

Student Models ◽

Trained Teachers ◽

High Level

Deep neural networks (DNN) have solved many tasks, including image classification, object detection, and semantic segmentation. However, when there are huge parameters and high level of computation associated with a DNN model, it becomes difficult to deploy on mobile devices. To address this difficulty, we propose an efficient compression method that can be split into three parts. First, we propose a cross-layer matrix to extract more features from the teacher's model. Second, we adopt Kullback Leibler (KL) Divergence in an offline environment to make the student model find a wider robust minimum. Finally, we propose the offline ensemble pre-trained teachers to teach a student model. To address dimension mismatch between teacher and student models, we adopt a $1\times 1$ convolution and two-stage knowledge distillation to release this constraint. We conducted experiments with VGG and ResNet models, using the CIFAR-100 dataset. With VGG-11 as the teacher's model and VGG-6 as the student's model, experimental results showed that the Top-1 accuracy increased by 3.57% with a $2.08\times$ compression rate and 3.5x computation rate. With ResNet-32 as the teacher's model and ResNet-8 as the student's model, experimental results showed that Top-1 accuracy increased by 4.38% with a $6.11\times$ compression rate and $5.27\times$ computation rate. In addition, we conducted experiments using the ImageNet $64\times 64$ dataset. With MobileNet-16 as the teacher's model and MobileNet-9 as the student's model, experimental results showed that the Top-1 accuracy increased by 3.98% with a $1.59\times$ compression rate and $2.05\times$ computation rate.

Two-stage pyramidal convolutional neural networks for image colorization

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2021.13 ◽

2021 ◽

Vol 10 ◽

Author(s):

Yu-Jen Wei ◽

Tsu-Tsai Wei ◽

Tien-Ying Kuo ◽

Po-Chyi Su

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Learning ◽

Computational Complexity ◽

Color Space ◽

Research Trend ◽

Two Stage ◽

Map Generation ◽

Generation Network ◽

Chroma Components

The development of colorization algorithms through deep learning has become the current research trend. These algorithms colorize grayscale images automatically and quickly, but the colors produced are usually subdued and have low saturation. This research addresses this issue of existing algorithms by presenting a two-stage convolutional neural network (CNN) structure with the first and second stages being a chroma map generation network and a refinement network, respectively. To begin, we convert the color space of an image from RGB to HSV to predict its low-resolution chroma components and therefore reduce the computational complexity. Following that, the first-stage output is zoomed in and its detail is enhanced with a pyramidal CNN, resulting in a colorized image. Experiments show that, while using fewer parameters, our methodology produces results with more realistic color and higher saturation than existing methods.

APSIPA Transactions on Signal and Information Processing
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Cambridge University Press

Analyzing public opinion on COVID-19 through different perspectives and stages

Demystifying data and AI for manufacturing: case studies from a major computer maker

Audio-to-score singing transcription based on a CRNN-HSMM hybrid model

Laplacian networks: bounding indicator function smoothness for neural networks robustness

TGHop: an explainable, efficient, and lightweight method for texture generation

Robust deep convolutional neural network against image distortions

Automatic Deception Detection using Multiple Speech and Language Communicative Descriptors in Dialogs

A protection method of trained CNN model with a secret key from unauthorized access

Cross-layer knowledge distillation with KL divergence and offline ensemble for compressing deep neural network

Two-stage pyramidal convolutional neural networks for image colorization

Export Citation Format

APSIPA Transactions on Signal and Information ProcessingLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Cambridge University Press

Analyzing public opinion on COVID-19 through different perspectives and stages

Demystifying data and AI for manufacturing: case studies from a major computer maker

Audio-to-score singing transcription based on a CRNN-HSMM hybrid model

Laplacian networks: bounding indicator function smoothness for neural networks robustness

TGHop: an explainable, efficient, and lightweight method for texture generation

Robust deep convolutional neural network against image distortions

Automatic Deception Detection using Multiple Speech and Language Communicative Descriptors in Dialogs

A protection method of trained CNN model with a secret key from unauthorized access

Cross-layer knowledge distillation with KL divergence and offline ensemble for compressing deep neural network

Two-stage pyramidal convolutional neural networks for image colorization

APSIPA Transactions on Signal and Information Processing
Latest Publications