scholarly journals CircConv: A Structured Convolution with Low Complexity

Author(s):  
Siyu Liao ◽  
Bo Yuan

Deep neural networks (DNNs), especially deep convolutional neural networks (CNNs), have emerged as the powerful technique in various machine learning applications. However, the large model sizes of DNNs yield high demands on computation resource and weight storage, thereby limiting the practical deployment of DNNs. To overcome these limitations, this paper proposes to impose the circulant structure to the construction of convolutional layers, and hence leads to circulant convolutional layers (CircConvs) and circulant CNNs. The circulant structure and models can be either trained from scratch or re-trained from a pre-trained non-circulant model, thereby making it very flexible for different training environments. Through extensive experiments, such strong structureimposing approach is proved to be able to substantially reduce the number of parameters of convolutional layers and enable significant saving of computational cost by using fast multiplication of the circulant tensor.

2021 ◽  
pp. 1-47
Author(s):  
Yeonjong Shin

Deep neural networks have been used in various machine learning applications and achieved tremendous empirical successes. However, training deep neural networks is a challenging task. Many alternatives have been proposed in place of end-to-end back-propagation. Layer-wise training is one of them, which trains a single layer at a time, rather than trains the whole layers simultaneously. In this paper, we study a layer-wise training using a block coordinate gradient descent (BCGD) for deep linear networks. We establish a general convergence analysis of BCGD and found the optimal learning rate, which results in the fastest decrease in the loss. We identify the effects of depth, width, and initialization. When the orthogonal-like initialization is employed, we show that the width of intermediate layers plays no role in gradient-based training beyond a certain threshold. Besides, we found that the use of deep networks could drastically accelerate convergence when it is compared to those of a depth 1 network, even when the computational cost is considered. Numerical examples are provided to justify our theoretical findings and demonstrate the performance of layer-wise training by BCGD.


2020 ◽  
Author(s):  
Jingbai Li ◽  
Patrick Reiser ◽  
André Eberhard ◽  
Pascal Friederich ◽  
Steven Lopez

<p>Photochemical reactions are being increasingly used to construct complex molecular architectures with mild and straightforward reaction conditions. Computational techniques are increasingly important to understand the reactivities and chemoselectivities of photochemical isomerization reactions because they offer molecular bonding information along the excited-state(s) of photodynamics. These photodynamics simulations are resource-intensive and are typically limited to 1–10 picoseconds and 1,000 trajectories due to high computational cost. Most organic photochemical reactions have excited-state lifetimes exceeding 1 picosecond, which places them outside possible computational studies. Westermeyr <i>et al.</i> demonstrated that a machine learning approach could significantly lengthen photodynamics simulation times for a model system, methylenimmonium cation (CH<sub>2</sub>NH<sub>2</sub><sup>+</sup>).</p><p>We have developed a Python-based code, Python Rapid Artificial Intelligence <i>Ab Initio</i> Molecular Dynamics (PyRAI<sup>2</sup>MD), to accomplish the unprecedented 10 ns <i>cis-trans</i> photodynamics of <i>trans</i>-hexafluoro-2-butene (CF<sub>3</sub>–CH=CH–CF<sub>3</sub>) in 3.5 days. The same simulation would take approximately 58 years with ground-truth multiconfigurational dynamics. We proposed an innovative scheme combining Wigner sampling, geometrical interpolations, and short-time quantum chemical trajectories to effectively sample the initial data, facilitating the adaptive sampling to generate an informative and data-efficient training set with 6,232 data points. Our neural networks achieved chemical accuracy (mean absolute error of 0.032 eV). Our 4,814 trajectories reproduced the S<sub>1</sub> half-life (60.5 fs), the photochemical product ratio (<i>trans</i>: <i>cis</i> = 2.3: 1), and autonomously discovered a pathway towards a carbene. The neural networks have also shown the capability of generalizing the full potential energy surface with chemically incomplete data (<i>trans</i> → <i>cis</i> but not <i>cis</i> → <i>trans</i> pathways) that may offer future automated photochemical reaction discoveries.</p>


2021 ◽  
Vol 11 (15) ◽  
pp. 6704
Author(s):  
Jingyong Cai ◽  
Masashi Takemoto ◽  
Yuming Qiu ◽  
Hironori Nakajo

Despite being heavily used in the training of deep neural networks (DNNs), multipliers are resource-intensive and insufficient in many different scenarios. Previous discoveries have revealed the superiority when activation functions, such as the sigmoid, are calculated by shift-and-add operations, although they fail to remove multiplications in training altogether. In this paper, we propose an innovative approach that can convert all multiplications in the forward and backward inferences of DNNs into shift-and-add operations. Because the model parameters and backpropagated errors of a large DNN model are typically clustered around zero, these values can be approximated by their sine values. Multiplications between the weights and error signals are transferred to multiplications of their sine values, which are replaceable with simpler operations with the help of the product to sum formula. In addition, a rectified sine activation function is utilized for further converting layer inputs into sine values. In this way, the original multiplication-intensive operations can be computed through simple add-and-shift operations. This trigonometric approximation method provides an efficient training and inference alternative for devices with insufficient hardware multipliers. Experimental results demonstrate that this method is able to obtain a performance close to that of classical training algorithms. The approach we propose sheds new light on future hardware customization research for machine learning.


Author(s):  
Chen Qi ◽  
Shibo Shen ◽  
Rongpeng Li ◽  
Zhifeng Zhao ◽  
Qing Liu ◽  
...  

AbstractNowadays, deep neural networks (DNNs) have been rapidly deployed to realize a number of functionalities like sensing, imaging, classification, recognition, etc. However, the computational-intensive requirement of DNNs makes it difficult to be applicable for resource-limited Internet of Things (IoT) devices. In this paper, we propose a novel pruning-based paradigm that aims to reduce the computational cost of DNNs, by uncovering a more compact structure and learning the effective weights therein, on the basis of not compromising the expressive capability of DNNs. In particular, our algorithm can achieve efficient end-to-end training that transfers a redundant neural network to a compact one with a specifically targeted compression rate directly. We comprehensively evaluate our approach on various representative benchmark datasets and compared with typical advanced convolutional neural network (CNN) architectures. The experimental results verify the superior performance and robust effectiveness of our scheme. For example, when pruning VGG on CIFAR-10, our proposed scheme is able to significantly reduce its FLOPs (floating-point operations) and number of parameters with a proportion of 76.2% and 94.1%, respectively, while still maintaining a satisfactory accuracy. To sum up, our scheme could facilitate the integration of DNNs into the common machine-learning-based IoT framework and establish distributed training of neural networks in both cloud and edge.


Electronics ◽  
2021 ◽  
Vol 10 (13) ◽  
pp. 1511
Author(s):  
Taylor Simons ◽  
Dah-Jye Lee

There has been a recent surge in publications related to binarized neural networks (BNNs), which use binary values to represent both the weights and activations in deep neural networks (DNNs). Due to the bitwise nature of BNNs, there have been many efforts to implement BNNs on ASICs and FPGAs. While BNNs are excellent candidates for these kinds of resource-limited systems, most implementations still require very large FPGAs or CPU-FPGA co-processing systems. Our work focuses on reducing the computational cost of BNNs even further, making them more efficient to implement on FPGAs. We target embedded visual inspection tasks, like quality inspection sorting on manufactured parts and agricultural produce sorting. We propose a new binarized convolutional layer, called the neural jet features layer, that learns well-known classic computer vision kernels that are efficient to calculate as a group. We show that on visual inspection tasks, neural jet features perform comparably to standard BNN convolutional layers while using less computational resources. We also show that neural jet features tend to be more stable than BNN convolution layers when training small models.


SLEEP ◽  
2021 ◽  
Vol 44 (Supplement_2) ◽  
pp. A164-A164
Author(s):  
Pahnwat Taweesedt ◽  
JungYoon Kim ◽  
Jaehyun Park ◽  
Jangwoon Park ◽  
Munish Sharma ◽  
...  

Abstract Introduction Obstructive sleep apnea (OSA) is a common sleep-related breathing disorder with an estimation of one billion people. Full-night polysomnography is considered the gold standard for OSA diagnosis. However, it is time-consuming, expensive and is not readily available in many parts of the world. Many screening questionnaires and scores have been proposed for OSA prediction with high sensitivity and low specificity. The present study is intended to develop models with various machine learning techniques to predict the severity of OSA by incorporating features from multiple questionnaires. Methods Subjects who underwent full-night polysomnography in Torr sleep center, Texas and completed 5 OSA screening questionnaires/scores were included. OSA was diagnosed by using Apnea-Hypopnea Index ≥ 5. We trained five different machine learning models including Deep Neural Networks with the scaled principal component analysis (DNN-PCA), Random Forest (RF), Adaptive Boosting classifier (ABC), and K-Nearest Neighbors classifier (KNC) and Support Vector Machine Classifier (SVMC). Training:Testing subject ratio of 65:35 was used. All features including demographic data, body measurement, snoring and sleepiness history were obtained from 5 OSA screening questionnaires/scores (STOP-BANG questionnaires, Berlin questionnaires, NoSAS score, NAMES score and No-Apnea score). Performance parametrics were used to compare between machine learning models. Results Of 180 subjects, 51.5 % of subjects were male with mean (SD) age of 53.6 (15.1). One hundred and nineteen subjects were diagnosed with OSA. Area Under the Receiver Operating Characteristic Curve (AUROC) of DNN-PCA, RF, ABC, KNC, SVMC, STOP-BANG questionnaire, Berlin questionnaire, NoSAS score, NAMES score, and No-Apnea score were 0.85, 0.68, 0.52, 0.74, 0.75, 0.61, 0.63, 0,61, 0.58 and 0,58 respectively. DNN-PCA showed the highest AUROC with sensitivity of 0.79, specificity of 0.67, positive-predictivity of 0.93, F1 score of 0.86, and accuracy of 0.77. Conclusion Our result showed that DNN-PCA outperforms OSA screening questionnaires, scores and other machine learning models. Support (if any):


2021 ◽  
Vol 11 (7) ◽  
pp. 3184
Author(s):  
Ismael Garrido-Muñoz  ◽  
Arturo Montejo-Ráez  ◽  
Fernando Martínez-Santiago  ◽  
L. Alfonso Ureña-López 

Deep neural networks are hegemonic approaches to many machine learning areas, including natural language processing (NLP). Thanks to the availability of large corpora collections and the capability of deep architectures to shape internal language mechanisms in self-supervised learning processes (also known as “pre-training”), versatile and performing models are released continuously for every new network design. These networks, somehow, learn a probability distribution of words and relations across the training collection used, inheriting the potential flaws, inconsistencies and biases contained in such a collection. As pre-trained models have been found to be very useful approaches to transfer learning, dealing with bias has become a relevant issue in this new scenario. We introduce bias in a formal way and explore how it has been treated in several networks, in terms of detection and correction. In addition, available resources are identified and a strategy to deal with bias in deep NLP is proposed.


Algorithms ◽  
2021 ◽  
Vol 14 (2) ◽  
pp. 39
Author(s):  
Carlos Lassance ◽  
Vincent Gripon ◽  
Antonio Ortega

Deep Learning (DL) has attracted a lot of attention for its ability to reach state-of-the-art performance in many machine learning tasks. The core principle of DL methods consists of training composite architectures in an end-to-end fashion, where inputs are associated with outputs trained to optimize an objective function. Because of their compositional nature, DL architectures naturally exhibit several intermediate representations of the inputs, which belong to so-called latent spaces. When treated individually, these intermediate representations are most of the time unconstrained during the learning process, as it is unclear which properties should be favored. However, when processing a batch of inputs concurrently, the corresponding set of intermediate representations exhibit relations (what we call a geometry) on which desired properties can be sought. In this work, we show that it is possible to introduce constraints on these latent geometries to address various problems. In more detail, we propose to represent geometries by constructing similarity graphs from the intermediate representations obtained when processing a batch of inputs. By constraining these Latent Geometry Graphs (LGGs), we address the three following problems: (i) reproducing the behavior of a teacher architecture is achieved by mimicking its geometry, (ii) designing efficient embeddings for classification is achieved by targeting specific geometries, and (iii) robustness to deviations on inputs is achieved via enforcing smooth variation of geometry between consecutive latent spaces. Using standard vision benchmarks, we demonstrate the ability of the proposed geometry-based methods in solving the considered problems.


Author(s):  
E. Yu. Shchetinin

The recognition of human emotions is one of the most relevant and dynamically developing areas of modern speech technologies, and the recognition of emotions in speech (RER) is the most demanded part of them. In this paper, we propose a computer model of emotion recognition based on an ensemble of bidirectional recurrent neural network with LSTM memory cell and deep convolutional neural network ResNet18. In this paper, computer studies of the RAVDESS database containing emotional speech of a person are carried out. RAVDESS-a data set containing 7356 files. Entries contain the following emotions: 0 – neutral, 1 – calm, 2 – happiness, 3 – sadness, 4 – anger, 5 – fear, 6 – disgust, 7 – surprise. In total, the database contains 16 classes (8 emotions divided into male and female) for a total of 1440 samples (speech only). To train machine learning algorithms and deep neural networks to recognize emotions, existing audio recordings must be pre-processed in such a way as to extract the main characteristic features of certain emotions. This was done using Mel-frequency cepstral coefficients, chroma coefficients, as well as the characteristics of the frequency spectrum of audio recordings. In this paper, computer studies of various models of neural networks for emotion recognition are carried out on the example of the data described above. In addition, machine learning algorithms were used for comparative analysis. Thus, the following models were trained during the experiments: logistic regression (LR), classifier based on the support vector machine (SVM), decision tree (DT), random forest (RF), gradient boosting over trees – XGBoost, convolutional neural network CNN, recurrent neural network RNN (ResNet18), as well as an ensemble of convolutional and recurrent networks Stacked CNN-RNN. The results show that neural networks showed much higher accuracy in recognizing and classifying emotions than the machine learning algorithms used. Of the three neural network models presented, the CNN + BLSTM ensemble showed higher accuracy.


Sign in / Sign up

Export Citation Format

Share Document