Text classification by untrained sentence embeddings

2021 ◽  
Vol 14 (2) ◽  
pp. 245-259
Author(s):  
Daniele Di Sarli ◽  
Claudio Gallicchio ◽  
Alessio Micheli

Recurrent Neural Networks (RNNs) represent a natural paradigm for modeling sequential data like text written in natural language. In fact, RNNs and their variations have long been the architecture of choice in many applications, however in practice they require the use of labored architectures (such as gating mechanisms) and computationally heavy training processes. In this paper we address the question of whether it is possible to generate sentence embeddings via completely untrained recurrent dynamics, on top of which to apply a simple learning algorithm for text classification. This would allow to obtain extremely efficient models in terms of training time. Our work investigates the extent to which this approach can be used, by analyzing the results on different tasks. Finally, we show that, within certain limits, it is possible to build extremely efficient models for text classification that remain competitive in accuracy with reference models in the state-of-the-art.

Author(s):  
Sungrae Park ◽  
Kyungwoo Song ◽  
Mingi Ji ◽  
Wonsung Lee ◽  
Il-Chul Moon

Successful application processing sequential data, such as text and speech, requires an improved generalization performance of recurrent neural networks (RNNs). Dropout techniques for RNNs were introduced to respond to these demands, but we conjecture that the dropout on RNNs could have been improved by adopting the adversarial concept. This paper investigates ways to improve the dropout for RNNs by utilizing intentionally generated dropout masks. Specifically, the guided dropout used in this research is called as adversarial dropout, which adversarially disconnects neurons that are dominantly used to predict correct targets over time. Our analysis showed that our regularizer, which consists of a gap between the original and the reconfigured RNNs, was the upper bound of the gap between the training and the inference phases of the random dropout. We demonstrated that minimizing our regularizer improved the effectiveness of the dropout for RNNs on sequential MNIST tasks, semi-supervised text classification tasks, and language modeling tasks.


2020 ◽  
Author(s):  
Yuyao Yang ◽  
Shuangjia Zheng ◽  
Shimin Su ◽  
Jun Xu ◽  
Hongming Chen

Fragment based drug design represents a promising drug discovery paradigm complimentary to the traditional HTS based lead generation strategy. How to link fragment structures to increase compound affinity is remaining a challenge task in this paradigm. Hereby a novel deep generative model (AutoLinker) for linking fragments is developed with the potential for applying in the fragment-based lead generation scenario. The state-of-the-art transformer architecture was employed to learn the linker grammar and generate novel linker. Our results show that, given starting fragments and user customized linker constraints, our AutoLinker model can design abundant drug-like molecules fulfilling these constraints and its performance was superior to other reference models. Moreover, several examples were showcased that AutoLinker can be useful tools for carrying out drug design tasks such as fragment linking, lead optimization and scaffold hopping.


2020 ◽  
Author(s):  
Dean Sumner ◽  
Jiazhen He ◽  
Amol Thakkar ◽  
Ola Engkvist ◽  
Esben Jannik Bjerrum

<p>SMILES randomization, a form of data augmentation, has previously been shown to increase the performance of deep learning models compared to non-augmented baselines. Here, we propose a novel data augmentation method we call “Levenshtein augmentation” which considers local SMILES sub-sequence similarity between reactants and their respective products when creating training pairs. The performance of Levenshtein augmentation was tested using two state of the art models - transformer and sequence-to-sequence based recurrent neural networks with attention. Levenshtein augmentation demonstrated an increase performance over non-augmented, and conventionally SMILES randomization augmented data when used for training of baseline models. Furthermore, Levenshtein augmentation seemingly results in what we define as <i>attentional gain </i>– an enhancement in the pattern recognition capabilities of the underlying network to molecular motifs.</p>


Author(s):  
Ravi Kauthale

Abstract: The aim here is to explore the methods to automate the labelling of the information that is present in bug trackers and client support systems. This is majorly based on the classification of the content depending on some criteria e.g., priority or product area. Labelling of the tickets is important as it helps in effective and efficient handling of the ticket and help is quicker and comprehensive resolution of the tickets. The main goal of the project is to analyze the existing methodologies used for automated labelling and then use a newer approach and compare the results. The existing methodologies are the ones which are based of the neural networks and without neural networks. In this project, a newer approach based on the recurrent neural networks which are based on the hierarchical attention paradigm will be used. Keywords: Automate Labeling, Recurrent Neural Networks, Hierarchical Attention, Multi-class Text Classification, GRU


2018 ◽  
Vol 8 (12) ◽  
pp. 2416 ◽  
Author(s):  
Ansi Zhang ◽  
Honglei Wang ◽  
Shaobo Li ◽  
Yuxin Cui ◽  
Zhonghao Liu ◽  
...  

Prognostics, such as remaining useful life (RUL) prediction, is a crucial task in condition-based maintenance. A major challenge in data-driven prognostics is the difficulty of obtaining a sufficient number of samples of failure progression. However, for traditional machine learning methods and deep neural networks, enough training data is a prerequisite to train good prediction models. In this work, we proposed a transfer learning algorithm based on Bi-directional Long Short-Term Memory (BLSTM) recurrent neural networks for RUL estimation, in which the models can be first trained on different but related datasets and then fine-tuned by the target dataset. Extensive experimental results show that transfer learning can in general improve the prediction models on the dataset with a small number of samples. There is one exception that when transferring from multi-type operating conditions to single operating conditions, transfer learning led to a worse result.


Author(s):  
Weixiang Xu ◽  
Xiangyu He ◽  
Tianli Zhao ◽  
Qinghao Hu ◽  
Peisong Wang ◽  
...  

Large neural networks are difficult to deploy on mobile devices because of intensive computation and storage. To alleviate it, we study ternarization, a balance between efficiency and accuracy that quantizes both weights and activations into ternary values. In previous ternarized neural networks, a hard threshold Δ is introduced to determine quantization intervals. Although the selection of Δ greatly affects the training results, previous works estimate Δ via an approximation or treat it as a hyper-parameter, which is suboptimal. In this paper, we present the Soft Threshold Ternary Networks (STTN), which enables the model to automatically determine quantization intervals instead of depending on a hard threshold. Concretely, we replace the original ternary kernel with the addition of two binary kernels at training time, where ternary values are determined by the combination of two corresponding binary values. At inference time, we add up the two binary kernels to obtain a single ternary kernel. Our method dramatically outperforms current state-of-the-arts, lowering the performance gap between full-precision networks and extreme low bit networks. Experiments on ImageNet with AlexNet (Top-1 55.6%), ResNet-18 (Top-1 66.2%) achieves new state-of-the-art.


Author(s):  
Yun-Peng Liu ◽  
Ning Xu ◽  
Yu Zhang ◽  
Xin Geng

The performances of deep neural networks (DNNs) crucially rely on the quality of labeling. In some situations, labels are easily corrupted, and therefore some labels become noisy labels. Thus, designing algorithms that deal with noisy labels is of great importance for learning robust DNNs. However, it is difficult to distinguish between clean labels and noisy labels, which becomes the bottleneck of many methods. To address the problem, this paper proposes a novel method named Label Distribution based Confidence Estimation (LDCE). LDCE estimates the confidence of the observed labels based on label distribution. Then, the boundary between clean labels and noisy labels becomes clear according to confidence scores. To verify the effectiveness of the method, LDCE is combined with the existing learning algorithm to train robust DNNs. Experiments on both synthetic and real-world datasets substantiate the superiority of the proposed algorithm against state-of-the-art methods.


Author(s):  
Maosheng Guo ◽  
Yu Zhang ◽  
Ting Liu

Natural Language Inference (NLI) is an active research area, where numerous approaches based on recurrent neural networks (RNNs), convolutional neural networks (CNNs), and self-attention networks (SANs) has been proposed. Although obtaining impressive performance, previous recurrent approaches are hard to train in parallel; convolutional models tend to cost more parameters, while self-attention networks are not good at capturing local dependency of texts. To address this problem, we introduce a Gaussian prior to selfattention mechanism, for better modeling the local structure of sentences. Then we propose an efficient RNN/CNN-free architecture named Gaussian Transformer for NLI, which consists of encoding blocks modeling both local and global dependency, high-order interaction blocks collecting the evidence of multi-step inference, and a lightweight comparison block saving lots of parameters. Experiments show that our model achieves new state-of-the-art performance on both SNLI and MultiNLI benchmarks with significantly fewer parameters and considerably less training time. Besides, evaluation using the Hard NLI datasets demonstrates that our approach is less affected by the undesirable annotation artifacts.


2019 ◽  
Vol 9 (11) ◽  
pp. 2347 ◽  
Author(s):  
Hannah Kim ◽  
Young-Seob Jeong

As the number of textual data is exponentially increasing, it becomes more important to develop models to analyze the text data automatically. The texts may contain various labels such as gender, age, country, sentiment, and so forth. Using such labels may bring benefits to some industrial fields, so many studies of text classification have appeared. Recently, the Convolutional Neural Network (CNN) has been adopted for the task of text classification and has shown quite successful results. In this paper, we propose convolutional neural networks for the task of sentiment classification. Through experiments with three well-known datasets, we show that employing consecutive convolutional layers is effective for relatively longer texts, and our networks are better than other state-of-the-art deep learning models.


Sign in / Sign up

Export Citation Format

Share Document