Revisiting LSTM Networks for Semi-Supervised Text Classification via Mixed Objective Function

Long-tailed Visual Recognition via Gaussian Clouded Logit Adjustment

10.36227/techrxiv.17031920.v1 ◽

2021 ◽

Author(s):

Mengke Li ◽

Yiu-ming Cheung ◽

Yang Lu

Keyword(s):

Visual Recognition ◽

Deep Neural Networks ◽

Sampling Strategy ◽

Cross Entropy ◽

Superior Performance ◽

Great Success ◽

Effective Number ◽

Entropy Loss ◽

Benchmark Datasets ◽

Varied Amplitude

Long-tailed data is still a big challenge for deep neural networks, even though they have achieved great success on balanced data. We observe that vanilla training on long-tailed data with cross-entropy loss makes the instance-rich head classes severely squeeze the spatial distribution of the tail classes, which leads to difficulty in classifying tail class samples. Furthermore, the original cross-entropy loss can only propagate gradient short-lively because the gradient in softmax form rapidly approaches zero as the logit difference increases. This phenomenon is called softmax saturation. It is unfavorable for training on balanced data, but can be utilized to adjust the validity of the samples in long-tailed data, thereby solving the distorted embedding space of long-tailed problems. To this end, this paper therefore proposes the Gaussian clouded logit adjustment by Gaussian perturbing different class logits with varied amplitude. We define the amplitude of perturbation as cloud size and set relatively large cloud sizes to tail classes. The large cloud size can reduce the softmax saturation and thereby making tail class samples more active as well as enlarging the embedding space. To alleviate the bias in the classifier, we accordingly propose the class-based effective number sampling strategy with classifier re-training. Extensive experiments on benchmark datasets validate the superior performance of the proposed method.

Download Full-text

Personal Interest Attention Graph Neural Networks for Session-Based Recommendation

Entropy ◽

10.3390/e23111500 ◽

2021 ◽

Vol 23 (11) ◽

pp. 1500

Author(s):

Xiangde Zhang ◽

Yuan Zhou ◽

Jianping Wang ◽

Xiaojun Lu

Keyword(s):

Neural Network ◽

Neural Networks ◽

Objective Function ◽

Cross Entropy ◽

Personal Interest ◽

Entropy Loss ◽

Convolutional Networks ◽

The Cross ◽

Graph Neural Networks

Session-based recommendations aim to predict a user’s next click based on the user’s current and historical sessions, which can be applied to shopping websites and APPs. Existing session-based recommendation methods cannot accurately capture the complex transitions between items. In addition, some approaches compress sessions into a fixed representation vector without taking into account the user’s interest preferences at the current moment, thus limiting the accuracy of recommendations. Considering the diversity of items and users’ interests, a personalized interest attention graph neural network (PIA-GNN) is proposed for session-based recommendation. This approach utilizes personalized graph convolutional networks (PGNN) to capture complex transitions between items, invoking an interest-aware mechanism to activate users’ interest in different items adaptively. In addition, a self-attention layer is used to capture long-term dependencies between items when capturing users’ long-term preferences. In this paper, the cross-entropy loss is used as the objective function to train our model. We conduct rich experiments on two real datasets, and the results show that PIA-GNN outperforms existing personalized session-aware recommendation methods.

Download Full-text

Can Cross Entropy Loss Be Robust to Label Noise?

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/305 ◽

2020 ◽

Author(s):

Lei Feng ◽

Senlin Shu ◽

Zhuoyi Lin ◽

Fengmao Lv ◽

Li Li ◽

...

Keyword(s):

Mean Squared Error ◽

Absolute Error ◽

Training Data ◽

Cross Entropy ◽

Loss Functions ◽

Label Noise ◽

Entropy Loss ◽

Squared Error ◽

Great Performance ◽

Benchmark Datasets

Trained with the standard cross entropy loss, deep neural networks can achieve great performance on correctly labeled data. However, if the training data is corrupted with label noise, deep models tend to overfit the noisy labels, thereby achieving poor generation performance. To remedy this issue, several loss functions have been proposed and demonstrated to be robust to label noise. Although most of the robust loss functions stem from Categorical Cross Entropy (CCE) loss, they fail to embody the intrinsic relationships between CCE and other loss functions. In this paper, we propose a general framework dubbed Taylor cross entropy loss to train deep models in the presence of label noise. Specifically, our framework enables to weight the extent of fitting the training labels by controlling the order of Taylor Series for CCE, hence it can be robust to label noise. In addition, our framework clearly reveals the intrinsic relationships between CCE and other loss functions, such as Mean Absolute Error (MAE) and Mean Squared Error (MSE). Moreover, we present a detailed theoretical analysis to certify the robustness of this framework. Extensive experimental results on benchmark datasets demonstrate that our proposed approach significantly outperforms the state-of-the-art counterparts.

Download Full-text

Long-tailed Visual Recognition via Gaussian Clouded Logit Adjustment

10.36227/techrxiv.17031920 ◽

2021 ◽

Author(s):

Mengke Li ◽

Yiu-ming Cheung ◽

Yang Lu

Keyword(s):

Visual Recognition ◽

Deep Neural Networks ◽

Sampling Strategy ◽

Cross Entropy ◽

Superior Performance ◽

Great Success ◽

Effective Number ◽

Entropy Loss ◽

Benchmark Datasets ◽

Varied Amplitude

Long-tailed data is still a big challenge for deep neural networks, even though they have achieved great success on balanced data. We observe that vanilla training on long-tailed data with cross-entropy loss makes the instance-rich head classes severely squeeze the spatial distribution of the tail classes, which leads to difficulty in classifying tail class samples. Furthermore, the original cross-entropy loss can only propagate gradient short-lively because the gradient in softmax form rapidly approaches zero as the logit difference increases. This phenomenon is called softmax saturation. It is unfavorable for training on balanced data, but can be utilized to adjust the validity of the samples in long-tailed data, thereby solving the distorted embedding space of long-tailed problems. To this end, this paper therefore proposes the Gaussian clouded logit adjustment by Gaussian perturbing different class logits with varied amplitude. We define the amplitude of perturbation as cloud size and set relatively large cloud sizes to tail classes. The large cloud size can reduce the softmax saturation and thereby making tail class samples more active as well as enlarging the embedding space. To alleviate the bias in the classifier, we accordingly propose the class-based effective number sampling strategy with classifier re-training. Extensive experiments on benchmark datasets validate the superior performance of the proposed method.

Download Full-text

On Parameter Adaptation in Softmax-Based Cross-Entropy Loss for Improved Convergence Speed and Accuracy in DNN-Based Speaker Recognition

10.21437/interspeech.2020-2264 ◽

2020 ◽

Author(s):

Magdalena Rybicka ◽

Konrad Kowalczyk

Keyword(s):

Speaker Recognition ◽

Convergence Speed ◽

Cross Entropy ◽

Parameter Adaptation ◽

Entropy Loss ◽

Speed And Accuracy

Download Full-text

Named Entity Recognition and Relation Extraction

ACM Computing Surveys ◽

10.1145/3445965 ◽

2021 ◽

Vol 54 (1) ◽

pp. 1-39

Author(s):

Zara Nasar ◽

Syed Waqar Jaffry ◽

Muhammad Kamran Malik

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Named Entity Recognition ◽

Relation Extraction ◽

The State ◽

Entity Recognition ◽

Joint Models ◽

Named Entity ◽

Textual Data ◽

Benchmark Datasets

With the advent of Web 2.0, there exist many online platforms that result in massive textual-data production. With ever-increasing textual data at hand, it is of immense importance to extract information nuggets from this data. One approach towards effective harnessing of this unstructured textual data could be its transformation into structured text. Hence, this study aims to present an overview of approaches that can be applied to extract key insights from textual data in a structured way. For this, Named Entity Recognition and Relation Extraction are being majorly addressed in this review study. The former deals with identification of named entities, and the latter deals with problem of extracting relation between set of entities. This study covers early approaches as well as the developments made up till now using machine learning models. Survey findings conclude that deep-learning-based hybrid and joint models are currently governing the state-of-the-art. It is also observed that annotated benchmark datasets for various textual-data generators such as Twitter and other social forums are not available. This scarcity of dataset has resulted into relatively less progress in these domains. Additionally, the majority of the state-of-the-art techniques are offline and computationally expensive. Last, with increasing focus on deep-learning frameworks, there is need to understand and explain the under-going processes in deep architectures.

Download Full-text

Improving Neural Response Diversity with Frequency-Aware Cross-Entropy Loss

The World Wide Web Conference on - WWW '19 ◽

10.1145/3308558.3313415 ◽

2019 ◽

Cited By ~ 4

Author(s):

Shaojie Jiang ◽

Pengjie Ren ◽

Christof Monz ◽

Maarten de Rijke

Keyword(s):

Neural Response ◽

Cross Entropy ◽

Entropy Loss ◽

Response Diversity

Download Full-text

An Analysis of the Softmax Cross Entropy Loss for Learning-to-Rank with Binary Relevance

Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval - ICTIR '19 ◽

10.1145/3341981.3344221 ◽

2019 ◽

Cited By ~ 2

Author(s):

Sebastian Bruch ◽

Xuanhui Wang ◽

Michael Bendersky ◽

Marc Najork

Keyword(s):

Learning To Rank ◽

Cross Entropy ◽

Entropy Loss ◽

Binary Relevance

Download Full-text

Cross-Entropy Loss for Recommending Efficient Fold-Over Technique

Journal of Systems Science and Complexity ◽

10.1007/s11424-020-9267-9 ◽

2021 ◽

Vol 34 (1) ◽

pp. 402-439

Author(s):

Lin-Chen Weng ◽

A. M. Elsawah ◽

Kai-Tai Fang

Keyword(s):

Cross Entropy ◽

Entropy Loss

Download Full-text

SGAN4AbSum: A Semantic-Enhanced Generative Adversarial Network for Abstractive Text Summarization

10.21203/rs.3.rs-648146/v1 ◽

2021 ◽

Author(s):

Tham Vo

Keyword(s):

Ground Truth ◽

Text Summarization ◽

Generative Adversarial Network ◽

Convolutional Network ◽

Training Strategy ◽

Adversarial Network ◽

Deep Recurrent Neural Network ◽

Benchmark Datasets ◽

Latent Representations ◽

Abstractive Summarization

Abstract In abstractive summarization task, most of proposed models adopt the deep recurrent neural network (RNN)-based encoder-decoder architecture to learn and generate meaningful summary for a given input document. However, most of recent RNN-based models always suffer the challenges related to the involvement of much capturing high-frequency/reparative phrases in long documents during the training process which leads to the outcome of trivial and generic summaries are generated. Moreover, the lack of thorough analysis on the sequential and long-range dependency relationships between words within different contexts while learning the textual representation also make the generated summaries unnatural and incoherent. To deal with these challenges, in this paper we proposed a novel semantic-enhanced generative adversarial network (GAN)-based approach for abstractive text summarization task, called as: SGAN4AbSum. We use an adversarial training strategy for our text summarization model in which train the generator and discriminator to simultaneously handle the summary generation and distinguishing the generated summary with the ground-truth one. The input of generator is the jointed rich-semantic and global structural latent representations of training documents which are achieved by applying a combined BERT and graph convolutional network (GCN) textual embedding mechanism. Extensive experiments in benchmark datasets demonstrate the effectiveness of our proposed SGAN4AbSum which achieve the competitive ROUGE-based scores in comparing with state-of-the-art abstractive text summarization baselines.

Download Full-text