Shaping the learning landscape in neural networks around wide flat minima

Learning in deep neural networks takes place by minimizing a nonconvex high-dimensional loss function, typically by a stochastic gradient descent (SGD) strategy. The learning process is observed to be able to find good minimizers without getting stuck in local critical points and such minimizers are often satisfactory at avoiding overfitting. How these 2 features can be kept under control in nonlinear devices composed of millions of tunable connections is a profound and far-reaching open question. In this paper we study basic nonconvex 1- and 2-layer neural network models that learn random patterns and derive a number of basic geometrical and algorithmic features which suggest some answers. We first show that the error loss function presents few extremely wide flat minima (WFM) which coexist with narrower minima and critical points. We then show that the minimizers of the cross-entropy loss function overlap with the WFM of the error loss. We also show examples of learning devices for which WFM do not exist. From the algorithmic perspective we derive entropy-driven greedy and message-passing algorithms that focus their search on wide flat regions of minimizers. In the case of SGD and cross-entropy loss, we show that a slow reduction of the norm of the weights along the learning process also leads to WFM. We corroborate the results by a numerical study of the correlations between the volumes of the minimizers, their Hessian, and their generalization performance on real data.

Download Full-text

IMPROVING DEEP MATRIX FACTORIZATION WITH NORMALIZED CROSS ENTROPY LOSS FUNCTION FOR GRAPH-BASED MOOC RECOMMENDATION

14th International Conference on Computer Graphics, Visualization, Computer Vision and Image Processing ◽

10.33965/bigdaci2020_202011l017 ◽

2020 ◽

Keyword(s):

Loss Function ◽

Matrix Factorization ◽

Cross Entropy ◽

Entropy Loss

Download Full-text

Hybrid Model Structure for Diabetic Retinopathy Classification

Journal of Healthcare Engineering ◽

10.1155/2020/8840174 ◽

2020 ◽

Vol 2020 ◽

pp. 1-9

Author(s):

Hao Liu ◽

Keqiang Yue ◽

Siyi Cheng ◽

Chengming Pan ◽

Jie Sun ◽

...

Keyword(s):

Diabetic Retinopathy ◽

Hybrid Model ◽

Network Models ◽

Classification Performance ◽

Cross Entropy ◽

Model Structure ◽

Training Process ◽

Neural Network Models ◽

Entropy Loss ◽

Model Structures

Diabetic retinopathy (DR) is one of the most common complications of diabetes and the main cause of blindness. The progression of the disease can be prevented by early diagnosis of DR. Due to differences in the distribution of medical conditions and low labor efficiency, the best time for diagnosis and treatment was missed, which results in impaired vision. Using neural network models to classify and diagnose DR can improve efficiency and reduce costs. In this work, an improved loss function and three hybrid model structures Hybrid-a, Hybrid-f, and Hybrid-c were proposed to improve the performance of DR classification models. EfficientNetB4, EfficientNetB5, NASNetLarge, Xception, and InceptionResNetV2 CNNs were chosen as the basic models. These basic models were trained using enhance cross-entropy loss and cross-entropy loss, respectively. The output of the basic models was used to train the hybrid model structures. Experiments showed that enhance cross-entropy loss can effectively accelerate the training process of the basic models and improve the performance of the models under various evaluation metrics. The proposed hybrid model structures can also improve DR classification performance. Compared with the best-performing results in the basic models, the accuracy of DR classification was improved from 85.44% to 86.34%, the sensitivity was improved from 98.48% to 98.77%, the specificity was improved from 71.82% to 74.76%, the precision was improved from 90.27% to 91.37%, and the F1 score was improved from 93.62% to 93.9% by using hybrid model structures.

Download Full-text

Local Geometry of Cross Entropy Loss in Learning One-Hidden-Layer Neural Networks

2019 IEEE International Symposium on Information Theory (ISIT) ◽

10.1109/isit.2019.8849289 ◽

2019 ◽

Cited By ~ 1

Author(s):

Haoyu Fu ◽

Yuejie Chi ◽

Yingbin Liang

Keyword(s):

Neural Networks ◽

Cross Entropy ◽

Local Geometry ◽

Entropy Loss ◽

Hidden Layer

Download Full-text

Approximating the Gradient of Cross-Entropy Loss Function

IEEE Access ◽

10.1109/access.2020.3001531 ◽

2020 ◽

Vol 8 ◽

pp. 111626-111635

Author(s):

Li Li ◽

Milos Doroslovacki ◽

Murray H. Loew

Keyword(s):

Loss Function ◽

Cross Entropy ◽

Entropy Loss

Download Full-text

Adaptive Hybrid Higher Order Neural Networks for Prediction of Stock Market Behavior

Advances in Computational Intelligence and Robotics - Applied Artificial Higher Order Neural Networks for Control and Recognition ◽

10.4018/978-1-5225-0063-6.ch007 ◽

2016 ◽

pp. 174-191 ◽

Cited By ~ 2

Author(s):

Sarat Chandra Nayak ◽

Bijan Bihari Misra ◽

Himansu Sekhar Behera

Keyword(s):

Neural Networks ◽

Stock Market ◽

Learning Process ◽

Real Life ◽

Network Models ◽

Higher Order ◽

Market Behavior ◽

Neural Network Models ◽

Efficient Prediction ◽

Higher Order Neural Networks

This chapter presents two higher order neural networks (HONN) for efficient prediction of stock market behavior. The models include Pi-Sigma, and Sigma-Pi higher order neural network models. Along with the traditional gradient descent learning, how the evolutionary computation technique such as genetic algorithm (GA) can be used effectively for the learning process is also discussed here. The learning process is made adaptive to handle the noise and uncertainties associated with stock market data. Further, different prediction approaches are discussed here and application of HONN for time series forecasting is illustrated with real life data taken from a number of stock markets across the globe.

Download Full-text

MPCE: A Maximum Probability Based Cross Entropy Loss Function for Neural Network Classification

IEEE Access ◽

10.1109/access.2019.2946264 ◽

2019 ◽

Vol 7 ◽

pp. 146331-146341 ◽

Cited By ~ 4

Author(s):

Yangfan Zhou ◽

Xin Wang ◽

Mingchuan Zhang ◽

Junlong Zhu ◽

Ruijuan Zheng ◽

...

Keyword(s):

Neural Network ◽

Loss Function ◽

Cross Entropy ◽

Maximum Probability ◽

Entropy Loss ◽

Neural Network Classification

Download Full-text

AttentionBased Deep Feature Fusion for the Scene Classification of HighResolution Remote Sensing Images

Remote Sensing ◽

10.3390/rs11171996 ◽

2019 ◽

Vol 11 (17) ◽

pp. 1996 ◽

Cited By ~ 7

Author(s):

Zhu ◽

Yan ◽

Mo ◽

Liu

Keyword(s):

Remote Sensing ◽

Loss Function ◽

Feature Fusion ◽

Cross Entropy ◽

Scene Classification ◽

Remote Sensing Images ◽

Graphic Processing Units ◽

Entropy Loss ◽

Deep Feature

Scene classification of highresolution remote sensing images (HRRSI) is one of the most important means of landcover classification. Deep learning techniques, especially the convolutional neural network (CNN) have been widely applied to the scene classification of HRRSI due to the advancement of graphic processing units (GPU). However, they tend to extract features from the whole images rather than discriminative regions. The visual attention mechanism can force the CNN to focus on discriminative regions, but it may suffer from the influence of intraclass diversity and repeated texture. Motivated by these problems, we propose an attention-based deep feature fusion (ADFF) framework that constitutes three parts, namely attention maps generated by Gradientweighted Class Activation Mapping (GradCAM), a multiplicative fusion of deep features and the centerbased cross-entropy loss function. First of all, we propose to make attention maps generated by GradCAM as an explicit input in order to force the network to concentrate on discriminative regions. Then, deep features derived from original images and attention maps are proposed to be fused by multiplicative fusion in order to consider both improved abilities to distinguish scenes of repeated texture and the salient regions. Finally, the centerbased cross-entropy loss function that utilizes both the cross-entropy loss and center loss function is proposed to backpropagate fused features so as to reduce the effect of intraclass diversity on feature representations. The proposed ADFF architecture is tested on three benchmark datasets to show its performance in scene classification. The experiments confirm that the proposed method outperforms most competitive scene classification methods with an average overall accuracy of 94% under different training ratios.

Download Full-text

Building Outline Extraction Directly Using the U2-Net Semantic Segmentation Model from High-Resolution Aerial Images and a Comparison Study

Remote Sensing ◽

10.3390/rs13163187 ◽

2021 ◽

Vol 13 (16) ◽

pp. 3187

Author(s):

Xinchun Wei ◽

Xing Li ◽

Wei Liu ◽

Lianpeng Zhang ◽

Dayu Cheng ◽

...

Keyword(s):

Edge Detection ◽

Loss Function ◽

Semantic Segmentation ◽

Cross Entropy ◽

Aerial Images ◽

Building Extraction ◽

Precise Position ◽

Entropy Loss ◽

Imbalance Problem ◽

Outline Extraction

Deep learning techniques have greatly improved the efficiency and accuracy of building extraction using remote sensing images. However, high-quality building outline extraction results that can be applied to the field of surveying and mapping remain a significant challenge. In practice, most building extraction tasks are manually executed. Therefore, an automated procedure of a building outline with a precise position is required. In this study, we directly used the U2-net semantic segmentation model to extract the building outline. The extraction results showed that the U2-net model can provide the building outline with better accuracy and a more precise position than other models based on comparisons with semantic segmentation models (Segnet, U-Net, and FCN) and edge detection models (RCF, HED, and DexiNed) applied for two datasets (Nanjing and Wuhan University (WHU)). We also modified the binary cross-entropy loss function in the U2-net model into a multiclass cross-entropy loss function to directly generate the binary map with the building outline and background. We achieved a further refined outline of the building, thus showing that with the modified U2-net model, it is not necessary to use non-maximum suppression as a post-processing step, as in the other edge detection models, to refine the edge map. Moreover, the modified model is less affected by the sample imbalance problem. Finally, we created an image-to-image program to further validate the modified U2-net semantic segmentation model for building outline extraction.

Download Full-text

Personal Interest Attention Graph Neural Networks for Session-Based Recommendation

Entropy ◽

10.3390/e23111500 ◽

2021 ◽

Vol 23 (11) ◽

pp. 1500

Author(s):

Xiangde Zhang ◽

Yuan Zhou ◽

Jianping Wang ◽

Xiaojun Lu

Keyword(s):

Neural Network ◽

Neural Networks ◽

Objective Function ◽

Cross Entropy ◽

Personal Interest ◽

Entropy Loss ◽

Convolutional Networks ◽

The Cross ◽

Graph Neural Networks

Session-based recommendations aim to predict a user’s next click based on the user’s current and historical sessions, which can be applied to shopping websites and APPs. Existing session-based recommendation methods cannot accurately capture the complex transitions between items. In addition, some approaches compress sessions into a fixed representation vector without taking into account the user’s interest preferences at the current moment, thus limiting the accuracy of recommendations. Considering the diversity of items and users’ interests, a personalized interest attention graph neural network (PIA-GNN) is proposed for session-based recommendation. This approach utilizes personalized graph convolutional networks (PGNN) to capture complex transitions between items, invoking an interest-aware mechanism to activate users’ interest in different items adaptively. In addition, a self-attention layer is used to capture long-term dependencies between items when capturing users’ long-term preferences. In this paper, the cross-entropy loss is used as the objective function to train our model. We conduct rich experiments on two real datasets, and the results show that PIA-GNN outperforms existing personalized session-aware recommendation methods.

Download Full-text

A system of analysis and prediction of the loss of forging tool material applying artificial neural networks

Journal of Mining and Metallurgy Section B Metallurgy ◽

10.2298/jmmb180417023h ◽

2018 ◽

Vol 54 (3) ◽

pp. 323-337 ◽

Cited By ~ 2

Author(s):

M. Hawryluk ◽

B. Mrzyglod

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Surface Treatment ◽

Learning Process ◽

Tool Material ◽

Single Layer ◽

Quality Parameters ◽

Operating Conditions ◽

Neural Network Models ◽

Artificial Neural

The article presents the use of artificial neural networks (ANN) to build a system of analysis and forecasting of the durability of forging tools and the process of acquiring the source knowledge necessary for the network learning process. In particular, the study focuses on the prediction of the geometrical loss of the tool material after different surface treatment variants.The methodology of developing neural network models and their quality parameters is also presented. The standard single-layer MLP networks were used here; their quality parameters are at a high level and the results presented with their participation give satisfactory results in line with technological practice. The data used in the learning process come from extensive comprehensive performance tests of forging tools operating under extreme operating conditions (cyclic mechanical and thermal loads). The parameterization of the factors important for the selected forging process was made and a database was developed, including 900 knowledge vectors, each of which provided information on the size of the geometrical loss of the tool material (explained variables). The value of wear was determined for the set values of explanatory variables such as: number of forgings, pressure, temperature on selected tool surfaces, friction path and the variant of the applied surface treatment. The results presented in the study, confirmed by expert technologists, have a clear applicational character, because based on the presented solutions, the optimal treatment can be chosen and the appropriate preventive measures applied, which will extend the service life.

Download Full-text