How do loss functions impact the performance of graph neural networks?

Graph neural networks (GNNs) have become the de facto approach for supervised learning on graph data.To train these networks, most practitioners employ the categorical cross-entropy (CE) loss. We can attribute this largely to the probabilistic interpretability of models trained using CE, since it corresponds to the negative log of the categorical/softmax likelihood.We can attribute this largely to the probabilistic interpretation of CE, since it corresponds to the negative log of the categorical/softmax likelihood.Nonetheless, recent works have shown that deep learning models can benefit from adopting other loss functions. For instance, neural networks trained with symmetric losses (e.g., mean absolute error) are robust to label noise. Nonetheless, loss functions are a modeling choice and other training criteria can be employed — e.g., hinge loss and mean absolute error (MAE). Perhaps surprisingly, the effect of using different losses on GNNs has not been explored. In this preliminary work, we gauge the impact of different loss functions to the performance of GNNs for node classification under i) noisy labels and ii) different sample sizes. In contrast to findings on Euclidean domains, our results for GNNs show that there is no significant difference between models trained with CE and other classical loss functions on both aforementioned scenarios.

Download Full-text

Graph Self Supervised Learning: the BT, the HSIC, and the VICReg

10.31219/osf.io/tvmdu ◽

2021 ◽

Author(s):

Sayan Nag

Keyword(s):

Neural Networks ◽

Supervised Learning ◽

Loss Function ◽

Data Augmentation ◽

Learning Strategy ◽

Loss Functions ◽

Augmentation Strategies ◽

Batch Sizes ◽

Graph Neural Networks ◽

The Impact

Self-supervised learning and pre-training strategies have developed over the last few years especially for Convolutional Neural Networks (CNNs). Recently application of such methods can also be noticed for Graph Neural Networks (GNNs). In this paper, we have used a graph based self-supervised learning strategy with different loss functions (Barlow Twins[? ], HSIC[? ], VICReg[? ]) which have shown promising results when applied with CNNs previously. We have also proposed a hybrid loss function combining the advantages of VICReg and HSIC and called it as VICRegHSIC. The performance of these aforementioned methods have been compared when applied to two different datasets namely MUTAG and PROTEINS. Moreover, the impact of different batch sizes, projector dimensions and data augmentation strategies have also been explored. The results are preliminary and we will be continuing to explore with other datasets.

Download Full-text

The interplay between communities and homophily in semi-supervised classification using graph neural networks

Applied Network Science ◽

10.1007/s41109-021-00423-1 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Hussain Hussain ◽

Tomislav Duricic ◽

Elisabeth Lex ◽

Denis Helic ◽

Roman Kern

Keyword(s):

Neural Networks ◽

Community Structure ◽

Model Selection ◽

Graph Structure ◽

Information Theoretic ◽

Node Classification ◽

Selection For ◽

Graph Neural Networks ◽

Node Labels ◽

The Impact

AbstractGraph Neural Networks (GNNs) are effective in many applications. Still, there is a limited understanding of the effect of common graph structures on the learning process of GNNs. To fill this gap, we study the impact of community structure and homophily on the performance of GNNs in semi-supervised node classification on graphs. Our methodology consists of systematically manipulating the structure of eight datasets, and measuring the performance of GNNs on the original graphs and the change in performance in the presence and the absence of community structure and/or homophily. Our results show the major impact of both homophily and communities on the classification accuracy of GNNs, and provide insights on their interplay. In particular, by analyzing community structure and its correlation with node labels, we are able to make informed predictions on the suitability of GNNs for classification on a given graph. Using an information-theoretic metric for community-label correlation, we devise a guideline for model selection based on graph structure. With our work, we provide insights on the abilities of GNNs and the impact of common network phenomena on their performance. Our work improves model selection for node classification in semi-supervised settings.

Download Full-text

Revisiting Graph Neural Networks for Node Classification in Heterogeneous Graphs

2021 IEEE International Conference on Multimedia and Expo (ICME) ◽

10.1109/icme51207.2021.9428354 ◽

2021 ◽

Author(s):

Ye Tao ◽

Ying Li ◽

Zhonghai Wu

Keyword(s):

Neural Networks ◽

Node Classification ◽

Graph Neural Networks

Download Full-text

Validating the usefulness of sectorwise regression of visual field in the central 10°

British Journal of Ophthalmology ◽

10.1136/bjophthalmol-2020-317391 ◽

2021 ◽

pp. bjophthalmol-2020-317391

Author(s):

Takashi Omoto ◽

Hiroshi Murata ◽

Yuri Fujino ◽

Masato Matsuura ◽

Takehiro Yamashita ◽

...

Keyword(s):

Visual Field ◽

Mean Absolute Error ◽

Open Angle Glaucoma ◽

Absolute Error ◽

Open Angle ◽

Clustering Method ◽

Significant Difference ◽

The Mean ◽

Test Points

AimTo evaluate the usefulness of the application of the clustering method to the trend analysis (sectorwise regression) in comparison with the pointwise linear regression (PLR).MethodsThis study included 153 eyes of 101 patients with open-angle glaucoma. With PLR, the total deviation (TD) values of the 10th visual field (VF) were predicted using the shorter VF sequences (from first 3 to 9) by extrapolating TD values against time in a pointwise manner. Then, 68 test points were stratified into 29 sectors. In each sector, the mean of TD values was calculated and allocated to all test points belonging to the sector. Subsequently, the TD values of the 10th VF were predicted by extrapolating the allocated TD value against time in a pointwise manner. Similar analyses were conducted to predict the 11th–16th VFs using the first 10 VFs.ResultsWhen predicting the 10th VF using the shorter sequences, the mean absolute error (MAE) values were significantly smaller in the sectorwise regression than in PLR. When predicting from the 11th and 16th VFs using the first 10 VFs, the MAE values were significantly larger in the sectorwise regression than in PLR when predicting the 11th VF; however, no significant difference was observed with other VF predictions.ConclusionAccurate prediction was achieved using the sectorwise regression, in particular when a small number of VFs were used in the prediction. The accuracy of the sectorwise regression was not hampered in longer follow-up compared with PLR.

Download Full-text

Convolutional Neural Networks for Decoding of Covert Attention Focus and Saliency Maps for EEG Feature Visualization

10.1101/614784 ◽

2019 ◽

Author(s):

Amr Farahat ◽

Christoph Reichert ◽

Catherine M. Sweeney-Reed ◽

Hermann Hinrichs

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Cognitive Task ◽

Event Related Potentials ◽

Eeg Signal ◽

Linear Discriminant ◽

Saliency Maps ◽

Significant Difference ◽

Related Potentials ◽

The Impact

ABSTRACTObjectiveConvolutional neural networks (CNNs) have proven successful as function approximators and have therefore been used for classification problems including electroencephalography (EEG) signal decoding for brain-computer interfaces (BCI). Artificial neural networks, however, are considered black boxes, because they usually have thousands of parameters, making interpretation of their internal processes challenging. Here we systematically evaluate the use of CNNs for EEG signal decoding and investigate a method for visualizing the CNN model decision process.ApproachWe developed a CNN model to decode the covert focus of attention from EEG event-related potentials during object selection. We compared the CNN and the commonly used linear discriminant analysis (LDA) classifier performance, applied to datasets with different dimensionality, and analyzed transfer learning capacity. Moreover, we validated the impact of single model components by systematically altering the model. Furthermore, we investigated the use of saliency maps as a tool for visualizing the spatial and temporal features driving the model output.Main resultsThe CNN model and the LDA classifier achieved comparable accuracy on the lower-dimensional dataset, but CNN exceeded LDA performance significantly on the higher-dimensional dataset (without hypothesis-driven preprocessing), achieving an average decoding accuracy of 90.7% (chance level = 8.3%). Parallel convolutions, tanh or ELU activation functions, and dropout regularization proved valuable for model performance, whereas the sequential convolutions, ReLU activation function, and batch normalization components, reduced accuracy or yielded no significant difference. Saliency maps revealed meaningful features, displaying the typical spatial distribution and latency of the P300 component expected during this task.SignificanceFollowing systematic evaluation, we provide recommendations for when and how to use CNN models in EEG decoding. Moreover, we propose a new approach for investigating the neural correlates of a cognitive task by training CNN models on raw high-dimensional EEG data and utilizing saliency maps for relevant feature extraction.

Download Full-text

On transformative adaptive activation functions in neural networks for gene expression inference

PLoS ONE ◽

10.1371/journal.pone.0243915 ◽

2021 ◽

Vol 16 (1) ◽

pp. e0243915

Author(s):

Vladimír Kunc ◽

Jiří Kléma

Keyword(s):

Gene Expression ◽

Neural Networks ◽

Mean Absolute Error ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Cost Effective ◽

Absolute Error ◽

Activation Function ◽

Training Procedure ◽

Activation Functions

Gene expression profiling was made more cost-effective by the NIH LINCS program that profiles only ∼1, 000 selected landmark genes and uses them to reconstruct the whole profile. The D–GEX method employs neural networks to infer the entire profile. However, the original D–GEX can be significantly improved. We propose a novel transformative adaptive activation function that improves the gene expression inference even further and which generalizes several existing adaptive activation functions. Our improved neural network achieves an average mean absolute error of 0.1340, which is a significant improvement over our reimplementation of the original D–GEX, which achieves an average mean absolute error of 0.1637. The proposed transformative adaptive function enables a significantly more accurate reconstruction of the full gene expression profiles with only a small increase in the complexity of the model and its training procedure compared to other methods.

Download Full-text

The Impact of Global Structural Information in Graph Neural Networks Applications

Data ◽

10.3390/data7010010 ◽

2022 ◽

Vol 7 (1) ◽

pp. 10

Author(s):

Davide Buffelli ◽

Fabio Vandin

Keyword(s):

Neural Networks ◽

Structural Information ◽

Global Information ◽

Graph Structure ◽

Practical Applications ◽

Average Accuracy ◽

Graph Neural Networks ◽

The Impact ◽

Regularization Strategy ◽

Node Embeddings

Graph Neural Networks (GNNs) rely on the graph structure to define an aggregation strategy where each node updates its representation by combining information from its neighbours. A known limitation of GNNs is that, as the number of layers increases, information gets smoothed and squashed and node embeddings become indistinguishable, negatively affecting performance. Therefore, practical GNN models employ few layers and only leverage the graph structure in terms of limited, small neighbourhoods around each node. Inevitably, practical GNNs do not capture information depending on the global structure of the graph. While there have been several works studying the limitations and expressivity of GNNs, the question of whether practical applications on graph structured data require global structural knowledge or not remains unanswered. In this work, we empirically address this question by giving access to global information to several GNN models, and observing the impact it has on downstream performance. Our results show that global information can in fact provide significant benefits for common graph-related tasks. We further identify a novel regularization strategy that leads to an average accuracy improvement of more than 5% on all considered tasks.

Download Full-text

Graph Few-Shot Learning via Knowledge Transfer

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6142 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6656-6663 ◽

Cited By ~ 4

Author(s):

Huaxiu Yao ◽

Chuxu Zhang ◽

Ying Wei ◽

Meng Jiang ◽

Suhang Wang ◽

...

Keyword(s):

Neural Networks ◽

Receptive Field ◽

Classification Accuracy ◽

Structural Knowledge ◽

Challenging Problem ◽

Satisfactory Performance ◽

Proposed Model ◽

Node Classification ◽

Graph Neural Networks ◽

Embedding Function

Towards the challenging problem of semi-supervised node classification, there have been extensive studies. As a frontier, Graph Neural Networks (GNNs) have aroused great interest recently, which update the representation of each node by aggregating information of its neighbors. However, most GNNs have shallow layers with a limited receptive field and may not achieve satisfactory performance especially when the number of labeled nodes is quite small. To address this challenge, we innovatively propose a graph few-shot learning (GFL) algorithm that incorporates prior knowledge learned from auxiliary graphs to improve classification accuracy on the target graph. Specifically, a transferable metric space characterized by a node embedding and a graph-specific prototype embedding function is shared between auxiliary graphs and the target, facilitating the transfer of structural knowledge. Extensive experiments and ablation studies on four real-world graph datasets demonstrate the effectiveness of our proposed model and the contribution of each component.

Download Full-text

Impact of Carriage Crowding Level on Bus Dwell Time: Modelling and Analysis

Journal of Advanced Transportation ◽

10.1155/2020/6530530 ◽

2020 ◽

Vol 2020 ◽

pp. 1-11

Author(s):

Yiming Bie ◽

Yunhao Wang ◽

Le Zhang

Keyword(s):

Dwell Time ◽

Mean Absolute Error ◽

Time Estimation ◽

Absolute Error ◽

Regression Method ◽

Travel Time Estimation ◽

The Mean ◽

Bus Schedule ◽

The Impact ◽

Estimation Models

This paper develops two types of estimation models to quantify the impacts of carriage crowding level on bus dwell time. The first model (model I) takes the crowding level and the number of alighting and boarding passengers into consideration and estimates the alighting time and boarding time, respectively. The second model (model II) adopts almost the same regression method, except that the impact of crowding on dwell time is neglected. The analysis was conducted along two major bus routes in Harbin, China, by collecting 640 groups of dwell times under crowded condition manually. Compared with model II, the mean absolute error (MAE) of model I is reduced by 137.51%, which indicates that the accuracy of bus dwell time estimation could be highly improved by introducing carriage crowding level into the model. Meanwhile, the MAE of model I is about 3.9 seconds, which is acceptable in travel time estimation and bus schedule.

Download Full-text

Power Function Error Initialization Can Improve Convergence of Backpropagation Learning in Neural Networks for Classification

Neural Computation ◽

10.1162/neco_a_01407 ◽

2021 ◽

pp. 1-33

Author(s):

Andreas Knoblauch

Keyword(s):

Neural Networks ◽

Training Data ◽

Cross Entropy ◽

Loss Functions ◽

Power Functions ◽

Output Layer ◽

New Family ◽

Target Values ◽

The Difference ◽

Backpropagation Learning

Abstract Abstract supervised learning corresponds to minimizing a loss or cost function expressing the differences between model predictions yn and the target values tn given by the training data. In neural networks, this means backpropagating error signals through the transposed weight matrixes from the output layer toward the input layer. For this, error signals in the output layer are typically initialized by the difference yn - tn, which is optimal for several commonly used loss functions like cross-entropy or sum of squared errors. Here I evaluate a more general error initialization method using power functions |yn - tn|q for q>0, corresponding to a new family of loss functions that generalize cross-entropy. Surprisingly, experiments on various learning tasks reveal that a proper choice of q can significantly improve the speed and convergence of backpropagation learning, in particular in deep and recurrent neural networks. The results suggest two main reasons for the observed improvements. First, compared to cross-entropy, the new loss functions provide better fits to the distribution of error signals in the output layer and therefore maximize the model's likelihood more efficiently. Second, the new error initialization procedure may often provide a better gradient-to-loss ratio over a broad range of neural output activity, thereby avoiding flat loss landscapes with vanishing gradients.

Download Full-text