Graph-based semi-supervised learning via improving the quality of the graph dynamically

2021 ◽  
Author(s):  
Jiye Liang ◽  
Junbiao Cui ◽  
Jie Wang ◽  
Wei Wei
Keyword(s):  
2021 ◽  
Vol 182 (2) ◽  
pp. 95-110
Author(s):  
Linh Le ◽  
Ying Xie ◽  
Vijay V. Raghavan

The k Nearest Neighbor (KNN) algorithm has been widely applied in various supervised learning tasks due to its simplicity and effectiveness. However, the quality of KNN decision making is directly affected by the quality of the neighborhoods in the modeling space. Efforts have been made to map data to a better feature space either implicitly with kernel functions, or explicitly through learning linear or nonlinear transformations. However, all these methods use pre-determined distance or similarity functions, which may limit their learning capacity. In this paper, we present two loss functions, namely KNN Loss and Fuzzy KNN Loss, to quantify the quality of neighborhoods formed by KNN with respect to supervised learning, such that minimizing the loss function on the training data leads to maximizing KNN decision accuracy on the training data. We further present a deep learning strategy that is able to learn, by minimizing KNN loss, pairwise similarities of data that implicitly maps data to a feature space where the quality of KNN neighborhoods is optimized. Experimental results show that this deep learning strategy (denoted as Deep KNN) outperforms state-of-the-art supervised learning methods on multiple benchmark data sets.


Author(s):  
Yujin Yuan ◽  
Liyuan Liu ◽  
Siliang Tang ◽  
Zhongfei Zhang ◽  
Yueting Zhuang ◽  
...  

Distant supervision leverages knowledge bases to automatically label instances, thus allowing us to train relation extractor without human annotations. However, the generated training data typically contain massive noise, and may result in poor performances with the vanilla supervised learning. In this paper, we propose to conduct multi-instance learning with a novel Cross-relation Cross-bag Selective Attention (C2SA), which leads to noise-robust training for distant supervised relation extractor. Specifically, we employ the sentence-level selective attention to reduce the effect of noisy or mismatched sentences, while the correlation among relations were captured to improve the quality of attention weights. Moreover, instead of treating all entity-pairs equally, we try to pay more attention to entity-pairs with a higher quality. Similarly, we adopt the selective attention mechanism to achieve this goal. Experiments with two types of relation extractor demonstrate the superiority of the proposed approach over the state-of-the-art, while further ablation studies verify our intuitions and demonstrate the effectiveness of our proposed two techniques.


Author(s):  
Muhammad Farooq Ahmed ◽  
Umer Waqas ◽  
Muhammad Saleem Khan ◽  
Hafiz Muhammad Awais Rashid ◽  
Shahab Saqib

2021 ◽  
Vol 11 (11) ◽  
pp. 4942
Author(s):  
Jorge E. Preciado-Velasco ◽  
Joan D. Gonzalez-Franco ◽  
Caridad E. Anias-Calderon ◽  
Juan I. Nieto-Hipolito ◽  
Raul Rivera-Rodriguez

The classification of services in 5G/B5G (Beyond 5G) networks has become important for telecommunications service providers, who face the challenge of simultaneously offering a better Quality of Service (QoS) in their networks and a better Quality of Experience (QoE) to users. Service classification allows 5G service providers to accurately select the network slices for each service, thereby improving the QoS of the network and the QoE perceived by users, and ensuring compliance with the Service Level Agreement (SLA). Some projects have developed systems for classifying these services based on the Key Performance Indicators (KPIs) that characterize the different services. However, Key Quality Indicators (KQIs) are also significant in 5G networks, although these are generally not considered. We propose a service classifier that uses a Machine Learning (ML) approach based on Supervised Learning (SL) to improve classification and to support a better distribution of resources and traffic over 5G/B5G based networks. We carry out simulations of our proposed scheme using different SL algorithms, first with KPIs alone and then incorporating KQIs and show that the latter achieves better prediction, with an accuracy of 97% and a Matthews correlation coefficient of 96.6% with a Random Forest classifier.


Sensors ◽  
2020 ◽  
Vol 20 (9) ◽  
pp. 2717
Author(s):  
Caleb Vununu ◽  
Suk-Hwan Lee ◽  
Ki-Ryong Kwon

Classifying the images that portray the Human Epithelial cells of type 2 (HEp-2) represents one of the most important steps in the diagnosis procedure of autoimmune diseases. Performing this classification manually represents an extremely complicated task due to the heterogeneity of these cellular images. Hence, an automated classification scheme appears to be necessary. However, the majority of the available methods prefer to utilize the supervised learning approach for this problem. The need for thousands of images labelled manually can represent a difficulty with this approach. The first contribution of this work is to demonstrate that classifying HEp-2 cell images can also be done using the unsupervised learning paradigm. Unlike the majority of the existing methods, we propose here a deep learning scheme that performs both the feature extraction and the cells’ discrimination through an end-to-end unsupervised paradigm. We propose the use of a deep convolutional autoencoder (DCAE) that performs feature extraction via an encoding–decoding scheme. At the same time, we embed in the network a clustering layer whose purpose is to automatically discriminate, during the feature learning process, the latent representations produced by the DCAE. Furthermore, we investigate how the quality of the network’s reconstruction can affect the quality of the produced representations. We have investigated the effectiveness of our method on some benchmark datasets and we demonstrate here that the unsupervised learning, when done properly, performs at the same level as the actual supervised learning-based state-of-the-art methods in terms of accuracy.


2020 ◽  
Vol 15 ◽  
pp. 214-218
Author(s):  
Martyna Wawrzyk

The paper is focused on application of the clustering algorithm and Decision Tress classifier (DTs) as a semi-supervised method for the task of cognitive workload level classification. The analyzed data were collected during examination of Digit Symbol Substitution Test (DSST) with use of eye-tracker device. 26 participants took part in examination as volunteers. There were conducted three parts of DSST test with different levels of difficulty. As a results there were obtained three versions of data: low, middle and high level of cognitive workload. The case study covered clustering of collected data by using k-means algorithm to detect three clusters or more. The obtained clusters were evaluated by three internal indices to measure the quality of clustering. The David-Boudin index detected the best results in case of four clusters. Based on this information it is possible to formulate the hypothesis of the existence of four clusters. The obtained clusters were adopted as classes in supervised learning and have been subjected to classification. The DTs was applied in classification. There were obtained the 0.85 mean accuracy for three-class classification and 0.73 mean accuracy for four-class classification.  


2021 ◽  
Vol 12 ◽  
Author(s):  
V. V. Kuznetsov ◽  
V. A. Moskalenko ◽  
D. V. Gribanov ◽  
Nikolai Yu. Zolotykh

We propose a method for generating an electrocardiogram (ECG) signal for one cardiac cycle using a variational autoencoder. Our goal was to encode the original ECG signal using as few features as possible. Using this method we extracted a vector of new 25 features, which in many cases can be interpreted. The generated ECG has quite natural appearance. The low value of the Maximum Mean Discrepancy metric, 3.83 × 10−3, indicates good quality of ECG generation too. The extracted new features will help to improve the quality of automatic diagnostics of cardiovascular diseases. Generating new synthetic ECGs will allow us to solve the issue of the lack of labeled ECG for using them in supervised learning.


Author(s):  
Avirup Saha ◽  
Shreyas Sheshadri ◽  
Samik Datta ◽  
Niloy Ganguly ◽  
Disha Makhija ◽  
...  

With the proliferation of learning scenarios with an abundance of instances, but limited amount of high-quality labels, semi-supervised learning algorithms came to prominence. Graph-based semi-supervised learning (G-SSL) algorithms, of which Label Propagation (LP) is a prominent example, are particularly well-suited for these problems. The premise of LP is the existence of homophily in the graph, but beyond that nothing is known about the efficacy of LP. In particular, there is no characterisation that connects the structural constraints, volume and quality of the labels to the accuracy of LP. In this work, we draw upon the notion of recovery from the literature on community detection, and provide guarantees on accuracy for partially-labelled graphs generated from the Partially-Labelled Stochastic Block Model (PLSBM). Extensive experiments performed on synthetic data verify the theoretical findings.


Sign in / Sign up

Export Citation Format

Share Document