Graph-based semi-supervised learning via improving the quality of the graph dynamically

The k Nearest Neighbor (KNN) algorithm has been widely applied in various supervised learning tasks due to its simplicity and effectiveness. However, the quality of KNN decision making is directly affected by the quality of the neighborhoods in the modeling space. Efforts have been made to map data to a better feature space either implicitly with kernel functions, or explicitly through learning linear or nonlinear transformations. However, all these methods use pre-determined distance or similarity functions, which may limit their learning capacity. In this paper, we present two loss functions, namely KNN Loss and Fuzzy KNN Loss, to quantify the quality of neighborhoods formed by KNN with respect to supervised learning, such that minimizing the loss function on the training data leads to maximizing KNN decision accuracy on the training data. We further present a deep learning strategy that is able to learn, by minimizing KNN loss, pairwise similarities of data that implicitly maps data to a feature space where the quality of KNN neighborhoods is optimized. Experimental results show that this deep learning strategy (denoted as Deep KNN) outperforms state-of-the-art supervised learning methods on multiple benchmark data sets.

Download Full-text

Cross-Relation Cross-Bag Attention for Distantly-Supervised Relation Extraction

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301419 ◽

2019 ◽

Vol 33 ◽

pp. 419-426 ◽

Cited By ~ 6

Author(s):

Yujin Yuan ◽

Liyuan Liu ◽

Siliang Tang ◽

Zhongfei Zhang ◽

Yueting Zhuang ◽

...

Keyword(s):

Selective Attention ◽

Supervised Learning ◽

State Of The Art ◽

Relation Extraction ◽

Knowledge Bases ◽

Training Data ◽

Distant Supervision ◽

Sentence Level ◽

Noise Robust

Distant supervision leverages knowledge bases to automatically label instances, thus allowing us to train relation extractor without human annotations. However, the generated training data typically contain massive noise, and may result in poor performances with the vanilla supervised learning. In this paper, we propose to conduct multi-instance learning with a novel Cross-relation Cross-bag Selective Attention (C2SA), which leads to noise-robust training for distant supervised relation extractor. Specifically, we employ the sentence-level selective attention to reduce the effect of noisy or mismatched sentences, while the correlation among relations were captured to improve the quality of attention weights. Moreover, instead of treating all entity-pairs equally, we try to pay more attention to entity-pairs with a higher quality. Similarly, we adopt the selective attention mechanism to achieve this goal. Experiments with two types of relation extractor demonstrate the superiority of the proposed approach over the state-of-the-art, while further ablation studies verify our intuitions and demonstrate the effectiveness of our proposed two techniques.

Download Full-text

Evaluation and classification of water quality of glacier‐fed channels using supervised learning and water quality index

Water and Environment Journal ◽

10.1111/wej.12708 ◽

2021 ◽

Author(s):

Muhammad Farooq Ahmed ◽

Umer Waqas ◽

Muhammad Saleem Khan ◽

Hafiz Muhammad Awais Rashid ◽

Shahab Saqib

Keyword(s):

Water Quality ◽

Supervised Learning ◽

Water Quality Index ◽

Quality Index

Download Full-text

5G/B5G Service Classification Using Supervised Learning

Applied Sciences ◽

10.3390/app11114942 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4942

Author(s):

Jorge E. Preciado-Velasco ◽

Joan D. Gonzalez-Franco ◽

Caridad E. Anias-Calderon ◽

Juan I. Nieto-Hipolito ◽

Raul Rivera-Rodriguez

Keyword(s):

Supervised Learning ◽

Quality Of Experience ◽

Service Providers ◽

Matthews Correlation Coefficient ◽

Service Level Agreement ◽

Service Level ◽

5G Networks ◽

Distribution Of Resources

The classification of services in 5G/B5G (Beyond 5G) networks has become important for telecommunications service providers, who face the challenge of simultaneously offering a better Quality of Service (QoS) in their networks and a better Quality of Experience (QoE) to users. Service classification allows 5G service providers to accurately select the network slices for each service, thereby improving the QoS of the network and the QoE perceived by users, and ensuring compliance with the Service Level Agreement (SLA). Some projects have developed systems for classifying these services based on the Key Performance Indicators (KPIs) that characterize the different services. However, Key Quality Indicators (KQIs) are also significant in 5G networks, although these are generally not considered. We propose a service classifier that uses a Machine Learning (ML) approach based on Supervised Learning (SL) to improve classification and to support a better distribution of resources and traffic over 5G/B5G based networks. We carry out simulations of our proposed scheme using different SL algorithms, first with KPIs alone and then incorporating KQIs and show that the latter achieves better prediction, with an accuracy of 97% and a Matthews correlation coefficient of 96.6% with a Random Forest classifier.

Download Full-text

Predicting Quality of Castings via Supervised Learning Method

International Journal of Metalcasting ◽

10.1007/s40962-021-00606-7 ◽

2021 ◽

Author(s):

Adam E. Kopper ◽

Diran Apelian

Keyword(s):

Supervised Learning ◽

Learning Method

Download Full-text

A Strictly Unsupervised Deep Learning Method for HEp-2 Cell Image Classification

Sensors ◽

10.3390/s20092717 ◽

2020 ◽

Vol 20 (9) ◽

pp. 2717

Author(s):

Caleb Vununu ◽

Suk-Hwan Lee ◽

Ki-Ryong Kwon

Keyword(s):

Feature Extraction ◽

Deep Learning ◽

Unsupervised Learning ◽

Supervised Learning ◽

Feature Learning ◽

Automated Classification ◽

Benchmark Datasets ◽

Unsupervised Deep Learning ◽

Latent Representations

Classifying the images that portray the Human Epithelial cells of type 2 (HEp-2) represents one of the most important steps in the diagnosis procedure of autoimmune diseases. Performing this classification manually represents an extremely complicated task due to the heterogeneity of these cellular images. Hence, an automated classification scheme appears to be necessary. However, the majority of the available methods prefer to utilize the supervised learning approach for this problem. The need for thousands of images labelled manually can represent a difficulty with this approach. The first contribution of this work is to demonstrate that classifying HEp-2 cell images can also be done using the unsupervised learning paradigm. Unlike the majority of the existing methods, we propose here a deep learning scheme that performs both the feature extraction and the cells’ discrimination through an end-to-end unsupervised paradigm. We propose the use of a deep convolutional autoencoder (DCAE) that performs feature extraction via an encoding–decoding scheme. At the same time, we embed in the network a clustering layer whose purpose is to automatically discriminate, during the feature learning process, the latent representations produced by the DCAE. Furthermore, we investigate how the quality of the network’s reconstruction can affect the quality of the produced representations. We have investigated the effectiveness of our method on some benchmark datasets and we demonstrate here that the unsupervised learning, when done properly, performs at the same level as the actual supervised learning-based state-of-the-art methods in terms of accuracy.

Download Full-text

Semi-supervised learning with the clustering and Decision Trees classifier for the task of cognitive workload study

Journal of Computer Sciences Institute ◽

10.35784/jcsi.1725 ◽

2020 ◽

Vol 15 ◽

pp. 214-218

Author(s):

Martyna Wawrzyk

Keyword(s):

Supervised Learning ◽

Clustering Algorithm ◽

Cognitive Workload ◽

Digit Symbol Substitution Test ◽

Eye Tracker ◽

Symbol Substitution ◽

Decision Tress ◽

High Level

The paper is focused on application of the clustering algorithm and Decision Tress classifier (DTs) as a semi-supervised method for the task of cognitive workload level classification. The analyzed data were collected during examination of Digit Symbol Substitution Test (DSST) with use of eye-tracker device. 26 participants took part in examination as volunteers. There were conducted three parts of DSST test with different levels of difficulty. As a results there were obtained three versions of data: low, middle and high level of cognitive workload. The case study covered clustering of collected data by using k-means algorithm to detect three clusters or more. The obtained clusters were evaluated by three internal indices to measure the quality of clustering. The David-Boudin index detected the best results in case of four clusters. Based on this information it is possible to formulate the hypothesis of the existence of four clusters. The obtained clusters were adopted as classes in supervised learning and have been subjected to classification. The DTs was applied in classification. There were obtained the 0.85 mean accuracy for three-class classification and 0.73 mean accuracy for four-class classification.

Download Full-text

Interpretable Feature Generation in ECG Using a Variational Autoencoder

Frontiers in Genetics ◽

10.3389/fgene.2021.638191 ◽

2021 ◽

Vol 12 ◽

Author(s):

V. V. Kuznetsov ◽

V. A. Moskalenko ◽

D. V. Gribanov ◽

Nikolai Yu. Zolotykh

Keyword(s):

Cardiovascular Diseases ◽

Supervised Learning ◽

Cardiac Cycle ◽

Ecg Signal ◽

Feature Generation ◽

Maximum Mean Discrepancy ◽

Variational Autoencoder ◽

Electrocardiogram Ecg

We propose a method for generating an electrocardiogram (ECG) signal for one cardiac cycle using a variational autoencoder. Our goal was to encode the original ECG signal using as few features as possible. Using this method we extracted a vector of new 25 features, which in many cases can be interpreted. The generated ECG has quite natural appearance. The low value of the Maximum Mean Discrepancy metric, 3.83 × 10−3, indicates good quality of ECG generation too. The extracted new features will help to improve the quality of automatic diagnostics of cardiovascular diseases. Generating new synthetic ECGs will allow us to solve the issue of the lack of labeled ECG for using them in supervised learning.

Download Full-text

Understanding the Success of Graph-based Semi-Supervised Learning using Partially Labelled Stochastic Block Model

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/187 ◽

2020 ◽

Author(s):

Avirup Saha ◽

Shreyas Sheshadri ◽

Samik Datta ◽

Niloy Ganguly ◽

Disha Makhija ◽

...

Keyword(s):

Supervised Learning ◽

Synthetic Data ◽

Label Propagation ◽

Structural Constraints ◽

Block Model ◽

Stochastic Block Model ◽

Supervised Learning Algorithms ◽

Learning Scenarios ◽

Labelled Graphs

With the proliferation of learning scenarios with an abundance of instances, but limited amount of high-quality labels, semi-supervised learning algorithms came to prominence. Graph-based semi-supervised learning (G-SSL) algorithms, of which Label Propagation (LP) is a prominent example, are particularly well-suited for these problems. The premise of LP is the existence of homophily in the graph, but beyond that nothing is known about the efficacy of LP. In particular, there is no characterisation that connects the structural constraints, volume and quality of the labels to the accuracy of LP. In this work, we draw upon the notion of recovery from the literature on community detection, and provide guarantees on accuracy for partially-labelled graphs generated from the Partially-Labelled Stochastic Block Model (PLSBM). Extensive experiments performed on synthetic data verify the theoretical findings.

Download Full-text