scholarly journals A survey on semi-supervised learning

2019 ◽  
Vol 109 (2) ◽  
pp. 373-440 ◽  
Author(s):  
Jesper E. van Engelen ◽  
Holger H. Hoos

AbstractSemi-supervised learning is the branch of machine learning concerned with using labelled as well as unlabelled data to perform certain learning tasks. Conceptually situated between supervised and unsupervised learning, it permits harnessing the large amounts of unlabelled data available in many use cases in combination with typically smaller sets of labelled data. In recent years, research in this area has followed the general trends observed in machine learning, with much attention directed at neural network-based models and generative learning. The literature on the topic has also expanded in volume and scope, now encompassing a broad spectrum of theory, algorithms and applications. However, no recent surveys exist to collect and organize this knowledge, impeding the ability of researchers and engineers alike to utilize it. Filling this void, we present an up-to-date overview of semi-supervised learning methods, covering earlier work as well as more recent advances. We focus primarily on semi-supervised classification, where the large majority of semi-supervised learning research takes place. Our survey aims to provide researchers and practitioners new to the field as well as more advanced readers with a solid understanding of the main approaches and algorithms developed over the past two decades, with an emphasis on the most prominent and currently relevant work. Furthermore, we propose a new taxonomy of semi-supervised classification algorithms, which sheds light on the different conceptual and methodological approaches for incorporating unlabelled data into the training process. Lastly, we show how the fundamental assumptions underlying most semi-supervised learning algorithms are closely connected to each other, and how they relate to the well-known semi-supervised clustering assumption.

2021 ◽  
Vol 27 (12) ◽  
pp. 1390-1407
Author(s):  
Ani Vanyan ◽  
Hrant Khachatrian

Semi-supervised learning is a branch of machine learning focused on improving the performance of models when the labeled data is scarce, but there is access to large number of unlabeled examples. Over the past five years there has been a remarkable progress in designing algorithms which are able to get reasonable image classification accuracy having access to the labels for only 0.1% of the samples. In this survey, we describe most of the recently proposed deep semi-supervised learning algorithms for image classification and identify the main trends of research in the field. Next, we compare several components of the algorithms, discuss the challenges of reproducing the results in this area, and highlight recently proposed applications of the methods originally developed for semi-supervised learning.


2021 ◽  
Author(s):  
jorge cabrera Alvargonzalez ◽  
Ana Larranaga Janeiro ◽  
Sonia Perez ◽  
Javier Martinez Torres ◽  
Lucia martinez lamas ◽  
...  

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been and remains one of the major challenges humanity has faced thus far. Over the past few months, large amounts of information have been collected that are only now beginning to be assimilated. In the present work, the existence of residual information in the massive numbers of rRT-PCRs that tested positive out of the almost half a million tests that were performed during the pandemic is investigated. This residual information is believed to be highly related to a pattern in the number of cycles that are necessary to detect positive samples as such. Thus, a database of more than 20,000 positive samples was collected, and two supervised classification algorithms (a support vector machine and a neural network) were trained to temporally locate each sample based solely and exclusively on the number of cycles determined in the rRT-PCR of each individual. Finally, the results obtained from the classification show how the appearance of each wave is coincident with the surge of each of the variants present in the region of Galicia (Spain) during the development of the SARS-CoV-2 pandemic and clearly identified with the classification algorithm.


2015 ◽  
Vol 7 (1) ◽  
pp. 18-30
Author(s):  
Zalán Bodó ◽  
Lehel Csató

Abstract Semi-supervised learning has become an important and thoroughly studied subdomain of machine learning in the past few years, because gathering large unlabeled data is almost costless, and the costly human labeling process can be minimized by semi-supervision. Label propagation is a transductive semi-supervised learning method that operates on the—most of the time undirected—data graph. It was introduced in [8] and since many variants were proposed. However, the base algorithm has two variants: the first variant presented in [8] and its slightly modified version used afterwards, e.g. in [7]. This paper presents and compares the two algorithms—both theoretically and experimentally—and also tries to make a recommendation which variant to use.


2018 ◽  
Author(s):  
Nicholas A. Bokulich ◽  
Matthew Dillon ◽  
Evan Bolyen ◽  
Benjamin D. Kaehler ◽  
Gavin A. Huttley ◽  
...  

AbstractMicrobiome studies often aim to predict outcomes or differentiate samples based on their microbial compositions, tasks that can be efficiently performed by supervised learning methods. Here we present a benchmark comparison of supervised learning classifiers and regressors implemented in scikit-learn, a Python-based machine-learning library. We additionally present q2-sample-classifier, a plugin for the QIIME 2 microbiome bioinformatics framework, that facilitates application of the scikit-learn classifiers to microbiome data.Random forest, extra trees, andgradient boostingmodels demonstrate the highest performance for both supervised classification and regression of microbiome data. Automated feature selection and hyperparameter tuning enhance performance of most methods but may not be necessary under all circumstances. The q2-sample-classifier plugin makes these methods more accessible and interpretable to a broad audience of microbiologists, clinicians, and others who wish to utilize supervised learning methods for predicting sample characteristics based on microbiome composition. The q2-sample-classifier source code is available athttps://github.com/qiime2/q2-sample-classifier. It is released under a BSD-3-Clause license, and is freely available including for commercial use.


2021 ◽  
Author(s):  
Pouyan Hosseinizadeh

While many modelling methods have been developed and introduced to predict the actual state of a system at the next point of time, the purpose of this research is to present and discuss two approaches NOT to predict the exact future states, but to identify the potential for final collapse of a system. The first approach is based on kernel methods, a sub category of supervised learning, and attempts to provide a visualization method to classify the active and dead companies and predict the potential collapse of a system. The second method aims to analyze the inclination of a system by looking at the local changes that have been observed over a certain period of time in the past. Application of these modelling approaches to predict collapse in different companies belonging to two industrial sectors by looking at behaviour of their closing stock prices are discussed in this research. Advantages and limitations of each approach are also discussed.


2012 ◽  
pp. 695-703
Author(s):  
George Tzanis ◽  
Christos Berberidis ◽  
Ioannis Vlahavas

Machine learning is one of the oldest subfields of artificial intelligence and is concerned with the design and development of computational systems that can adapt themselves and learn. The most common machine learning algorithms can be either supervised or unsupervised. Supervised learning algorithms generate a function that maps inputs to desired outputs, based on a set of examples with known output (labeled examples). Unsupervised learning algorithms find patterns and relationships over a given set of inputs (unlabeled examples). Other categories of machine learning are semi-supervised learning, where an algorithm uses both labeled and unlabeled examples, and reinforcement learning, where an algorithm learns a policy of how to act given an observation of the world.


Author(s):  
George Tzanis ◽  
Christos Berberidis ◽  
Ioannis Vlahavas

Machine learning is one of the oldest subfields of artificial intelligence and is concerned with the design and development of computational systems that can adapt themselves and learn. The most common machine learning algorithms can be either supervised or unsupervised. Supervised learning algorithms generate a function that maps inputs to desired outputs, based on a set of examples with known output (labeled examples). Unsupervised learning algorithms find patterns and relationships over a given set of inputs (unlabeled examples). Other categories of machine learning are semi-supervised learning, where an algorithm uses both labeled and unlabeled examples, and reinforcement learning, where an algorithm learns a policy of how to act given an observation of the world.


2020 ◽  
Vol 1 (2) ◽  
pp. 1-4
Author(s):  
Priyam Guha ◽  
Abhishek Mukherjee ◽  
Abhishek Verma

This research paper deals with using supervised machine learning algorithms to detect authenticity of bank notes. In this research we were successful in achieving very high accuracy (of the order of 99%) by applying some data preprocessing tricks and then running the processed data on supervised learning algorithms like SVM, Decision Trees, Logistic Regression, KNN. We then proceed to analyze the misclassified points. We examine the confusion matrix to find out which algorithms had more number of false positives and which algorithm had more number of False negatives. This research paper deals with using supervised machine learning algorithms to detect authenticity of bank notes. In this research we were successful in achieving very high accuracy (of the order of 99%) by applying some data preprocessing tricks and then running the processed data on supervised learning algorithms like SVM, Decision Trees, Logistic Regression, KNN. We then proceed to analyze the misclassified points. We examine the confusion matrix to find out which algorithms had more number of false positives and which algorithm had more number of False negatives.


2015 ◽  
Vol 28 (6) ◽  
pp. 570-600 ◽  
Author(s):  
Grant Duwe ◽  
KiDeuk Kim

Recent research has produced mixed results as to whether newer machine learning algorithms outperform older, more traditional methods such as logistic regression in predicting recidivism. In this study, we compared the performance of 12 supervised learning algorithms to predict recidivism among offenders released from Minnesota prisons. Using multiple predictive validity metrics, we assessed the performance of these algorithms across varying sample sizes, recidivism base rates, and number of predictors in the data set. The newer machine learning algorithms generally yielded better predictive validity results. LogitBoost had the best overall performance, followed by Random forests, MultiBoosting, bagged trees, and logistic model trees. Still, the gap between the best and worst algorithms was relatively modest, and none of the methods performed the best in each of the 10 scenarios we examined. The results suggest that multiple methods, including machine learning algorithms, should be considered in the development of recidivism risk assessment instruments.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Eric Pettersson Ruiz ◽  
Jannis Angelis

Purpose This study aims to explore how to deanonymize cryptocurrency money launderers with the help of machine learning (ML). Money is laundered through cryptocurrencies by distributing funds to multiple accounts and then reexchanging the crypto back. This process of exchanging currencies is done through cryptocurrency exchanges. Current preventive efforts are outdated, and ML may provide novel ways to identify illicit currency movements. Hence, this study investigates ML applicability for combatting money laundering activities using cryptocurrency. Design/methodology/approach Four supervised-learning algorithms were compared using the Bitcoin Elliptic Dataset. The method covered a quantitative analysis of the algorithmic performance, capturing differences in three key evaluation metrics of F1-scores, precision and recall. Two complementary qualitative interviews were performed at cryptocurrency exchanges to identify fit and applicability of the algorithms. Findings The study results show that the current implemented ML tools for preventing money laundering at cryptocurrency exchanges are all too slow and need to be optimized for the task. The results also show that while not one single algorithm is most suitable for detecting transactions related to money-laundering, the specific applicability of the decision tree algorithm is most suitable for adoption by cryptocurrency exchanges. Originality/value Given the growth of cryptocurrency use, this study explores the newly developed field of algorithmic tools to combat illicit currency movement, in particular in the growing arena of cryptocurrencies. The study results provide new insights into the applicability of ML as a tool to combat money laundering using cryptocurrency exchanges.


Sign in / Sign up

Export Citation Format

Share Document