An Ensemble SSL Algorithm for Efficient Chest X-Ray Image Classification

A critical component in the computer-aided medical diagnosis of digital chest X-rays is the automatic detection of lung abnormalities, since the effective identification at an initial stage constitutes a significant and crucial factor in patient’s treatment. The vigorous advances in computer and digital technologies have ultimately led to the development of large repositories of labeled and unlabeled images. Due to the effort and expense involved in labeling data, training datasets are of a limited size, while in contrast, electronic medical record systems contain a significant number of unlabeled images. Semi-supervised learning algorithms have become a hot topic of research as an alternative to traditional classification methods, exploiting the explicit classification information of labeled data with the knowledge hidden in the unlabeled data for building powerful and effective classifiers. In the present work, we evaluate the performance of an ensemble semi-supervised learning algorithm for the classification of chest X-rays of tuberculosis. The efficacy of the presented algorithm is demonstrated by several experiments and confirmed by the statistical nonparametric tests, illustrating that reliable and robust prediction models could be developed utilizing a few labeled and many unlabeled data.

Download Full-text

A Weighted Voting Ensemble Self-Labeled Algorithm for the Detection of Lung Abnormalities from X-Rays

Algorithms ◽

10.3390/a12030064 ◽

2019 ◽

Vol 12 (3) ◽

pp. 64 ◽

Cited By ~ 7

Author(s):

Ioannis Livieris ◽

Andreas Kanavos ◽

Vassilis Tampakas ◽

Panagiotis Pintelas

Keyword(s):

Machine Learning ◽

Supervised Learning ◽

Learning Algorithm ◽

X Rays ◽

Weighted Voting ◽

Useful Knowledge ◽

Lung Abnormalities ◽

Supervised Learning Algorithms ◽

Digital Chest ◽

Classification Information

During the last decades, intensive efforts have been devoted to the extraction of useful knowledge from large volumes of medical data employing advanced machine learning and data mining techniques. Advances in digital chest radiography have enabled research and medical centers to accumulate large repositories of classified (labeled) images and mostly of unclassified (unlabeled) images from human experts. Machine learning methods such as semi-supervised learning algorithms have been proposed as a new direction to address the problem of shortage of available labeled data, by exploiting the explicit classification information of labeled data with the information hidden in the unlabeled data. In the present work, we propose a new ensemble semi-supervised learning algorithm for the classification of lung abnormalities from chest X-rays based on a new weighted voting scheme. The proposed algorithm assigns a vector of weights on each component classifier of the ensemble based on its accuracy on each class. Our numerical experiments illustrate the efficiency of the proposed ensemble methodology against other state-of-the-art classification methods.

Download Full-text

On Ensemble SSL Algorithms for Credit Scoring Problem

Informatics ◽

10.3390/informatics5040040 ◽

2018 ◽

Vol 5 (4) ◽

pp. 40 ◽

Cited By ~ 5

Author(s):

Ioannis Livieris ◽

Niki Kiriakidou ◽

Andreas Kanavos ◽

Vassilis Tampakas ◽

Panagiotis Pintelas

Keyword(s):

Supervised Learning ◽

Prediction Models ◽

Credit Scoring ◽

Learning Algorithms ◽

Unlabeled Data ◽

Useful Knowledge ◽

Customer Group ◽

Ensemble Techniques ◽

Supervised Learning Algorithms ◽

Robust Prediction

Credit scoring is generally recognized as one of the most significant operational research techniques used in banking and finance, aiming to identify whether a credit consumer belongs to either a legitimate or a suspicious customer group. With the vigorous development of the Internet and the widespread adoption of electronic records, banks and financial institutions have accumulated large repositories of labeled and mostly unlabeled data. Semi-supervised learning constitutes an appropriate machine- learning methodology for extracting useful knowledge from both labeled and unlabeled data. In this work, we evaluate the performance of two ensemble semi-supervised learning algorithms for the credit scoring problem. Our numerical experiments indicate that the proposed algorithms outperform their component semi-supervised learning algorithms, illustrating that reliable and robust prediction models could be developed by the adaptation of ensemble techniques in the semi-supervised learning framework.

Download Full-text

DEVELOPMENT AND COMPARATIVE ANALYSIS OF SEMI-SUPERVISED LEARNING ALGORITHMS ON A SMALL AMOUNT OF LABELED DATA

Bulletin of National Technical University KhPI Series System Analysis Control and Information Technologies ◽

10.20998/2079-0023.2021.01.16 ◽

2021 ◽

pp. 98-103

Author(s):

Klym Yamkovyi

Keyword(s):

Supervised Learning ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Center Of Mass ◽

Unlabeled Data ◽

Learning Approaches ◽

Classification Problems ◽

K Nearest Neighbor ◽

Supervised Learning Algorithms ◽

Label Information

The paper is dedicated to the development and comparative experimental analysis of semi-supervised learning approaches based on a mix of unsupervised and supervised approaches for the classification of datasets with a small amount of labeled data, namely, identifying to which of a set of categories a new observation belongs using a training set of data containing observations whose category membership is known. Semi-supervised learning is an approach to machine learning that combines a small amount of labeled data with a large amount of unlabeled data during training. Unlabeled data, when used in combination with a small quantity of labeled data, can produce significant improvement in learning accuracy. The goal is semi-supervised methods development and analysis along with comparing their accuracy and robustness on different synthetics datasets. The proposed approach is based on the unsupervised K-medoids methods, also known as the Partitioning Around Medoid algorithm, however, unlike Kmedoids the proposed algorithm first calculates medoids using only labeled data and next process unlabeled classes – assign labels of nearest medoid. Another proposed approach is the mix of the supervised method of K-nearest neighbor and unsupervised K-Means. Thus, the proposed learning algorithm uses information about both the nearest points and classes centers of mass. The methods have been implemented using Python programming language and experimentally investigated for solving classification problems using datasets with different distribution and spatial characteristics. Datasets were generated using the scikit-learn library. Was compared the developed approaches to find average accuracy on all these datasets. It was shown, that even small amounts of labeled data allow us to use semi-supervised learning, and proposed modifications ensure to improve accuracy and algorithm performance, which was demonstrated during experiments. And with the increase of available label information accuracy of the algorithms grows up. Thus, the developed algorithms are using a distance metric that considers available label information. Keywords: Unsupervised learning, supervised learning. semi-supervised learning, clustering, distance, distance function, nearest neighbor, medoid, center of mass.

Download Full-text

On incrementally using a small portion of strong unlabeled data for semi-supervised learning algorithms

Pattern Recognition Letters ◽

10.1016/j.patrec.2013.08.026 ◽

2014 ◽

Vol 41 ◽

pp. 53-64 ◽

Cited By ~ 8

Author(s):

Thanh-Binh Le ◽

Sang-Woon Kim

Keyword(s):

Supervised Learning ◽

Learning Algorithms ◽

Unlabeled Data ◽

Supervised Learning Algorithms

Download Full-text

An Auto-Adjustable Semi-Supervised Self-Training Algorithm

Algorithms ◽

10.3390/a11090139 ◽

2018 ◽

Vol 11 (9) ◽

pp. 139 ◽

Cited By ~ 5

Author(s):

Ioannis Livieris ◽

Andreas Kanavos ◽

Vassilis Tampakas ◽

Panagiotis Pintelas

Keyword(s):

Supervised Learning ◽

Predictive Models ◽

Learning Algorithm ◽

Learning Algorithms ◽

Classification Problem ◽

Classification Methods ◽

Training Algorithm ◽

Traditional Classification ◽

Supervised Learning Algorithms ◽

Significant Research

Semi-supervised learning algorithms have become a topic of significant research as an alternative to traditional classification methods which exhibit remarkable performance over labeled data but lack the ability to be applied on large amounts of unlabeled data. In this work, we propose a new semi-supervised learning algorithm that dynamically selects the most promising learner for a classification problem from a pool of classifiers based on a self-training philosophy. Our experimental results illustrate that the proposed algorithm outperforms its component semi-supervised learning algorithms in terms of accuracy, leading to more efficient, stable and robust predictive models.

Download Full-text

Predicting Secondary School Students' Performance Utilizing a Semi-supervised Learning Approach

Journal of Educational Computing Research ◽

10.1177/0735633117752614 ◽

2018 ◽

Vol 57 (2) ◽

pp. 448-470 ◽

Cited By ~ 11

Author(s):

Ioannis E. Livieris ◽

Konstantina Drakopoulou ◽

Vassilis T. Tampakas ◽

Tassos A. Mikropoulos ◽

Panagiotis Pintelas

Keyword(s):

Supervised Learning ◽

Prediction Models ◽

Learning Algorithms ◽

Educational Data Mining ◽

Semisupervised Learning ◽

Research Field ◽

Machine Learning Techniques ◽

School Students ◽

Supervised Learning Algorithms ◽

And Performance

Educational data mining constitutes a recent research field which gained popularity over the last decade because of its ability to monitor students' academic performance and predict future progression. Numerous machine learning techniques and especially supervised learning algorithms have been applied to develop accurate models to predict student's characteristics which induce their behavior and performance. In this work, we examine and evaluate the effectiveness of two wrapper methods for semisupervised learning algorithms for predicting the students' performance in the final examinations. Our preliminary numerical experiments indicate that the advantage of semisupervised methods is that the classification accuracy can be significantly improved by utilizing a few labeled and many unlabeled data for developing reliable prediction models.

Download Full-text

Clustering-Based Transductive Semi-Supervised Learning for Learning-to-Rank

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001419510078 ◽

2019 ◽

Vol 33 (12) ◽

pp. 1951007 ◽

Cited By ~ 1

Author(s):

Ashwini Rahangdale ◽

Shital Raut

Keyword(s):

Supervised Learning ◽

Learning To Rank ◽

Cost Effective ◽

Unlabeled Data ◽

Training Data ◽

Density Region ◽

The Arts ◽

Proposed Model ◽

Supervised Learning Algorithms ◽

Multiple Loss

Learning-to-rank (LTR) is a very hot topic of research for information retrieval (IR). LTR framework usually learns the ranking function using available training data that are very cost-effective, time-consuming and biased. When sufficient amount of training data is not available, semi-supervised learning is one of the machine learning paradigms that can be applied to get pseudo label from unlabeled data. Cluster and label is a basic approach for semi-supervised learning to identify the high-density region in data space which is mainly used to support the supervised learning. However, clustering with conventional method may lead to prediction performance which is worse than supervised learning algorithms for application of LTR. Thus, we propose rank preserving clustering (RPC) with PLocalSearch and get pseudo label for unlabeled data. We present semi-supervised learning that adopts clustering-based transductive method and combine it with nonmeasure specific listwise approach to learn the LTR model. Moreover, each cluster follows the multi-task learning to avoid optimization of multiple loss functions. It reduces the training complexity of adopted listwise approach from an exponential order to a polynomial order. Empirical analysis on the standard datasets (LETOR) shows that the proposed model gives better results as compared to other state-of-the-arts.

Download Full-text

Comparison of Adjusted Methods for Selecting Useful Unlabeled Data for Semi-Supervised Learning Algorithms

Current Approaches in Applied Artificial Intelligence - Lecture Notes in Computer Science ◽

10.1007/978-3-319-19066-2_51 ◽

2015 ◽

pp. 526-535

Author(s):

Thanh-Binh Le ◽

Sang-Woon Kim

Keyword(s):

Supervised Learning ◽

Learning Algorithms ◽

Unlabeled Data ◽

Supervised Learning Algorithms

Download Full-text

Time-Series Laplacian Semi-Supervised Learning for Indoor Localization †

Sensors ◽

10.3390/s19183867 ◽

2019 ◽

Vol 19 (18) ◽

pp. 3867 ◽

Cited By ~ 2

Author(s):

Jaehyun Yoo

Keyword(s):

Time Series ◽

Supervised Learning ◽

Indoor Localization ◽

Learning Algorithm ◽

Unlabeled Data ◽

Training Data ◽

Practical Implementation ◽

Additional Information ◽

Localization Scheme ◽

The Impact

Machine learning-based indoor localization used to suffer from the collection, construction, and maintenance of labeled training databases for practical implementation. Semi-supervised learning methods have been developed as efficient indoor localization methods to reduce use of labeled training data. To boost the efficiency and the accuracy of indoor localization, this paper proposes a new time-series semi-supervised learning algorithm. The key aspect of the developed method, which distinguishes it from conventional semi-supervised algorithms, is the use of unlabeled data. The learning algorithm finds spatio-temporal relationships in the unlabeled data, and pseudolabels are generated to compensate for the lack of labeled training data. In the next step, another balancing-optimization learning algorithm learns a positioning model. The proposed method is evaluated for estimating the location of a smartphone user by using a Wi-Fi received signal strength indicator (RSSI) measurement. The experimental results show that the developed learning algorithm outperforms some existing semi-supervised algorithms according to the variation of the number of training data and access points. Also, the proposed method is discussed in terms of why it gives better performance, by the analysis of the impact of the learning parameters. Moreover, the extended localization scheme in conjunction with a particle filter is executed to include additional information, such as a floor plan.

Download Full-text

On Teacher-Student Semi-Supervised Learning for Chest X-ray Image Classification

10.21528/cbic2021-80 ◽

2021 ◽

Author(s):

Roberto Augusto Philippi Martins ◽

Danilo Silva

Keyword(s):

Deep Learning ◽

Supervised Learning ◽

Learning Algorithm ◽

Unlabeled Data ◽

Imaging Data ◽

X Ray ◽

Teacher Student ◽

Chest X Ray ◽

Deep Learning Model

The lack of labeled data is one of the main prohibiting issues on the development of deep learning models, as they rely on large labeled datasets in order to achieve high accuracy in complex tasks. Our objective is to evaluate the performance gain of having additional unlabeled data in the training of a deep learning model when working with medical imaging data. We present a semi-supervised learning algorithm that utilizes a teacher-student paradigm in order to leverage unlabeled data in the classification of chest X-ray images. Using our algorithm on the ChestX-ray14 dataset, we manage to achieve a substantial increase in performance when using small labeled datasets. With our method, a model achieves an AUROC of 0.822 with only 2% labeled data and 0.865 with 5% labeled data, while a fully supervised method achieves an AUROC of 0.807 with 5% labeled data and only 0.845 with 10%.

Download Full-text