An Optimized Semi-Supervised Learning Approach for High Dimensional Datasets

The semi-supervised learning is one of the most interesting fields for research developments in the machine learning domain beyond the scope of supervised learning from data. Medical diagnostic process works mostly in supervised mode, but in reality, we are in the presence of a large amount of unlabeled samples and a small set of labeled examples characterized by thousands of features. This problem is known under the term “the curse of dimensionality”. In this study, we propose, as solution, a new approach in semi-supervised learning that we would call Optim Co-forest. The Optim Co-forest algorithm combines the re-sampling data approach (Bagging Breiman, 1996) with two selection strategies. The first one involves selecting random subset of parameters to construct the ensemble of classifiers following the principle of Co-forest (Li & Zhou, 2007). The second strategy is an extension of the importance measure of Random Forest (RF; Breiman, 2001). Experiments on high dimensional datasets confirm the power of the adopted selection strategies in the scalability of our method.

Download Full-text

A Semi-supervised Learning Approach for High Dimensional Android Malware Classification

Cyberspace Safety and Security - Lecture Notes in Computer Science ◽

10.1007/978-3-030-73671-2_3 ◽

2021 ◽

pp. 20-31

Author(s):

Qiao Shang ◽

Ni Li ◽

Qi Qi ◽

Xiao-Wei Lin

Keyword(s):

Supervised Learning ◽

High Dimensional ◽

Learning Approach ◽

Android Malware ◽

Malware Classification

Download Full-text

Investigating the role of Simpson’s paradox in the analysis of top-ranked features in high-dimensional bioinformatics datasets

Briefings in Bioinformatics ◽

10.1093/bib/bby126 ◽

2019 ◽

Vol 21 (2) ◽

pp. 421-428 ◽

Cited By ~ 1

Author(s):

Alex A Freitas

Keyword(s):

Machine Learning ◽

High Dimensional ◽

Feature Ranking ◽

Ranking Methods ◽

Simpson’S Paradox ◽

Small Set ◽

Simpson's Paradox ◽

High Dimensional Datasets ◽

Class Variable

Abstract An important problem in bioinformatics consists of identifying the most important features (or predictors), among a large number of features in a given classification dataset. This problem is often addressed by using a machine learning–based feature ranking method to identify a small set of top-ranked predictors (i.e. the most relevant features for classification). The large number of studies in this area has, however, an important limitation: they ignore the possibility that the top-ranked predictors occur in an instance of Simpson’s paradox, where the positive or negative association between a predictor and a class variable reverses sign upon conditional on each of the values of a third (confounder) variable. In this work, we review and investigate the role of Simpson’s paradox in the analysis of top-ranked predictors in high-dimensional bioinformatics datasets, in order to avoid the potential danger of misinterpreting an association between a predictor and the class variable. We perform computational experiments using four well-known feature ranking methods from the machine learning field and five high-dimensional datasets of ageing-related genes, where the predictors are Gene Ontology terms. The results show that occurrences of Simpson’s paradox involving top-ranked predictors are much more common for one of the feature ranking methods.

Download Full-text

Maps Ensemble for Semi-Supervised Learning of Large High Dimensional Datasets

Lecture Notes in Computer Science - Foundations of Intelligent Systems ◽

10.1007/978-3-540-68123-6_11 ◽

2008 ◽

pp. 100-110 ◽

Cited By ~ 2

Author(s):

Elie Prudhomme ◽

Stéphane Lallich

Keyword(s):

Supervised Learning ◽

High Dimensional ◽

High Dimensional Datasets

Download Full-text

A Supervised Learning Approach Involving Active Subspaces for an Efficient Genetic Algorithm in High-Dimensional Optimization Problems

SIAM Journal on Scientific Computing ◽

10.1137/20m1345219 ◽

2021 ◽

Vol 43 (3) ◽

pp. B831-B853

Author(s):

Nicola Demo ◽

Marco Tezzele ◽

Gianluigi Rozza

Keyword(s):

Genetic Algorithm ◽

Supervised Learning ◽

Optimization Problems ◽

High Dimensional ◽

Learning Approach ◽

Active Subspaces ◽

Dimensional Optimization

Download Full-text

A Supervised Learning Approach for Dynamic Sampling (SLADS) in Raman Hyperspectral Imaging

Electronic Imaging ◽

10.2352/issn.2470-1173.2018.15.coimg-132 ◽

2018 ◽

Vol 2018 (15) ◽

pp. 132-1-1323

Author(s):

Shijie Zhang ◽

Zhengtian Song ◽

G. M. Dilshan P. Godaliyadda ◽

Dong Hye Ye ◽

Atanu Sengupta ◽

...

Keyword(s):

Supervised Learning ◽

Hyperspectral Imaging ◽

Learning Approach ◽

Dynamic Sampling

Download Full-text

Stochastic Parameter Estimation Neural Nets Supervised Learning Approach

10.21528/cbrn1994-010 ◽

2016 ◽

Author(s):

Atair Rios Neto

Keyword(s):

Parameter Estimation ◽

Supervised Learning ◽

Neural Nets ◽

Learning Approach ◽

Stochastic Parameter

Download Full-text

Different ways of managing risk as reported in 10‐Ks: A supervised learning approach

Financial Review ◽

10.1111/fire.12268 ◽

2021 ◽

Author(s):

Richard Friberg ◽

Thomas Seiler

Keyword(s):

Supervised Learning ◽

Learning Approach ◽

Managing Risk

Download Full-text

Tropical Balls and Its Applications to K Nearest Neighbor over the Space of Phylogenetic Trees

Mathematics ◽

10.3390/math9070779 ◽

2021 ◽

Vol 9 (7) ◽

pp. 779

Author(s):

Ruriko Yoshida

Keyword(s):

Supervised Learning ◽

Phylogenetic Trees ◽

Nearest Neighbor ◽

Nearest Neighbors ◽

High Dimensional ◽

Learning Method ◽

Dimensional Vector ◽

K Nearest Neighbor ◽

K Nearest Neighbors

A tropical ball is a ball defined by the tropical metric over the tropical projective torus. In this paper we show several properties of tropical balls over the tropical projective torus and also over the space of phylogenetic trees with a given set of leaf labels. Then we discuss its application to the K nearest neighbors (KNN) algorithm, a supervised learning method used to classify a high-dimensional vector into given categories by looking at a ball centered at the vector, which contains K vectors in the space.

Download Full-text

A new method for mining disjunctive emerging patterns in high-dimensional datasets using hypergraphs

Information Systems ◽

10.1016/j.is.2013.09.001 ◽

2014 ◽

Vol 40 ◽

pp. 1-10 ◽

Cited By ~ 8

Author(s):

Renato Vimieiro ◽

Pablo Moscato

Keyword(s):

New Method ◽

High Dimensional ◽

Emerging Patterns ◽

High Dimensional Datasets

Download Full-text

Fingerprinting Mobile Devices Using Personalized Configurations

Proceedings on Privacy Enhancing Technologies ◽

10.1515/popets-2015-0027 ◽

2016 ◽

Vol 2016 (1) ◽

pp. 4-19 ◽

Cited By ~ 32

Author(s):

Andreas Kurtz ◽

Hugo Gascon ◽

Tobias Becker ◽

Konrad Rieck ◽

Felix Freiling

Keyword(s):

Supervised Learning ◽

Mobile Devices ◽

Real World ◽

Third Party ◽

Learning Approach ◽

Total Accuracy ◽

Over Time

Abstract Recently, Apple removed access to various device hardware identifiers that were frequently misused by iOS third-party apps to track users. We are, therefore, now studying the extent to which users of smartphones can still be uniquely identified simply through their personalized device configurations. Using Apple’s iOS as an example, we show how a device fingerprint can be computed using 29 different configuration features. These features can be queried from arbitrary thirdparty apps via the official SDK. Experimental evaluations based on almost 13,000 fingerprints from approximately 8,000 different real-world devices show that (1) all fingerprints are unique and distinguishable; and (2) utilizing a supervised learning approach allows returning users or their devices to be recognized with a total accuracy of 97% over time

Download Full-text