predictive clustering trees Latest Research Papers

Semi-supervised learning combines supervised and unsupervised learning approaches to learn predictive models from both labeled and unlabeled data. It is most appropriate for problems where labeled examples are difficult to obtain but unlabeled examples are readily available (e.g., drug repurposing). Semi-supervised predictive clustering trees (SSL-PCTs) are a prominent method for semi-supervised learning that achieves good performance on various predictive modeling tasks, including structured output prediction tasks. The main issue, however, is that the learning time scales quadratically with the number of features. In contrast to axis-parallel trees, which only use individual features to split the data, oblique predictive clustering trees (SPYCTs) use linear combinations of features. This makes the splits more flexible and expressive and often leads to better predictive performance. With a carefully designed criterion function, we can use efficient optimization techniques to learn oblique splits. In this paper, we propose semi-supervised oblique predictive clustering trees (SSL-SPYCTs). We adjust the split learning to take unlabeled examples into account while remaining efficient. The main advantage over SSL-PCTs is that the proposed method scales linearly with the number of features. The experimental evaluation confirms the theoretical computational advantage and shows that SSL-SPYCTs often outperform SSL-PCTs and supervised PCTs both in single-tree setting and ensemble settings. We also show that SSL-SPYCTs are better at producing meaningful feature importance scores than supervised SPYCTs when the amount of labeled data is limited.

Download Full-text

Exploiting partially-labeled data in learning predictive clustering trees for multi-target regression: A case study of water quality assessment in Ireland

Ecological Informatics ◽

10.1016/j.ecoinf.2020.101161 ◽

2021 ◽

Vol 61 ◽

pp. 101161

Author(s):

Stevanche Nikoloski ◽

Dragi Kocev ◽

Jurica Levatić ◽

David P. Wall ◽

Sašo Džeroski

Keyword(s):

Water Quality ◽

Quality Assessment ◽

Water Quality Assessment ◽

Partially Labeled Data ◽

Predictive Clustering Trees

Download Full-text

Incremental predictive clustering trees for online semi-supervised multi-target regression

Machine Learning ◽

10.1007/s10994-020-05918-z ◽

2020 ◽

Vol 109 (11) ◽

pp. 2121-2139

Author(s):

Aljaž Osojnik ◽

Panče Panov ◽

Sašo Džeroski

Keyword(s):

State Of The Art ◽

Predictive Performance ◽

The State ◽

High Frequencies ◽

Comparable Performance ◽

Tree Methods ◽

Online Setting ◽

Computational Resources ◽

Good Predictive Performance ◽

Predictive Clustering Trees

Abstract In many application settings, labeling data examples is a costly endeavor, while unlabeled examples are abundant and cheap to produce. Labeling examples can be particularly problematic in an online setting, where there can be arbitrarily many examples that arrive at high frequencies. It is also problematic when we need to predict complex values (e.g., multiple real values), a task that has started receiving considerable attention, but mostly in the batch setting. In this paper, we propose a method for online semi-supervised multi-target regression. It is based on incremental trees for multi-target regression and the predictive clustering framework. Furthermore, it utilizes unlabeled examples to improve its predictive performance as compared to using just the labeled examples. We compare the proposed iSOUP-PCT method with supervised tree methods, which do not use unlabeled examples, and to an oracle method, which uses unlabeled examples as though they were labeled. Additionally, we compare the proposed method to the available state-of-the-art methods. The method achieves good predictive performance on account of increased consumption of computational resources as compared to its supervised variant. The proposed method also beats the state-of-the-art in the case of very few labeled examples in terms of performance, while achieving comparable performance when the labeled examples are more common.

Download Full-text

Ensembles of extremely randomized predictive clustering trees for predicting structured outputs

Machine Learning ◽

10.1007/s10994-020-05894-4 ◽

2020 ◽

Vol 109 (11) ◽

pp. 2213-2241 ◽

Cited By ~ 1

Author(s):

Dragi Kocev ◽

Michelangelo Ceci ◽

Tomaž Stepišnik

Keyword(s):

Structured Outputs ◽

Predictive Clustering Trees

Download Full-text

Multivariate Predictive Clustering Trees for Classification

Lecture Notes in Computer Science - Foundations of Intelligent Systems ◽

10.1007/978-3-030-59491-6_31 ◽

2020 ◽

pp. 331-341

Author(s):

Tomaž Stepišnik ◽

Dragi Kocev

Keyword(s):

Predictive Clustering Trees

Download Full-text

Option Predictive Clustering Trees for Multi-label Classification

Acta Polytechnica Hungarica ◽

10.12700/aph.17.10.2020.10.7 ◽

2020 ◽

Vol 17 (10) ◽

pp. 109-128

Author(s):

Tomaž Stepišnik ◽

Dragi Kocev ◽

Sašo Džeroski

Keyword(s):

Predictive Clustering Trees

Download Full-text

Option predictive clustering trees for multi-target regression

Computer Science and Information Systems ◽

10.2298/csis190928006s ◽

2020 ◽

Vol 17 (2) ◽

pp. 459-486

Author(s):

Tomaz Stepisnik ◽

Aljaz Osojnik ◽

Saso Dzeroski ◽

Dragi Kocev

Keyword(s):

Decision Trees ◽

Domain Knowledge ◽

Mean Squared Error ◽

Ensemble Methods ◽

Predictive Performance ◽

Predictive Modelling ◽

Internal Node ◽

Data Sets ◽

Good Trade ◽

Predictive Clustering Trees

Decision trees are one of the most widely used predictive modelling methods primarily because they are readily interpretable and fast to learn. These nice properties come at the price of predictive performance. Moreover, the standard induction of decision trees suffers from myopia: a single split is chosen in each internal node which is selected in a greedy manner; hence, the resulting tree may be sub-optimal. To address these issues, option trees have been proposed which can include several alternative splits in a new type of internal nodes called option nodes. Considering all of this, an option tree can be also regarded as a condensed representation of an ensemble. In this work, we propose to learn option trees for multi-target regression (MTR) based on the predictive clustering framework. The resulting models are thus called option predictive clustering trees (OPCTs). Multi-target regression is concerned with learning predictive models for tasks with multiple numeric target variables.We evaluate the proposed OPCTs on 11 benchmark MTR data sets. The results reveal that OPCTs achieve statistically significantly better predictive performance than a single predictive clustering tree (PCT) and are competitive with bagging and random forests of PCTs. By limiting the number of option nodes, we can achieve a good trade-off between predictive power and efficiency (model size and learning time).We also perform parameter sensitivity analysis and bias-variance decomposition of the mean squared error. Our analysis shows that OPCTs can reduce the variance of PCTs nearly as much as ensemble methods do. In terms of bias, OPCTs occasionally outperform other methods. Finally, we demonstrate the potential of OPCTs for multifaceted interpretability and illustrate the potential for inclusion of domain knowledge in the tree learning process.

Download Full-text

Multi-Label Classification with PSO based Synthetic Minority Over-Sampling Technique (Psosmote) for Imbalanced Samples

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d8437.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 4039-4042

Keyword(s):

Data Mining ◽

Sampling Rate ◽

Sampling Technique ◽

Unbalanced Data ◽

Optimal Sampling ◽

Minority Class ◽

Swarm Optimization ◽

F Measure ◽

Predictive Clustering Trees

Recently, the learning from unbalanced data has emerged to be a pre-dominant problem in several applications and in that multi label classification is an evolving data mining task, learning from unbalanced multilabel data is being examined. However, the available algorithms-based SMOTE makes use of the same sampling rate for every instance of the minority class. This leads to sub-optimal performance. To deal with this problem, a new Particle Swarm Optimization based SMOTE (PSOSMOTE) algorithm is proposed. The PSOSMOTE algorithm employs diverse sampling rates for multiple minority class instances and gets the fusion of optimal sampling rates and to deal with classification of unbalanced datasets. Then, Bayesian technique is combined with Random forest for multilabel classification (BARF-MLC) is to address the inherent label dependencies among samples such as ML-FOREST classifier, Predictive Clustering Trees (PCT), Hierarchy of Multi Label Classifier (HOMER) by taking the different metrics including precision, recall, F-measure, Accuracy and Error Rate.

Download Full-text

predictive clustering trees
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Survival analysis with semi-supervised predictive clustering trees

Oblique predictive clustering trees

Semi-supervised oblique predictive clustering trees

Exploiting partially-labeled data in learning predictive clustering trees for multi-target regression: A case study of water quality assessment in Ireland

Incremental predictive clustering trees for online semi-supervised multi-target regression

Ensembles of extremely randomized predictive clustering trees for predicting structured outputs

Multivariate Predictive Clustering Trees for Classification

Option Predictive Clustering Trees for Multi-label Classification

Option predictive clustering trees for multi-target regression

Multi-Label Classification with PSO based Synthetic Minority Over-Sampling Technique (Psosmote) for Imbalanced Samples

Export Citation Format

predictive clustering treesRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Survival analysis with semi-supervised predictive clustering trees

Oblique predictive clustering trees

Semi-supervised oblique predictive clustering trees

Exploiting partially-labeled data in learning predictive clustering trees for multi-target regression: A case study of water quality assessment in Ireland

Incremental predictive clustering trees for online semi-supervised multi-target regression

Ensembles of extremely randomized predictive clustering trees for predicting structured outputs

Multivariate Predictive Clustering Trees for Classification

Option Predictive Clustering Trees for Multi-label Classification

Option predictive clustering trees for multi-target regression

Multi-Label Classification with PSO based Synthetic Minority Over-Sampling Technique (Psosmote) for Imbalanced Samples

predictive clustering trees
Recently Published Documents