Reliable Multilabel Classification: Prediction with Partial Abstention

In contrast to conventional (single-label) classification, the setting of multilabel classification (MLC) allows an instance to belong to several classes simultaneously. Thus, instead of selecting a single class label, predictions take the form of a subset of all labels. In this paper, we study an extension of the setting of MLC, in which the learner is allowed to partially abstain from a prediction, that is, to deliver predictions on some but not necessarily all class labels. This option is useful in cases of uncertainty, where the learner does not feel confident enough on the entire label set. Adopting a decision-theoretic perspective, we propose a formal framework of MLC with partial abstention, which builds on two main building blocks: First, the extension of underlying MLC loss functions so as to accommodate abstention in a proper way, and second the problem of optimal prediction, that is, finding the Bayes-optimal prediction minimizing this generalized loss in expectation. It is well known that different (generalized) loss functions may have different risk-minimizing predictions, and finding the Bayes predictor typically comes down to solving a computationally complexity optimization problem. In the most general case, given a prediction of the (conditional) joint distribution of possible labelings, the minimizer of the expected loss needs to be found over a number of candidates which is exponential in the number of class labels. We elaborate on properties of risk minimizers for several commonly used (generalized) MLC loss functions, show them to have a specific structure, and leverage this structure to devise efficient methods for computing Bayes predictors. Experimentally, we show MLC with partial abstention to be effective in the sense of reducing loss when being allowed to abstain.

Download Full-text

BiLabel-Specific Features for Multi-Label Classification

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3458283 ◽

2021 ◽

Vol 16 (1) ◽

pp. 1-23

Author(s):

Min-Ling Zhang ◽

Jun-Peng Fang ◽

Yi-Bo Wang

Keyword(s):

Predictive Models ◽

Comparative Studies ◽

State Of The Art ◽

Classification Model ◽

Generation Process ◽

Prototype Selection ◽

Class Label ◽

Benchmark Datasets ◽

Label Correlations ◽

Class Labels

In multi-label classification, the task is to induce predictive models which can assign a set of relevant labels for the unseen instance. The strategy of label-specific features has been widely employed in learning from multi-label examples, where the classification model for predicting the relevancy of each class label is induced based on its tailored features rather than the original features. Existing approaches work by generating a group of tailored features for each class label independently, where label correlations are not fully considered in the label-specific features generation process. In this article, we extend existing strategy by proposing a simple yet effective approach based on BiLabel-specific features. Specifically, a group of tailored features is generated for a pair of class labels with heuristic prototype selection and embedding. Thereafter, predictions of classifiers induced by BiLabel-specific features are ensembled to determine the relevancy of each class label for unseen instance. To thoroughly evaluate the BiLabel-specific features strategy, extensive experiments are conducted over a total of 35 benchmark datasets. Comparative studies against state-of-the-art label-specific features techniques clearly validate the superiority of utilizing BiLabel-specific features to yield stronger generalization performance for multi-label classification.

Download Full-text

Vector space model for patent documents with hierarchical class labels

Journal of Information Science ◽

10.1177/0165551512437635 ◽

2012 ◽

Vol 38 (3) ◽

pp. 222-233 ◽

Cited By ~ 6

Author(s):

Yen-Liang Chen ◽

Yu-Ting Chiu

Keyword(s):

Feature Selection ◽

Vector Space ◽

Selection Process ◽

Vector Space Model ◽

Class Label ◽

Discriminative Ability ◽

New Approach ◽

Space Model ◽

Patent Documents ◽

Class Labels

A vector space model (VSM) composed of selected important features is a common way to represent documents, including patent documents. Patent documents have some special characteristics that make it difficult to apply traditional feature selection methods directly: (a) it is difficult to find common terms for patent documents in different categories; and (b) the class label of a patent document is hierarchical rather than flat. Hence, in this article we propose a new approach that includes a hierarchical feature selection (HFS) algorithm which can be used to select more representative features with greater discriminative ability to present a set of patent documents with hierarchical class labels. The performance of the proposed method is evaluated through application to two documents sets with 2400 and 9600 patent documents, where we extract candidate terms from their titles and abstracts. The experimental results reveal that a VSM whose features are selected by a proportional selection process gives better coverage, while a VSM whose features are selected with a weighted-summed selection process gives higher accuracy.

Download Full-text

Query-Driven Multi-Instance Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5836 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4158-4165

Author(s):

Yen-Chi Hsu ◽

Cheng-Yao Hong ◽

Ming-Sui Lee ◽

Tyng-Luh Liu

Keyword(s):

Experimental Results ◽

Weighted Sum ◽

Class Label ◽

Action Classification ◽

Video Clips ◽

A New Technique ◽

Class Labels ◽

The Given ◽

Network Component ◽

Generalized Compatibility

We introduce a query-driven approach (qMIL) to multi-instance learning where the queries aim to uncover the class labels embodied in a given bag of instances. Specifically, it solves a multi-instance multi-label learning (MIML) problem with a more challenging setting than the conventional one. Each MIML bag in our formulation is annotated only with a binary label indicating whether the bag contains the instance of a certain class and the query is specified by the word2vec of a class label/name. To learn a deep-net model for qMIL, we construct a network component that achieves a generalized compatibility measure for query-visual co-embedding and yields proper instance attentions to the given query. The bag representation is then formed as the attention-weighted sum of the instances' weights, and passed to the classification layer at the end of the network. In addition, the qMIL formulation is flexible for extending the network to classify unseen class labels, leading to a new technique to solve the zero-shot MIML task through an iterative querying process. Experimental results on action classification over video clips and three MIML datasets from MNIST, CIFAR10 and Scene are provided to demonstrate the effectiveness of our method.

Download Full-text

A Survey on Imbalanced Data Handling Techniques for Classification

International Journal of Emerging Trends in Engineering Research ◽

10.30534/ijeter/2021/089102021 ◽

2021 ◽

Vol 9 (10) ◽

pp. 1341-1347

Keyword(s):

Real World ◽

Imbalanced Data ◽

Learning Task ◽

High Accuracy ◽

Data Handling ◽

Imbalanced Dataset ◽

Minority Class ◽

Class Labels ◽

Very High ◽

F Measure

Classification is a supervised learning task based on categorizing things in groups on the basis of class labels. Algorithms are trained with labeled datasets for accomplishing the task of classification. In the process of classification, datasets plays an important role. If in a dataset, instances of one label/class (majority class) are much more than instances of another label/class (minority class), such that it becomes hard to understand and learn characteristics of minority class for a classifier, such dataset is termed an imbalanced dataset. These types of datasets raise the problem of biased prediction or misclassification in the real world, as models based on such datasets may give very high accuracy during training, but as not familiar with minority class instances, would not be able to predict minority class and thus fails poorly. A survey on various techniques proposed by the researchers for handling imbalanced data has been presented and a comparison of the techniques based on f-measure has been identified and discussed.

Download Full-text

Impact of PDS Based kNN Classifiers on Kyoto Dataset

International Journal of Rough Sets and Data Analysis ◽

10.4018/ijrsda.2019040105 ◽

2019 ◽

Vol 6 (2) ◽

pp. 61-72

Author(s):

Kailasam Swathi ◽

Bobba Basaveswara Rao

Keyword(s):

Correlation Coefficient ◽

Intrusion Detection Systems ◽

Computational Time ◽

Network Intrusion Detection ◽

Class Label ◽

Detection Systems ◽

Network Intrusion ◽

Network Intrusion Detection Systems ◽

Significant Difference ◽

Class Labels

This article compares the performance of different Partial Distance Search-based (PDS) kNN classifiers on a benchmark Kyoto 2006+ dataset for Network Intrusion Detection Systems (NIDS). These PDS classifiers are named based on features indexing. They are: i) Simple PDS kNN, the features are not indexed (SPDS), ii) Variance indexing based kNN (VIPDS), the features are indexed by the variance of the features, and iii) Correlation coefficient indexing-based kNN (CIPDS), the features are indexed by the correlation coefficient of the features with a class label. For comparative study between these classifiers, the computational time and accuracy are considered performance measures. After the experimental study, it is observed that the CIPDS gives better performance in terms of computational time whereas VIPDS shows better accuracy, but not much significant difference when compared with CIPDS. The study suggests to adopt CIPDS when class labels were available without any ambiguity, otherwise it suggested the adoption of VIPDS.

Download Full-text

ML-EC2

International Journal of Web-Based Learning and Teaching Technologies ◽

10.4018/ijwltt.2020040102 ◽

2020 ◽

Vol 15 (2) ◽

pp. 19-33

Author(s):

Aakanksha Sharaff ◽

Naresh Kumar Nagwani

Keyword(s):

Text Classification ◽

Hybrid Algorithm ◽

Latent Dirichlet Allocation ◽

Text Clustering ◽

Mapping Technique ◽

Performance Parameters ◽

Class Label ◽

Single Class ◽

Cluster Label ◽

Email Classification

A multi-label variant of email classification named ML-EC2 (multi-label email classification using clustering) has been proposed in this work. ML-EC2 is a hybrid algorithm based on text clustering, text classification, frequent-term calculation (based on latent dirichlet allocation), and taxonomic term-mapping technique. It is an example of classification using text clustering technique. It studies the problem where each email cluster represents a single class label while it is associated with set of cluster labels. It is multi-label text-clustering-based classification algorithm in which an email cluster can be mapped to more than one email category when cluster label matches with more than one category term. The algorithm will be helpful when there is a vague idea of topic. The performance parameters Entropy and Davies-Bouldin Index are used to evaluate the designed algorithm.

Download Full-text

A loss minimization problem

Moscow University Computational Mathematics and Cybernetics ◽

10.3103/s0278641909020034 ◽

2009 ◽

Vol 33 (2) ◽

pp. 67-76

Author(s):

K. K. Osipenko

Keyword(s):

Minimization Problem ◽

Loss Minimization

Download Full-text

Leveraging Unlabeled Data for Classification

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch181 ◽

2011 ◽

pp. 1164-1169

Author(s):

Yinghui Yang ◽

Balaji Padmanabhan

Keyword(s):

Research Question ◽

Unlabeled Data ◽

Training Data ◽

Bank Loan ◽

Classification Model ◽

Classification Models ◽

Class Label ◽

Data Record ◽

Model Training ◽

Class Labels

Classification is a form of data analysis that can be used to extract models to predict categorical class labels (Han & Kamber, 2001). Data classification has proven to be very useful in a wide variety of applications. For example, a classification model can be built to categorize bank loan applications as either safe or risky. In order to build a classification model, training data containing multiple independent variables and a dependant variable (class label) is needed. If a data record has a known value for its class label, this data record is termed “labeled”. If the value for its class is unknown, it is “unlabeled”. There are situations with a large amount of unlabeled data and a small amount of labeled data. Using only labeled data to build classification models can potentially ignore useful information contained in the unlabeled data. Furthermore, unlabeled data can often be much cheaper and more plentiful than labeled data, and so if useful information can be extracted from it that reduces the need for labeled examples, this can be a significant benefit (Balcan & Blum 2005). The default practice is to use only the labeled data to build a classification model and then assign class labels to the unlabeled data. However, when the amount of labeled data is not enough, the classification model built only using the labeled data can be biased and far from accurate. The class labels assigned to the unlabeled data can then be inaccurate. How to leverage the information contained in the unlabeled data to help improve the accuracy of the classification model is an important research question. There are two streams of research that addresses the challenging issue of how to appropriately use unlabeled data for building classification models. The details are discussed below.

Download Full-text

Formulation of loss minimization problem using genetic algorithm and line-flow-based equations

2008 40th North American Power Symposium ◽

10.1109/naps.2008.5307358 ◽

2008 ◽

Cited By ~ 3

Author(s):

Sharanya Jaganathan ◽

Arun Sekar ◽

Wenzhong Gao

Keyword(s):

Genetic Algorithm ◽

Minimization Problem ◽

Loss Minimization ◽

Line Flow

Download Full-text