partially labeled data Latest Research Papers

Machine learning applications often need large amounts of training data to perform well. Whereas unlabeled data can be easily gathered, the labeling process is difficult, time-consuming, or expensive in most applications. Active learning can help solve this problem by querying labels for those data points that will improve the performance the most. Thereby, the goal is that the learning algorithm performs sufficiently well with fewer labels. We provide a library called scikit-activeml that covers the most relevant query strategies and implements tools to work with partially labeled data. It is programmed in Python and builds on top of scikit-learn.

Download Full-text

Exploiting partially-labeled data in learning predictive clustering trees for multi-target regression: A case study of water quality assessment in Ireland

Ecological Informatics ◽

10.1016/j.ecoinf.2020.101161 ◽

2021 ◽

Vol 61 ◽

pp. 101161

Author(s):

Stevanche Nikoloski ◽

Dragi Kocev ◽

Jurica Levatić ◽

David P. Wall ◽

Sašo Džeroski

Keyword(s):

Water Quality ◽

Quality Assessment ◽

Water Quality Assessment ◽

Partially Labeled Data ◽

Predictive Clustering Trees

Download Full-text

Three-way decision with co-training for partially labeled data

Information Sciences ◽

10.1016/j.ins.2020.08.104 ◽

2021 ◽

Vol 544 ◽

pp. 500-518 ◽

Cited By ~ 1

Author(s):

Can Gao ◽

Jie Zhou ◽

Duoqian Miao ◽

Jiajun Wen ◽

Xiaodong Yue

Keyword(s):

Partially Labeled Data

Download Full-text

A novel method of constrained feature selection by the measurement of pairwise constraints uncertainty

Journal Of Big Data ◽

10.1186/s40537-020-00352-3 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Mehrdad Rostami ◽

Kamal Berahmand ◽

Saman Forouzandeh

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Rapid Growth ◽

Classification Accuracy ◽

High Speed ◽

Large Scale ◽

Learning Algorithm ◽

Target Class ◽

Partially Labeled Data ◽

Novel Method

Abstract In the past decades, the rapid growth of computer and database technologies has led to the rapid growth of large-scale datasets. On the other hand, data mining applications with high dimensional datasets that require high speed and accuracy are rapidly increasing. Semi-supervised learning is a class of machine learning in which unlabeled data and labeled data are used simultaneously to improve feature selection. The goal of feature selection over partially labeled data (semi-supervised feature selection) is to choose a subset of available features with the lowest redundancy with each other and the highest relevancy to the target class, which is the same objective as the feature selection over entirely labeled data. This method actually used the classification to reduce ambiguity in the range of values. First, the similarity values of each pair are collected, and then these values are divided into intervals, and the average of each interval is determined. In the next step, for each interval, the number of pairs in this range is counted. Finally, by using the strength and similarity matrices, a new constraint feature selection ranking is proposed. The performance of the presented method was compared to the performance of the state-of-the-art, and well-known semi-supervised feature selection approaches on eight datasets. The results indicate that the proposed approach improves previous related approaches with respect to the accuracy of the constrained score. In particular, the numerical results showed that the presented approach improved the classification accuracy by about 3% and reduced the number of selected features by 1%. Consequently, it can be said that the proposed method has reduced the computational complexity of the machine learning algorithm despite increasing the classification accuracy.

Download Full-text

A Novel Method of Constrained Feature Selection by the Measurement of Pairwise Constraints Uncertainty

10.21203/rs.3.rs-31751/v2 ◽

2020 ◽

Author(s):

Kamal Berahmand ◽

Mehrdad Rostami ◽

Saman Forouzandeh

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Rapid Growth ◽

Classification Accuracy ◽

High Speed ◽

Large Scale ◽

Learning Algorithm ◽

Target Class ◽

Partially Labeled Data ◽

Novel Method

Abstract In the past decades, the rapid growth of computer and database technologies has led to the rapid growth of large-scale datasets. On the other hand, data mining applications with high dimensional datasets that require high speed and accuracy are rapidly increasing. Semi-supervised learning is a class of machine learning in which unlabeled data and labeled data are used simultaneously to improve feature selection. The goal of feature selection over partially labeled data (semi-supervised feature selection) is to choose a subset of available features with the lowest redundancy with each other and the highest relevancy to the target class, which is the same objective as the feature selection over entirely labeled data. This method actually used the classification to reduce ambiguity in the range of values. First, the similarity values of each pair are collected, and then these values are divided into intervals, and the average of each interval is determined. In the next step, for each interval, the number of pairs in this range is counted. Finally, by using the strength and similarity matrices, a new constraint feature selection ranking is proposed. The performance of the presented method was compared to the performance of the state-of-the-art, and well-known semi-supervised feature selection approaches on eight datasets. The results indicate that the proposed approach improves previous related approaches with respect to the accuracy of the constrained score. In particular, the numerical results showed that the presented approach improved the classification accuracy by about 3% and reduced the number of selected features by 1%. Consequently, it can be said that the proposed method has reduced the computational complexity of the machine learning algorithm despite increasing the classification accuracy.

Download Full-text

A Novel Method of Constrained Feature Selection by the Measurement of Pairwise Constraints Uncertainty

10.21203/rs.3.rs-31751/v1 ◽

2020 ◽

Author(s):

Kamal Berahmand ◽

Mehrdad Rostami ◽

Saman Forouzandeh

Keyword(s):

Feature Selection ◽

Classification Accuracy ◽

Learning Algorithm ◽

Side Information ◽

Computational Cost ◽

Cost Savings ◽

Learning Task ◽

Pairwise Constraint ◽

Target Class ◽

Partially Labeled Data

Abstract In recent years, with the development of science and technology, there were considerable advances in datasets in various sciences, and many features are also shown for these datasets nowadays. With a high-dimensional dataset, many features are generally redundant and/or irrelevant for a provided learning task, which has adverse effects with regard to computational cost and/or performance. The goal of feature selection over partially labeled data (semi-supervised feature selection) is to choose a subset of available features with the lowest redundancy with each other and the highest relevancy to the target class, which is the same objective as the feature selection over entirely labeled data. By appropriate reduction of the dimensions, in addition to time-cost savings, performance increases as well. In this paper, side information such as pairwise constraint is used to rank and reduce the dimensions. In the proposed method, the authors deal with checking the quality (strength or uncertainty) of the pairwise constraint. Usually, the quality of the pair of constraints on the dimension reduction is not calculated. In the first step, the strength matrix is created through a similarity matrix and uncertainty region. And then, by using the strength and similarity matrices, a new constraint feature selection ranking is proposed. The performance of the presented method was compared to the performance of the state-of-the-art, and well-known semi-supervised feature selection approaches on eight datasets. The findings indicate that the proposed approach improves previous related approaches with respect to the accuracy of constrained clustering. In particular, the numerical results showed that the presented approach improved the classification accuracy by about 3% and reduced the number of selected features by 1%. Consequently, it can be said that the proposed method has reduced the computational complexity of the machine learning algorithm despite increasing the classification accuracy.

Download Full-text

Thread Structure Learning on Online Health Forums With Partially Labeled Data

IEEE Transactions on Computational Social Systems ◽

10.1109/tcss.2019.2946498 ◽

2019 ◽

Vol 6 (6) ◽

pp. 1273-1282 ◽

Cited By ~ 1

Author(s):

Yunzhong Liu ◽

Jinhe Shi ◽

Yi Chen

Keyword(s):

Structure Learning ◽

Partially Labeled Data ◽

Thread Structure

Download Full-text

Data Stream Classification by Dynamic Incremental Semi-Supervised Fuzzy Clustering

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213019600091 ◽

2019 ◽

Vol 28 (08) ◽

pp. 1960009 ◽

Cited By ~ 9

Author(s):

Gabriella Casalino ◽

Giovanna Castellano ◽

Corrado Mencar

Keyword(s):

Fuzzy Clustering ◽

Data Streams ◽

Data Stream ◽

Clustering Algorithm ◽

Classification Model ◽

Real World Data ◽

Stream Classification ◽

Data Stream Classification ◽

Partially Labeled Data ◽

Classification Quality

A data stream classification method called DISSFCM (Dynamic Incremental Semi-Supervised FCM) is presented, which is based on an incremental semi-supervised fuzzy clustering algorithm. The method assumes that partially labeled data belonging to different classes are continuously available during time in form of chunks. Each chunk is processed by semi-supervised fuzzy clustering leading to a cluster-based classification model. The proposed DISSFCM is capable of dynamically adapting the number of clusters to data streams, by splitting low-quality clusters so as to improve classification quality. Experimental results on both synthetic and real-world data show the effectiveness of the proposed method in data stream classification.

Download Full-text

partially labeled data
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Semi-Supervised Audio Classification with Partially Labeled Data

Granular-conditional-entropy-based attribute reduction for partially labeled data with proxy labels

scikit-activeml: A Library and Toolbox for Active Learning Algorithms

Exploiting partially-labeled data in learning predictive clustering trees for multi-target regression: A case study of water quality assessment in Ireland

Three-way decision with co-training for partially labeled data

A novel method of constrained feature selection by the measurement of pairwise constraints uncertainty

A Novel Method of Constrained Feature Selection by the Measurement of Pairwise Constraints Uncertainty

A Novel Method of Constrained Feature Selection by the Measurement of Pairwise Constraints Uncertainty

Thread Structure Learning on Online Health Forums With Partially Labeled Data

Data Stream Classification by Dynamic Incremental Semi-Supervised Fuzzy Clustering

Export Citation Format

partially labeled dataRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Semi-Supervised Audio Classification with Partially Labeled Data

Granular-conditional-entropy-based attribute reduction for partially labeled data with proxy labels

scikit-activeml: A Library and Toolbox for Active Learning Algorithms

Exploiting partially-labeled data in learning predictive clustering trees for multi-target regression: A case study of water quality assessment in Ireland

Three-way decision with co-training for partially labeled data

A novel method of constrained feature selection by the measurement of pairwise constraints uncertainty

A Novel Method of Constrained Feature Selection by the Measurement of Pairwise Constraints Uncertainty

A Novel Method of Constrained Feature Selection by the Measurement of Pairwise Constraints Uncertainty

Thread Structure Learning on Online Health Forums With Partially Labeled Data

Data Stream Classification by Dynamic Incremental Semi-Supervised Fuzzy Clustering

partially labeled data
Recently Published Documents