missing labels
Recently Published Documents


TOTAL DOCUMENTS

59
(FIVE YEARS 33)

H-INDEX

13
(FIVE YEARS 4)

2022 ◽  
Vol 1 ◽  
Author(s):  
Mickael Tardy ◽  
Diana Mateus

In breast cancer screening, binary classification of mammograms is a common task aiming to determine whether a case is malignant or benign. A Computer-Aided Diagnosis (CADx) system based on a trainable classifier requires clean data and labels coming from a confirmed diagnosis. Unfortunately, such labels are not easy to obtain in clinical practice, since the histopathological reports of biopsy may not be available alongside mammograms, while normal cases may not have an explicit follow-up confirmation. Such ambiguities result either in reducing the number of samples eligible for training or in a label uncertainty that may decrease the performances. In this work, we maximize the number of samples for training relying on multi-task learning. We design a deep-neural-network-based classifier yielding multiple outputs in one forward pass. The predicted classes include binary malignancy, cancer probability estimation, breast density, and image laterality. Since few samples have all classes available and confirmed, we propose to introduce the uncertainty related to the classes as a per-sample weight during training. Such weighting prevents updating the network's parameters when training on uncertain or missing labels. We evaluate our approach on the public INBreast and private datasets, showing statistically significant improvements compared to baseline and independent state-of-the-art approaches. Moreover, we use mammograms from Susan G. Komen Tissue Bank for fine-tuning, further demonstrating the ability to improve the performances in our multi-task learning setup from raw clinical data. We achieved the binary classification performance of AUC = 80.46 on our private dataset and AUC = 85.23 on the INBreast dataset.


2021 ◽  
Vol 2090 (1) ◽  
pp. 012170
Author(s):  
Jr Cristovão Iglesias ◽  
Varun Mehta ◽  
Alina Venereo-Sanchez ◽  
Xingge Xu ◽  
Julien Robitaille ◽  
...  

Abstract Training Deep Learning (DL) models with missing labels is a challenge in diverse engineering applications. Missing value imputation methods have been proposed to try to address this problem, but their performance is affected with Massive Proportion of Missing Labels (MPML). This paper presents a approach for handling MPML in Multivariate Long-Term Time Series Forecasting. It is an two-step process where interpolation (using Gaussian Processes Regression (GPR) and domain knowledge from experts) and prediction model are separated to enable the integration of prior domain knowledge. First, a set of samples of the possible interpolation of the missing outputs are generated by the GPR based on the domain knowledge. Second, the observed input sensor data and interpolated labels from GPR are used to train the prediction model. We evaluated our approach with the development of a soft-sensor with one real datasets to forecast the biomass during recombinant adeno-associated virus (rAAV) production in bioreactors. Our experimental results demonstrate the potential of the approach through quantitative evaluation of the generated forecasts in a case that would be extremely difficult to train a DL model due to MPML.


2021 ◽  
Author(s):  
Shengyuan Liu ◽  
Haobo Wang ◽  
Tianlei Hu ◽  
Ke Chen
Keyword(s):  

2021 ◽  
Author(s):  
Hyebin Song ◽  
Garvesh Raskutti ◽  
Rebecca Willett
Keyword(s):  

Author(s):  
Mohammadreza Qaraei ◽  
Erik Schultheis ◽  
Priyanshu Gupta ◽  
Rohit Babbar

Author(s):  
Jun Huang ◽  
Linchuan Xu ◽  
Kun Qian ◽  
Jing Wang ◽  
Kenji Yamanishi

AbstractMulti-label learning deals with data examples which are associated with multiple class labels simultaneously. Despite the success of existing approaches to multi-label learning, there is still a problem neglected by researchers, i.e., not only are some of the values of observed labels missing, but also some of the labels are completely unobserved for the training data. We refer to the problem as multi-label learning with missing and completely unobserved labels, and argue that it is necessary to discover these completely unobserved labels in order to mine useful knowledge and make a deeper understanding of what is behind the data. In this paper, we propose a new approach named MCUL to solve multi-label learning with Missing and Completely Unobserved Labels. We try to discover the unobserved labels of a multi-label data set with a clustering based regularization term and describe the semantic meanings of them based on the label-specific features learned by MCUL, and overcome the problem of missing labels by exploiting label correlations. The proposed method MCUL can predict both the observed and newly discovered labels simultaneously for unseen data examples. Experimental results validated over ten benchmark datasets demonstrate that the proposed method can outperform other state-of-the-art approaches on observed labels and obtain an acceptable performance on the new discovered labels as well.


Sign in / Sign up

Export Citation Format

Share Document