scholarly journals Partial Multi-Label Learning with Noisy Label Identification

2020 ◽  
Vol 34 (04) ◽  
pp. 6454-6461 ◽  
Author(s):  
Ming-Kun Xie ◽  
Sheng-Jun Huang

Partial multi-label learning (PML) deals with problems where each instance is assigned with a candidate label set, which contains multiple relevant labels and some noisy labels. Recent studies usually solve PML problems with the disambiguation strategy, which recovers ground-truth labels from the candidate label set by simply assuming that the noisy labels are generated randomly. In real applications, however, noisy labels are usually caused by some ambiguous contents of the example. Based on this observation, we propose a partial multi-label learning approach to simultaneously recover the ground-truth information and identify the noisy labels. The two objectives are formalized in a unified framework with trace norm and ℓ1 norm regularizers. Under the supervision of the observed noise-corrupted label matrix, the multi-label classifier and noisy label identifier are jointly optimized by incorporating the label correlation exploitation and feature-induced noise model. Extensive experiments on synthetic as well as real-world data sets validate the effectiveness of the proposed approach.

Author(s):  
Nan Cao ◽  
Teng Zhang ◽  
Hai Jin

Partial multi-label learning deals with the circumstance in which the ground-truth labels are not directly available but hidden in a candidate label set. Due to the presence of other irrelevant labels, vanilla multi-label learning methods are prone to be misled and fail to generalize well on unseen data, thus how to enable them to get rid of the noisy labels turns to be the core problem of partial multi-label learning. In this paper, we propose the Partial Multi-Label Optimal margin Distribution Machine (PML-ODM), which distinguishs the noisy labels through explicitly optimizing the distribution of ranking margin, and exhibits better generalization performance than minimum margin based counterparts. In addition, we propose a novel feature prototype representation to further enhance the disambiguation ability, and the non-linear kernels can also be applied to promote the generalization performance for linearly inseparable data. Extensive experiments on real-world data sets validates the superiority of our proposed method.


Author(s):  
Jinpeng Chen ◽  
Yu Liu ◽  
Deyi Li

The recommender systems community is paying great attention to diversity as key qualities beyond accuracy in real recommendation scenarios. Multifarious diversity-increasing approaches have been developed to enhance recommendation diversity in the related literature while making personalized recommendations to users. In this work, we present Gaussian Cloud Recommendation Algorithm (GCRA), a novel method designed to balance accuracy and diversity personalized top-N recommendation lists in order to capture the user's complete spectrum of tastes. Our proposed algorithm does not require semantic information. Meanwhile we propose a unified framework to extend the traditional CF algorithms via utilizing GCRA for improving the recommendation system performance. Our work builds upon prior research on recommender systems. Though being detrimental to average accuracy, we show that our method can capture the user's complete spectrum of interests. Systematic experiments on three real-world data sets have demonstrated the effectiveness of our proposed approach in learning both accuracy and diversity.


Author(s):  
Xuan Wu ◽  
Qing-Guo Chen ◽  
Yao Hu ◽  
Dengbao Wang ◽  
Xiaodong Chang ◽  
...  

Multi-view multi-label learning serves an important framework to learn from objects with diverse representations and rich semantics. Existing multi-view multi-label learning techniques focus on exploiting shared subspace for fusing multi-view representations, where helpful view-specific information for discriminative modeling is usually ignored. In this paper, a novel multi-view multi-label learning approach named SIMM is proposed which leverages shared subspace exploitation and view-specific information extraction. For shared subspace exploitation, SIMM jointly minimizes confusion adversarial loss and multi-label loss to utilize shared information from all views. For view-specific information extraction, SIMM enforces an orthogonal constraint w.r.t. the shared subspace to utilize view-specific discriminative information. Extensive experiments on real-world data sets clearly show the favorable performance of SIMM against other state-of-the-art multi-view multi-label learning approaches.


2019 ◽  
Vol 13 (1) ◽  
pp. 120-126
Author(s):  
K. Bhavanishankar ◽  
M. V. Sudhamani

Objective: Lung cancer is proving to be one of the deadliest diseases that is haunting mankind in recent years. Timely detection of the lung nodules would surely enhance the survival rate. This paper focusses on the classification of candidate lung nodules into nodules/non-nodules in a CT scan of the patient. A deep learning approach –autoencoder is used for the classification. Investigation/Methodology: Candidate lung nodule patches obtained as the results of the lung segmentation are considered as input to the autoencoder model. The ground truth data from the LIDC repository is prepared and is submitted to the autoencoder training module. After a series of experiments, it is decided to use 4-stacked autoencoder. The model is trained for over 600 LIDC cases and the trained module is tested for remaining data sets. Results: The results of the classification are evaluated with respect to performance measures such as sensitivity, specificity, and accuracy. The results obtained are also compared with other related works and the proposed approach was found to be better by 6.2% with respect to accuracy. Conclusion: In this paper, a deep learning approach –autoencoder has been used for the classification of candidate lung nodules into nodules/non-nodules. The performance of the proposed approach was evaluated with respect to sensitivity, specificity, and accuracy and the obtained values are 82.6%, 91.3%, and 87.0%, respectively. This result is then compared with existing related works and an improvement of 6.2% with respect to accuracy has been observed.


2019 ◽  
Vol 38 (12) ◽  
pp. 934-942 ◽  
Author(s):  
Xing Zhao ◽  
Ping Lu ◽  
Yanyan Zhang ◽  
Jianxiong Chen ◽  
Xiaoyang Li

Noise attenuation for ordinary images using machine learning technology has achieved great success in the computer vision field. However, directly applying these models to seismic data would not be effective since the evaluation criteria from the geophysical domain require a high-quality visualized image and the ability to maintain original seismic signals from the contaminated wavelets. This paper introduces an approach equipped with a specially designed deep learning model that can effectively attenuate swell noise with different intensities and characteristics from shot gathers with a relatively simple workflow applicable to marine seismic data sets. Three significant benefits are introduced from the proposed deep learning model. First, our deep learning model doesn't need to consume a pure swell-noise model. Instead, a contaminated swell-noise model derived from field data sets (which may contain other noises or primary signals) can be used for training. Second, inspired by the conventional algorithm for coherent noise attenuation, our neural network model is designed to learn and detect the swell noise rather than inferring the attenuated seismic data. Third, several comparisons (signal-to-noise ratio, mean squared error, and intensities of residual swell noises) indicate that the deep learning approach has the capability to remove swell noise without harming the primary signals. The proposed deep learning-based approach can be considered as an alternative approach that combines and takes advantage of both the conventional and data-driven method to better serve swell-noise attenuation. The comparable results also indicate that the deep learning method has strong potential to solve other coherent noise-attenuation tasks for seismic data.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Audrey Hulot ◽  
Denis Laloë ◽  
Florence Jaffrézic

Abstract Background Integrating data from different sources is a recurring question in computational biology. Much effort has been devoted to the integration of data sets of the same type, typically multiple numerical data tables. However, data types are generally heterogeneous: it is a common place to gather data in the form of trees, networks or factorial maps, as these representations all have an appealing visual interpretation that helps to study grouping patterns and interactions between entities. The question we aim to answer in this paper is that of the integration of such representations. Results To this end, we provide a simple procedure to compare data with various types, in particular trees or networks, that relies essentially on two steps: the first step projects the representations into a common coordinate system; the second step then uses a multi-table integration approach to compare the projected data. We rely on efficient and well-known methodologies for each step: the projection step is achieved by retrieving a distance matrix for each representation form and then applying multidimensional scaling to provide a new set of coordinates from all the pairwise distances. The integration step is then achieved by applying a multiple factor analysis to the multiple tables of the new coordinates. This procedure provides tools to integrate and compare data available, for instance, as tree or network structures. Our approach is complementary to kernel methods, traditionally used to answer the same question. Conclusion Our approach is evaluated on simulation and used to analyze two real-world data sets: first, we compare several clusterings for different cell-types obtained from a transcriptomics single-cell data set in mouse embryos; second, we use our procedure to aggregate a multi-table data set from the TCGA breast cancer database, in order to compare several protein networks inferred for different breast cancer subtypes.


2007 ◽  
Vol 19 (12) ◽  
pp. 3369-3391 ◽  
Author(s):  
Kiruthika Ramanathan ◽  
Sheng-Uei Guan

This letter proposes to use multiorder neurons for clustering irregularly shaped data arrangements. Multiorder neurons are an evolutionary extension of the use of higher-order neurons in clustering. Higher-order neurons parametrically model complex neuron shapes by replacing the classic synaptic weight by higher-order tensors. The multiorder neuron goes one step further and eliminates two problems associated with higher-order neurons. First, it uses evolutionary algorithms to select the best neuron order for a given problem. Second, it obtains more information about the underlying data distribution by identifying the correct order for a given cluster of patterns. Empirically we observed that when the correlation of clusters found with ground truth information is used in measuring clustering accuracy, the proposed evolutionary multiorder neurons method can be shown to outperform other related clustering methods. The simulation results from the Iris, Wine, and Glass data sets show significant improvement when compared to the results obtained using self-organizing maps and higher-order neurons. The letter also proposes an intuitive model by which multiorder neurons can be grown, thereby determining the number of clusters in data.


Author(s):  
Abramo Agosti ◽  
Enea Shaqiri ◽  
Matteo Paoletti ◽  
Francesca Solazzo ◽  
Niels Bergsland ◽  
...  

Abstract Objective In this study we address the automatic segmentation of selected muscles of the thigh and leg through a supervised deep learning approach. Material and methods The application of quantitative imaging in neuromuscular diseases requires the availability of regions of interest (ROI) drawn on muscles to extract quantitative parameters. Up to now, manual drawing of ROIs has been considered the gold standard in clinical studies, with no clear and universally accepted standardized procedure for segmentation. Several automatic methods, based mainly on machine learning and deep learning algorithms, have recently been proposed to discriminate between skeletal muscle, bone, subcutaneous and intermuscular adipose tissue. We develop a supervised deep learning approach based on a unified framework for ROI segmentation. Results The proposed network generates segmentation maps with high accuracy, consisting in Dice Scores ranging from 0.89 to 0.95, with respect to “ground truth” manually segmented labelled images, also showing high average performance in both mild and severe cases of disease involvement (i.e. entity of fatty replacement). Discussion The presented results are promising and potentially translatable to different skeletal muscle groups and other MRI sequences with different contrast and resolution.


Author(s):  
K Sobha Rani

Collaborative filtering suffers from the problems of data sparsity and cold start, which dramatically degrade recommendation performance. To help resolve these issues, we propose TrustSVD, a trust-based matrix factorization technique. By analyzing the social trust data from four real-world data sets, we conclude that not only the explicit but also the implicit influence of both ratings and trust should be taken into consideration in a recommendation model. Hence, we build on top of a state-of-the-art recommendation algorithm SVD++ which inherently involves the explicit and implicit influence of rated items, by further incorporating both the explicit and implicit influence of trusted users on the prediction of items for an active user. To our knowledge, the work reported is the first to extend SVD++ with social trust information. Experimental results on the four data sets demonstrate that our approach TrustSVD achieves better accuracy than other ten counterparts, and can better handle the concerned issues.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Nick Le Large ◽  
Frank Bieder ◽  
Martin Lauer

Abstract For the application of an automated, driverless race car, we aim to assure high map and localization quality for successful driving on previously unknown, narrow race tracks. To achieve this goal, it is essential to choose an algorithm that fulfills the requirements in terms of accuracy, computational resources and run time. We propose both a filter-based and a smoothing-based Simultaneous Localization and Mapping (SLAM) algorithm and evaluate them using real-world data collected by a Formula Student Driverless race car. The accuracy is measured by comparing the SLAM-generated map to a ground truth map which was acquired using high-precision Differential GPS (DGPS) measurements. The results of the evaluation show that both algorithms meet required time constraints thanks to a parallelized architecture, with GraphSLAM draining the computational resources much faster than Extended Kalman Filter (EKF) SLAM. However, the analysis of the maps generated by the algorithms shows that GraphSLAM outperforms EKF SLAM in terms of accuracy.


Sign in / Sign up

Export Citation Format

Share Document