cluster label
Recently Published Documents


TOTAL DOCUMENTS

30
(FIVE YEARS 11)

H-INDEX

5
(FIVE YEARS 1)

Symmetry ◽  
2021 ◽  
Vol 13 (11) ◽  
pp. 2060
Author(s):  
Xiaofeng Zhao ◽  
Wei Zhao ◽  
Mingao Yuan

In network data mining, community detection refers to the problem of partitioning the nodes of a network into clusters (communities). This is equivalent to identifying the cluster label of each node. A label estimator is said to be an exact recovery of the true labels (communities) if it coincides with the true labels with a probability convergent to one. In this work, we consider the effect of label information on the exact recovery of communities in an m-uniform Hypergraph Stochastic Block Model (HSBM). We investigate two scenarios of label information: (1) a noisy label for each node is observed independently, with 1−αn as the probability that the noisy label will match the true label; (2) the true label of each node is observed independently, with the probability of 1−αn. We derive sharp boundaries for exact recovery under both scenarios from an information-theoretical point of view. The label information improves the sharp detection boundary if and only if αn=n−β+o(1) for a constant β>0.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Harsh Patel ◽  
David M. Vock ◽  
G. Elisabeta Marai ◽  
Clifton D. Fuller ◽  
Abdallah S. R. Mohamed ◽  
...  

AbstractTo improve risk prediction for oropharyngeal cancer (OPC) patients using cluster analysis on the radiomic features extracted from pre-treatment Computed Tomography (CT) scans. 553 OPC Patients randomly split into training (80%) and validation (20%), were classified into 2 or 3 risk groups by applying hierarchical clustering over the co-occurrence matrix obtained from a random survival forest (RSF) trained over 301 radiomic features. The cluster label was included together with other clinical data to train an ensemble model using five predictive models (Cox, random forest, RSF, logistic regression, and logistic-elastic net). Ensemble performance was evaluated over the independent test set for both recurrence free survival (RFS) and overall survival (OS). The Kaplan–Meier curves for OS stratified by cluster label show significant differences for both training and testing (p val < 0.0001). When compared to the models trained using clinical data only, the inclusion of the cluster label improves AUC test performance from .62 to .79 and from .66 to .80 for OS and RFS, respectively. The extraction of a single feature, namely a cluster label, to represent the high-dimensional radiomic feature space reduces the dimensionality and sparsity of the data. Moreover, inclusion of the cluster label improves model performance compared to clinical data only and offers comparable performance to the models including raw radiomic features.


2021 ◽  
Vol 10 (6) ◽  
pp. 386
Author(s):  
Jennie Gray ◽  
Lisa Buckner ◽  
Alexis Comber

This paper reviews geodemographic classifications and developments in contemporary classifications. It develops a critique of current approaches and identifiea a number of key limitations. These include the problems associated with the geodemographic cluster label (few cluster members are typical or have the same properties as the cluster centre) and the failure of the static label to describe anything about the underlying neighbourhood processes and dynamics. To address these limitations, this paper proposed a data primitives approach. Data primitives are the fundamental dimensions or measurements that capture the processes of interest. They can be used to describe the current state of an area in a multivariate feature space, and states can be compared over multiple time periods for which data are available, through for example a change vector approach. In this way, emergent social processes, which may be too weak to result in a change in a cluster label, but are nonetheless important signals, can be captured. As states are updated (for example, as new data become available), inferences about different social processes can be made, as well as classification updates if required. State changes can also be used to determine neighbourhood trajectories and to predict or infer future states. A list of data primitives was suggested from a review of the mechanisms driving a number of neighbourhood-level social processes, with the aim of improving the wider understanding of the interaction of complex neighbourhood processes and their effects. A small case study was provided to illustrate the approach. In this way, the methods outlined in this paper suggest a more nuanced approach to geodemographic research, away from a focus on classifications and static data, towards approaches that capture the social dynamics experienced by neighbourhoods.


Author(s):  
Sheung Wai Chan ◽  
Yiu-Ming Cheung

The existing image retrieval methods generally require at least one complete image as a query sample. From the practical point of view, a user may not have an image sample in hand for query. Instead, partial information from multiple image samples would be available. This paper therefore attempts to deal with this problem by presenting a novel framework that allows a user to make an image query composed of several partial information extracted from multiple image samples via Boolean operations (i.e. AND, OR and NOT). Based on the request from the query, a Descriptor Cluster Label Table (DCLT) is designed to efficiently find out the result of Boolean operations on partial information. Experiments show the promising result of the proposed framework on commodity query and criminal investigation, respectively, although it is essentially applicable to different scenarios as well by changing descriptors.


2021 ◽  
Author(s):  
Harsh Patel ◽  
David Vock ◽  
Elisabeta Marai ◽  
Clifton Fuller ◽  
Abdallah Mohamed ◽  
...  

Abstract OBJECTIVE: To improve risk prediction for oropharyngeal cancer (OPC) patients using cluster analysis on the radiomic features extracted from pre-treatment Computed Tomography (CT) scans.MATERIALS AND METHODS: OPC Patients were classified into 2 or 3 risk groups by applying hierarchical clustering over the co-occurrence matrix obtained from a random survival forest (RSF) trained over 301 radiomic features. The cluster label was included together with other clinical data to train an ensemble model using five predictive models (Cox, random forest, RSF, logistic regression, and logistic-elastic net). Ensemble performance was evaluated over an independent test set for both recurrence free survival (RFS) and overall survival (OS). RESULTS: The Kaplan-Meier curves for OS stratified by cluster label show significant differences for both training (p-val<0.0001) and testing (p-val=0.005). Inclusion of the cluster label outperforms clinical data only improving AUC from .60 to .76 and from .63 to .75 for OS and RFS, respectively. CONCLUSION: The extraction of a single feature, namely a cluster label, to represent the high-dimensional radiomic feature space reduces the dimensionality and sparsity of the data. Moreover, inclusion of the cluster label improves model performance compared to clinical data only and compares to the raw radiomic features performance.


2020 ◽  
Vol 13 (3) ◽  
pp. 531-535
Author(s):  
Vijayasherly Velayutham ◽  
Srimathi Chandrasekaran

Aim: To develop a prediction model grounded on Machine Learning using Support Vector Machine (SVM). Background: Prediction of workload in a Cloud Environment is one of the primary task in provisioning resources. Forecasting the requirements of future workload lies in the competency of predicting technique which could maximize the usage of resources in a cloud computing environment. Objective: To reduce the training time of SVM model. Methods: K-Means clustering is applied on the training dataset to form ‘n’ clusters firstly. Then, for every tuple in the cluster, the tuple’s class label is compared with the tuple’s cluster label. If the two labels are identical then the tuple is rightly classified and such a tuple would not contribute much during the SVM training process that formulates the separating hyperplane with lowest generalization error. Otherwise the tuple is added to the reduced training dataset. This selective addition of tuples to train SVM is carried for all clusters. The support vectors are a few among the samples in reduced training dataset that determines the optimal separating hyperplane. Results: On Google Cluster Trace dataset, the proposed model incurred a reduction in the training time, Root Mean Square Error and a marginal increase in the R2 Score than the traditional SVM. The model has also been tested on Los Alamos National Laboratory’s Mustang and Trinity cluster traces. Conclusion: The Cloudsim’s CPU utilization (VM and Cloudlet utilization) was measured and it was found to increase upon running the same set of tasks through our proposed model.


Author(s):  
Yuchen Yang ◽  
Gang Li ◽  
Huijun Qian ◽  
Kirk C Wilhelmsen ◽  
Yin Shen ◽  
...  

Abstract Batch effect correction has been recognized to be indispensable when integrating single-cell RNA sequencing (scRNA-seq) data from multiple batches. State-of-the-art methods ignore single-cell cluster label information, but such information can improve the effectiveness of batch effect correction, particularly under realistic scenarios where biological differences are not orthogonal to batch effects. To address this issue, we propose SMNN for batch effect correction of scRNA-seq data via supervised mutual nearest neighbor detection. Our extensive evaluations in simulated and real datasets show that SMNN provides improved merging within the corresponding cell types across batches, leading to reduced differentiation across batches over MNN, Seurat v3 and LIGER. Furthermore, SMNN retains more cell-type-specific features, partially manifested by differentially expressed genes identified between cell types after SMNN correction being biologically more relevant, with precision improving by up to 841.0%.


2020 ◽  
Vol 34 (07) ◽  
pp. 13114-13121 ◽  
Author(s):  
Zhihui Zhu ◽  
Xinyang Jiang ◽  
Feng Zheng ◽  
Xiaowei Guo ◽  
Feiyue Huang ◽  
...  

Although great progress in supervised person re-identification (Re-ID) has been made recently, due to the viewpoint variation of a person, Re-ID remains a massive visual challenge. Most existing viewpoint-based person Re-ID methods project images from each viewpoint into separated and unrelated sub-feature spaces. They only model the identity-level distribution inside an individual viewpoint but ignore the underlying relationship between different viewpoints. To address this problem, we propose a novel approach, called Viewpoint-Aware Loss with Angular Regularization (VA-reID). Instead of one subspace for each viewpoint, our method projects the feature from different viewpoints into a unified hypersphere and effectively models the feature distribution on both the identity-level and the viewpoint-level. In addition, rather than modeling different viewpoints as hard labels used for conventional viewpoint classification, we introduce viewpoint-aware adaptive label smoothing regularization (VALSR) that assigns the adaptive soft label to feature representation. VALSR can effectively solve the ambiguity of the viewpoint cluster label assignment. Extensive experiments on the Market1501 and DukeMTMC-reID datasets demonstrated that our method outperforms the state-of-the-art supervised Re-ID methods.


Author(s):  
Aakanksha Sharaff ◽  
Naresh Kumar Nagwani

A multi-label variant of email classification named ML-EC2 (multi-label email classification using clustering) has been proposed in this work. ML-EC2 is a hybrid algorithm based on text clustering, text classification, frequent-term calculation (based on latent dirichlet allocation), and taxonomic term-mapping technique. It is an example of classification using text clustering technique. It studies the problem where each email cluster represents a single class label while it is associated with set of cluster labels. It is multi-label text-clustering-based classification algorithm in which an email cluster can be mapped to more than one email category when cluster label matches with more than one category term. The algorithm will be helpful when there is a vague idea of topic. The performance parameters Entropy and Davies-Bouldin Index are used to evaluate the designed algorithm.


Sign in / Sign up

Export Citation Format

Share Document