class labels
Recently Published Documents


TOTAL DOCUMENTS

322
(FIVE YEARS 190)

H-INDEX

15
(FIVE YEARS 6)

2022 ◽  
Author(s):  
Stephen Coleman ◽  
Xaquin Castro Dopico ◽  
Gunilla B Karlsson Hedestam ◽  
Paul DW Kirk ◽  
Chris Wallace

Systematic differences between batches of samples present significant challenges when analysing biological data. Such batch effects are well-studied and are liable to occur in any setting where multiple batches are assayed. Many existing methods for accounting for these have focused on high-dimensional data such as RNA-seq and have assumptions that reflect this. Here we focus on batch-correction in low-dimensional classification problems. We propose a semi-supervised Bayesian generative classifier based on mixture models that jointly predicts class labels and models batch effects. Our model allows observations to be probabilistically assigned to classes in a way that incorporates uncertainty arising from batch effects. We explore two choices for the within-class densities: the multivariate normal and the multivariate t. A simulation study demonstrates that our method performs well compared to popular off-the-shelf machine learning methods and is also quick; performing 15,000 iterations on a dataset of 500 samples with 2 measurements each in 7.3 seconds for the MVN mixture model and 11.9 seconds for the MVT mixture model. We apply our model to two datasets generated using the enzyme-linked immunosorbent assay (ELISA), a spectrophotometric assay often used to screen for antibodies. The examples we consider were collected in 2020 and measure seropositivity for SARS-CoV-2. We use our model to estimate seroprevalence in the populations studied. We implement the models in C++ using a Metropolis-within-Gibbs algorithm; this is available in the R package at https://github.com/stcolema/BatchMixtureModel. Scripts to recreate our analysis are at https://github.com/stcolema/BatchClassifierPaper.


Author(s):  
Arindom Ain

Abstract: Land use and land cover (LULC) provides a way to classify objects on the surface of Earth. This paper aims to identify the varying land cover classes by stacking of 6 spectral bands and 10 different generated indices from those bands together. We have considered the multispectral images of Landsat 7 for our research. It is seen that instead of using only basic spectral bands (blue, green, red, nir, swir1 and swir2) for classification, stacking relevant indices of multiple target classes like ndvi, evi, nbr, BU, etc. with basic bands generates more precise results. In this study, we have used automated clustering techniques for generating 5 different class labels for training the model. These labels are further used to develop a predictive model to classify LULC classes. The proposed classifier is compared with the SVM and KNN classifiers. The results show that this proposed strategy gives preferable outcomes over other techniques. After training the model over 50 epochs, an accuracy of 93.29% is achieved. Keywords: Land use, land cover, CNN, ISODATA, indices


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Li Zhang

Feature selection is the key step in the analysis of high-dimensional small sample data. The core of feature selection is to analyse and quantify the correlation between features and class labels and the redundancy between features. However, most of the existing feature selection algorithms only consider the classification contribution of individual features and ignore the influence of interfeature redundancy and correlation. Therefore, this paper proposes a feature selection algorithm for nonlinear dynamic conditional relevance (NDCRFS) through the study and analysis of the existing feature selection algorithm ideas and method. Firstly, redundancy and relevance between features and between features and class labels are discriminated by mutual information, conditional mutual information, and interactive mutual information. Secondly, the selected features and candidate features are dynamically weighted utilizing information gain factors. Finally, to evaluate the performance of this feature selection algorithm, NDCRFS was validated against 6 other feature selection algorithms on three classifiers, using 12 different data sets, for variability and classification metrics between the different algorithms. The experimental results show that the NDCRFS method can improve the quality of the feature subsets and obtain better classification results.


2021 ◽  
Author(s):  
Wei Liu ◽  
Xu Liao ◽  
Xiang Zhou ◽  
Xingjie Shi ◽  
Jin Liu

Dimension reduction and (spatial) clustering are two key steps for the analysis of both single-cell RNA-sequencing (scRNA-seq) and spatial transcriptomics data collected from different platforms. Most existing methods perform dimension reduction and (spatial) clustering sequentially, treating them as two consecutive stages in tandem analysis. However, the low-dimensional embeddings estimated in the dimension reduction step may not necessarily be relevant to the class labels inferred in the clustering step and thus may impair the performance of the clustering and other downstream analysis. Here, we develop a computation method, DR-SC, to perform both dimension reduction and (spatial) clustering jointly in a unified framework. Joint analysis in DR-SC ensures accurate (spatial) clustering results and effective extraction of biologically informative low-dimensional features. Importantly, DR-SC is not only applicable for cell type clustering in scRNA-seq studies but also applicable for spatial clustering in spatial transcriptimics that characterizes the spatial organization of the tissue by segregating it into multiple tissue structures. For spatial transcriptoimcs analysis, DR-SC relies on an underlying latent hidden Markov random field model to encourage the spatial smoothness of the detected spatial cluster boundaries. We also develop an efficient expectation-maximization algorithm based on an iterative conditional mode. DR-SC is not only scalable to large sample sizes, but is also capable of optimizing the spatial smoothness parameter in a data-driven manner. Comprehensive simulations show that DR-SC outperforms existing clustering methods such as Seurat and spatial clustering methods such as BayesSpace and SpaGCN and extracts more biologically relevant features compared to the conventional dimension reduction methods such as PCA and scVI. Using 16 benchmark scRNA-seq datasets, we demonstrate that the low-dimensional embeddings and class labels estimated from DR-SC lead to improved trajectory inference. In addition, analyzing three published scRNA-seq and spatial transcriptomics data in three platforms, we show DR-SC can improve both the spatial and non-spatial clustering performance, resolving a low-dimensional representation with improved visualization, and facilitate the downstream analysis such as trajectory inference.


2021 ◽  
Vol 12 (1) ◽  
pp. 148
Author(s):  
Francesca Lizzi ◽  
Camilla Scapicchio ◽  
Francesco Laruina ◽  
Alessandra Retico ◽  
Maria Evelina Fantacci

We propose and evaluate a procedure for the explainability of a breast density deep learning based classifier. A total of 1662 mammography exams labeled according to the BI-RADS categories of breast density was used. We built a residual Convolutional Neural Network, trained it and studied the responses of the model to input changes, such as different distributions of class labels in training and test sets and suitable image pre-processing. The aim was to identify the steps of the analysis with a relevant impact on the classifier performance and on the model explainability. We used the grad-CAM algorithm for CNN to produce saliency maps and computed the Spearman’s rank correlation between input images and saliency maps as a measure of explanation accuracy. We found that pre-processing is critical not only for accuracy, precision and recall of a model but also to have a reasonable explanation of the model itself. Our CNN reaches good performances compared to the state-of-art and it considers the dense pattern to make the classification. Saliency maps strongly correlate with the dense pattern. This work is a starting point towards the implementation of a standard framework to evaluate both CNN performances and the explainability of their predictions in medical image classification problems.


2021 ◽  
Vol 11 (24) ◽  
pp. 12145
Author(s):  
Jun Huang ◽  
Qian Xu ◽  
Xiwen Qu ◽  
Yaojin Lin ◽  
Xiao Zheng

In multi-label learning, each object is represented by a single instance and is associated with more than one class labels, where the labels might be correlated with each other. As we all know, exploiting label correlations can definitely improve the performance of a multi-label classification model. Existing methods mainly model label correlations in an indirect way, i.e., adding extra constraints on the coefficients or outputs of a model based on a pre-learned label correlation graph. Meanwhile, the high dimension of the feature space also poses great challenges to multi-label learning, such as high time and memory costs. To solve the above mentioned issues, in this paper, we propose a new approach for Multi-Label Learning by Correlation Embedding, namely MLLCE, where the feature space dimension reduction and the multi-label classification are integrated into a unified framework. Specifically, we project the original high-dimensional feature space to a low-dimensional latent space by a mapping matrix. To model label correlation, we learn an embedding matrix from the pre-defined label correlation graph by graph embedding. Then, we construct a multi-label classifier from the low-dimensional latent feature space to the label space, where the embedding matrix is utilized as the model coefficients. Finally, we extend the proposed method MLLCE to the nonlinear version, i.e., NL-MLLCE. The comparison experiment with the state-of-the-art approaches shows that the proposed method MLLCE has a competitive performance in multi-label learning.


2021 ◽  
Vol 11 (6) ◽  
pp. 7824-7835
Author(s):  
H. Alalawi ◽  
M. Alsuwat ◽  
H. Alhakami

The importance of classification algorithms has increased in recent years. Classification is a branch of supervised learning with the goal of predicting class labels categorical of new cases. Additionally, with Coronavirus (COVID-19) propagation since 2019, the world still faces a great challenge in defeating COVID-19 even with modern methods and technologies. This paper gives an overview of classification algorithms to provide the readers with an understanding of the concept of the state-of-the-art classification algorithms and their applications used in the COVID-19 diagnosis and detection. It also describes some of the research published on classification algorithms, the existing gaps in the research, and future research directions. This article encourages both academics and machine learning learners to further strengthen the basis of classification methods.


2021 ◽  
Vol 5 (12) ◽  
pp. 282
Author(s):  
Siu-Hei Cheung ◽  
V. Ashley Villar ◽  
Ho-Sang Chan ◽  
Shirley Ho

Abstract Using the second data release from the Zwicky Transient Facility (ZTF), Chen et al. created a ZTF Catalog of Periodic Variable Stars (ZTF CPVS) of 781,602 periodic variables stars (PVSs) with 11 class labels. Here, we provide a new classification model of PVSs in the ZTF CPVS using a convolutional variational autoencoder and hierarchical random forest. We cross-match the sky-coordinate of PVSs in the ZTF CPVS with those presented in the SIMBAD catalog. We identify non-stellar objects that are not previously classified, including extragalactic objects such as Quasi-Stellar Objects, Active Galactic Nuclei, supernovae and planetary nebulae. We then create a new labeled training set with 13 classes in two levels. We obtain a reasonable level of completeness (≳90%) for certain classes of PVSs, although we have poorer completeness in other classes (∼40% in some cases). Our new labels for the ZTF CPVS are available via Zenodo.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Olga Zolotareva ◽  
Reza Nasirigerdeh ◽  
Julian Matschinske ◽  
Reihaneh Torkzadehmahani ◽  
Mohammad Bakhtiari ◽  
...  

AbstractAggregating transcriptomics data across hospitals can increase sensitivity and robustness of differential expression analyses, yielding deeper clinical insights. As data exchange is often restricted by privacy legislation, meta-analyses are frequently employed to pool local results. However, the accuracy might drop if class labels are inhomogeneously distributed among cohorts. Flimma (https://exbio.wzw.tum.de/flimma/) addresses this issue by implementing the state-of-the-art workflow limma voom in a federated manner, i.e., patient data never leaves its source site. Flimma results are identical to those generated by limma voom on aggregated datasets even in imbalanced scenarios where meta-analysis approaches fail.


Sign in / Sign up

Export Citation Format

Share Document