scholarly journals Statistical significance of cluster membership for unsupervised evaluation of cell identities

2020 ◽  
Vol 36 (10) ◽  
pp. 3107-3114 ◽  
Author(s):  
Neo Christopher Chung

Abstract Motivation Single-cell RNA-sequencing (scRNA-seq) allows us to dissect transcriptional heterogeneity arising from cellular types, spatio-temporal contexts and environmental stimuli. Transcriptional heterogeneity may reflect phenotypes and molecular signatures that are often unmeasured or unknown a priori. Cell identities of samples derived from heterogeneous subpopulations are then determined by clustering of scRNA-seq data. These cell identities are used in downstream analyses. How can we examine if cell identities are accurately inferred? Unlike external measurements or labels for single cells, using clustering-based cell identities result in spurious signals and false discoveries. Results We introduce non-parametric methods to evaluate cell identities by testing cluster memberships in an unsupervised manner. Diverse simulation studies demonstrate accuracy of the jackstraw test for cluster membership. We propose a posterior probability that a cell should be included in that clustering-based subpopulation. Posterior inclusion probabilities (PIPs) for cluster memberships can be used to select and visualize samples relevant to subpopulations. The proposed methods are applied on three scRNA-seq datasets. First, a mixture of Jurkat and 293T cell lines provides two distinct cellular populations. Second, Cell Hashing yields cell identities corresponding to eight donors which are independently analyzed by the jackstraw. Third, peripheral blood mononuclear cells are used to explore heterogeneous immune populations. The proposed P-values and PIPs lead to probabilistic feature selection of single cells that can be visualized using principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE) and others. By learning uncertainty in clustering high-dimensional data, the proposed methods enable unsupervised evaluation of cluster membership. Availability and implementation https://cran.r-project.org/package=jackstraw. Supplementary information Supplementary data are available at Bioinformatics online.

2018 ◽  
Author(s):  
Neo Christopher Chung

AbstractSingle cell RNA sequencing (scRNA-seq) allows us to dissect transcriptional heterogeneity arising from cellular types, spatio-temporal contexts, and environmental stimuli. Cell identities of samples derived from heterogeneous subpopulations are routinely determined by clustering of scRNA-seq data. Computational cell identities are then used in downstream analysis, feature selection, and visualization. However, how can we examine if cell identities are accurately inferred? To this end, we introduce non-parametric methods to evaluate cell identities by testing cluster memberships of single cell samples in an unsupervised manner. We propose posterior inclusion probabilities for cluster memberships to select and visualize samples relevant to subpopulations. Beyond simulation studies, we examined two scRNA-seq data - a mixture of Jurkat and 293T cells and a large family of peripheral blood mononuclear cells. We demonstrated probabilistic feature selection and improved t-SNE visualization. By learning uncertainty in clustering, the proposed methods enable rigorous testing of cell identities in scRNA-seq.


Nutrients ◽  
2019 ◽  
Vol 11 (7) ◽  
pp. 1489 ◽  
Author(s):  
Jed W. Fahey ◽  
Kristina L. Wade ◽  
Katherine K. Stephenson ◽  
Anita A. Panjwani ◽  
Hua Liu ◽  
...  

We examined whether gastric acidity would affect the activity of myrosinase, co-delivered with glucoraphanin (GR), to convert GR to sulforaphane (SF). A broccoli seed and sprout extract (BSE) rich in GR and active myrosinase was delivered before and after participants began taking the anti-acid omeprazole, a potent proton pump inhibitor. Gastric acidity appears to attenuate GR bioavailability, as evidenced by more SF and its metabolites being excreted after participants started taking omeprazole. Enteric coating enhanced conversion of GR to SF, perhaps by sparing myrosinase from the acidity of the stomach. There were negligible effects of age, sex, ethnicity, BMI, vegetable consumption, and bowel movement frequency and quality. Greater body mass correlated with reduced conversion efficiency. Changes in the expression of 20 genes in peripheral blood mononuclear cells were evaluated as possible pharmacodynamic indicators. When grouped by their primary functions based on a priori knowledge, expression of genes associated with inflammation decreased non-significantly, and those genes associated with cytoprotection, detoxification and antioxidant functions increased significantly with bioavailability. Using principal components analysis, component loadings of the changes in gene expression confirmed these groupings in a sensitivity analysis.


2021 ◽  
Author(s):  
Léa Guyonnet ◽  
grégoire Detriché ◽  
Nicolas Gendron ◽  
Aurélien Philippe ◽  
Christian Latremouille ◽  
...  

Abstract The Aeson® total artificial heart (A-TAH) has been developed as a total heart replacement for patients at risk of death from biventricular failure. We previously described endothelialization of the hybrid membrane inside A-TAH probably at the origin of acquired hemocompatibility. We aimed to quantify vasculogenic stem cells in peripheral blood of patients with long-term A-TAH implantation. Four male adult patients were included in this study. Peripheral blood mononuclear cells were collected before A-TAH implantation (T0) and after implantation at one month (T1), between two and five months (T2), and then between six and twelve months (T3). Supervised analysis of flow cytometry data confirmed the presence of the previously identified Lin−CD133+CD45− and Lin−CD34+ with different CD45 level intensities. Lin−CD133+CD45−, Lin−CD34+CD45− and Lin−CD34+CD45+ were not modulated after A-TAH implantation. However, we demonstrated a significant mobilization of Lin−CD34+CD45dim (p = 0.01) one month after A-TAH implantation regardless of the expression of CD133 or c-Kit. We then visualized data for the resulting clusters on a uniform manifold approximation and projection (UMAP) plot showing all single cells of the live Lin− and CD34+ events selected from down sampled files concatenated at T0 and T1. The three clusters upregulated in T1 are CD45dim clusters, confirming our results. In conclusion, using a flow cytometry approach, we demonstrated in A-TAH-transplanted patients a significant mobilization of Lin−CD34+CD45dim in peripheral blood one month after A-TAH implantation.


2019 ◽  
Author(s):  
Ralph Patrick ◽  
David T. Humphreys ◽  
Vaibhao Janbandhu ◽  
Alicia Oshlack ◽  
Joshua W.K. Ho ◽  
...  

AbstractHigh-throughput single-cell RNA-seq (scRNA-seq) is a powerful tool for studying gene expression in single cells. Most current scRNA-seq bioinformatics tools focus on analysing overall expression levels, largely ignoring alternative mRNA isoform expression. We present a computational pipeline, Sierra, that readily detects differential transcript usage from data generated by commonly used polyA-captured scRNA-seq technology. We validate Sierra by comparing cardiac scRNA-seq cell-types to bulk RNA-seq of matched populations, finding significant overlap in differential transcripts. Sierra detects differential transcript usage across human peripheral blood mononuclear cells and the Tabula Muris, and 3’UTR shortening in cardiac fibroblasts. Sierra is available at https://github.com/VCCRI/Sierra.


2017 ◽  
Author(s):  
Navpreet Ranu ◽  
Alexandra-Chloé Villani ◽  
Nir Hacohen ◽  
Paul C. Blainey

There is rising interest in applying single-cell transcriptome analysis and other single-cell sequencing methods to resolve differences between cells. Pooled processing of thousands of single cells is now routinely practiced by introducing cell-specific DNA barcodes early in cell processing protocols1-5. However, researchers must sequence a large number of cells to sample rare subpopulations6-8, even when fluorescence-activated cell sorting (FACS) is used to pre-enrich rare cell populations. Here, a new molecular enrichment method is used in conjunction with FACS enrichment to enable efficient sampling of rare dendritic cell (DC) populations, including the recently identified AXL+SIGLEC6+ (AS DCs) subset7, within a 10X Genomics single-cell RNA-Seq library. DC populations collectively represent 1-2% of total peripheral blood mononuclear cells (PBMC), with AS DC representing only 1-3% of human blood DCs and 0.01-0.06% of total PBMCs.


2021 ◽  
Author(s):  
Wenkai Han ◽  
Yuqi Cheng ◽  
Jiayang Chen ◽  
Huawen Zhong ◽  
Zhihang Hu ◽  
...  

Single-cell RNA-sequencing (scRNA-seq) has become a powerful tool to reveal the complex biological diversity and heterogeneity among cell populations. However, the technical noise and bias of the technology still have negative impacts on the downstream analysis. Here, we present a self-supervised Contrastive LEArning framework for scRNA-seq (CLEAR) profile representation and the downstream analysis. CLEAR overcomes the heterogeneity of the experimental data with a specifically designed representation learning task and thus can handle batch effects and dropout events. In the task, the deep learning model learns to pull together the representations of similar cells while pushing apart distinct cells, without manual labeling. It achieves superior performance on a broad range of fundamental tasks, including clustering, visualization, dropout correction, batch effect removal, and pseudo-time inference. The proposed method successfully identifies and illustrates inflammatory-related mechanisms in a COVID-19 disease study with 43,695 single cells from peripheral blood mononuclear cells. Further experiments to process a million-scale single-cell dataset demonstrate the scalability of CLEAR. This scalable method generates effective scRNA-seq data representation while eliminating technical noise, and it will serve as a general computational framework for single-cell data analysis.


Hypertension ◽  
2014 ◽  
Vol 64 (suppl_1) ◽  
Author(s):  
Maria João Pinho ◽  
Manuel Vaz-da-Silva ◽  
José M Cabral ◽  
Joana Afonso ◽  
Paula Serrão ◽  
...  

MicroRNAs (miRNAs) are regulators of gene expression and play a key role in the pathophysiology of various disease processes, namely cardiovascular disease. This study aimed at the identification of the miRNA expression pattern of peripheral blood mononuclear cells derived from hypertensive subjects (HT) and non-hypertensive subjects with cardiac disease (CD) using a genome-wide approach. Furthermore, we explored the potential of orthogonal partial least squares-discrimination analysis (OPLS-DA) to analyze miRNA expression data. The homogeneity of the population was assessed with Principal Component Analysis (PCA) and verified by the clear separation of the three groups on the basis of the miRNA expression levels. Moreover, local PCA of HT group indicates the splitting of this group in two distinct subgroups. This distinction correlates with two subject’s characteristics (smoking habits and history of myocardial infarction) that are closest to the CD group. OPLS-DA analysis confirms the separation of three groups and identified the most important miRNA to discrimination between groups. Among the most influential miRNA to the positive differentiation of HT from CD and control, several of them are relevant to endothelial dysfunction. On the other hand, miRNAs associated with angiogenesis positively differentiate CD from HT and control. Mir-186, a regulator of myogenin, is the most important miRNA to discriminate HT from CD. These results provide evidence of altered miRNA profile in hypertensives and cardiac disease and demonstrate that PCA and OPLS-DA are very useful and robust tools to analyze how different miRNAs contribute to the separations of groups and to find interesting patterns within the multidimensional data.


2007 ◽  
Vol 13 (5) ◽  
pp. 578-583 ◽  
Author(s):  
Roverto Alvarez-Lafuente ◽  
Virginia de las Heras ◽  
Marta Garcia-Montojo ◽  
Manuel Bartolome ◽  
Rafael Arroyo

Recently, it has been suggested that human herpesvirus-6 (HHV-6) may play a role in the pathogenesis of relapsing-remitting multiple sclerosis (RRMS), but there is not enough information related to the role of HHV-6 in secondary-progressive MS (SPMS). To address this question, we evaluated HHV-6 prevalence, active viral replication and viral load measured by quantitative real-time PCR, in DNA and mRNA extracted from peripheral blood mononuclear cells (PBMCs) and DNA extracted from serum; the samples were collected from 31 SPMS and 31 RRMS patients in a one-year follow-up study, and sex- and age-matched controls. The results were as follows: i) We found a statistical significant difference in HHV-6 DNA prevalences between RRMS and SPMS patients in: DNA extracted from PBMCs (P= 0.027), DNA extracted from serum (P= 0.010) and mRNA extracted from PBMCs (P =0.010). When we compared HHV-6 prevalences from RRMS patients in relapse and in remission with those from SPMS patients, we only achieved a statistical significance for the relapses (P=0.003 in DNA from PBMCs, and P<0.001 in DNA from serum samples and mRNA from PBMCs). ii) We only found HHV-6 variant A among HHV-6 positive samples in serum. iii) We did not find any difference in HHV-6 viral loads. These results suggest that HHV-6A does not play an active role in SPMS, while this virus may contribute to the pathogenesis of RRMS triggering MS attacks in a subset of patients. Multiple Sclerosis 2007; 13: 578-583. http://msj.sagepub.com


2007 ◽  
Vol 25 (18_suppl) ◽  
pp. 3518-3518 ◽  
Author(s):  
S. Kummar ◽  
R. Kinders ◽  
M. Gutierrez ◽  
L. Rubinstein ◽  
R. E. Parchment ◽  
...  

3518 Background: Inhibition of PARP activity sensitizes tumor cells to the effects of DNA damaging agents. We conducted a phase 0 pharmacokinetic (PK) and pharmacodynamic (PD) study of ABT-888, an oral inhibitor of PARP. Methods: The objectives were to determine a dose range at which ABT-888 inhibits PARP in tumor tissue and in peripheral blood mononuclear cells (PBMC); and the PK of ABT-888. Patients with advanced solid tumors refractory to at least one line of therapy were eligible; patients with CLL or follicular lymphomas were also eligible if standard therapy was not currently indicated. A single oral dose of ABT-888 was administered per patient, dose escalations were planned in cohorts of 3 patients each (10 mg, 25 mg, 50 mg, 100 mg, and 150 mg). PBMC and tumor sampling were performed before and after drug administration for real time PK and PD analyses. All patients underwent PBMC sampling; tumor biopsies were planned once significant inhibition of PARP activity in PBMCs was seen in 1 of 3 patients in a cohort or plasma Cmax of 210 nM was achieved in at least 1 patient. Tumor biopsies were performed at baseline in the week prior to drug administration and then 3–6 hours post drug administration. Significant inhibition of PARP activity was defined as at least 0.69 reduction on the log scale, which also satisfied statistical significance. Results: A total of 6 patients have been studied so far, 3 each for the 10 mg and 25 mg cohorts. No treatment related adverse events have been observed. Target Cmax was exceeded in the first cohort, all patients in the next cohort underwent tumor biopsies in addition to PBMC sampling. A trend towards inhibition of PARP activity in PBMCs was observed in the first cohort. Significant inhibition of PAR levels was observed in tumor biopsies from all 3 patients in the second cohort (92%, 99%, 100% reductions respectively, as compared to baseline). Greater than 85% reduction of PAR levels was observed in PBMCs from 2 of the 3 patients in the second cohort (one patient was not evaluable). Conclusions: ABT-888 is orally bioavailable and inhibits PARP activity in PBMCs and tumor cells. Target assay feasibility was established in human samples. Funded in part by NCI Contract N01-CO-12400 No significant financial relationships to disclose.


2016 ◽  
Vol 34 (2_suppl) ◽  
pp. 313-313
Author(s):  
Ben Yiming Zhang ◽  
Shaun M. Riska ◽  
Douglas W. Mahoney ◽  
James Robert Cerhan ◽  
Brian Addis Costello ◽  
...  

313 Background: We evaluated variation in candidate genes implicated in either initiation or progression of prostate cancer with survival in CRPC stage. Methods: Germline DNA was extracted from peripheral blood mononuclear cells of CRPC patients enrolled in a clinically annotated registry. Fourteen candidate genes were tagged using single nucleotide polymorphisms (SNPs) from HapMap with minor allele frequency of > 5%. The primary endpoint was overall survival (OS), defined as time from development of CRPC to death. Principal component analysis was used for gene levels tests of significance. For SNP level results the per allele hazard ratios (HR) and 95% confidence intervals (CI) under the additive allele model using Cox regression adjusted for age at CRCP and Gleason score (GS) were used. Results: Two hundred and forty five CRPC patients who met the criteria of having adequate DNA were genotyped (14 genes, 84 SNPs). The median age of the cohort was 69 years (range 43-93). The GS distribution was 55% with GS≥8, 32% with GS = 7 and 13% with GS < 7 or unknown. Median time from castration resistance to death for the cohort was 2.67 years (IQ range: 1.6-4.07, 144 deaths). At the gene level JAK2 was detected to be associated with OS. Six of 18 JAK2 SNPs were associated with OS after adjustment for age and GS as well as pruning of SNPs in high linkage disequilibrium with each other (see table). In our multivariate model including age, GS, rs2149556, and rs4372063, the adjusted HRs for rs2149556 and rs4372063 were 0.67 (95% CI 0.38-1.18) and 2.17 (95% CI 1.25-3.76), respectively. The protective effect of rs2149556 appears after adjusting for the presence of minor alleles for rs4372063 in this dataset. Conclusions: Germline variation in the JAK2gene is associated with survival in CRPC stage and warrants further validation as a potential prognostic biomarker. [Table: see text]


Sign in / Sign up

Export Citation Format

Share Document