scholarly journals Discriminative feature of cells characterizes cell populations of interest by a small subset of genes

2021 ◽  
Author(s):  
Takeru Fujii ◽  
Kazumitsu Maehara ◽  
Masatoshi Fujita ◽  
Yasuyuki Ohkawa

ABSTRACTStatistical methods for detecting differences in individual gene expression are indispensable for understanding cell types. However, conventional statistical methods have faced difficulties associated with the inflation of P-values because of both the large sample size and selection bias introduced by exploratory data analysis such as single-cell transcriptomics. Here, we propose the concept of discriminative feature of cells (DFC), an alternative to using differentially expressed gene-based approaches. We implemented DFC using logistic regression with an adaptive LASSO penalty to perform binary classification for the discrimination of a population of interest and variable selection to obtain a small subset of defining genes. We demonstrated that DFC prioritized gene pairs with non-independent expression using artificial data, and that DFC enabled to characterize the muscle satellite cell population. The results revealed that DFC well captured cell-type-specific markers, specific gene expression patterns, and subcategories of this cell population. DFC may complement differentially expressed gene-based methods for interpreting large data sets.

2021 ◽  
Vol 17 (11) ◽  
pp. e1009579
Author(s):  
Takeru Fujii ◽  
Kazumitsu Maehara ◽  
Masatoshi Fujita ◽  
Yasuyuki Ohkawa

Organisms are composed of various cell types with specific states. To obtain a comprehensive understanding of the functions of organs and tissues, cell types have been classified and defined by identifying specific marker genes. Statistical tests are critical for identifying marker genes, which often involve evaluating differences in the mean expression levels of genes. Differentially expressed gene (DEG)-based analysis has been the most frequently used method of this kind. However, in association with increases in sample size such as in single-cell analysis, DEG-based analysis has faced difficulties associated with the inflation of P-values. Here, we propose the concept of discriminative feature of cells (DFC), an alternative to using DEG-based approaches. We implemented DFC using logistic regression with an adaptive LASSO penalty to perform binary classification for discriminating a population of interest and variable selection to obtain a small subset of defining genes. We demonstrated that DFC prioritized gene pairs with non-independent expression using artificial data and that DFC enabled characterization of the muscle satellite/progenitor cell population. The results revealed that DFC well captured cell-type-specific markers, specific gene expression patterns, and subcategories of this cell population. DFC may complement DEG-based methods for interpreting large data sets. DEG-based analysis uses lists of genes with differences in expression between groups, while DFC, which can be termed a discriminative approach, has potential applications in the task of cell characterization. Upon recent advances in the high-throughput analysis of single cells, methods of cell characterization such as scRNA-seq can be effectively subjected to the discriminative methods.


2020 ◽  
Vol 26 (29) ◽  
pp. 3619-3630
Author(s):  
Saumya Choudhary ◽  
Dibyabhaba Pradhan ◽  
Noor S. Khan ◽  
Harpreet Singh ◽  
George Thomas ◽  
...  

Background: Psoriasis is a chronic immune mediated skin disorder with global prevalence of 0.2- 11.4%. Despite rare mortality, the severity of the disease could be understood by the accompanying comorbidities, that has even led to psychological problems among several patients. The cause and the disease mechanism still remain elusive. Objective: To identify potential therapeutic targets and affecting pathways for better insight of the disease pathogenesis. Method: The gene expression profile GSE13355 and GSE14905 were retrieved from NCBI, Gene Expression Omnibus database. The GEO profiles were integrated and the DEGs of lesional and non-lesional psoriasis skin were identified using the affy package in R software. The Kyoto Encyclopaedia of Genes and Genomes pathways of the DEGs were analyzed using clusterProfiler. Cytoscape, V3.7.1 was utilized to construct protein interaction network and analyze the interactome map of candidate proteins encoded in DEGs. Functionally relevant clusters were detected through Cytohubba and MCODE. Results: A total of 1013 genes were differentially expressed in lesional skin of which 557 were upregulated and 456 were downregulated. Seven dysregulated genes were extracted in non-lesional skin. The disease gene network of these DEGs revealed 75 newly identified differentially expressed gene that might have a role in development and progression of the disease. GO analysis revealed keratinocyte differentiation and positive regulation of cytokine production to be the most enriched biological process and molecular function. Cytokines -cytokine receptor was the most enriched pathways. Among 1013 identified DEGs in lesional group, 36 DEGs were found to have altered genetic signature including IL1B and STAT3 which are also reported as hub genes. CCNB1, CCNA2, CDK1, IL1B, CXCL8, MKI 67, ESR1, UBE2C, STAT1 and STAT3 were top 10 hub gene. Conclusion: The hub genes, genomic altered DEGs and other newly identified differentially dysregulated genes would improve our understanding of psoriasis pathogenesis, moreover, the hub genes could be explored as potential therapeutic targets for psoriasis.


2020 ◽  
Vol 15 ◽  
Author(s):  
Chen-An Tsai ◽  
James J. Chen

Background: Gene set enrichment analyses (GSEA) provide a useful and powerful approach to identify differentially expressed gene sets with prior biological knowledge. Several GSEA algorithms have been proposed to perform enrichment analyses on groups of genes. However, many of these algorithms have focused on identification of differentially expressed gene sets in a given phenotype. Objective: In this paper, we propose a gene set analytic framework, Gene Set Correlation Analysis (GSCoA), that simultaneously measures within and between gene sets variation to identify sets of genes enriched for differential expression and highly co-related pathways. Methods: We apply co-inertia analysis to the comparisons of cross-gene sets in gene expression data to measure the costructure of expression profiles in pairs of gene sets. Co-inertia analysis (CIA) is one multivariate method to identify trends or co-relationships in multiple datasets, which contain the same samples. The objective of CIA is to seek ordinations (dimension reduction diagrams) of two gene sets such that the square covariance between the projections of the gene sets on successive axes is maximized. Simulation studies illustrate that CIA offers superior performance in identifying corelationships between gene sets in all simulation settings when compared to correlation-based gene set methods. Result and Conclusion: We also combine between-gene set CIA and GSEA to discover the relationships between gene sets significantly associated with phenotypes. In addition, we provide a graphical technique for visualizing and simultaneously exploring the associations of between and within gene sets and their interaction and network. We then demonstrate integration of within and between gene sets variation using CIA and GSEA, applied to the p53 gene expression data using the c2 curated gene sets. Ultimately, the GSCoA approach provides an attractive tool for identification and visualization of novel associations between pairs of gene sets by integrating co-relationships between gene sets into gene set analysis.


2021 ◽  
Author(s):  
Richard J White ◽  
Eirinn Mackay ◽  
Stephen W Wilson ◽  
Elisabeth M Busch-Nentwich

In model organisms, RNA sequencing is frequently used to assess the effect of genetic mutations on cellular and developmental processes. Typically, animals heterozygous for a mutation are crossed to produce offspring with different genotypes. Resultant embryos are grouped by genotype to compare homozygous mutant embryos to heterozygous and wild-type siblings. Genes that are differentially expressed between the groups are assumed to reveal insights into the pathways affected by the mutation. Here we show that in zebrafish, differentially expressed genes are often overrepresented on the same chromosome as the mutation due to different levels of expression of alleles from different genetic backgrounds. Using an incross of haplotype-resolved wild-type fish, we found evidence of widespread allele-specific expression, which appears as differential expression when comparing embryos homozygous for a region of the genome to their siblings. When analysing mutant transcriptomes, this means that differentially expressed genes on the same chromosome as a mutation of interest may not be caused by that mutation. Typically, the genomic location of a differentially expressed gene is not considered when interpreting its importance with respect to the phenotype. This could lead to pathways being erroneously implicated or overlooked due to the noise of spurious differentially expressed genes on the same chromosome as the mutation. These observations have implications for the interpretation of RNA-seq experiments involving outbred animals and non-inbred model organisms.


Blood ◽  
2004 ◽  
Vol 104 (11) ◽  
pp. 420-420
Author(s):  
Christian Flotho ◽  
Susana C. Raimondi ◽  
James R. Downing

Abstract We have demonstrated that expression profiling of leukemic blasts can accurately identify the known prognostic subtypes of ALL, including T-ALL, E2A-PBX1, TEL-AML1, MLL rearrangements, BCR-ABL, and hyperdiploid >50 chromosomes (HD>50). Interestingly, almost 70% of the genes that defined HD>50 ALL localized to chromosome 21 or X. To further explore the relationship between gene expression and chromosome dosage, we compared the expression profiles obtained using the Affymetrix U133A&B microarrays of 17 HD>50 ALLs to 78 diploid or pseudodiploid ALLs. Our analysis demonstrated that the average expression level for all genes on a chromosome could be used to predict chromosome copy numbers. Specifically, the copy number for each chromosome calculated by gene expression profiling predicted the numerical chromosomal abnormalities detected by standard cytogenetics. For chromosomes that were trisomic in HD>50 ALL, the mean chromosome-specific gene expression level was increased approximately 1.5-fold compared to that observed in diploid or pseudodiploid ALL cases. Similarly, for chromosome 21 and X, the mean chromosome-specific gene expression levels were increased approximately 2-fold, consistent with a duplication of the active X chromosome and tetrasomy of chromosome 21, a finding verified by standard cytogenetics in >90% of the HD>50 cases. These finding indicate that the aberrant gene expression levels seen in HD>50 ALL primarily reflect gene dosages. Importantly, we did not observe any clustering of aberrantly expressed genes across the duplicated chromosomes, making regional gain or loss of genomic material unlikely. Paradoxically, however, a more detailed analysis revealed a small but statistically significant number of genes on the trisomic/tetrasomic chromosomes whose expression levels were markedly reduced when compared to that seen in diploid or pseudodiploid leukemic samples. Using the Statistical Analysis of Microarrays (SAM) algorithm we identified 20 genes whose expression was reduced >2-fold despite having an increase in copy number. Interestingly, included within this group are several known tumor suppressors, including AKAP12, which is specifically silenced by methylation in fos-transformed cells, and IGF2R and IGFBP7, negative regulators of insulin-like growth factor signaling. In addition to the silencing of a small subset of genes, we also identified 21 genes on these chromosomes whose expression levels were markedly higher (>3-fold) than would be predicted solely based on copy number. Although the mechanism responsible for their increased expression remains unknown, included in this group are four genes involved in signal transduction (IL3RA, IL13RA1, SNX9, and GASP) and a novel cytokine, C17, whose expression is normally limited to CD34+ hematopoietic progenitors. Taken together, these data suggest that aberrant growth in HD>50 ALL is in part driven by increased expression of a large number of genes secondary to chromosome duplications, coupled with a further enhanced expression of a limited number of growth promoting genes, and the specific silencing of a small subset of negative growth regulatory genes. Understanding the mechanisms responsible for the non-dosage related changes in gene expression should provide important insights into the pathology of HD>50 ALL.


Blood ◽  
2011 ◽  
Vol 118 (21) ◽  
pp. 801-801
Author(s):  
Lili Wang ◽  
Alex K Shalek ◽  
Jellert Gaublomme ◽  
Nir Yosef ◽  
Jennifer R Brown ◽  
...  

Abstract Abstract 801 We have recently found that the Wnt/b-catenin signaling pathway plays a key role in chronic lymphocytic leukemia (CLL). We were, however, intrigued by the question of whether this aberrant pathway may function differently in independent leukemias, and contribute to disease heterogeneity. To assess differential activity of the Wnt pathway across patients, we tested the effects of blocking Wnt activation on CLL cell survival. We knocked down a key downstream gene, LEF1, which is the most differentially expressed gene in CLL compared to normal B cells (based on gene expression microarrays). Addressing this question requires genetic manipulation of primary normal and malignant human B cells, and yet these cells are notoriously difficult to transfect. We therefore focused on developing a method for introducing siRNAs into normal and malignant B cells. We adapted a novel delivery system consisting of vertical silicon nanowires (SiNWs, Shalek et al PNAS 2010) that penetrate the plasma membrane in a minimally invasive fashion and deliver biomolecular cargo directly into the cytoplasm. We achieved consistent and reliable delivery of fluorescently labeled siRNAs (at 50–200 pmol) into normal and CLL B cells. siRNA was delivered to >90% of cells with >85% cell viability remaining after 48 hours. We used this platform to knockdown LEF1 in 20 CLL-B and 5 normal CD19+ B cell samples, and examined cell survival 48 hours after siRNA delivery using an ATP-based CellTiter-Glo assay. Indeed, our studies revealed a heterogeneous response among CLL-B cells to LEF1 inhibition. As a group, CLL-B cells were significantly more sensitive to LEF1 knockdown with a survival rate of 77% (12% s.e.m) compared to 97% (13% s.e.m) in normal B cells. CLL B cells from different patients showed differential sensitivity to LEF1 knockdown, with 8 non-responders, 8 intermediate responders and 4 strong responders (i.e. significant death). Sensitivity to LEF1 inhibition did not correlate with known CLL cytogenetic prognostic factors. To determine if the differential response to LEF1 knockdown was associated with specific gene signatures, we examined gene expression data generated from CLL-B cells from 12 (4 strong, 3 intermediate, and 5 non-responders) of the 20 CLLs tested (using the Affymetrix U133 Plus 2 Array). To increase statistical power, we used each CLL's expression profile (using only genes that showed variability across samples) to create clusters of ∼19 CLLs that showed similar expression profiles (using microarray data from our compendium of 177 additional CLLs). We further reduced the number of genes to ∼4000 genes by retaining only those whose expression levels were significantly different in at least one associated cluster relative to normal CD19+ B cell controls (T-test, FDR<10−4; p-values converted using the Benjamini-Hochberg method). These analyses led to the identification of several hundred genes whose expression correlated significantly with LEF1 knockdown's effect on cell viability. Analysis of these differentially expressed genes identified several potentially important pathways. Ongoing analyses include the identification and validation of a molecular signature for this effect. This signature could enable rapid identification of patients who would be most responsive to therapy with LEF1 inhibitors, which are under development along with other Wnt pathway inhibitors. Disclosures: No relevant conflicts of interest to declare.


2007 ◽  
Vol 25 (18_suppl) ◽  
pp. 21106-21106 ◽  
Author(s):  
J. Kim ◽  
J. H. Pak ◽  
W. H. Choi ◽  
J. Y. Kim ◽  
W. D. Joo ◽  
...  

21106 Background: To detect the genes differentially expressed in the ovarian cancer, we analysed the genes in the ovarian cancer and normal ovary by differentially expressed gene(DEG) PCR using the RNA extracted from the both tissues. We examined the relationship between the specific genes of ovarian cancer and pathogenesis of ovarian cancer. Methods: Differentially expressed genes were screened by ACP-based PCR. Differentially expressed bands were extracted from agarose gel, and then directly sequenced. Finally we determined the clinical importances of differentially expressed genes. Results: Some genes were overexpressed in the ovarian cancer tissue than normal ovary, such as plexin B1(PLXNB1), aminoacylase 1(ACY1), solute carrier family 25 protein(SLC25A5), triosephosphate isomerase 1(TPI 1), poliovirus receptor-related 3 protein(PVRL 3), clusterin, LY6/PLAUR domain containing 1 protein(LYPDC 1). And other five genes were more expressed in the normal ovary than ovarian cancer, such as ribosomal protein L11 and L23, tenascin XB (TNXB), complement component 1 and actin alpha 2. Conclusions: Clusterin was highly expressed in the tissue from ovarian cancer, which was identified with anti- or proapoptotic activity regulated by calcium homeostasis in prostate, breast and colorectal cancers. And it suggests the possibility that regulation of clusterin activity provides the prospect of breaking down cancer cells‘ resistance to apoptosis in the ovarian cancer. Ribosomal protein L11 and L23 was highly expressed in normal ovary, which plays an important role in regulating the stability and function of the p53 tumor suppressor protein. It suggests that suppression of ribosomal protein L11 may act an important role in proliferation of ovarian cancer and over-expression of ribosomal protein L11 may act an important role in cell cycle arrest in the treatment of the ovarian cancer. No significant financial relationships to disclose.


2017 ◽  
Vol 35 (15_suppl) ◽  
pp. 7011-7011
Author(s):  
Kamal Chamoun ◽  
Christopher Brent Benton ◽  
Ahmed AlRawi ◽  
Rodrigo Jacamo ◽  
Patrick Williams ◽  
...  

7011 Background: AML LSC are believed to be responsible for residual and resistant leukemic disease leading to relapse. Understanding differences between bulk AML and the LSC subpopulation may allow the identification of novel LSC targets, especially for the most adverse risk AML where few patients are cured. Targeting LSC may be needed to eradicate AML, and immune-based therapies provide an approach for eliminating LSC. The transcriptional landscape of immune-related genes in LSC is not well understood. Methods: Samples were collected at diagnosis from 12 patients with high-risk AML prior to therapy. Bulk (CD45-dim blasts) and LSC (Lin-CD34+CD38-CD123+) AML marrow cells were FACS-sorted and analyzed using whole genome RNA-sequencing. Transcriptomes were analyzed using AltAnalyze software to identify differentially expressed genes in bulk AML cells and in AML LSC populations. These genes were further assessed by gene enrichment analysis using data from Gene Ontology (GO) and the Cancer Genome Atlas Project (CGAP). Results: Sixty-eight genes were identified with greater than 3-fold differential expression between bulk AML and LSC. GO enrichment analysis demonstrated more than 10-fold enrichment of genes involved in the molecular functions, biologic processes, and cell components related to the antigen presentation pathway, with the comparative down-regulation occurring in LSC. Among the top differentially expressed gene clusters, both the MHC class II and interferon-gamma signaling/response pathway gene expression was blunted in LSC. Additional expression analysis revealed that 42% of a CGAP-curated list of 201 antigen-processing and -presentation genes had significantly decreased expression in the LSC subpopulation compared to bulk AML. Conclusions: LSC from primary AML patient samples are characterized by reduction in expression of MHC class II receptor and antigen presentation genes compared to bulk AML. These results suggest that impairment in the presentation and/or processing of tumor associated antigens by MHC class II on LSC, along with tonic sponging of immune response cells and diversion away from LSC by bulk AML, may contribute to LSC evasion of immune surveillance and response.


2015 ◽  
Vol 18 (3) ◽  
pp. 281-289 ◽  
Author(s):  
Chao-Pin Hsiao ◽  
Swarnalatha Y. Reddy ◽  
Mei-Kuang Chen ◽  
Leorey N. Saligan

Purpose: The purpose of this study was to explore gene expression changes in fatigued men with nonmetastatic prostate cancer receiving localized external beam radiation therapy (EBRT). Methods: Fatigue was measured in 40 men with prostate cancer (20 receiving EBRT and 20 controls on active surveillance) using the Functional Assessment of Cancer Therapy–Fatigue (FACT-F). EBRT subjects were followed from baseline to midpoint and end point of EBRT, while controls were seen at one time point. EBRT subjects were categorized into high- and low-fatigue groups based on change in FACT-F scores from baseline to EBRT completion. Full genome microarray was performed from peripheral leukocyte RNA to determine gene expression changes related to fatigue phenotypes. Real-time polymerase chain reaction and enzyme-linked immunosorbent assay confirmed the most differentially expressed gene in the microarray experiment. Results: At baseline, mean FACT-F scores were not different between EBRT subjects (44.3 ± 7.16) and controls (46.7 ± 4.32, p = .24). Fatigue scores of EBRT subjects decreased at treatment midpoint (38.6 ± 9.17, p = .01) and completion (37.6 ± 9.9, p = .06), indicating worsening fatigue. Differential expression of 42 genes was observed between fatigue groups when EBRT time points were controlled. Membrane-spanning four domains, subfamily A, member ( MS4A1) was the most differentially expressed gene and was associated with fatigue at treatment end point ( r = −.46, p = .04). Conclusion: Fatigue intensification was associated with MS4A1 downregulation, suggesting that fatigue during EBRT may be related to impairment in B-cell immune response. The 42 differentially expressed fatigue-related genes are associated with glutathione biosynthesis, γ-glutamyl cycle, and antigen presentation pathways.


F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 474
Author(s):  
Andy R. Eugene ◽  
Jolanta Masiak ◽  
Beata Eugene

Background: We sought to test the hypothesis that transcriptiome-level genes signatures are differentially expressed between male and female bipolar patients, prior to lithium treatment, in a patient cohort who later were clinically classified as lithium treatment responders. Methods: Gene expression study data was obtained from the Lithium Treatment-Moderate dose Use Study data accessed from the National Center for Biotechnology Information’s Gene Expression Omnibus via accession number GSE4548. Differential gene expression analysis was conducted using the Linear Models for Microarray and RNA-Seq (limma) package and the Random Forests machine learning algorithm in R. Results: In pre-treatment lithium responders, the following genes were found having a greater than 0.5 fold-change, and differentially expressed indicating a male bias: RBPMS2, SIDT2, CDH23, LILRA5, and KIR2DS5; while the female-biased genes were: HLA-H, RPS23, FHL3, RPL10A, NBPF14, PSTPIP2, FAM117B, CHST7, and ABRACL. Conclusions: Using machine learning, we developed a pre-treatment gender- and gene-expression-based predictive model selective for lithium responders with an ROC AUC of 0.92 for men and an ROC AUC of 1 for women.


Sign in / Sign up

Export Citation Format

Share Document