scholarly journals A scaling-free minimum enclosing ball method to detect differentially expressed genes for RNA-seq data

BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yan Zhou ◽  
Bin Yang ◽  
Junhui Wang ◽  
Jiadi Zhu ◽  
Guoliang Tian

Abstract Background Identifying differentially expressed genes between the same or different species is an urgent demand for biological and medical research. For RNA-seq data, systematic technical effects and different sequencing depths are usually encountered when conducting experiments. Normalization is regarded as an essential step in the discovery of biologically important changes in expression. The present methods usually involve normalization of the data with a scaling factor, followed by detection of significant genes. However, more than one scaling factor may exist because of the complexity of real data. Consequently, methods that normalize data by a single scaling factor may deliver suboptimal performance or may not even work.The development of modern machine learning techniques has provided a new perspective regarding discrimination between differentially expressed (DE) and non-DE genes. However, in reality, the non-DE genes comprise only a small set and may contain housekeeping genes (in same species) or conserved orthologous genes (in different species). Therefore, the process of detecting DE genes can be formulated as a one-class classification problem, where only non-DE genes are observed, while DE genes are completely absent from the training data. Results In this study, we transform the problem to an outlier detection problem by treating DE genes as outliers, and we propose a scaling-free minimum enclosing ball (SFMEB) method to construct a smallest possible ball to contain the known non-DE genes in a feature space. The genes outside the minimum enclosing ball can then be naturally considered to be DE genes. Compared with the existing methods, the proposed SFMEB method does not require data normalization, which is particularly attractive when the RNA-seq data include more than one scaling factor. Furthermore, the SFMEB method could be easily extended to different species without normalization. Conclusions Simulation studies demonstrate that the SFMEB method works well in a wide range of settings, especially when the data are heterogeneous or biological replicates. Analysis of the real data also supports the conclusion that the SFMEB method outperforms other existing competitors. The R package of the proposed method is available at https://bioconductor.org/packages/MEB.

2017 ◽  
Vol 3 (3) ◽  
pp. 31 ◽  
Author(s):  
Isabel González Gayte ◽  
Rocío Bautista Moreno ◽  
Pedro Seoane Zonjic ◽  
M. Gonzalo Claros

Differential gene expression based on RNA-seq is widely used. Bioinformatics skills are required since no algorithm is appropriate for all experimental designs. Moreover, when working with organisms without reference genome, functional analysis is less than straightforward in most situations. DEgenes Hunter, an attempt to automate the process, is based on two independent scripts, one for differential expression and one for functional interpretation. Based on replicates, the R script decides which of the edgeR, DEseq2, NOISeq and limma algorithms are appropriate. It performs quality control calculations and provides the prevalent, most reliable, set of differentially expressed genes, and lists all other possible candidates for further functional interpretation. It also provides a combined P-value that allows differentially expressed genes ranking. It has been tested with synthetic and real-world datasets, showing in both cases ease of use and reliable results. With real data, DEgenes Hunter offers straightforward functional interpretation.


2020 ◽  
Author(s):  
Chanwoo Kim ◽  
Hanbin Lee ◽  
Juhee Jeong ◽  
Keehoon Jung ◽  
Buhm Han

ABSTRACTA common approach to analyzing single-cell RNA-sequencing data is to cluster cells first and then identify differentially expressed genes based on the clustering result. However, clustering has an innate uncertainty and can be imperfect, undermining the reliability of differential expression analysis results. To overcome this challenge, we present MarcoPolo, a clustering-free approach to exploring differentially expressed genes. To find informative genes without clustering, MarcoPolo exploits the bimodality of gene expression to learn the group information of the cells with respect to the expression level directly from given data. Using simulations and real data analyses, we showed that our method puts biologically informative genes at higher ranks more accurately and robustly than other existing methods. As our method provides information on how cells can be grouped for each gene, it can help identify cell types that are not separated well in the standard clustering process. Our method can also be used as a feature selection method to improve the robustness against changes in the number of genes used in clustering.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yance Feng ◽  
Lei M. Li

Abstract Background Normalization of RNA-seq data aims at identifying biological expression differentiation between samples by removing the effects of unwanted confounding factors. Explicitly or implicitly, the justification of normalization requires a set of housekeeping genes. However, the existence of housekeeping genes common for a very large collection of samples, especially under a wide range of conditions, is questionable. Results We propose to carry out pairwise normalization with respect to multiple references, selected from representative samples. Then the pairwise intermediates are integrated based on a linear model that adjusts the reference effects. Motivated by the notion of housekeeping genes and their statistical counterparts, we adopt the robust least trimmed squares regression in pairwise normalization. The proposed method (MUREN) is compared with other existing tools on some standard data sets. The goodness of normalization emphasizes on preserving possible asymmetric differentiation, whose biological significance is exemplified by a single cell data of cell cycle. MUREN is implemented as an R package. The code under license GPL-3 is available on the github platform: github.com/hippo-yf/MUREN and on the conda platform: anaconda.org/hippo-yf/r-muren. Conclusions MUREN performs the RNA-seq normalization using a two-step statistical regression induced from a general principle. We propose that the densities of pairwise differentiations are used to evaluate the goodness of normalization. MUREN adjusts the mode of differentiation toward zero while preserving the skewness due to biological asymmetric differentiation. Moreover, by robustly integrating pre-normalized counts with respect to multiple references, MUREN is immune to individual outlier samples.


Viruses ◽  
2021 ◽  
Vol 13 (2) ◽  
pp. 244 ◽  
Author(s):  
Antonio Victor Campos Coelho ◽  
Rossella Gratton ◽  
João Paulo Britto de Melo ◽  
José Leandro Andrade-Santos ◽  
Rafael Lima Guimarães ◽  
...  

HIV-1 infection elicits a complex dynamic of the expression various host genes. High throughput sequencing added an expressive amount of information regarding HIV-1 infections and pathogenesis. RNA sequencing (RNA-Seq) is currently the tool of choice to investigate gene expression in a several range of experimental setting. This study aims at performing a meta-analysis of RNA-Seq expression profiles in samples of HIV-1 infected CD4+ T cells compared to uninfected cells to assess consistently differentially expressed genes in the context of HIV-1 infection. We selected two studies (22 samples: 15 experimentally infected and 7 mock-infected). We found 208 differentially expressed genes in infected cells when compared to uninfected/mock-infected cells. This result had moderate overlap when compared to previous studies of HIV-1 infection transcriptomics, but we identified 64 genes already known to interact with HIV-1 according to the HIV-1 Human Interaction Database. A gene ontology (GO) analysis revealed enrichment of several pathways involved in immune response, cell adhesion, cell migration, inflammation, apoptosis, Wnt, Notch and ERK/MAPK signaling.


2019 ◽  
Vol 32 (5) ◽  
pp. 515-526 ◽  
Author(s):  
William E. Fry ◽  
Sean P. Patev ◽  
Kevin L. Myers ◽  
Kan Bao ◽  
Zhangjun Fei

Sporangia of Phytophthora infestans from pure cultures on agar plates are typically used in lab studies, whereas sporangia from leaflet lesions drive natural infections and epidemics. Multiple assays were performed to determine if sporangia from these two sources are equivalent. Sporangia from plate cultures showed much lower rates of indirect germination and produced much less disease in field and moist-chamber tests. This difference in aggressiveness was observed whether the sporangia had been previously incubated at 4°C (to induce indirect germination) or at 21°C (to prevent indirect germination). Furthermore, lesions caused by sporangia from plates produced much less sporulation. RNA-Seq analysis revealed that thousands of the >17,000 P. infestans genes with a RPKM (reads per kilobase of exon model per million mapped reads) >1 were differentially expressed in sporangia obtained from plate cultures of two independent field isolates compared with sporangia of those isolates from leaflet lesions. Among the significant differentially expressed genes (DEGs), putative RxLR effectors were overrepresented, with almost half of the 355 effectors with RPKM >1 being up- or downregulated. DEGs of both isolates include nine flagellar-associated genes, and all were down-regulated in plate sporangia. Ten elicitin genes were also detected as DEGs in both isolates, and nine (including INF1) were up-regulated in plate sporangia. These results corroborate previous observations that sporangia produced from plates and leaflets sometimes yield different experimental results and suggest hypotheses for potential mechanisms. We caution that use of plate sporangia in assays may not always produce results reflective of natural infections and epidemics.


2021 ◽  
Author(s):  
Chengang Guo ◽  
Zhimin wei ◽  
Wei Lyu ◽  
Yanlou Geng

Abstract Quinoa saponins have complex, diverse and evident physiologic activities. However, the key regulatory genes for quinoa saponin metabolism are not yet well studied. The purpose of this study was to explore genes closely related to quinoa saponin metabolism. In this study, the significantly differentially expressed genes in yellow quinoa were firstly screened based on RNA-seq technology. Then, the key genes for saponin metabolism were selected by gene set enrichment analysis (GSEA) and principal component analysis (PCA) statistical methods. Finally, the specificity of the key genes was verified by hierarchical clustering. The results of differential analysis showed that 1654 differentially expressed genes were achieved after pseudogenes deletion. Therein, there were 142 long non-coding genes and 1512 protein-coding genes. Based on GSEA analysis, 116 key candidate genes were found to be significantly correlated with quinoa saponin metabolism. Through PCA dimension reduction analysis, 57 key genes were finally obtained. Hierarchical cluster analysis further demonstrated that these key genes can clearly separate the four groups of samples. The present results could provide references for the breeding of sweet quinoa and would be helpful for the rational utilization of quinoa saponins.


2020 ◽  
Author(s):  
Xue Fan ◽  
Meng Li ◽  
Min Xiao ◽  
Cong Liu ◽  
Mingguo Xu

Abstract Background: Kawasaki disease (KD) leads to coronary artery damage and the etiology of KD is unknown. The present study was designed to explore the differentially expressed genes (DEGs) in KD serum-induced human coronary artery endothelial cells (HCAECs) by RNA-sequence (RNA-seq). Methods: HCAECs were stimulated with serum (15% (v/v)), which were collected from 20 healthy children and 20 KD patients, for 24 hours. DEGs were then detected and analyzed by RNA-seq and bioinformatics analysis. Results: The expression of SMAD1, SMAD6, CD34, CXCL1, PITX2, and APLN was validated by qPCR. 102 genes, 59 up-regulated and 43 down-regulated genes, were significantly differentially expressed in KD groups. GO enrichment analysis showed that DEGs were enriched in cellular response to cytokines, cytokine-mediated signaling pathway, and regulation of immune cells migration and chemotaxis. KEGG signaling pathway analysis showed that DEGs were mainly involved in cytokine−cytokine receptor interaction, chemokine signaling pathway, and TGF−β signaling pathway. Besides, the mRNA expression levels of SMAD1, SMAD6, CD34, CXCL1, and APLN in the KD group were significantly up-regulated compared with the normal group, whilePITX2 was significantly down-regulated. Conclusion: 102 DEGs in KD serum-induced HCAECs were identified, and six new targets were proposed as potential indicators of KD.


Sign in / Sign up

Export Citation Format

Share Document