Expression Matrix
Recently Published Documents





2021 ◽  
Zi-xuan Wu ◽  
Xuyan Huang ◽  
Min-jie Cai ◽  
Peidong Huang ◽  
Zunhui Guan

Abstract Background: Major depressive disorder (MDD) is an emotional disorder that has a negative effect on patients' studies and daily lives. A great number of studies have found that miRNAs play an important role in the development of MDD and that they can be used as a biomarker for the diagnosis and treatment of MDD. However, there have been few investigations on nerve-immunity interaction therapy for MMD patients' brains.Methods: We attempted to evaluate MDD in the gene expression matrix database and miRNAs in plasma samples from healthy controls using bioinformatics methods. Four plasma miRNAs (DE-miRNAs) samples were found from MDD patients. Funrich planned the transcription factors and target genes of miRNAs, and the enrichment of TF and GO was examined. The intersecting mRNAs were discovered by comparing the various expressions of the projected target genes and 5 mRNAs (DE-mRNAs) samples. In the end, 34 DE-miRNAs, 386 DE-mRNAs, and 17 intersecting mRNAs were detected. Intersecting core genes were then investigated using GO and KEGG enrichment analysis to find the intersecting mRNA. Identify particular candidate genes and pathways in neurology and immunology that may be associated with MDD for further investigation.Results: We discovered 17 important HUB genes by the advance of a miRNA-mRNA network, and 5 HUB DE-MRNAs were derived following CytoNCA topology.Conclusion: Our findings from a comprehensive bioinformatics analysis of miRNAs and mRNAs in MDD show that DE-miRNAs like miR-338-3P and miR-206 may be excellent biomarkers and potential therapeutic targets for the treatment of MDD via nerve-immunity interaction.

PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e12233
Diem-Trang Tran ◽  
Matthew Might

Normalization of RNA-seq data has been an active area of research since the problem was first recognized a decade ago. Despite the active development of new normalizers, their performance measures have been given little attention. To evaluate normalizers, researchers have been relying on ad hoc measures, most of which are either qualitative, potentially biased, or easily confounded by parametric choices of downstream analysis. We propose a metric called condition-number based deviation, or cdev, to quantify normalization success. cdev measures how much an expression matrix differs from another. If a ground truth normalization is given, cdev can then be used to evaluate the performance of normalizers. To establish experimental ground truth, we compiled an extensive set of public RNA-seq assays with external spike-ins. This data collection, together with cdev, provides a valuable toolset for benchmarking new and existing normalization methods.

Computation ◽  
2021 ◽  
Vol 9 (10) ◽  
pp. 106
Angelica Alejandra Serrano-Rubio ◽  
Guillermo B. Morales-Luna ◽  
Amilcar Meneses-Viveros

Genetic expression analysis is a principal tool to explain the behavior of genes in an organism when exposed to different experimental conditions. In the state of art, many clustering algorithms have been proposed. It is overwhelming the amount of biological data whose high-dimensional structure exceeds mostly current computational architectures. The computational time and memory consumption optimization actually become decisive factors in choosing clustering algorithms. We propose a clustering algorithm based on Non-negative Matrix Factorization and K-means to reduce data dimensionality but whilst preserving the biological context and prioritizing gene selection, and it is implemented within parallel GPU-based environments through the CUDA library. A well-known dataset is used in our tests and the quality of the results is measured through the Rand and Accuracy Index. The results show an increase in the acceleration of 6.22× compared to the sequential version. The algorithm is competitive in the biological datasets analysis and it is invariant with respect to the classes number and the size of the gene expression matrix.

2021 ◽  
Vol 11 ◽  
Ling Zhang ◽  
Zheng-Shuai Song ◽  
Zhi-Shun Wang ◽  
Yong-Lian Guo ◽  
Chang-Geng Xu ◽  

ObjectiveTumor metabolism has always been the focus of cancer research. SLC16A1, as a key factor in catalysis of monocarboxylate transport across the plasma membrane, has been found to be associated with the occurrence and metastasis of a variety of cancers, but its prognostic significance and mechanism in different tumors are still unclear.MethodsBased on the gene expression matrix and clinical information of human cancer tissues acquired from TCGA and GTEX databases, the differential expression of SLC16A1 in different tumors and normal tissues was analyzed. To confirm the association between its expression, the mutation of MMRS gene, and the expression level of DNMTs. Univariate Cox regression was applied to analyze the association between SLC16A1 expression and patient prognosis. The effect of SLC16A1 expression on patient survival was examined by Kaplan Meier analysis. GSEA was used to identify related signaling pathways.ResultsThe expression of SLC16A1 was differentially expressed in most tumors, especially in the urinary tract where it is commonly highly expressed, and differential expression of SLC16A1 in different clinical stages. SLC16A1 expression was significantly positively correlated with MMRS gene mutation and DNMTS expression. Moreover, high SLC16A1 expression was associated with poorer overall survival (OS) and progression-free survival (PFS) in urological cancers. In particular, the results of the enrichment analysis showed that SLC16A1 was associated with processes such as cell adhesion and many signaling pathways affecting cell cycle were significantly enriched in the group with high-expressed SLC16A1.ConclusionSLC16A1 expression was upregulated in urological cancer. SLC16A1 may promote tumor development by regulating the epigenetic process of urological cancer and demonstrated a great potential as a prognostic biomarker of urological cancer patients.

2021 ◽  
Vol 12 ◽  
Chengnan Tian ◽  
Yanchen Yang ◽  
Yingjie Ke ◽  
Liang Yang ◽  
Lishan Zhong ◽  

Tricuspid regurgitation (TR) induces right ventricular cardiomyopathy, a common heart disease, and eventually leads to severe heart failure and serious clinical complications. Accumulating evidence shows that long non-coding RNAs (lncRNAs) are involved in the pathological process of a variety of cardiovascular diseases. However, the regulatory mechanisms and functional roles of RNA interactions in TR-induced right ventricular cardiomyopathy are still unclear. Accordingly, we performed integrative analyses of genes associated with right ventricular cardiomyopathy induced by TR to study the roles of lncRNAs in the pathogenesis of this disease. In this study, we used high-throughput sequencing data of tissue samples from nine clinical cases of right ventricular myocardial cardiomyopathy induced by TR and nine controls with normal right ventricular myocardium from the Genotype-Tissue Expression database. We identified differentially expressed lncRNAs and constructed a protein-protein interaction and lncRNA-messenger RNA (mRNA) co-expression network. Furthermore, we determined hub lncRNA-mRNA modules related to right ventricular myocardial disease induced by TR and constructed a competitive endogenous RNA network for TR-induced right ventricular myocardial disease by integrating the interaction of lncRNA-miRNA-mRNA. In addition, we analyzed the immune infiltration using integrated data and the correlation of each immune-related gene with key genes of the integrated expression matrix. The present study identified 648 differentially expressed mRNAs, 201 differentially expressed miRNAs, and 163 differentially expressed lncRNAs. Protein-protein interaction network analysis confirmed that ADRA1A, AVPR1B, OPN4, IL-1B, IL-1A, CXCL4, ADCY2, CXCL12, GNB4, CCL20, CXCL8, and CXCL1 were hub genes. CTD-2314B22.3, hsa-miR-653-5p, and KIF17ceRNA; SRGAP3-AS2, hsa-miR-539-5p, and SHANK1; CERS6-AS1, hsa-miR-497-5p, and OPN4; INTS6-AS1, hsa-miR-4262, and NEURL1B; TTN-AS1, hsa-miR-376b-3p, and TRPM5; and DLX6-AS1, hsa-miR-346, and BIRC7 axes were obtained by constructing the ceRNA networks. Through the immune infiltration analysis, we found that the proportion of CD4 and CD8 T cells was about 20%, and the proportion of fibroblasts and endothelial cells was high. Our findings provide some insights into the mechanisms of RNA interaction in TR-induced right ventricular cardiomyopathy and suggest that lncRNAs are a potential therapeutic target for treating right ventricular myocardial disease induced by TR.

2021 ◽  
Jing Zhao ◽  
Zhaoqian Liu ◽  
Bingqiang Liu ◽  
Qi Wang ◽  
Dongjun Chung ◽  

Unveiling disease-associated microbial biomarkers (e.g., key species, genes, and pathways) is an efficient strategy for the diagnosis and therapy of diseases. However, the heterogeneity and large size of microbial data bring tremendous challenges for fundamental characteristics discovery. We present IDAM, a novel method for disease- associated biomarker identification from metagenomic and metatranscriptomic data, without requiring prior metadata. It integrates gene context conservation and regulatory mechanism through a mathematical model for maximizing the number of connected components between local-low rank submatrices of a gene expression matrix and known uber-operon structures. We applied IDAM to 813 inflammatory bowel disease-associated datasets and showed IDAM outperformed existing methods in microbial biomarker identification. In addition, the identified biomarkers successfully distinguished disease subtypes and showcased their power in discovering novel disease subtypes/states. IDAM is freely available at

2021 ◽  
Vol 12 (1) ◽  
Kai Xing ◽  
Huatao Liu ◽  
Fengxia Zhang ◽  
Yibing Liu ◽  
Yong Shi ◽  

Abstract Background Fat deposition is an important economic consideration in pig production. The amount of fat deposition in pigs seriously affects production efficiency, quality, and reproductive performance, while also affecting consumers’ choice of pork. Weighted gene co-expression network analysis (WGCNA) is effective in pig genetic studies. Therefore, this study aimed to identify modules that co-express genes associated with fat deposition in pigs (Songliao black and Landrace breeds) with extreme levels of backfat (high and low) and to identify the core genes in each of these modules. Results We used RNA sequences generated in different pig tissues to construct a gene expression matrix consisting of 12,862 genes from 36 samples. Eleven co-expression modules were identified using WGCNA and the number of genes in these modules ranged from 39 to 3,363. Four co-expression modules were significantly correlated with backfat thickness. A total of 16 genes (RAD9A, IGF2R, SCAP, TCAP, SMYD1, PFKM, DGAT1, GPS2, IGF1, MAPK8, FABP, FABP5, LEPR, UCP3, APOF, and FASN) were associated with fat deposition. Conclusions RAD9A, TCAP, SMYD1, PFKM, GPS2, and APOF were the key genes in the four modules based on the degree of gene connectivity. Combining these results with those from differential gene analysis, SMYD1 and PFKM were proposed as strong candidate genes for body size traits. This study explored the key genes that regulate porcine fat deposition and lays the foundation for further research into the molecular regulatory mechanisms underlying porcine fat deposition.

2021 ◽  
Benjamin R Babcock ◽  
Astrid Kosters ◽  
Junkai Yang ◽  
Mackenzie L White ◽  
Eliver Ghosn

Single-cell RNA sequencing (scRNA-seq) can reveal accurate and sensitive RNA abundance in a single sample, but robust integration of multiple samples remains challenging. Large-scale scRNA-seq data generated by different workflows or laboratories can contain batch-specific systemic variation. Such variation challenges data integration by confounding sample-specific biology with undesirable batch-specific systemic effects. Therefore, there is a need for guidance in selecting computational and experimental approaches to minimize batch-specific impacts on data interpretation and a need to empirically evaluate the sources of systemic variation in a given dataset. To uncover the contributions of experimental variables to systemic variation, we intentionally perturb four potential sources of batch-effect in five human peripheral blood samples. We investigate sequencing replicate, sequencing depth, sample replicate, and the effects of pooling libraries for concurrent sequencing. To quantify the downstream effects of these variables on data interpretation, we introduced a new scoring metric, the Cell Misclassification Statistic (CMS), which identifies losses to cell type fidelity that occur when merging datasets of different batches. CMS reveals an undesirable overcorrection by popular batch-effect correction and data integration methods. We show that optimizing gene expression matrix normalization and merging can reduce the need for batch-effect correction and minimize the risk of overcorrecting true biological differences between samples.

2021 ◽  
Vol 12 ◽  
Jing Li ◽  
Urminder Singh ◽  
Zebulun Arendsee ◽  
Eve Syrkin Wurtele

The “dark transcriptome” can be considered the multitude of sequences that are transcribed but not annotated as genes. We evaluated expression of 6,692 annotated genes and 29,354 unannotated open reading frames (ORFs) in the Saccharomyces cerevisiae genome across diverse environmental, genetic and developmental conditions (3,457 RNA-Seq samples). Over 30% of the highly transcribed ORFs have translation evidence. Phylostratigraphic analysis infers most of these transcribed ORFs would encode species-specific proteins (“orphan-ORFs”); hundreds have mean expression comparable to annotated genes. These data reveal unannotated ORFs most likely to be protein-coding genes. We partitioned a co-expression matrix by Markov Chain Clustering; the resultant clusters contain 2,468 orphan-ORFs. We provide the aggregated RNA-Seq yeast data with extensive metadata as a project in MetaOmGraph (MOG), a tool designed for interactive analysis and visualization. This approach enables reuse of public RNA-Seq data for exploratory discovery, providing a rich context for experimentalists to make novel, experimentally testable hypotheses about candidate genes.

2021 ◽  
Boxuan Liu ◽  
Yun Zhao ◽  
Shuanying Yang

Abstract Background: Lung adenocarcinoma is the most occurred pathological type among non-small cell lung cancer. Although huge progress has been made in terms of early diagnosis, precision treatment in recent years, the overall 5-year survival rate of a patient remains low. In our study, we try to construct an autophagy-related lncRNA prognostic signature that may guide clinical practice.Methods: The mRNA and lncRNA expression matrix of lung adenocarcinoma patients were retrieved from TCGA database. Next, we constructed a co-expression network of lncRNAs and autophagy-related genes. Lasso regression and multivariate Cox regression were then applied to establish a prognostic risk model. Subsequently, a risk score was generated to differentiate high and low risk group and a ROC curve and Nomogram to visualize the predictive ability of current signature. Finally, gene ontology and pathway enrichment analysis were executed via GSEA.Results: A total of 1,703 autophagy-related lncRNAs were screened and five autophagy-related lncRNAs (LINC01137, AL691432.2, LINC01116, AL606489.1 and HLA-DQB1-AS1) were finally included in our signature. Judging from univariate(HR=1.075, 95% CI: 1.046–1.104) and multivariate(HR =1.088, 95%CI = 1.057 − 1.120) Cox regression analysis, the risk score is an independent factor for LUAD patients. Further, the AUC value based on the risk score for 1-year, 3-year, 5-year, was 0.735, 0.672 and 0.662 respectively. Finally, the lncRNAs included in our signature were primarily enriched in autophagy process, metabolism, p53 pathway and JAK/STAT pathway. Conclusions: Overall, our study indicated that the prognostic model we generated had certain predictability for LUAD patients’ prognosis.

Sign in / Sign up

Export Citation Format

Share Document