Mining sponge phenomena in RNA expression data

Author(s):  
Fabrizio Angiulli ◽  
Teresa Colombo ◽  
Fabio Fassetti ◽  
Angelo Furfaro ◽  
Paola Paci

In the last few years, the interactions among competing endogenous RNAs (ceRNAs) have been recognized as a key post-transcriptional regulatory mechanism in cell differentiation, tissue development, and disease. Notably, such sponge phenomena substracting active microRNAs from their silencing targets have been recognized as having a potential oncosuppressive, or oncogenic, role in several cancer types. Hence, the ability to predict sponges from the analysis of large expression data sets (e.g. from international cancer projects) has become an important data mining task in bioinformatics. We present a technique designed to mine sponge phenomena whose presence or absence may discriminate between healthy and unhealthy populations of samples in tumoral or normal expression data sets, thus providing lists of candidates potentially relevant in the pathology. With this aim, we search for pairs of elements acting as ceRNA for a given miRNA, namely, we aim at discovering miRNA-RNA pairs involved in phenomena which are clearly present in one population and almost absent in the other one. The results on tumoral expression data, concerning five different cancer types, confirmed the effectiveness of the approach in mining interesting knowledge. Indeed, 32 out of 33 miRNAs and 22 out of 25 protein-coding genes identified as top scoring in our analysis are corroborated by having been similarly associated with cancer processes in independent studies. In fact, the subset of miRNAs selected by the sponge analysis results in a significant enrichment of annotation for the KEGG32 pathway “microRNAs in cancer” when tested with the commonly used bioinformatic resource DAVID. Moreover, often the cancer datasets where our sponge analysis identified a miRNA as top scoring match the one reported already in the pertaining literature.

2019 ◽  
Vol 18 (4) ◽  
pp. 255-266 ◽  
Author(s):  
Baohong Liu ◽  
Yu Shyr ◽  
Jianping Cai ◽  
Qi Liu

Abstract MicroRNAs (miRNAs) are small endogenous non-coding functional RNAs that post-transcriptionally regulate gene expression. They play essential roles in nearly all biological processes including cell development and differentiation, DNA damage repair, cell death as well as intercellular communication. They are highly involved in cancer, acting as tumor suppressors and/or promoters to modulate cell proliferation, epithelial-mesenchymal transition and tumor invasion and metastasis. Recent studies have shown that more than half of miRNAs are located within protein-coding or non-coding genes. Intragenic miRNAs and their host genes either share the promoter or have independent transcription. Meanwhile, miRNAs work as partners or antagonists of their host genes by fine-tuning their target genes functionally associated with host genes. This review outlined the complicated relationship between intragenic miRNAs and host genes. Focusing on miRNAs known as oncogenes or tumor suppressors in specific cancer types, it studied co-expression relationships between these miRNAs and host genes in the cancer types using TCGA data sets, which validated previous findings and revealed common, tumor-specific and even subtype-specific patterns. These observations will help understand the function of intragenic miRNAs and further develop miRNA therapeutics in cancer.


Blood ◽  
2015 ◽  
Vol 126 (23) ◽  
pp. 2663-2663
Author(s):  
Matthew A Care ◽  
Stephen M Thirdborough ◽  
Andrew J Davies ◽  
Peter W.M. Johnson ◽  
Andrew Jack ◽  
...  

Abstract Purpose To assess whether comparative gene network analysis can reveal characteristic immune response signatures that predict clinical response in Diffuse large B-cell lymphoma (DLBCL). Background The wealth of available gene expression data sets for DLBCL and other cancer types provides a resource to define recurrent pathological processes at the level of gene expression and gene correlation neighbourhoods. This is of particular relevance in the context of cancer immune responses, where convergence onto common patterns may drive shared gene expression profiles. Where existing and novel immunotherapies harness the immune response for therapeutic benefit such responses may provide predictive biomarkers. Methods We independently analysed publically available DLBCL gene expression data sets and a wide compendium of gene expression data from diverse cancer types, and then asked whether common elements of cancer host response could be identified from resulting networks. Using 10 DLBCL gene expression data sets, encompassing 2030 cases, we established pairwise gene correlation matrices per data set, which were merged to generate median correlations of gene pairs across all data sets. Gene network analysis and unsupervised clustering was then applied to define global representations of DLBCL gene expression neighbourhoods. In parallel a diverse range of solid and lymphoid malignancies including; breast, colorectal, oesophageal, head and neck, non-small cell lung, prostate, pancreatic cancer, Hodgkin lymphoma, Follicular lymphoma and DLBCL were independently analysed using an orthogonal weighted gene correlation network analysis of gene expression data sets from which correlated modules across diverse cancer types were identified. The biology of resulting gene neighbourhoods was assessed by signature and ontology enrichment, and the overlap between gene correlation neighbourhoods and WGCNA derived modules associated with immune/host responses was analysed. Results Amongst DLBCL data, we identified distinct gene correlation neighbourhoods associated with the immune response. These included both elements of IFN-polarised responses, core T-cell, and cytotoxic signatures as well as distinct macrophage responses. Neighbourhoods linked to macrophages separated CD163 from CD68 and CD14. In the WGCNA analysis of diverse cancer types clusters corresponding to these immune response neighbourhoods were independently identified including a highly similar cluster related to CD163. The overlapping CD163 clusters in both analyses linked to diverse Fc-Receptors, complement pathway components and patterns of scavenger receptors potentially linked to alternative macrophage activation. The relationship between the CD163 macrophage gene expression cluster and outcome was tested in DLBCL data sets, identifying a poor response in CD163 -cluster high patients, which reached statistical significance in one data set (GSE10846). Notably, the effect of the CD163-associated gene neighbourhood which correlates with poor outcome post rituximab containing immunochemotherapy is distinct from the effect of IFNG-STAT1-IRF1 polarised cytotoxic responses. The latter represents the predominant immune response pattern separating cell of origin unclassifiable (Type-III) DLBCL from either ABC or GCB DLBCL subsets, and is associated with a trend toward positive outcome. Conclusion Comparative gene expression network analysis identifies common immune response signatures shared between DLBCL and other cancer types. Gene expression clusters linked to CD163 macrophage responses and IFNG-STAT1-IRF1 polarised cytotoxic responses are common patterns with apparent divergent outcome association. Disclosures Davies: CTI: Honoraria; GIlead: Consultancy, Honoraria, Research Funding; Mundipharma: Honoraria, Research Funding; Bayer: Research Funding; Takeda: Honoraria, Research Funding; Janssen: Honoraria, Research Funding; Roche: Honoraria, Research Funding; GSK: Research Funding; Pfizer: Honoraria; Celgene: Honoraria, Research Funding. Jack:Jannsen: Research Funding.


2021 ◽  
Vol 12 ◽  
Author(s):  
Pingping Ren ◽  
Luying Lu ◽  
Shasha Cai ◽  
Jianghua Chen ◽  
Weiqiang Lin ◽  
...  

Alternative splicing (AS) is a complex coordinated transcriptional regulatory mechanism. It affects nearly 95% of all protein-coding genes and occurs in nearly all human organs. Aberrant alternative splicing can lead to various neurological diseases and cancers and is responsible for aging, infection, inflammation, immune and metabolic disorders, and so on. Though aberrant alternative splicing events and their regulatory mechanisms are widely recognized, the association between autoimmune disease and alternative splicing has not been extensively examined. Autoimmune diseases are characterized by the loss of tolerance of the immune system towards self-antigens and organ-specific or systemic inflammation and subsequent tissue damage. In the present review, we summarized the most recent reports on splicing events that occur in the immunopathogenesis of systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA) and attempted to clarify the role that splicing events play in regulating autoimmune disease progression. We also identified the changes that occur in splicing factor expression. The foregoing information might improve our understanding of autoimmune diseases and help develop new diagnostic and therapeutic tools for them.


2020 ◽  
pp. 1-17
Author(s):  
Francisco Javier Balea-Fernandez ◽  
Beatriz Martinez-Vega ◽  
Samuel Ortega ◽  
Himar Fabelo ◽  
Raquel Leon ◽  
...  

Background: Sociodemographic data indicate the progressive increase in life expectancy and the prevalence of Alzheimer’s disease (AD). AD is raised as one of the greatest public health problems. Its etiology is twofold: on the one hand, non-modifiable factors and on the other, modifiable. Objective: This study aims to develop a processing framework based on machine learning (ML) and optimization algorithms to study sociodemographic, clinical, and analytical variables, selecting the best combination among them for an accurate discrimination between controls and subjects with major neurocognitive disorder (MNCD). Methods: This research is based on an observational-analytical design. Two research groups were established: MNCD group (n = 46) and control group (n = 38). ML and optimization algorithms were employed to automatically diagnose MNCD. Results: Twelve out of 37 variables were identified in the validation set as the most relevant for MNCD diagnosis. Sensitivity of 100%and specificity of 71%were achieved using a Random Forest classifier. Conclusion: ML is a potential tool for automatic prediction of MNCD which can be applied to relatively small preclinical and clinical data sets. These results can be interpreted to support the influence of the environment on the development of AD.


2019 ◽  
Vol 20 (13) ◽  
pp. 3315 ◽  
Author(s):  
Simona Cantarella ◽  
Davide Carnevali ◽  
Marco Morselli ◽  
Anastasia Conti ◽  
Matteo Pellegrini ◽  
...  

Alu retroelements, whose retrotransposition requires prior transcription by RNA polymerase III to generate Alu RNAs, represent the most numerous non-coding RNA (ncRNA) gene family in the human genome. Alu transcription is generally kept to extremely low levels by tight epigenetic silencing, but it has been reported to increase under different types of cell perturbation, such as viral infection and cancer. Alu RNAs, being able to act as gene expression modulators, may be directly involved in the mechanisms determining cellular behavior in such perturbed states. To directly address the regulatory potential of Alu RNAs, we generated IMR90 fibroblasts and HeLa cell lines stably overexpressing two slightly different Alu RNAs, and analyzed genome-wide the expression changes of protein-coding genes through RNA-sequencing. Among the genes that were upregulated or downregulated in response to Alu overexpression in IMR90, but not in HeLa cells, we found a highly significant enrichment of pathways involved in cell cycle progression and mitotic entry. Accordingly, Alu overexpression was found to promote transition from G1 to S phase, as revealed by flow cytometry. Therefore, increased Alu RNA may contribute to sustained cell proliferation, which is an important factor of cancer development and progression.


Genetics ◽  
2000 ◽  
Vol 155 (1) ◽  
pp. 431-449 ◽  
Author(s):  
Ziheng Yang ◽  
Rasmus Nielsen ◽  
Nick Goldman ◽  
Anne-Mette Krabbe Pedersen

AbstractComparison of relative fixation rates of synonymous (silent) and nonsynonymous (amino acid-altering) mutations provides a means for understanding the mechanisms of molecular sequence evolution. The nonsynonymous/synonymous rate ratio (ω = dN/dS) is an important indicator of selective pressure at the protein level, with ω = 1 meaning neutral mutations, ω < 1 purifying selection, and ω > 1 diversifying positive selection. Amino acid sites in a protein are expected to be under different selective pressures and have different underlying ω ratios. We develop models that account for heterogeneous ω ratios among amino acid sites and apply them to phylogenetic analyses of protein-coding DNA sequences. These models are useful for testing for adaptive molecular evolution and identifying amino acid sites under diversifying selection. Ten data sets of genes from nuclear, mitochondrial, and viral genomes are analyzed to estimate the distributions of ω among sites. In all data sets analyzed, the selective pressure indicated by the ω ratio is found to be highly heterogeneous among sites. Previously unsuspected Darwinian selection is detected in several genes in which the average ω ratio across sites is <1, but in which some sites are clearly under diversifying selection with ω > 1. Genes undergoing positive selection include the β-globin gene from vertebrates, mitochondrial protein-coding genes from hominoids, the hemagglutinin (HA) gene from human influenza virus A, and HIV-1 env, vif, and pol genes. Tests for the presence of positively selected sites and their subsequent identification appear quite robust to the specific distributional form assumed for ω and can be achieved using any of several models we implement. However, we encountered difficulties in estimating the precise distribution of ω among sites from real data sets.


2021 ◽  
Vol 22 (6) ◽  
pp. 3151 ◽  
Author(s):  
Roberto Piergentili ◽  
Simona Zaami ◽  
Anna Franca Cavaliere ◽  
Fabrizio Signore ◽  
Giovanni Scambia ◽  
...  

Endometrial cancer (EC) has been classified over the years, for prognostic and therapeutic purposes. In recent years, classification systems have been emerging not only based on EC clinical and pathological characteristics but also on its genetic and epigenetic features. Noncoding RNAs (ncRNAs) are emerging as promising markers in several cancer types, including EC, for which their prognostic value is currently under investigation and will likely integrate the present prognostic tools based on protein coding genes. This review aims to underline the importance of the genetic and epigenetic events in the EC tumorigenesis, by expounding upon the prognostic role of ncRNAs.


2021 ◽  
Vol 48 (4) ◽  
pp. 307-328
Author(s):  
Dominic Farace ◽  
Hélène Prost ◽  
Antonella Zane ◽  
Birger Hjørland ◽  
◽  
...  

This article presents and discusses different kinds of data documents, including data sets, data studies, data papers and data journals. It provides descriptive and bibliometric data on different kinds of data documents and discusses the theoretical and philosophical problems by classifying documents according to the DIKW model (data documents, information documents, knowl­edge documents and wisdom documents). Data documents are, on the one hand, an established category today, even with its own data citation index (DCI). On the other hand, data documents have blurred boundaries in relation to other kinds of documents and seem sometimes to be understood from the problematic philosophical assumption that a datum can be understood as “a single, fixed truth, valid for everyone, everywhere, at all times”


1996 ◽  
Vol 118 (4) ◽  
pp. 284-291 ◽  
Author(s):  
C. Guedes Soares ◽  
A. C. Henriques

This work examines some aspects involved in the estimation of the parameters of the probability distribution of significant wave height, in particular the homogeneity of the data sets and the statistical methods of fitting a distribution to data. More homogeneous data sets are organized by collecting the data on a monthly basis and by separating the simple sea states from the combined ones. A three-parameter Weibull distribution is fitted to the data. The parameters of the fitted distribution are estimated by the methods of maximum likelihood, of regression, and of the moments. The uncertainty involved in estimating the probability distribution with the three methods is compared with the one that results from using more homogeneous data sets, and it is concluded that the uncertainty involved in the fitting procedure can be more significant unless the method of moments is not considered.


Sign in / Sign up

Export Citation Format

Share Document