Pan-cancer Feature Selection and Classification Reveals Important Long Non-coding RNAs

Author(s):  
Abdullah Al Mamun ◽  
Wenrui Duan ◽  
Ananda Mohan Mondal
Author(s):  
Hongying Zhao ◽  
Jian Shi ◽  
Yunpeng Zhang ◽  
Aimin Xie ◽  
Lei Yu ◽  
...  

Abstract Long non-coding RNAs (lncRNAs) are associated with human diseases. Although lncRNA–disease associations have received significant attention, no online repository is available to collect lncRNA-mediated regulatory mechanisms, key downstream targets, and important biological functions driven by disease-related lncRNAs in human diseases. We thus developed LncTarD (http://biocc.hrbmu.edu.cn/LncTarD/ or http://bio-bigdata.hrbmu.edu.cn/LncTarD), a manually-curated database that provides a comprehensive resource of key lncRNA–target regulations, lncRNA-influenced functions, and lncRNA-mediated regulatory mechanisms in human diseases. LncTarD offers (i) 2822 key lncRNA–target regulations involving 475 lncRNAs and 1039 targets associated with 177 human diseases; (ii) 1613 experimentally-supported functional regulations and 1209 expression associations in human diseases; (iii) important biological functions driven by disease-related lncRNAs in human diseases; (iv) lncRNA–target regulations responsible for drug resistance or sensitivity in human diseases and (v) lncRNA microarray, lncRNA sequence data and transcriptome data of an 11 373 pan-cancer patient cohort from TCGA to help characterize the functional dynamics of these lncRNA–target regulations. LncTarD also provides a user-friendly interface to conveniently browse, search, and download data. LncTarD will be a useful resource platform for the further understanding of functions and molecular mechanisms of lncRNA deregulation in human disease, which will help to identify novel and sensitive biomarkers and therapeutic targets.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e8797 ◽  
Author(s):  
Matthew Ung ◽  
Evelien Schaafsma ◽  
Daniel Mattox ◽  
George L. Wang ◽  
Chao Cheng

Background The “dark matter” of the genome harbors several non-coding RNA species including Long non-coding RNAs (lncRNAs), which have been implicated in neoplasia but remain understudied. RNA-seq has provided deep insights into the nature of lncRNAs in cancer but current RNA-seq data are rarely accompanied by longitudinal patient survival information. In contrast, a plethora of microarray studies have collected these clinical metadata that can be leveraged to identify novel associations between gene expression and clinical phenotypes. Methods In this study, we developed an analysis framework that computationally integrates RNA-seq and microarray data to systematically screen 9,463 lncRNAs for association with mortality risk across 20 cancer types. Results In total, we identified a comprehensive list of associations between lncRNAs and patient survival and demonstrate that these prognostic lncRNAs are under selective pressure and may be functional. Our results provide valuable insights that facilitate further exploration of lncRNAs and their potential as cancer biomarkers and drug targets.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e6388 ◽  
Author(s):  
Asanigari Saleembhasha ◽  
Seema Mishra

Despite years of research, we are still unraveling crucial stages of gene expression regulation in cancer. On the basis of major biological hallmarks, we hypothesized that there must be a uniform gene expression pattern and regulation across cancer types. Among non-coding genes, long non-coding RNAs (lncRNAs) are emerging as key gene regulators playing powerful roles in cancer. Using TCGA RNAseq data, we analyzed coding (mRNA) and non-coding (lncRNA) gene expression across 15 and 9 common cancer types, respectively. 70 significantly differentially expressed genes common to all 15 cancer types were enlisted. Correlating with protein expression levels from Human Protein Atlas, we observed 34 positively correlated gene sets which are enriched in gene expression, transcription from RNA Pol-II, regulation of transcription and mitotic cell cycle biological processes. Further, 24 lncRNAs were among common significantly differentially expressed non-coding genes. Using guilt-by-association method, we predicted lncRNAs to be involved in same biological processes. Combining RNA-RNA interaction prediction and transcription regulatory networks, we identified E2F1, FOXM1 and PVT1 regulatory path as recurring pan-cancer regulatory entity. PVT1 is predicted to interact with SYNE1 at 3′-UTR; DNAJC9, RNPS1 at 5′-UTR and ATXN2L, ALAD, FOXM1 and IRAK1 at CDS sites. The key findings are that through E2F1, FOXM1 and PVT1 regulatory axis and possible interactions with different coding genes, PVT1 may be playing a prominent role in pan-cancer development and progression.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 181683-181697
Author(s):  
Robson P. Bonidia ◽  
Jaqueline Sayuri Machida ◽  
Tatianne C. Negri ◽  
Wonder A. L. Alves ◽  
Andre Y. Kashiwabara ◽  
...  

EBioMedicine ◽  
2016 ◽  
Vol 7 ◽  
pp. 62-72 ◽  
Author(s):  
Travers Ching ◽  
Karolina Peplowska ◽  
Sijia Huang ◽  
Xun Zhu ◽  
Yi Shen ◽  
...  

2018 ◽  
Author(s):  
Kevin Walters ◽  
Radmir Sarsenov ◽  
Wen Siong Too ◽  
Roseanna K. Hare ◽  
Ian C. Paterson ◽  
...  

AbstractLong non-coding RNAs (lncRNAs) are emerging as crucial regulators of cellular processes in diseases such as cancer, although the functions of most remain poorly understood. To address this, here we apply a novel strategy to integrate gene expression profiles across 32 cancer types, and cluster human lncRNAs based on their pan-cancer protein-coding gene associations. By doing so, we derive 16 lncRNA modules whose unique properties allow simultaneous inference of function, disease specificity and regulation for over 800 lncRNAs. Remarkably, modules could be grouped into just four functional themes: transcription regulation, immunological, extracellular, and neurological, with module generation frequently driven by lncRNA tissue specificity. Notably, three modules associated with the extracellular matrix represented potential networks of lncRNAs regulating key events in tumour progression. These included a tumour-specific signature of 33 lncRNAs that may play a role in inducing epithelialmesenchymal transition through modulation of TGFβ signalling, and two stromal-specific modules comprising 26 lncRNAs linked to a tumour suppressive microenvironment, and 12 lncRNAs related to cancer-associated fibroblasts. At least one member of the 12-lncRNA signature was experimentally supported by siRNA knockdown, which resulted in attenuated differentiation of quiescent fibroblasts to a cancer-associated phenotype. Overall, the study provides a unique pan-cancer perspective on the lncRNA functional landscape, acting as a global source of novel hypotheses on lncRNA contribution to tumour progression.Author SummaryThe established view of protein production is that genomic DNA is transcribed into RNA, which is then translated into protein. Proteins play a critical role in shaping the function of each individual cell in the human body yet they represent less than 2% of human genomic sequence whilst up to 90% of the genome is transcribed. To explain this disparity, the existence of thousands of long non-coding RNAs (lncRNAs) has emerged that do not encode proteins but perform function as an RNA molecule. Most lncRNAs have yet to be assigned a specific biological role, so to address this we apply a novel computational approach to characterise the function of >800 lncRNAs through consistent association with protein coding genes across multiple cancer types. By doing so, we discover 16 “modules” of closely related lncRNAs that share broad functional themes, the most compelling of which consists of 12 lncRNAs that could regulate activation of specific cells neighbouring the tumour, leading to accelerated tumour progression and invasion. Overall, the study provides the most robust view of the lncRNA-protein coding gene landscape to date, adding to growing evidence that lncRNAs are key regulators of cancer, and have therapeutic potential comparable to proteins.


2019 ◽  
Vol 35 (21) ◽  
pp. 4344-4349 ◽  
Author(s):  
Yuwei Zhang ◽  
Yang Tao ◽  
Huihui Ji ◽  
Wei Li ◽  
Xingli Guo ◽  
...  

Abstract Motivation Genome-scale CRISPR/Cas9 system has been a democratized gene editing technique and widely used to investigate gene functions in some biological processes and diseases especially cancers. Aiming to characterize gene aberrations and assess their effects on cancer, we designed a pipeline to identify the essential genes for pan-cancer. Methods CRISPR screening data were used to identify the essential genes that were collected from published data and integrated by Robust Rank Aggregation algorithm. Then, hypergeometrics test and random walks with restart (RWR) were used to predict additional essential genes on broader scale. Finally, the expression status and potential roles of these genes were explored based on TCGA portal and regulatory network analysis. Results We collected 926 samples from 10 CRISPR-based screening studies involving 33 different types of cancer to identify cancer-essential genes, which consists of 799 protein-coding genes (PCGs) and 97 long non-coding RNAs (lncRNAs). Then, we constructed a ‘bi-colored’ network with both PCGs and lncRNAs and applied it to predict additional essential genes including 495 PCGs and 280 lncRNAs on a broader scale using hypergeometrics test and RWR. After obtaining all essential genes, we further investigated their potential roles in cancer and found that essential genes have higher and more stable expression levels, and are associated with multiple cancer-associated biological processes and survival time. The regulatory network analysis detected two intriguing modules of essential genes participating in the regulation of cell cycle and ribosome biogenesis in cancer. Availability and implementation   Supplementary information Supplementary data are available at Bioinformatics online.


Biomedicines ◽  
2021 ◽  
Vol 9 (9) ◽  
pp. 1263
Author(s):  
Baohong Liu ◽  
Yu Shyr ◽  
Qi Liu

MicroRNAs (miRNAs) are small endogenous non-coding RNAs that play important roles in regulating gene expression. Most miRNAs are located within or close to genes (host). miRNAs and their host genes have either coordinated or independent transcription. We performed a comprehensive investigation on co-transcriptional patterns of miRNAs and host genes based on 4707 patients across 21 cancer types. We found that only 11.6% of miRNA-host pairs were co-transcribed consistently and strongly across cancer types. Most miRNA-host pairs showed a strong coexpression only in some specific cancer types, demonstrating a high heterogenous pattern. For two particular types of intergenic miRNAs, readthrough and divergent miRNAs, readthrough miRNAs showed higher coexpression with their host genes than divergent ones. miRNAs located within non-coding genes had tighter co-transcription with their hosts than those located within protein-coding genes, especially exonic and junction miRNAs. A few precursor miRNAs changed their dominate form between 5′ and 3′ strands in different cancer types, including miR-486, miR-99b, let-7e, miR-125a, let-7g, miR-339, miR-26a, miR-16, and miR-218, whereas only two miRNAs with multiple host genes switched their co-transcriptional partner in different cancer types (miR-219a-1 with SLC39A7/HSD17B8 and miR-3615 with RAB37/SLC9A3R1). miRNAs generated from distinct precursors (such as miR-125b from miR-125b-1 or miR-125b-2) were more likely to have cancer-dependent main contributors. miRNAs and hosts were less co-expressed in KIRC than other cancer types, possibly due to its frequent VHL mutations. Our findings shed new light on miRNA biogenesis and cancer diagnosis and treatments.


Author(s):  
ShiJian Ding ◽  
Hao Li ◽  
Yu-Hang Zhang ◽  
XianChao Zhou ◽  
KaiYan Feng ◽  
...  

There are many types of cancers. Although they share some hallmarks, such as proliferation and metastasis, they are still very different from many perspectives. They grow on different organ or tissues. Does each cancer have a unique gene expression pattern that makes it different from other cancer types? After the Cancer Genome Atlas (TCGA) project, there are more and more pan-cancer studies. Researchers want to get robust gene expression signature from pan-cancer patients. But there is large variance in cancer patients due to heterogeneity. To get robust results, the sample size will be too large to recruit. In this study, we tried another approach to get robust pan-cancer biomarkers by using the cell line data to reduce the variance. We applied several advanced computational methods to analyze the Cancer Cell Line Encyclopedia (CCLE) gene expression profiles which included 988 cell lines from 20 cancer types. Two feature selection methods, including Boruta, and max-relevance and min-redundancy methods, were applied to the cell line gene expression data one by one, generating a feature list. Such list was fed into incremental feature selection method, incorporating one classification algorithm, to extract biomarkers, construct optimal classifiers and decision rules. The optimal classifiers provided good performance, which can be useful tools to identify cell lines from different cancer types, whereas the biomarkers (e.g. NCKAP1, TNFRSF12A, LAMB2, FKBP9, PFN2, TOM1L1) and rules identified in this work may provide a meaningful and precise reference for differentiating multiple types of cancer and contribute to the personalized treatment of tumors.


Sign in / Sign up

Export Citation Format

Share Document