scholarly journals Lowest expressing microRNAs capture indispensable information: identifying cancer types

2016 ◽  
Author(s):  
Roni Rasnic ◽  
Nathan Linial ◽  
Michal Linial

ABSTRACTThe primary function of microRNAs (miRNAs) is to maintain cell homeostasis. In cancerous tissues miRNAs’ expression undergo drastic alterations. In this study, we used miRNA expression profiles from The Cancer Genome Atlas (TCGA) of 24 cancer types and 3 healthy tissues, collected from >8500 samples. We seek to classify the cancer’s origin and tissue identification using the expression from 1046 reported miRNAs. Despite an apparent uniform appearance of miRNAs among cancerous samples, we recover indispensable information from lowly expressed miRNAs regarding the cancer/tissue types. Multiclass support vector machine classification yields an average recall of 58% in identifying the correct tissue and tumor types. Data discretization has led to substantial improvement reaching an average recall of 91% (95% median). We propose a straightforward protocol as a crucial step in classifying tumors of unknown primary origin. Our counter-intuitive conclusion is that in almost all cancer types, highly expressing miRNAs mask the significant signal that lower expressed miRNAs provide.

2021 ◽  
Vol 11 ◽  
Author(s):  
Yi Zhang ◽  
Lei Xia ◽  
Dawei Ma ◽  
Jing Wu ◽  
Xinyu Xu ◽  
...  

Cancer of unknown primary (CUP), in which metastatic diseases exist without an identifiable primary location, accounts for about 3–5% of all cancer diagnoses. Successful diagnosis and treatment of such patients are difficult. This study aimed to assess the expression characteristics of 90 genes as a method of identifying the primary site from CUP samples. We validated a 90-gene expression assay and explored its potential diagnostic utility in 44 patients at Jiangsu Cancer Hospital. For each specimen, the expression of 90 tumor-specific genes in malignant tumors was analyzed, and similarity scores were obtained. The types of malignant tumors predicted were compared with the reference diagnosis to calculate the accuracy. In addition, we verified the consistency of the expression profiles of the 90 genes in CUP secondary malignancies and metastatic malignancies in The Cancer Genome Atlas. We also reported a detailed description of the next-generation coding sequences for CUP patients. For each clinical medical specimen collected, the type of malignant tumor predicted and analyzed by the 90-gene expression assay was compared with its reference diagnosis, and the overall accuracy was 95.4%. In addition, the 90-gene expression profile generally accurately classified CUP into the cluster of its primary tumor. Sequencing of the exome transcriptome containing 556 high-frequency gene mutation oncogenes was not significantly related to the 90 genes analysis. Our results demonstrate that the expression characteristics of these 90 genes can be used as a powerful tool to accurately identify the primary sites of CUP. In the future, the inclusion of the 90-gene expression assay in pathological diagnosis will help oncologists use precise treatments, thereby improving the care and outcomes of CUP patients.


2020 ◽  
Vol 14 ◽  
pp. 117955492096626
Author(s):  
Yun Liu ◽  
Fu Liu ◽  
Xintong Hu ◽  
Jiaxue He ◽  
Yanfang Jiang

Motivation: Although several prognostic signatures for lung adenocarcinoma (LUAD) have been developed, they are mainly based on a single-omics data set. This article aims to develop a novel set of prognostic signatures by combining genetic mutation and expression profiles of LUAD patients. Methods: The genetic mutation and expression profiles, together with the clinical profiles of a cohort of LUAD patients from The Cancer Genome Atlas (TCGA), were downloaded. Patients were separated into 2 groups, namely, the high-risk and low-risk groups, according to their overall survivals. Then, differential analysis was performed to determine differentially expressed genes (DEGs) and mutated genes (DMGs) in the expression and mutation profiles, respectively, between the 2 groups. Finally, a prognostic model based on the support vector machine (SVM) algorithm was developed by combining the expression values of the DEGs and the mutation times of the DMGs. Results: A total of 13 DEGs and 7 DMGs were recognized between the 2 groups. Their prognostic values were validated using independent cohorts. Compared with several existing signatures, the proposed prognostic signatures exhibited better prediction performance in the testing set. In addition, it is found that 1 of the 7 DMGs, GRIN2B, is mutated much more frequently in the high-risk group, showing a potential value as a therapy target. Conclusions: Combining multi-omics data sets is an applicable manner to identify novel prognostic signatures and to improve the prognostic prediction for LUAD, which will be heuristic to other types of cancers.


2005 ◽  
Vol 17 (06) ◽  
pp. 300-308 ◽  
Author(s):  
LI-YEH CHUANG ◽  
CHENG-HONG YANG ◽  
LI-CHENG JIN

The support vector machine (SVM) is a new learning method and has shown comparable or better results than the neural networks on some applications. In this paper, we applied SVM to classify multiple cancer types by gene expression profiles and exploit some strategies of the SVM method, including fuzzy logic and statistical theories. Using the proposed strategies and outlier detection methods, the FSVM (fuzzy support vector machine) can achieve a comparable or better performance than other methods, and provide a more flexible architecture to discriminate against SRBCT and non-SRBCT samples.


2021 ◽  
Vol 12 ◽  
Author(s):  
Dongfang Jia ◽  
Cheng Chen ◽  
Chen Chen ◽  
Fangfang Chen ◽  
Ningrui Zhang ◽  
...  

Mastering the molecular mechanism of breast cancer (BC) can provide an in-depth understanding of BC pathology. This study explored existing technologies for diagnosing BC, such as mammography, ultrasound, magnetic resonance imaging (MRI), computed tomography (CT), and positron emission tomography (PET) and summarized the disadvantages of the existing cancer diagnosis. The purpose of this article is to use gene expression profiles of The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) to classify BC samples and normal samples. The method proposed in this article triumphs over some of the shortcomings of traditional diagnostic methods and can conduct BC diagnosis more rapidly with high sensitivity and have no radiation. This study first selected the genes most relevant to cancer through weighted gene co-expression network analysis (WGCNA) and differential expression analysis (DEA). Then it used the protein–protein interaction (PPI) network to screen 23 hub genes. Finally, it used the support vector machine (SVM), decision tree (DT), Bayesian network (BN), artificial neural network (ANN), convolutional neural network CNN-LeNet and CNN-AlexNet to process the expression levels of 23 hub genes. For gene expression profiles, the ANN model has the best performance in the classification of cancer samples. The ten-time average accuracy is 97.36% (±0.34%), the F1 value is 0.8535 (±0.0260), the sensitivity is 98.32% (±0.32%), the specificity is 89.59% (±3.53%) and the AUC is 0.99. In summary, this method effectively classifies cancer samples and normal samples and provides reasonable new ideas for the early diagnosis of cancer in the future.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Milad Mostavi ◽  
Yu-Chiao Chiu ◽  
Yidong Chen ◽  
Yufei Huang

Abstract Background The state-of-the-art deep learning based cancer type prediction can only predict cancer types whose samples are available during the training where the sample size is commonly large. In this paper, we consider how to utilize the existing training samples to predict cancer types unseen during the training. We hypothesize the existence of a set of type-agnostic expression representations that define the similarity/dissimilarity between samples of the same/different types and propose a novel one-shot learning model called CancerSiamese to learn this common representation. CancerSiamese accepts a pair of query and support samples (gene expression profiles) and learns the representation of similar or dissimilar cancer types through two parallel convolutional neural networks joined by a similarity function. Results We trained CancerSiamese for cancer type prediction for primary and metastatic tumors using samples from the Cancer Genome Atlas (TCGA) and MET500. Network transfer learning was utilized to facilitate the training of the CancerSiamese models. CancerSiamese was tested for different N-way predictions and yielded an average accuracy improvement of 8% and 4% over the benchmark 1-Nearest Neighbor (1-NN) classifier for primary and metastatic tumors, respectively. Moreover, we applied the guided gradient saliency map and feature selection to CancerSiamese to examine 100 and 200 top marker-gene candidates for the prediction of primary and metastatic cancers, respectively. Functional analysis of these marker genes revealed several cancer related functions between primary and metastatic tumors. Conclusion This work demonstrated, for the first time, the feasibility of predicting unseen cancer types whose samples are limited. Thus, it could inspire new and ingenious applications of one-shot and few-shot learning solutions for improving cancer diagnosis, prognostic, and our understanding of cancer.


2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Jiani Wu ◽  
Dongqiang Zeng ◽  
Shimeng Zhi ◽  
Zilan Ye ◽  
Wenjun Qiu ◽  
...  

Abstract Background Tumor-derived exosomes (TEXs) are involved in tumor progression and the immune modulation process and mediate intercellular communication in the tumor microenvironment. Although exosomes are considered promising liquid biomarkers for disease diagnosis, it is difficult to discriminate TEXs and to develop TEX-based predictive biomarkers. Methods In this study, the gene expression profiles and clinical information were collected from The Cancer Genome Atlas (TCGA) database, IMvigor210 cohorts, and six independent Gene Expression Omnibus datasets. A TEXs-associated signature named TEXscore was established to predict overall survival in multiple cancer types and in patients undergoing immune checkpoint blockade therapies. Results Based on exosome-associated genes, we first constructed a tumor-derived exosome signature named TEXscore using a principal component analysis algorithm. In single-cell RNA-sequencing data analysis, ascending TEXscore was associated with disease progression and poor clinical outcomes. In the TCGA Pan-Cancer cohort, TEXscore was elevated in tumor samples rather than in normal tissues, thereby serving as a reliable biomarker to distinguish cancer from non-cancer sources. Moreover, high TEXscore was associated with shorter overall survival across 12 cancer types. TEXscore showed great potential in predicting immunotherapy response in melanoma, urothelial cancer, and renal cancer. The immunosuppressive microenvironment characterized by macrophages, cancer-associated fibroblasts, and myeloid-derived suppressor cells was associated with high TEXscore in the TCGA and immunotherapy cohorts. Besides, TEXscore-associated miRNAs and gene mutations were also identified. Further experimental research will facilitate the extending of TEXscore in tumor-associated exosomes. Conclusions TEXscore capturing tumor-derived exosome features might be a robust biomarker for prognosis and treatment responses in independent cohorts.


2020 ◽  
Author(s):  
Nikolaos Dikaios

AbstractThis paper aims to differentiate cancer types from primary tumour samples based on somatic point mutations (SPM). Primary cancer site identification is necessary to perform site-specific and potentially targeted treatment. Current methods like histopathology/lab-tests cannot accurately determine cancers origin, which results in empirical patient treatment and poor survival rates. The availability of large deoxyribonucleic-acid sequencing datasets has allowed scientists to examine the ability of SPM to classify primary cancer sites. These datasets are highly sparse since most genes will not be mutated, have low signal-to-noise ratio and are imbalanced since rare cancers have less samples. To overcome these limitations a sparse-input neural network (spinn) is suggested that projects the input data in a lower dimensional space, where the more informative genes are used for learning. To train and evaluate spinn, an extensive dataset was collected from the cancer genome atlas containing 7624 samples spanning 32 cancer types. Different sampling strategies were performed to balance the dataset but have not benefited the classifiers performance except for removing Tomek-links. This is probably due to high amount of class overlapping. Spinn consistently outperformed algorithms like extreme gradient-boosting, deep neural networks and support-vector-machines, achieving an accuracy up to 73% on independent testing data.


2019 ◽  
Vol 48 (D1) ◽  
pp. D956-D963 ◽  
Author(s):  
Jiang Li ◽  
Yawen Xue ◽  
Muhammad Talal Amin ◽  
Yanbo Yang ◽  
Jiajun Yang ◽  
...  

Abstract Numerous studies indicate that non-coding RNAs (ncRNAs) have critical functions across biological processes, and single-nucleotide polymorphisms (SNPs) could contribute to diseases or traits through influencing ncRNA expression. However, the associations between SNPs and ncRNA expression are largely unknown. Therefore, genome-wide expression quantitative trait loci (eQTL) analysis to assess the effects of SNPs on ncRNA expression, especially in multiple cancer types, will help to understand how risk alleles contribute toward tumorigenesis and cancer development. Using genotype data and expression profiles of ncRNAs of >8700 samples from The Cancer Genome Atlas (TCGA), we developed a computational pipeline to systematically identify ncRNA-related eQTLs (ncRNA-eQTLs) across 33 cancer types. We identified a total of 6 133 278 and 721 122 eQTL-ncRNA pairs in cis-eQTL and trans-eQTL analyses, respectively. Further survival analyses identified 8312 eQTLs associated with patient survival times. Furthermore, we linked ncRNA-eQTLs to genome-wide association study (GWAS) data and found 262 332 ncRNA-eQTLs overlapping with known disease- and trait-associated loci. Finally, a user-friendly database, ncRNA-eQTL (http://ibi.hzau.edu.cn/ncRNA-eQTL), was developed for free searching, browsing and downloading of all ncRNA-eQTLs. We anticipate that such an integrative and comprehensive resource will improve our understanding of the mechanistic basis of human complex phenotypic variation, especially for ncRNA- and cancer-related studies.


2018 ◽  
Vol 20 (4) ◽  
pp. 1322-1328 ◽  
Author(s):  
Qin Tang ◽  
Qiong Zhang ◽  
Yao Lv ◽  
Ya-Ru Miao ◽  
An-Yuan Guo

AbstractHuman specifically expressed genes (SEGs) usually serve as potential biomarkers for disease diagnosis and treatment. However, the regulation underlying their specific expression remains to be revealed. In this study, we constructed SEG regulation database (SEGreg; available at http://bioinfo.life.hust.edu.cn/SEGreg) for showing SEGs and their transcription factors (TFs) and microRNA (miRNA) regulations under different physiological conditions, which include normal tissue, cancer tissue and cell line. In total, SEGreg collected 6387, 1451, 4506 and 5320 SEGs from expression profiles of 34 cancer types and 55 tissues of The Cancer Genome Atlas, Cancer Cell Line Encyclopedia, Human Body Map and Genotype-Tissue Expression databases/projects, respectively. The cancer or tissue corresponding expressed miRNAs and TFs were identified from miRNA and gene expression profiles, and their targets were collected from several public resources. Then the regulatory networks of all SEGs were constructed and integrated into SEGreg. Through a user-friendly interface, users can browse and search SEGreg by gene name, data source, tissue, cancer type and regulators. In summary, SEGreg is a specialized resource to explore SEGs and their regulations, which provides clues to reveal the mechanisms of carcinogenesis and biological processes.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Amy Li ◽  
Bjoern Chapuy ◽  
Xaralabos Varelas ◽  
Paola Sebastiani ◽  
Stefano Monti

AbstractThe emergence of large-scale multi-omics data warrants method development for data integration. Genomic studies from cancer patients have identified epigenetic and genetic regulators – such as methylation marks, somatic mutations, and somatic copy number alterations (SCNAs), among others – as predictive features of cancer outcome. However, identification of “driver genes” associated with a given alteration remains a challenge. To this end, we developed a computational tool, iEDGE, to model cis and trans effects of (epi-)DNA alterations and identify potential cis driver genes, where cis and trans genes denote those genes falling within and outside the genomic boundaries of a given (epi-)genetic alteration, respectively. iEDGE first identifies the cis and trans gene expression signatures associated with the presence/absence of a particular epi-DNA alteration across samples. It then applies tests of statistical mediation to determine the cis genes predictive of the trans gene expression. Finally, cis and trans effects are annotated by pathway enrichment analysis to gain insights into the underlying regulatory networks. We used iEDGE to perform integrative analysis of SCNAs and gene expression data from breast cancer and 18 additional cancer types included in The Cancer Genome Atlas (TCGA). Notably, cis gene drivers identified by iEDGE were found to be significantly enriched for known driver genes from multiple compendia of validated oncogenes and tumor suppressors, suggesting that the remainder are of equal importance. Furthermore, predicted drivers were enriched for functionally relevant cancer genes with amplification-driven dependencies, which are of potential prognostic and therapeutic value. All the analyses results are accessible at https://montilab.bu.edu/iEDGE. In summary, integrative analysis of SCNAs and gene expression using iEDGE successfully identified known cancer driver genes and putative cancer therapeutic targets across 19 cancer types in the TCGA. The proposed method can easily be applied to the integration of gene expression profiles with other epi-DNA assays in a variety of disease contexts.


Sign in / Sign up

Export Citation Format

Share Document