scholarly journals A model and algorithm for identifying driver pathways based on weighted non-binary mutation matrix

Author(s):  
Jingli Wu ◽  
Kai Zhu ◽  
Gaoshi Li ◽  
Jinyan Wang ◽  
Qirong Cai

AbstractIt is generally acknowledged that driver pathway plays a decisive role in the occurrence and progress of tumors, and the identification of driver pathways has become imperative for precision medicine or personalized medicine. Due to the inevitable sequencing error, the noise contained in single omics cancer data usually plays a negative effect on identification. It is a feasible approach to take advantage of multi-omics cancer data rather than a single one now that large amounts of multi-omics cancer data have become available. The identification of driver pathways by integrating multi-omics cancer data has attracted attention of researchers in bioinformatics recently. In this paper, a weighted non-binary mutation matrix is constructed by integrating copy number variations, somatic mutations and gene expressions. Based on the weighted non-binary mutation matrix, a new identification model is proposed through defining new measurements of coverage and exclusivity. Then, a cooperative coevolutionary algorithm CGA-MWS is put forward for solving the presented model. Both real cancer data and simulated one were used to conduct comparisons among methods Dendrix, GA, iMCMC, MOGA, PGA-MWS and CGA-MWS. Compared with the pathways identified by the other five methods, more genes, belonging to the pathway identified by the CGA-MWS method, are enriched in a known signaling pathway in most cases. Simultaneously, the high efficiency of method CGA-MWS makes it practical in realistic applications. All of which have been verified through a number of experiments.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Lidong Guo ◽  
Mengyang Xu ◽  
Wenchao Wang ◽  
Shengqiang Gu ◽  
Xia Zhao ◽  
...  

Abstract Background Synthetic long reads (SLR) with long-range co-barcoding information are now widely applied in genomics research. Although several tools have been developed for each specific SLR technique, a robust standalone scaffolder with high efficiency is warranted for hybrid genome assembly. Results In this work, we developed a standalone scaffolding tool, SLR-superscaffolder, to link together contigs in draft assemblies using co-barcoding and paired-end read information. Our top-to-bottom scheme first builds a global scaffold graph based on Jaccard Similarity to determine the order and orientation of contigs, and then locally improves the scaffolds with the aid of paired-end information. We also exploited a screening algorithm to reduce the negative effect of misassembled contigs in the input assembly. We applied SLR-superscaffolder to a human single tube long fragment read sequencing dataset and increased the scaffold NG50 of its corresponding draft assembly 1349 fold. Moreover, benchmarking on different input contigs showed that this approach overall outperformed existing SLR scaffolders, providing longer contiguity and fewer misassemblies, especially for short contigs assembled by next-generation sequencing data. The open-source code of SLR-superscaffolder is available at https://github.com/BGI-Qingdao/SLR-superscaffolder. Conclusions SLR-superscaffolder can dramatically improve the contiguity of a draft assembly by integrating a hybrid assembly strategy.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Weihua Pan ◽  
Desheng Gong ◽  
Da Sun ◽  
Haohui Luo

AbstractDue to the high complexity of cancer genome, it is too difficult to generate complete cancer genome map which contains the sequence of every DNA molecule until now. Nevertheless, phasing each chromosome in cancer genome into two haplotypes according to germline mutations provides a suboptimal solution to understand cancer genome. However, phasing cancer genome is also a challenging problem, due to the limit in experimental and computational technologies. Hi-C data is widely used in phasing in recent years due to its long-range linkage information and provides an opportunity for solving the problem of phasing cancer genome. The existing Hi-C based phasing methods can not be applied to cancer genome directly, because the somatic mutations in cancer genome such as somatic SNPs, copy number variations and structural variations greatly reduce the correctness and completeness. Here, we propose a new Hi-C based pipeline for phasing cancer genome called HiCancer. HiCancer solves different kinds of somatic mutations and variations, and take advantage of allelic copy number imbalance and linkage disequilibrium to improve the correctness and completeness of phasing. According to our experiments in K562 and KBM-7 cell lines, HiCancer is able to generate very high-quality chromosome-level haplotypes for cancer genome with only Hi-C data.


Author(s):  
Wenhui Li ◽  
Wanjun Lei ◽  
Xiaopei Chao ◽  
Xiaochen Song ◽  
Yalan Bi ◽  
...  

AbstractThe association between human papillomavirus (HPV) integration and relevant genomic changes in uterine cervical adenocarcinoma is poorly understood. This study is to depict the genomic mutational landscape in a cohort of 20 patients. HPV+ and HPV− groups were defined as patients with and without HPV integration in the host genome. The genetic changes between these two groups were described and compared by whole-genome sequencing (WGS) and whole-exome sequencing (WES). WGS identified 2916 copy number variations and 743 structural variations. WES identified 6113 somatic mutations, with a mutational burden of 2.4 mutations/Mb. Six genes were predicted as driver genes: PIK3CA, KRAS, TRAPPC12, NDN, GOLGA6L4 and BAIAP3. PIK3CA, NDN, GOLGA6L4, and BAIAP3 were recognized as significantly mutated genes (SMGs). HPV was detected in 95% (19/20) of patients with cervical adenocarcinoma, 7 of whom (36.8%) had HPV integration (HPV+ group). In total, 1036 genes with somatic mutations were confirmed in the HPV+ group, while 289 genes with somatic mutations were confirmed in the group without HPV integration (HPV− group); only 2.1% were shared between the two groups. In the HPV+ group, GOLGA6L4 and BAIAP3 were confirmed as SMGs, while PIK3CA, NDN, KRAS, FUT1, and GOLGA6L64 were identified in the HPV− group. ZDHHC3, PKD1P1, and TGIF2 showed copy number amplifications after HPV integration. In addition, the HPV+ group had significantly more neoantigens. HPV integration rather than HPV infection results in different genomic changes in cervical adenocarcinoma.


2016 ◽  
Author(s):  
Stephen G. Gaffney ◽  
Jeffrey P. Townsend

ABSTRACTSummaryPathScore quantifies the level of enrichment of somatic mutations within curated pathways, applying a novel approach that identifies pathways enriched across patients. The application provides several user-friendly, interactive graphic interfaces for data exploration, including tools for comparing pathway effect sizes, significance, gene-set overlap and enrichment differences between projects.Availability and ImplementationWeb application available at pathscore.publichealth.yale.edu. Site implemented in Python and MySQL, with all major browsers supported. Source code available at github.com/sggaffney/pathscore with a GPLv3 [email protected] InformationAdditional documentation can be found at http://pathscore.publichealth.yale.edu/faq.


The analization of cancer data and normal data for the predication of somatic mu-tation occurrences in the data set plays an important role and several challenges persist in detectingsomatic mutations which leads to complexity of handling large volumes of data in classifi-cation with good accuracy. In many situations the dataset may consist of redundant and less significant features and there is a need to remove insignificant features in order to improve the performance of classification. Feature selection techniques are useful for dimensionality reduction purpose. PCA is one type of feature selection technique to identify significant attributes and is adopted in this paper. A novel technique, PCA based regression decision tree is proposed for classification of somatic mutations data in this paper.The performance analysis of this clas-sification process for the detection of somatic mutation is compared with existing algorithms and satisfactory results are obtained with the proposed model.


Author(s):  
Jun Wang ◽  
Ziying Yang ◽  
Carlotta Domeniconi ◽  
Xiangliang Zhang ◽  
Guoxian Yu

Abstract Discovering driver pathways is an essential step to uncover the molecular mechanism underlying cancer and to explore precise treatments for cancer patients. However, due to the difficulties of mapping genes to pathways and the limited knowledge about pathway interactions, most previous work focus on identifying individual pathways. In practice, two (or even more) pathways interplay and often cooperatively trigger cancer. In this study, we proposed a new approach called CDPathway to discover cooperative driver pathways. First, CDPathway introduces a driver impact quantification function to quantify the driver weight of each gene. CDPathway assumes that genes with larger weights contribute more to the occurrence of the target disease and identifies them as candidate driver genes. Next, it constructs a heterogeneous network composed of genes, miRNAs and pathways nodes based on the known intra(inter)-relations between them and assigns the quantified driver weights to gene–pathway and gene–miRNA relational edges. To transfer driver impacts of genes to pathway interaction pairs, CDPathway collaboratively factorizes the weighted adjacency matrices of the heterogeneous network to explore the latent relations between genes, miRNAs and pathways. After this, it reconstructs the pathway interaction network and identifies the pathway pairs with maximal interactive and driver weights as cooperative driver pathways. Experimental results on the breast, uterine corpus endometrial carcinoma and ovarian cancer data from The Cancer Genome Atlas show that CDPathway can effectively identify candidate driver genes [area under the receiver operating characteristic curve (AUROC) of $\geq $0.9] and reconstruct the pathway interaction network (AUROC of>0.9), and it uncovers much more known (potential) driver genes than other competitive methods. In addition, CDPathway identifies 150% more driver pathways and 60% more potential cooperative driver pathways than the competing methods. The code of CDPathway is available at http://mlda.swu.edu.cn/codes.php?name=CDPathway.


2020 ◽  
Vol 21 (2) ◽  
pp. 685 ◽  
Author(s):  
Antonietta Arcella ◽  
Fiona Limanaqi ◽  
Rosangela Ferese ◽  
Francesca Biagioni ◽  
Maria Antonietta Oliva ◽  
...  

Recently, several studies focused on the genetics of gliomas. This allowed identifying several germline loci that contribute to individual risk for tumor development, as well as various somatic mutations that are key for disease classification. Unfortunately, none of the germline loci clearly confers increased risk per se. Contrariwise, somatic mutations identified within the glioma tissue define tumor genotype, thus representing valid diagnostic and prognostic markers. Thus, genetic features can be used in glioma classification and guided therapy. Such copious genomic variabilities are screened routinely in glioma diagnosis. In detail, Sanger sequencing or pyrosequencing, fluorescence in-situ hybridization, and microsatellite analyses were added to immunohistochemistry as diagnostic markers. Recently, Next Generation Sequencing was set-up as an all-in-one diagnostic tool aimed at detecting both DNA copy number variations and mutations in gliomas. This approach is widely used also to detect circulating tumor DNA within cerebrospinal fluid from patients affected by primary brain tumors. Such an approach is providing an alternative cost-effective strategy to genotype all gliomas, which allows avoiding surgical tissue collection and repeated tumor biopsies. This review summarizes available molecular features that represent solid tools for the genetic diagnosis of gliomas at present or in the next future.


2020 ◽  
Vol 14 (4) ◽  
pp. 671-685 ◽  
Author(s):  
Mieke R. Van Bockstal ◽  
Marie Colombe Agahozo ◽  
Ronald van Marion ◽  
Peggy N. Atmodimedjo ◽  
Hein F. B. M. Sleddens ◽  
...  

2020 ◽  
Vol 38 (15_suppl) ◽  
pp. e21530-e21530
Author(s):  
Dongcheng Liu ◽  
Shixuan Li ◽  
Jiaqing Wang ◽  
Xuefeng Sun ◽  
Guofeng Li ◽  
...  

e21530 Background: Tumor heterogeneity is an important characteristic of malignant tumors that reflects the high complexity and diversity in the process of evolution. It can lead to differences in tumor growth rate, invasion, metastasis, drug sensitivity and prognosis. At present, there exist few genome-wide studies on heterogeneous cases of multiple nodules for patients with non-small-cell lung cancer (NSCLC). Methods: In this study, we performed genetic profiling using a combination of targeted panel and whole exon genome sequencing (WES) on all lesions of three NSCLC patients with multiple nodules. Results: Driver gene alterations were detected in all lesion samples of the three patients. One patient presented with three nodules, which harbored EGFR L858R, ERBB2 exon 20 insertion and a EML4-ALK fusion. Evolutionary analysis showed strong heterogeneity among the three lesions. The three lesions of patient 2 showed EGFR L858R in two lesions (2A and 2C), and a ERBB2 exon 20 insertions in the other (2B). Mutational signatures and cluster analysis of somatic mutations also showed commonality between sample 2A and 2C, suggesting that they might arise from the same clone. Patient 3 had seven lesions. Analysis of somatic mutations and copy number variations revealed possible route of metastasis. Conclusions: Detailed analysis of genetic mutation characteristics and biological evolution trees showed that there were substantial heterogeneity and distinct evolutionary relationships among the different lesions of the same individual. Consequently, such information could aid in the determination of primary and metastasis of tumor lesions and guide the selection of treatment options.


Sign in / Sign up

Export Citation Format

Share Document