scholarly journals OncodriveCLUSTL: a sequence-based clustering method to identify cancer drivers

2018 ◽  
Author(s):  
Claudia Arnedo-Pac ◽  
Loris Mularoni ◽  
Ferran Muiños ◽  
Abel Gonzalez-Perez ◽  
Nuria Lopez-Bigas

AbstractSummaryThe identification of the genomic alterations driving tumorigenesis is one of the main goals in oncogenomics research. Given the evolutionary principles of cancer development, computational methods that detect signals of positive selection in the pattern of tumor mutations have been effectively applied in the search for cancer genes. One of these signals is the abnormal clustering of mutations, which has been shown to be complementary to other signals in the detection of driver genes. We have developed OncodriveCLUSTL, a new sequence-based clustering algorithm to detect significant clustering signals across genomic regions. OncodriveCLUSTL is based on a local background model derived from the simulation of mutations accounting for the composition of tri- or penta-nucleotide context substitutions observed in the cohort under study. Our method is able to identify known clusters and bona-fide cancer drivers across cohorts of tumor whole-exomes, outperforming the existing OncodriveCLUST algorithm and complementing other methods based on different signals of positive selection. We show that OncodriveCLUSTL may be applied to the analysis of non-coding genomic elements and non-human mutations data.Availability and implementationOncodriveCLUSTL is available as an installable Python 3.5 package. The source code and running examples are freely available at https://bitbucket.org/bbglab/oncodriveclustl under GNU Affero General Public [email protected]

2019 ◽  
Vol 35 (22) ◽  
pp. 4788-4790 ◽  
Author(s):  
Claudia Arnedo-Pac ◽  
Loris Mularoni ◽  
Ferran Muiños ◽  
Abel Gonzalez-Perez ◽  
Nuria Lopez-Bigas

Abstract Motivation Identification of the genomic alterations driving tumorigenesis is one of the main goals in oncogenomics research. Given the evolutionary principles of cancer development, computational methods that detect signals of positive selection in the pattern of tumor mutations have been effectively applied in the search for cancer genes. One of these signals is the abnormal clustering of mutations, which has been shown to be complementary to other signals in the detection of driver genes. Results We have developed OncodriveCLUSTL, a new sequence-based clustering algorithm to detect significant clustering signals across genomic regions. OncodriveCLUSTL is based on a local background model derived from the simulation of mutations accounting for the composition of tri- or penta-nucleotide context substitutions observed in the cohort under study. Our method can identify known clusters and bona-fide cancer drivers across cohorts of tumor whole-exomes, outperforming the existing OncodriveCLUST algorithm and complementing other methods based on different signals of positive selection. Our results indicate that OncodriveCLUSTL can be applied to the analysis of non-coding genomic elements and non-human mutations data. Availability and implementation OncodriveCLUSTL is available as an installable Python 3.5 package. The source code and running examples are freely available at https://bitbucket.org/bbglab/oncodriveclustl under GNU Affero General Public License. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Oriol Pich ◽  
Iker Reyes-Salazar ◽  
Abel Gonzalez-Perez ◽  
Nuria Lopez-Bigas

AbstractMutations in genes that confer a selective advantage to hematopoietic stem cells (HSCs) in certain conditions drive clonal hematopoiesis (CH). While some CH drivers have been identified experimentally or through epidemiological studies, the compendium of all genes able to drive CH upon mutations in HSCs is far from complete. We propose that identifying signals of positive selection in blood somatic mutations may be an effective way to identify CH driver genes, similarly as done to identify cancer genes. Using a reverse somatic variant calling approach, we repurposed whole-genome and whole-exome blood/tumor paired samples of more than 12,000 donors from two large cancer genomics cohorts to identify blood somatic mutations. The application of IntOGen, a robust driver discovery pipeline, to blood somatic mutations across both cohorts, and more than 24,000 targeted sequenced samples yielded a list of close to 70 genes with signals of positive selection in CH, available at http://www.intogen.org/ch. This approach recovers all known CH genes, and discovers novel candidates. Generating this compendium is an essential step to understand the molecular mechanisms of CH and to accurately detect individuals with CH to ascertain their risk to develop related diseases.


2018 ◽  
Author(s):  
Felix Dietlein ◽  
Donate Weghorn ◽  
Amaro Taylor-Weiner ◽  
André Richters ◽  
Brendan Reardon ◽  
...  

Many cancer genomes contain large numbers of somatic mutations, but few of these mutations drive tumor development. Current approaches to identify cancer driver genes are largely based on mutational recurrence, i.e. they search for genes with an increased number of nonsynonymous mutations relative to the local background mutation rate. Multiple studies have noted that the sensitivity of recurrence-based methods is limited in tumors with high background mutation rates, because passenger mutations dilute their statistical power. Here, we observe that passenger mutations tend to occur in characteristic nucleotide sequence contexts, while driver mutations follow a different distribution pattern determined by the location of functionally relevant genomic positions along the protein-coding sequence. To discover new cancer genes, we searched for genes with an excess of mutations in unusual nucleotide contexts that deviate from the characteristic context around passenger mutations. By applying this statistical framework to whole-exome sequencing data from 12,004 tumors, we discovered a long tail of novel candidate cancer genes with mutation frequencies as low as 1% and functional supporting evidence. Our results show that considering both the number and the nucleotide context around mutations helps identify novel cancer driver genes, particularly in tumors with high background mutation rates.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Cesim Erten ◽  
Aissa Houdjedj ◽  
Hilal Kazan

Abstract Background Recent cancer genomic studies have generated detailed molecular data on a large number of cancer patients. A key remaining problem in cancer genomics is the identification of driver genes. Results We propose BetweenNet, a computational approach that integrates genomic data with a protein-protein interaction network to identify cancer driver genes. BetweenNet utilizes a measure based on betweenness centrality on patient specific networks to identify the so-called outlier genes that correspond to dysregulated genes for each patient. Setting up the relationship between the mutated genes and the outliers through a bipartite graph, it employs a random-walk process on the graph, which provides the final prioritization of the mutated genes. We compare BetweenNet against state-of-the art cancer gene prioritization methods on lung, breast, and pan-cancer datasets. Conclusions Our evaluations show that BetweenNet is better at recovering known cancer genes based on multiple reference databases. Additionally, we show that the GO terms and the reference pathways enriched in BetweenNet ranked genes and those that are enriched in known cancer genes overlap significantly when compared to the overlaps achieved by the rankings of the alternative methods.


2017 ◽  
Author(s):  
Heiko Horn ◽  
Michael S. Lawrence ◽  
Candace R. Chouinard ◽  
Yashaswi Shrestha ◽  
Jessica Xin Hu ◽  
...  

AbstractApproaches that integrate molecular network information and tumor genome data could complement gene-based statistical tests to identify likely new cancer genes, but are challenging to validate at scale and their predictive value remains unclear. We developed a robust statistic (NetSig) that integrates protein interaction networks and data from 4,742 tumor exomes and used it to accurately classify known driver genes in 60% of tested tumor types and to predict 62 new candidates. We designed a quantitative experimental framework to compare the in vivo tumorigenic potential of NetSig candidates, known oncogenes and random genes in mice showing that NetSig candidates induce tumors at rates comparable to known oncogenes and 10-fold higher than random genes. By reanalyzing nine tumor-inducing NetSig candidates in 242 patients with oncogene-negative lung adenocarcinomas, we find that two (AKT2 and TFDP2) are significantly amplified. Overall, we illustrate a scalable integrated computational and experimental workflow to expand discovery from cancer genomes.


PLoS ONE ◽  
2014 ◽  
Vol 9 (3) ◽  
pp. e91237 ◽  
Author(s):  
Cornelia Di Gaetano ◽  
Giovanni Fiorito ◽  
Maria Francesca Ortu ◽  
Fabio Rosa ◽  
Simonetta Guarrera ◽  
...  

ESC CardioMed ◽  
2018 ◽  
pp. 669-671
Author(s):  
Eric Schulze-Bahr

The human genome consists of approximately 3 billion (3 × 109) base pairs of DNA (around 20,000 genes), organized as 23 chromosomes (diploid parental set), and a small mitochondrial genome (37 genes, including 13 proteins; 16,589 base pairs) of maternal origin. Most human genetic variation is natural, that is, common or rare (minor allele frequency >0.1%) and does not cause disease—apart from every true disease-causing (bona fide) mutation each individual genome harbours more than 3.5 million single nucleotide variants (including >10,000 non-synonymous changes causing amino acid substitutions) and 200–300 large structural or copy number variants (insertions/deletions, up to several thousands of base-pairs) that are non-disease-causing variations and scattered throughout coding and non-coding genomic regions.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Felix Grassmann ◽  
Yudi Pawitan ◽  
Kamila Czene

Abstract Genes involved in cancer are under constant evolutionary pressure, potentially resulting in diverse molecular properties. In this study, we explore 23 omic features from publicly available databases to define the molecular profile of different classes of cancer genes. Cancer genes were grouped according to mutational landscape (germline and somatically mutated genes), role in cancer initiation (cancer driver genes) or cancer survival (survival genes), as well as being implicated by genome-wide association studies (GWAS genes). For each gene, we also computed feature scores based on all omic features, effectively summarizing how closely a gene resembles cancer genes of the respective class. In general, cancer genes are longer, have a lower GC content, have more isoforms with shorter exons, are expressed in more tissues and have more transcription factor binding sites than non-cancer genes. We found that germline genes more closely resemble single tissue GWAS genes while somatic genes are more similar to pleiotropic cancer GWAS genes. As a proof-of-principle, we utilized aggregated feature scores to prioritize genes in breast cancer GWAS loci and found that top ranking genes were enriched in cancer related pathways. In conclusion, we have identified multiple omic features associated with different classes of cancer genes, which can assist prioritization of genes in cancer gene discovery.


2020 ◽  
Vol 49 (D1) ◽  
pp. D1289-D1301 ◽  
Author(s):  
Tao Wang ◽  
Shasha Ruan ◽  
Xiaolu Zhao ◽  
Xiaohui Shi ◽  
Huajing Teng ◽  
...  

Abstract The prevalence of neutral mutations in cancer cell population impedes the distinguishing of cancer-causing driver mutations from passenger mutations. To systematically prioritize the oncogenic ability of somatic mutations and cancer genes, we constructed a useful platform, OncoVar (https://oncovar.org/), which employed published bioinformatics algorithms and incorporated known driver events to identify driver mutations and driver genes. We identified 20 162 cancer driver mutations, 814 driver genes and 2360 pathogenic pathways with high-confidence by reanalyzing 10 769 exomes from 33 cancer types in The Cancer Genome Atlas (TCGA) and 1942 genomes from 18 cancer types in International Cancer Genome Consortium (ICGC). OncoVar provides four points of view, ‘Mutation’, ‘Gene’, ‘Pathway’ and ‘Cancer’, to help researchers to visualize the relationships between cancers and driver variants. Importantly, identification of actionable driver alterations provides promising druggable targets and repurposing opportunities of combinational therapies. OncoVar provides a user-friendly interface for browsing, searching and downloading somatic driver mutations, driver genes and pathogenic pathways in various cancer types. This platform will facilitate the identification of cancer drivers across individual cancer cohorts and helps to rank mutations or genes for better decision-making among clinical oncologists, cancer researchers and the broad scientific community interested in cancer precision medicine.


Genes ◽  
2019 ◽  
Vol 10 (11) ◽  
pp. 938 ◽  
Author(s):  
Islam ◽  
Li ◽  
Liu ◽  
Berihulay ◽  
Abied ◽  
...  

: Detection of selection footprints provides insight into the evolution process and the underlying mechanisms controlling the phenotypic diversity of traits that have been exposed to selection. Selection focused on certain characters, mapping certain genomic regions often shows a loss of genetic diversity with an increased level of homozygosity. Therefore, the runs of homozygosity (ROHs), homozygosity by descent (HBD), and effective population size (Ne) are effective tools for exploring the genetic diversity, understanding the demographic history, foretelling the signature of directional selection, and improving the breeding strategies to use and conserve genetic resources. We characterized the ROH, HBD, Ne, and signature of selection of six Chinese goat populations using single nucleotide polymorphism (SNP) 50K Illumina beadchips. Our results show an inverse relationship between the length and frequency of ROH. A long ROH length, higher level of inbreeding, long HBD segment, and smaller Ne in Guangfeng (GF) goats suggested intensive selection pressure and recent inbreeding in this breed. We identified six reproduction-related genes within the genomic regions with a high ROH frequency, of which two genes overlapped with a putative selection signature. The estimated pair-wise genetic differentiation (FST) among the populations is 9.60% and the inter- and intra-population molecular variations are 9.68% and 89.6%, respectively, indicating low to moderate genetic differentiation. Our selection signatures analysis revealed 54 loci harboring 86 putative candidate genes, with a strong signature of selection. Further analysis showed that several candidate genes, including MARF1, SYCP2, TMEM200C, SF1, ADCY1, and BMP5, are involved in goat fecundity. We identified 11 candidate genes by using cross-population extended haplotype homozygosity (XP-EHH) estimates, of which MARF1 and SF1 are under strong positive selection, as they are differentiated in high and low reproduction groups according to the three approaches used. Gene ontology enrichment analysis revealed that different biological pathways could be involved in the variation of fecundity in female goats. This study provides a new insight into the ROHs patterns for maintenance of within breed diversity and suggests a role of positive selection for genetic variation influencing fecundity in Chinese goat.


Sign in / Sign up

Export Citation Format

Share Document