scholarly journals Non-Coding Variants in Cancer: Mechanistic Insights and Clinical Potential for Personalized Medicine

2021 ◽  
Vol 7 (3) ◽  
pp. 47
Author(s):  
Marios Lange ◽  
Rodiola Begolli ◽  
Antonis Giakountis

The cancer genome is characterized by extensive variability, in the form of Single Nucleotide Polymorphisms (SNPs) or structural variations such as Copy Number Alterations (CNAs) across wider genomic areas. At the molecular level, most SNPs and/or CNAs reside in non-coding sequences, ultimately affecting the regulation of oncogenes and/or tumor-suppressors in a cancer-specific manner. Notably, inherited non-coding variants can predispose for cancer decades prior to disease onset. Furthermore, accumulation of additional non-coding driver mutations during progression of the disease, gives rise to genomic instability, acting as the driving force of neoplastic development and malignant evolution. Therefore, detection and characterization of such mutations can improve risk assessment for healthy carriers and expand the diagnostic and therapeutic toolbox for the patient. This review focuses on functional variants that reside in transcribed or not transcribed non-coding regions of the cancer genome and presents a collection of appropriate state-of-the-art methodologies to study them.

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Qingbo S. Wang ◽  
David R. Kelley ◽  
Jacob Ulirsch ◽  
Masahiro Kanai ◽  
Shuvom Sadhuka ◽  
...  

AbstractThe large majority of variants identified by GWAS are non-coding, motivating detailed characterization of the function of non-coding variants. Experimental methods to assess variants’ effect on gene expressions in native chromatin context via direct perturbation are low-throughput. Existing high-throughput computational predictors thus have lacked large gold standard sets of regulatory variants for training and validation. Here, we leverage a set of 14,807 putative causal eQTLs in humans obtained through statistical fine-mapping, and we use 6121 features to directly train a predictor of whether a variant modifies nearby gene expression. We call the resulting prediction the expression modifier score (EMS). We validate EMS by comparing its ability to prioritize functional variants with other major scores. We then use EMS as a prior for statistical fine-mapping of eQTLs to identify an additional 20,913 putatively causal eQTLs, and we incorporate EMS into co-localization analysis to identify 310 additional candidate genes across UK Biobank phenotypes.


2022 ◽  
Vol 13 (1) ◽  
Author(s):  
John K. L. Wong ◽  
Christian Aichmüller ◽  
Markus Schulze ◽  
Mario Hlevnjak ◽  
Shaymaa Elgaafary ◽  
...  

AbstractCancer driving mutations are difficult to identify especially in the non-coding part of the genome. Here, we present sigDriver, an algorithm dedicated to call driver mutations. Using 3813 whole-genome sequenced tumors from International Cancer Genome Consortium, The Cancer Genome Atlas Program, and a childhood pan-cancer cohort, we employ mutational signatures based on single-base substitution in the context of tri- and penta-nucleotide motifs for hotspot discovery. Knowledge-based annotations on mutational hotspots reveal enrichment in coding regions and regulatory elements for 6 mutational signatures, including APOBEC and somatic hypermutation signatures. APOBEC activity is associated with 32 hotspots of which 11 are known and 11 are putative regulatory drivers. Somatic single nucleotide variants clusters detected at hypermutation-associated hotspots are distinct from translocation or gene amplifications. Patients carrying APOBEC induced PIK3CA driver mutations show lower occurrence of signature SBS39. In summary, sigDriver uncovers mutational processes associated with known and putative tumor drivers and hotspots particularly in the non-coding regions of the genome.


10.1038/10290 ◽  
1999 ◽  
Vol 22 (3) ◽  
pp. 231-238 ◽  
Author(s):  
Michele Cargill ◽  
David Altshuler ◽  
James Ireland ◽  
Pamela Sklar ◽  
Kristin Ardlie ◽  
...  

2019 ◽  
Vol 97 (3) ◽  
pp. 381
Author(s):  
Jorge Ricaño-Rodríguez ◽  
Enrique Hipólito-Romero ◽  
José M. Ramos-Prado ◽  
Eliezer Cocoletzi-Vásquez

<p><strong>Background:</strong> Single nucleotide polymorphisms (SNPs) have been identified in <em>Theobroma cacao</em> through a genotyping-by-sequencing approach. Through this research it is shared for the first time a set of results related to genetic variability and nature of conserved coding regions of reduced nucleotide sequences of mexican native varieties of cocoa.</p><p><strong>Hypothesis:</strong> Obtaining reduced genomes of <em>T. cacao</em> specimens by restriction enzymes (REs) allows the characterization of single nucleotide polymorphisms (SNPs) as well as conserved coding regions (CDs).</p><p><strong>Species of study and dates:</strong> <em>Theobroma cacao </em>L. (Malvaceae)</p><p>Study site: <em>Theobroma cacao</em> twigs came from traditional agroforestry plots located in the municipalities of Cardenas, Huimanguillo, Comalcalco, Paraiso, Jalpa de Mendez and Cunduacan, Tabasco, as well as Ixtacomitan and Pichucalco, Chiapas, Mexico; and they were collected and grafted among May and June from 2018.</p><p><strong>Methods:</strong> A method of genotyping-by-sequencing for the characterization of biobanks was developed. Filtering of crude sequences, genomic assembly, identification of SNPs, taxonomic molecular characterization and characterization of coding regions as well as minimum evolution of protein transcripts were performed.</p><p><strong>Results:</strong> <em>Theobroma cacao</em> samples showed different SNPs percentages (2 – 11 %) and the molecular evolution analyzes suggested similar maximum compound probabilities respect to their phylogeny. Conserved sequences were observed in the genomes´ coding regions, which suggest heuristic ontological predictions that have been evolutionarily regrouped in five clusters related to transcription processes and secondary metabolism.</p><strong>Conclusions:</strong> The GBS method allows to identify SNPs in cocoa. The characterization of reduced genomes determined the structural and transcriptional correlation between the samples and the reference genome of cacao Criollo.


2020 ◽  
Author(s):  
Qingbo S. Wang ◽  
David R. Kelley ◽  
Jacob Ulirsch ◽  
Masahiro Kanai ◽  
Shuvom Sadhuka ◽  
...  

AbstractThe large majority of variants identified by GWAS are non-coding, motivating detailed characterization of the function of non-coding variants. Experimental methods to assess variants’ effect on gene expressions in native chromatin context via direct perturbation are low-throughput. Existing high-throughput computational predictors thus have lacked large gold standard sets of regulatory variants for training and validation. Here, we leverage a set of 14,807 putative causal eQTLs in humans obtained through statistical fine-mapping, and we use 6,121 features to directly train a predictor of whether a variant modifies nearby gene expression. We call the resulting prediction the expression modifier score (EMS). We validate EMS by comparing its ability to prioritize functional variants with other major scores. We then use EMS as a prior for statistical fine-mapping of eQTLs to identify an additional 20,913 putatively causal eQTLs, and we incorporate EMS into co-localization analysis to identify 310 additional candidate genes across UK Biobank phenotypes.


Author(s):  
Neha Rajput ◽  
Gagandeep Kaur Gahlay

ZP2, an important component of the zona matrix, surrounds mammalian oocytes and facilitates fertilization. Recently, some studies have documented the association of mutations in genes encoding the zona matrix with the infertile status of human females. Single nucleotide polymorphisms are the most common type of genetic variations observed in a population and as per the dbSNP database, around 5,152 SNPs are reported to exist in the human ZP2 (hZP2) gene. Although a wide range of computational tools are publicly available, yet no computational studies have been done to date to identify and analyze structural and functional effects of deleterious SNPs on hZP2. In this study, we conducted a comprehensive in silico analysis of all the SNPs found in hZP2. Six different computational tools including SIFT and PolyPhen-2 predicted 18 common nsSNPs as deleterious of which 12 were predicted to most likely affect the structure/functional properties. These were either present in the N-term region crucial for sperm-zona interaction or in the zona domain. 31 additional SNPs in both coding and non-coding regions were also identified. Interestingly, some of these SNPs have been found to be present in infertile females in some recent studies.


2021 ◽  
Author(s):  
Mauro Lúcio Ferreira Souza ◽  
Jaime Viana de Sousa ◽  
João Farias Guerreiro

AbstractSingle nucleotide polymorphisms (SNPs) in the first intron of the FTO gene (alpha-ketoglutarate-dependent dioxygenase) identified by a genome-wide association study (GWAS) in 2007 continue to be the known variants with the greatest effect on adiposity in different human populations. Currently available data reveal a total of 61 different intronic SNPs associated with adiposity. Coding variants in the FTO gene, on the other hand, have been little explored, but data from complete sequencing of the exomes of various populations are available in public databases and provide an excellent opportunity to investigate potential functional variants in FTO. This study aimed to track nonsynonymous variants in the exons of the FTO gene in different population groups using the ExAC database (gnomAD) (http://exac.broadinstitute.org/) and to analyze the potential functional impact of these variants on the FTO protein. Variants were analyzed using five publicly available pathogenicity prediction programs. Of the 158 mutations identified (152 missense and 6 stop-gain), 64 (40.5%) were classified as pathogenic, 67 (42.4%) were classified as benign, and 27 (17%) were classified as inconclusive. Thirty variants were classified as pathogenic by all five predictors used in this study, and 16 mutations were classified as pathogenic by only one predictor. The largest number of mutations was found in Europeans (non-Finnish) (85/158), all with very low frequencies, and half (32/64) of the variants classified as pathogenic by the five predictors used were also found in this population. The data obtained in this analysis show that a large number of rare coding variants classified as pathogenic or potentially pathogenic by different in silico pathogenicity prediction programs are not detected by GWAS due to the low linkage disequilibrium as well as the limitations of GWAS in capturing rare variants present in less than 1.0% of the population.


2020 ◽  
Vol 36 (12) ◽  
pp. 3637-3644 ◽  
Author(s):  
Mark F Rogers ◽  
Tom R Gaunt ◽  
Colin Campbell

Abstract Motivation Next-generation sequencing technologies have accelerated the discovery of single nucleotide variants in the human genome, stimulating the development of predictors for classifying which of these variants are likely functional in disease, and which neutral. Recently, we proposed CScape, a method for discriminating between cancer driver mutations and presumed benign variants. For the neutral class, this method relied on benign germline variants found in the 1000 Genomes Project database. Discrimination could, therefore, be influenced by the distinction of germline versus somatic, rather than neutral versus disease driver. This motivates this article in which we consider predictive discrimination between recurrent and rare somatic single point mutations based solely on using cancer data, and the distinction between these two somatic classes and germline single point mutations. Results For somatic point mutations in coding and non-coding regions of the genome, we propose CScape-somatic, an integrative classifier for predictively discriminating between recurrent and rare variants in the human cancer genome. In this study, we use purely cancer genome data and investigate the distinction between minimal occurrence and significantly recurrent somatic single point mutations in the human cancer genome. We show that this type of predictive distinction can give novel insight, and may deliver more meaningful prediction in both coding and non-coding regions of the cancer genome. Tested on somatic mutations, CScape-somatic outperforms alternative methods, reaching 74% balanced accuracy in coding regions and 69% in non-coding regions, whereas even higher accuracy may be achieved using thresholds to isolate high-confidence predictions. Availability and implementation Predictions and software are available at http://CScape-somatic.biocompute.org.uk/. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document