scholarly journals Influence of neighboring small sequence variants on functional impact prediction

2019 ◽  
Author(s):  
Jan-Simon Baasner ◽  
Dakota Howard ◽  
Boas Pucker

AbstractOnce a suitable reference sequence is generated, genomic differences within a species are often assessed by re-sequencing. Variant calling processes can reveal all differences between two strains, accessions, genotypes, or individuals. These variants can be enriched with predictions about their functional implications based on available structural annotations i.e. gene models. Although these functional impact predictions on a per variant basis are often accurate, some challenging cases require the simultaneous incorporation of multiple adjacent variants into this prediction process. Examples are neighboring variants which modify each others’ functional impact. Neighborhood-Aware Variant Impact Predictor (NAVIP) considers all variants within a given protein coding sequence when predicting the functional consequences. As a proof of concept, variants between the Arabidopsis thaliana accessions Columbia-0 and Niederzenz-1 were annotated. NAVIP is freely available on github: https://github.com/bpucker/NAVIP.

2016 ◽  
Vol 2 ◽  
pp. e71
Author(s):  
Justin Bedo ◽  
Benjamin Goudey ◽  
Jeremy Wazny ◽  
Zeyu Zhou

While traditional methods for calling variants across whole genome sequence data rely on alignment to an appropriate reference sequence, alternative techniques are needed when a suitable reference does not exist. We present a novel alignment and assembly free variant calling method based on information theoretic principles designed to detect variants have strong statistical evidence for their ability to segregate samples in a given dataset. Our method uses the context surrounding a particular nucleotide to define variants. Given a set of reads, we model the probability of observing a given nucleotide conditioned on the surrounding prefix and suffixes of lengthkas a multinomial distribution. We then estimate which of these contexts are stable intra-sample and varying inter-sample using a statistic based on the Kullback–Leibler divergence.The utility of the variant calling method was evaluated through analysis of a pair of bacterial datasets and a mouse dataset. We found that our variants are highly informative for supervised learning tasks with performance similar to standard reference based calls and another reference free method (DiscoSNP++). Comparisons against reference based calls showed our method was able to capture very similar population structure on the bacterial dataset. The algorithm’s focus on discriminatory variants makes it suitable for many common analysis tasks for organisms that are too diverse to be mapped back to a single reference sequence.


2016 ◽  
Author(s):  
Justin Bedo ◽  
Benjamin Goudey ◽  
Jeremy Wazny ◽  
Zeyu Zhou

While traditional methods for calling variants across whole genome sequence data rely on alignment to an appropriate reference sequence, alternative techniques are needed when a suitable reference does not exist. We present a novel alignment and assembly free variant calling method based on information theoretic principles designed to detect variants have strong statistical evidence for their ability to segregate samples in a given dataset. Our method uses the context surrounding a particular nucleotide to define variants. Given a set of reads, we model the probability of observing a given nucleotide conditioned on the surrounding prefix and suffixes of length k as a multinomial distribution. We then estimate which of these contexts are stable intra-sample and varying inter-sample using a statistic based on the Kullback–Leibler divergence. The utility of the variant calling method was evaluated through analysis of a pair of bacterial datasets and a mouse dataset. We found that our variants are highly informative for supervised learning tasks with performance similar to standard reference based calls and another reference free method (DiscoSNP++). Comparisons against reference based calls showed our method was able to capture very similar population structure on the bacterial dataset. The algorithm’s focus on discriminatory variants makes it suitable for many common analysis tasks for organisms that are too diverse to be mapped back to a single reference sequence.


2016 ◽  
Author(s):  
Justin Bedo ◽  
Benjamin Goudey ◽  
Jeremy Wazny ◽  
Zeyu Zhou

While traditional methods for calling variants across whole genome sequence data rely on alignment to an appropriate reference sequence, alternative techniques are needed when a suitable reference does not exist. We present a novel alignment and assembly free variant calling method based on information theoretic principles designed to detect variants have strong statistical evidence for their ability to segregate samples in a given dataset. Our method uses the context surrounding a particular nucleotide to define variants. Given a set of reads, we model the probability of observing a given nucleotide conditioned on the surrounding prefix and suffixes of length k as a multinomial distribution. We then estimate which of these contexts are stable intra-sample and varying inter-sample using a statistic based on the Kullback–Leibler divergence. The utility of the variant calling method was evaluated through analysis of a pair of bacterial datasets and a mouse dataset. We found that our variants are highly informative for supervised learning tasks with performance similar to standard reference based calls and another reference free method (DiscoSNP++). Comparisons against reference based calls showed our method was able to capture very similar population structure on the bacterial dataset. The algorithm’s focus on discriminatory variants makes it suitable for many common analysis tasks for organisms that are too diverse to be mapped back to a single reference sequence.


Hypertension ◽  
2014 ◽  
Vol 64 (suppl_1) ◽  
Author(s):  
Bruno Cerrato ◽  
Oscar Carretero ◽  
Hernán Grecco ◽  
Mariela M Gironacci

G protein-coupled receptors (R) exist as homo- or hetero-oligomers, which is essential for receptor function. Since BK actions were blocked by a Mas R antagonist or that Ang-(1-7) responses disappeared when the BK receptor B2 was blocked, we hypothesized that Mas and B2 Rs on the plasma membrane may interact through hetero-oligomer formation. Our aim was to investigate the existence of heteromerization between Mas and B2 Rs by the fluorescence energy transfer (FRET) technique and the functional consequences of this oligomer formation. HEK293T cells were transfected with the coding sequence for Mas R fused to YFP and B2 R fused to CFP. After 48 h cells were incubated in the absence and presence of 1 μM Ang-(1-7) or BK during 15 min and interaction between Mas and B2 R was evaluated by FRET. Functional consequences of this interaction were determined by ligand binding assays. A positive FRET was observed in cells cotransfected with MasR-YFP and B2R-CFP, suggesting that both Mas and B2 Rs interact by a hetero-oligomer formation in a constitutive manner. This hetero-oligomer was not altered by the agonist because FRET was not modified when the cells were stimulated with BK or Ang-(1-7). Ang-(1-7) or BK induced internalization of this hetero-oligomer into early endosomes since MasR-YFP or B2R-CFP colocalized with Rab-5, an early endosome marker, after ligand stimulation. When MasR-YFP plus B2R-CFP transfected cells were stimulated with Ang-(1-7) there was a decrease of 82±6% in Mas R and 58±4% in B2 R present in the plasma membrane. Conversely, when MasR-YFP plus B2R-CFP transfected cells were stimulated with BK there was a decrease of 91±4% in B2 R and 53±3% in Mas R in the plasma membrane. This result clearly demonstrates that in co-expressing cells of both receptors the selective stimulation of one of the GPCRs promotes co-internalization of both receptors. We conclude that Mas and B2 Rs constitutively interact through an hetero-oligomer formation at the plasma membrane which may explain the cross-talk between Ang-(1-7) and BK. This hetero-oligomer is internalized upon stimulation with either Ang-(1-7) or BK, leading to a decrease in the number of Rs present in the membrane.


2015 ◽  
Vol 1 ◽  
pp. e33 ◽  
Author(s):  
Elisha D. Roberson

CRISPR/Cas9 is emerging as one of the most-used methods of genome modification in organisms ranging from bacteria to human cells. However, the efficiency of editing varies tremendously site-to-site. A recent report identified a novel motif, called the 3′GG motif, which substantially increases the efficiency of editing at all sites tested inC. elegans. Furthermore, they highlighted that previously published gRNAs with high editing efficiency also had this motif. I designed a Python command-line tool, ngg2, to identify 3′GG gRNA sites from indexed FASTA files. As a proof-of-concept, I screened for these motifs in six model genomes:Saccharomyces cerevisiae,Caenorhabditis elegans,Drosophila melanogaster,Danio rerio,Mus musculus, andHomo sapiens. I also scanned the genomes of pig (Sus scrofa) and African elephant (Loxodonta africana) to demonstrate the utility in non-model organisms. I identified more than 60 million single match 3′GG motifs in these genomes. Greater than 61% of all protein coding genes in the reference genomes had at least one unique 3′GG gRNA site overlapping an exon. In particular, more than 96% of mouse and 93% of human protein coding genes have at least one unique, overlapping 3′GG gRNA. These identified sites can be used as a starting point in gRNA selection, and the ngg2 tool provides an important ability to identify 3′GG editing sites in any species with an available genome sequence.


2016 ◽  
Author(s):  
Héctor Climente-González ◽  
Eduard Porta-Pardo ◽  
Adam Godzik ◽  
Eduardo Eyras

SummaryAlternative splicing changes are frequently observed in cancer and are starting to be recognized as important signatures for tumor progression and therapy. However, their functional impact and relevance to tumorigenesis remains mostly unknown. We carried out a systematic analysis to characterize the potential functional consequences of alternative splicing changes in thousands of tumor samples. This analysis revealed that a subset of alternative splicing changes affect protein domain families that are frequently mutated in tumors and potentially disrupt protein protein interactions in cancer-related pathways. Moreover, there was a negative correlation between the number of these alternative splicing changes in a sample and the number of somatic mutations in drivers. We propose that a subset of the alternative splicing changes observed in tumors may represent independent oncogenic processes that could be relevant to explain the functional transformations in cancer and some of them could potentially be considered alternative splicing drivers (AS-drivers).


2020 ◽  
Vol 2020 ◽  
pp. 1-16
Author(s):  
Javad Behroozi ◽  
Shirin Shahbazi ◽  
Mohammad Reza Bakhtiarizadeh ◽  
Habibollah Mahmoodzadeh

RNA editing is a posttranscriptional nucleotide modification in humans. Of the various types of RNA editing, the adenosine to inosine substitution is the most widespread in higher eukaryotes, which is mediated by the ADAR family enzymes. Inosine is recognized by the biological machinery as guanosine; therefore, editing could have substantial functional effects throughout the genome. RNA editing could contribute to cancer either by exclusive editing of tumor suppressor/promoting genes or by introducing transcriptomic diversity to promote cancer progression. Here, we provided a comprehensive overview of the RNA editing sites in gastric adenocarcinoma and highlighted some of their possible contributions to gastric cancer. RNA-seq data corresponding to 8 gastric adenocarcinoma and their paired nontumor counterparts were retrieved from the GEO database. After preprocessing and variant calling steps, a stringent filtering pipeline was employed to distinguish potential RNA editing sites from SNPs. The identified potential editing sites were annotated and compared with those in the DARNED database. Totally, 12362 high-confidence adenosine to inosine RNA editing sites were detected across all samples. Of these, 12105 and 257 were known and novel editing events, respectively. These editing sites were unevenly distributed across genomic regions, and nearly half of them were located in 3 ′ UTR. Our results revealed that 4868 editing sites were common in both normal and cancer tissues. From the remaining sites, 3985 and 3509 were exclusive to normal and cancer tissues, respectively. Further analysis revealed a significant number of differentially edited events among these sites, which were located in protein coding genes and microRNAs. Given the distinct pattern of RNA editing in gastric adenocarcinoma and adjacent normal tissue, edited sites have the potential to serve as the diagnostic biomarkers and therapeutic targets in gastric cancer.


Blood ◽  
2013 ◽  
Vol 122 (21) ◽  
pp. 1535-1535
Author(s):  
Becky Pinjou Tsai ◽  
Liang Li ◽  
Min Li ◽  
Patrick Halliday ◽  
Anastasia Rosales ◽  
...  

Abstract Therapy-related myelodysplasia/acute myeloid leukemia (t-MDS) is a lethal complication of cytotoxic cancer treatment. The incidence of t-MDS is particularly high in patients undergoing autologous hematopoietic cell transplantation (aHCT) for Hodgkin lymphoma (HL) or non-Hodgkin lymphoma (NHL). To better understand pathogenetic mechanisms underlying development of t-MDS we are prospectively following a cohort of patients undergoing aHCT for lymphoma at our center. In previous studies using samples from this cohort we identified altered expression of genes related to mitochondrial function in pre-aHCT samples from lymphoma patients who later developed t-MDS (cases) compared to patients from the cohort who did not develop t-MDS (controls) (Cancer Cell 2011 20:591). Altered mitochondrial gene expression was associated with increased levels of mitochondrial reactive oxygen species (ROS) in samples from patients who subsequently developed t-MDS. These results suggest a possible role for mitochondrial dysfunction and increased ROS in pathogenesis of t-MDS. Somatic mutations of mitochondria DNA (mtDNA) are observed at increased frequency in many malignant conditions. Somatic mtDNA mutations may reflect increased susceptibility to mutagenesis, and could contribute to altered mitochondrial function in patients with t-MDS. Here we sought to investigate the role of mitochondrial genomic instability in t-MDS development by analyzing the mutation profile of mtDNA in hematopoietic cells from pre-HCT samples obtained from cases that developed t-MDS and controls that did not develop t-MDS after aHCT for lymphoma. We isolated myeloid and lymphoid cell populations from t-MDS cases (n=13) and controls (n=18) using flow cytometry, amplified mtDNA by PCR using specific primers, and sequenced pooled, barcoded amplicons using next generation sequencing on a Illumina Hi-Seq instrument. Sequences were alighted to the revised Cambridge Reference Sequence (rCRS) in the MITOMAP database. We did not identify significant differences in abundance of variations between controls and cases in protein coding genes (control 13.9, case 14.9), hypervariable region (control 7.7, case 6.6) and rRNA/tRNA genes (control 5.7, case 5.6). In addition we did not observe differences in the abundance of variation between cases and controls in individual genes. We also did not detect significant differences in the abundance of SNPs resulting in non-synonymous amino acid changes in lymphoid compared to myeloid populations from both cases (lymphoid 5.0, myeloid 4.7) and controls (lymphoid 5.3, myeloid 4.3). Several new variations in both coding and noncoding regions were identified that were not previously reported in the rCRS database, which were present in greater abundance in myeloid (control 5.3, case 5.3) compared to lymphoid cells (control 1.7, case 2.2). Amongst new variations, the top 5 protein coding genes with the highest number of mutations were found in myeloid cells and included NADH dehydrogenase subunit 2, 4, and 5, and cytochrome c oxidase subunit 1 and 2. We also analyzed the specific mutations that displayed a variation frequency greater than 25% in myeloid control and case samples. Interestingly, the A10398G mutation, in the NADH dehydrogenase subunit 3 (ND3) coding region, associated with metabolic syndrome and increased risk for breast cancer, was found in ∼30% of t-MDS myeloid case samples. In summary our data suggests that there is increased abundance of novel variations in mtDNA in myeloid compared to lymphoid populations. Although several new, previously unreported variations were identified, we did not observe any significant differences in type of lesions, abundance of lesions in coding and non-coding regions, or in individual genes between cases and controls. We conclude that changes in mtDNA sequence are not significantly different between patients that later develop t-MDS after aHCT (cases) compared to controls that do not develop t-MDS. These observations suggest that differences in mtDNA sequence cannot by itself explain the alterations in altered mitochondrial function seen early during the course of development of t-MDS. Disclosures: No relevant conflicts of interest to declare.


2019 ◽  
Author(s):  
Gloria M. Sheynkman ◽  
Katharine S. Tuttle ◽  
Elizabeth Tseng ◽  
Jason G. Underwood ◽  
Liang Yu ◽  
...  

AbstractMost human protein-coding genes are expressed as multiple isoforms. This in turn greatly expands the functional repertoire of the encoded proteome. While at least one reliable open reading frame (ORF) model has been assigned for every gene, the majority of alternative isoforms remains uncharacterized experimentally. This is primarily due to: i) vast differences of overall levels between different isoforms expressed from common genes, and ii) the difficulty of obtaining contiguous full-length ORF sequences. Here, we present ORF Capture-Seq (OCS), a flexible and cost-effective method that addresses both challenges for targeted full-length isoform sequencing applications using collections of cloned ORFs as probes. As proof-of-concept, we show that an OCS pipeline focused on genes coding for transcription factors increases isoform detection by an order of magnitude, compared to unenriched sample. In short, OCS enables rapid discovery of isoforms from custom-selected genes and will allow mapping of the full set of human isoforms at reasonable cost.


2021 ◽  
Vol 118 (11) ◽  
pp. e2016274118 ◽  
Author(s):  
Julia V. Halo ◽  
Amanda L. Pendleton ◽  
Feichen Shen ◽  
Aurélien J. Doucet ◽  
Thomas Derrien ◽  
...  

Technological advances have allowed improvements in genome reference sequence assemblies. Here, we combined long- and short-read sequence resources to assemble the genome of a female Great Dane dog. This assembly has improved continuity compared to the existing Boxer-derived (CanFam3.1) reference genome. Annotation of the Great Dane assembly identified 22,182 protein-coding gene models and 7,049 long noncoding RNAs, including 49 protein-coding genes not present in the CanFam3.1 reference. The Great Dane assembly spans the majority of sequence gaps in the CanFam3.1 reference and illustrates that 2,151 gaps overlap the transcription start site of a predicted protein-coding gene. Moreover, a subset of the resolved gaps, which have an 80.95% median GC content, localize to transcription start sites and recombination hotspots more often than expected by chance, suggesting the stable canine recombinational landscape has shaped genome architecture. Alignment of the Great Dane and CanFam3.1 assemblies identified 16,834 deletions and 15,621 insertions, as well as 2,665 deletions and 3,493 insertions located on secondary contigs. These structural variants are dominated by retrotransposon insertion/deletion polymorphisms and include 16,221 dimorphic canine short interspersed elements (SINECs) and 1,121 dimorphic long interspersed element-1 sequences (LINE-1_Cfs). Analysis of sequences flanking the 3′ end of LINE-1_Cfs (i.e., LINE-1_Cf 3′-transductions) suggests multiple retrotransposition-competent LINE-1_Cfs segregate among dog populations. Consistent with this conclusion, we demonstrate that a canine LINE-1_Cf element with intact open reading frames can retrotranspose its own RNA and that of a SINEC_Cf consensus sequence in cultured human cells, implicating ongoing retrotransposon activity as a driver of canine genetic variation.


Sign in / Sign up

Export Citation Format

Share Document