scholarly journals AGFusion: annotate and visualize gene fusions

2016 ◽  
Author(s):  
Charlie Murphy ◽  
Olivier Elemento

AbstractSummaryThe discovery of novel gene fusions in tumor samples has rapidly accelerated with the rise of next-generation sequencing. A growing number of tools enable discovery of gene fusions from RNA-seq data. However it is likely that not all gene fusions are driving tumors. Assessing the potential functional consequences of a fusion is critical to understand their driver role. It is also challenging as gene fusions are described by chromosomal breakpoint coordinates that need to be translated into an actual amino acid fusion sequence and predicted domain architecture of the fusion proteins. Currently there are no easy-to-use tools that can automatically reconstruct and visualize fusion proteins from genomic breakpoints. To facilitate the functional interpretation of gene fusions, we developed AGFusion, available as an online web tool that can be readily used by non-computational researchers as well as a python package that can be built into computational pipelines. With minimal input from the user, AGFusion predicts the cDNA, CDS, and protein sequences of all gene fusion products based on all combinations of gene isoforms. For protein coding fusions, AGFusion can annotate and visualize the protein domain architecture. AGFusion currently supports Homo sapiens (genome builds GRCh37 and GRCh38) and Mus musculus (genome build GRCm38) and new genomes can easily be added.AvailabilityAGFusion python package is freely available at https://github.com/murphycj/AGFusion under the MIT license. The AGFusion web app is available at http://agfusion.info

2020 ◽  
Author(s):  
Kari Salokas ◽  
Rigbe G. Weldatsadik ◽  
Varjosalo Markku

ABSTRACTOncogenic gene fusions are estimated to account for up-to 20 % of cancer morbidity. Originally, oncofusions were identified in blood cancer, but recently multiple sequence-level studies of cancer genomes have established oncofusions throughout all tissue types. However, the functional implications of the identified oncofusions have often not been investigated. In this study, the identified oncofusions from a fusion detection approach (DEEPEST) were analyzed in more detail. In total, DEEPEST contains 28863 unique fusions. From sequence analysis, we found that almost 30% of them (8225) are expected to produce functional fusion proteins with features from both parent genes. Kinases and transcription factors were found to be the main gene families of the protein producing fusions. Considering their role as initiators, actors, and termination points of cellular signaling pathways, we focused our in-depth analyses on them. The domain architecture of the fusions, as well as of their expected interactors, suggests that abnormal molecular context of intact protein domains brought about by fusion events may unlock the oncogenic potential of the wild type counterparts of the fusion proteins. To understand overall effects of oncofusions on cellular signaling, we performed differential expression analysis using TCGA cancer project samples. Results indicated oncofusion-specific alterations in expression levels of individual genes, and overall lowering of the expression levels of key cellular pathways, such as signal transduction, proteolysis, microtubule cytoskeleton organization, and in particular regulation of transcription. The sum of our results suggests that kinase and transcription factor oncofusions globally deregulate cellular signaling, possibly via acquiring novel functions.


2021 ◽  
Author(s):  
Lambert Moyon ◽  
Camille Berthelot ◽  
Alexandra Louis ◽  
Nga Thi Thuy Nguyen ◽  
Hugues Roest Crollius

Whole genome sequencing is increasingly used to diagnose medical conditions of genetic origin. While both coding and non-coding DNA variants contribute to a wide range of diseases, most patients who receive a WGS-based diagnosis today harbour a protein-coding mutation. Functional interpretation and prioritization of non-coding variants represents a persistent challenge, and disease-causing non-coding variants remain largely unidentified. Depending on the disease, WGS fails to identify a candidate variant in 20-80% of patients, severely limiting the usefulness of sequencing for personalised medicine. Here we present FINSURF, a machine-learning approach to predict the functional impact of non-coding variants in regulatory regions. FINSURF outperforms state-of-the-art methods, owing to control optimisation during training. In addition to ranking candidate variants, FINSURF also delivers diagnostic information on functional consequences of mutations. We applied FINSURF to a diverse set of 30 diseases with described causative non-coding mutations, and correctly identified the disease-causative non-coding variant within the ten top hits in 22 cases. FINSURF is implemented as an online server to as well as custom browser tracks, and provides a quick and efficient solution to prioritize candidate non-coding variants in realistic clinical settings.


2017 ◽  
Vol 11 ◽  
pp. 117793221771247 ◽  
Author(s):  
Larissa Catharina ◽  
Carlyle Ribeiro Lima ◽  
Alexander Franca ◽  
Ana Carolina Ramos Guimarães ◽  
Marcelo Alves-Ferreira ◽  
...  

We present an approach for detecting enzymes that are specific of Leishmania major compared with Homo sapiens and provide targets that may assist research in drug development. This approach is based on traditional techniques of sequence homology comparison by similarity search and Markov modeling; it integrates the characterization of enzymatic functionality, secondary and tertiary protein structures, protein domain architecture, and metabolic environment. From 67 enzymes represented by 42 enzymatic activities classified by AnEnPi (Analogous Enzymes Pipeline) as specific for L major compared with H sapiens, only 40 (23 Enzyme Commission [EC] numbers) could actually be considered as strictly specific of L major and 27 enzymes (19 EC numbers) were disregarded for having ambiguous homologies or analogies with H sapiens. Among the 40 strictly specific enzymes, we identified sterol 24-C-methyltransferase, pyruvate phosphate dikinase, trypanothione synthetase, and RNA-editing ligase as 4 essential enzymes for L major that may serve as targets for drug development.


2018 ◽  
Author(s):  
Sarah Klass ◽  
Matthew J. Smith ◽  
Tahoe Fiala ◽  
Jessica Lee ◽  
Anthony Omole ◽  
...  

Herein, we describe a new series of fusion proteins that have been developed to self-assemble spontaneously into stable micelles that are 27 nm in diameter after enzymatic cleavage of a solubilizing protein tag. The sequences of the proteins are based on a human intrinsically disordered protein, which has been appended with a hydrophobic segment. The micelles were found to form across a broad range of pH, ionic strength, and temperature conditions, with critical micelle concentration (CMC) values below 1 µM being observed in some cases. The reported micelles were found to solubilize hydrophobic metal complexes and organic molecules, suggesting their potential suitability for catalysis and drug delivery applications.


Insects ◽  
2021 ◽  
Vol 12 (7) ◽  
pp. 652
Author(s):  
Hongwei Tan ◽  
Muhammad Naeem ◽  
Hussain Ali ◽  
Muhammad Shakeel ◽  
Haiou Kuang ◽  
...  

In Pakistan, Apis cerana, the Asian honeybee, has been used for honey production and pollination services. However, its genomic makeup and phylogenetic relationship with those in other countries are still unknown. We collected A. cerana samples from the main cerana-keeping region in Pakistan and performed whole genome sequencing. A total of 28 Gb of Illumina shotgun reads were generated, which were used to assemble the genome. The obtained genome assembly had a total length of 214 Mb, with a GC content of 32.77%. The assembly had a scaffold N50 of 2.85 Mb and a BUSCO completeness score of 99%, suggesting a remarkably complete genome sequence for A. cerana in Pakistan. A MAKER pipeline was employed to annotate the genome sequence, and a total of 11,864 protein-coding genes were identified. Of them, 6750 genes were assigned at least one GO term, and 8813 genes were annotated with at least one protein domain. Genome-scale phylogeny analysis indicated an unexpectedly close relationship between A. cerana in Pakistan and those in China, suggesting a potential human introduction of the species between the two countries. Our results will facilitate the genetic improvement and conservation of A. cerana in Pakistan.


2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Indrani Datta ◽  
Houtan Noushmehr ◽  
Chaya Brodie ◽  
Laila M. Poisson

Abstract Background Clinically relevant glioma subtypes, such as the glioma-CpG island methylator phenotype (G-CIMP), have been defined by epigenetics. In this study, the role of long non-coding RNAs in association with the poor-prognosis G-CMIP-low phenotype and the good-prognosis G-CMIP-high phenotype was investigated. Functional associations of lncRNAs with mRNAs and miRNAs were examined to hypothesize influencing factors of the aggressive phenotype. Methods RNA-seq data on 250 samples from TCGA’s Pan-Glioma study, quantified for lncRNA and mRNAs (GENCODE v28), were analyzed for differential expression between G-CIMP-low and G-CIMP-high phenotypes. Functional interpretation of the differential lncRNAs was performed by Ingenuity Pathway Analysis. Spearman rank order correlation estimates between lncRNA, miRNA, and mRNA nominated differential lncRNA with a likely miRNA sponge function. Results We identified 4371 differentially expressed features (mRNA = 3705; lncRNA = 666; FDR ≤ 5%). From these, the protein-coding gene TP53 was identified as an upstream regulator of differential lncRNAs PANDAR and PVT1 (p = 0.0237) and enrichment was detected in the “development of carcinoma” (p = 0.0176). Two lncRNAs (HCG11, PART1) were positively correlated with 342 mRNAs, and their correlation estimates diminish after adjusting for either of the target miRNAs: hsa-miR-490-3p, hsa-miR-129-5p. This suggests a likely sponge function for HCG11 and PART1. Conclusions These findings identify differential lncRNAs with oncogenic features that are associated with G-CIMP phenotypes. Further investigation with controlled experiments is needed to confirm the molecular relationships.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Giovanni Scala ◽  
Antonio Federico ◽  
Dario Greco

Abstract Background The investigation of molecular alterations associated with the conservation and variation of DNA methylation in eukaryotes is gaining interest in the biomedical research community. Among the different determinants of methylation stability, the DNA composition of the CpG surrounding regions has been shown to have a crucial role in the maintenance and establishment of methylation statuses. This aspect has been previously characterized in a quantitative manner by inspecting the nucleotidic composition in the region. Research in this field still lacks a qualitative perspective, linked to the identification of certain sequences (or DNA motifs) related to particular DNA methylation phenomena. Results Here we present a novel computational strategy based on short DNA motif discovery in order to characterize sequence patterns related to aberrant CpG methylation events. We provide our framework as a user-friendly, shiny-based application, CpGmotifs, to easily retrieve and characterize DNA patterns related to CpG methylation in the human genome. Our tool supports the functional interpretation of deregulated methylation events by predicting transcription factors binding sites (TFBS) encompassing the identified motifs. Conclusions CpGmotifs is an open source software. Its source code is available on GitHub https://github.com/Greco-Lab/CpGmotifs and a ready-to-use docker image is provided on DockerHub at https://hub.docker.com/r/grecolab/cpgmotifs.


Author(s):  
Veeraya Weerawongwiwat ◽  
Seokmin Yoon ◽  
Jong-Hwa Kim ◽  
Jung-Hoon Yoon ◽  
Jung Sook Lee ◽  
...  

A Gram-stain-negative, aerobic, motile, short rod-shaped, catalase-negative and oxidase-positive bacterium, strain CAU 1568T, was isolated from marine sediment sand sampled at Sido Island in the Republic of Korea. The optimum conditions for growth were at 25–30 °C, at pH 6.5–8.5 and with 0–4.0 % (w/v) NaCl. Phylogenetic analysis based on the 16S rRNA gene sequence indicated that strain CAU 1568T was a member of the genus Photobacterium with high similarity to Photobacterium salinisoli JCM 30852T (97.7 %), Photobacterium halotolerans KACC 17089T (97.3 %) and Photobacterium galatheae LMG F28894T (97.3 %). The predominant cellular fatty acids were C16 : 0, summed feature 3 (C16 : 1  ω6c and/or C16 : 1  ω7c) and summed feature 8 (C18 : 1  ω7c and/or C18 : 1  ω6c), with Q-8 as the major of isoprenoid quinone. The polar lipid profile consisted of diphosphatidylglycerol, phosphatidylglycerols, phosphatidylcholine, phosphatidylethanolamine, phospholipid, two aminophospholipids and three unidentified lipids. The whole genome size of strain CAU 1568T was 4.8 Mb with 50.1 mol% G+C content; including 38 contigs and 4233 protein-coding genes. These taxonomic data support CAU 1568T as representing a novel Photobacterium species, for which the name Photobacterium arenosum sp. nov. is proposed. The type strain of this novel species is CAU 1568T (=KCTC 82404T=MCCC 1K05668T).


2018 ◽  
Author(s):  
Alexander J. Hart ◽  
Samuel Ginzburg ◽  
Muyang (Sam) Xu ◽  
Cera R. Fisher ◽  
Nasim Rahmatpour ◽  
...  

ABSTRACTEnTAP (Eukaryotic Non-Model Transcriptome Annotation Pipeline) was designed to improve the accuracy, speed, and flexibility of functional gene annotation for de novo assembled transcriptomes in non-model eukaryotes. This software package addresses the fragmentation and related assembly issues that result in inflated transcript estimates and poor annotation rates, while focusing primarily on protein-coding transcripts. Following filters applied through assessment of true expression and frame selection, open-source tools are leveraged to functionally annotate the translated proteins. Downstream features include fast similarity search across three repositories, protein domain assignment, orthologous gene family assessment, and Gene Ontology term assignment. The final annotation integrates across multiple databases and selects an optimal assignment from a combination of weighted metrics describing similarity search score, taxonomic relationship, and informativeness. Researchers have the option to include additional filters to identify and remove contaminants, identify associated pathways, and prepare the transcripts for enrichment analysis. This fully featured pipeline is easy to install, configure, and runs significantly faster than comparable annotation packages. EnTAP is optimized to generate extensive functional information for the gene space of organisms with limited or poorly characterized genomic resources.


2015 ◽  
Vol 1 ◽  
pp. e33 ◽  
Author(s):  
Elisha D. Roberson

CRISPR/Cas9 is emerging as one of the most-used methods of genome modification in organisms ranging from bacteria to human cells. However, the efficiency of editing varies tremendously site-to-site. A recent report identified a novel motif, called the 3′GG motif, which substantially increases the efficiency of editing at all sites tested inC. elegans. Furthermore, they highlighted that previously published gRNAs with high editing efficiency also had this motif. I designed a Python command-line tool, ngg2, to identify 3′GG gRNA sites from indexed FASTA files. As a proof-of-concept, I screened for these motifs in six model genomes:Saccharomyces cerevisiae,Caenorhabditis elegans,Drosophila melanogaster,Danio rerio,Mus musculus, andHomo sapiens. I also scanned the genomes of pig (Sus scrofa) and African elephant (Loxodonta africana) to demonstrate the utility in non-model organisms. I identified more than 60 million single match 3′GG motifs in these genomes. Greater than 61% of all protein coding genes in the reference genomes had at least one unique 3′GG gRNA site overlapping an exon. In particular, more than 96% of mouse and 93% of human protein coding genes have at least one unique, overlapping 3′GG gRNA. These identified sites can be used as a starting point in gRNA selection, and the ngg2 tool provides an important ability to identify 3′GG editing sites in any species with an available genome sequence.


Sign in / Sign up

Export Citation Format

Share Document