scholarly journals Improving GENCODE reference gene annotation using a high-stringency proteogenomics workflow

2016 ◽  
Vol 7 (1) ◽  
Author(s):  
James C. Wright ◽  
Jonathan Mudge ◽  
Hendrik Weisser ◽  
Mitra P. Barzine ◽  
Jose M. Gonzalez ◽  
...  
2015 ◽  
Vol 26 (9-10) ◽  
pp. 366-378 ◽  
Author(s):  
Jonathan M. Mudge ◽  
Jennifer Harrow

2021 ◽  
Vol 15 (1) ◽  
Author(s):  
Zeeshan Ahmed ◽  
Eduard Gibert Renart ◽  
Saman Zeeshan ◽  
XinQi Dong

Abstract Background Genetic disposition is considered critical for identifying subjects at high risk for disease development. Investigating disease-causing and high and low expressed genes can support finding the root causes of uncertainties in patient care. However, independent and timely high-throughput next-generation sequencing data analysis is still a challenge for non-computational biologists and geneticists. Results In this manuscript, we present a findable, accessible, interactive, and reusable (FAIR) bioinformatics platform, i.e., GVViZ (visualizing genes with disease-causing variants). GVViZ is a user-friendly, cross-platform, and database application for RNA-seq-driven variable and complex gene-disease data annotation and expression analysis with a dynamic heat map visualization. GVViZ has the potential to find patterns across millions of features and extract actionable information, which can support the early detection of complex disorders and the development of new therapies for personalized patient care. The execution of GVViZ is based on a set of simple instructions that users without a computational background can follow to design and perform customized data analysis. It can assimilate patients’ transcriptomics data with the public, proprietary, and our in-house developed gene-disease databases to query, easily explore, and access information on gene annotation and classified disease phenotypes with greater visibility and customization. To test its performance and understand the clinical and scientific impact of GVViZ, we present GVViZ analysis for different chronic diseases and conditions, including Alzheimer’s disease, arthritis, asthma, diabetes mellitus, heart failure, hypertension, obesity, osteoporosis, and multiple cancer disorders. The results are visualized using GVViZ and can be exported as image (PNF/TIFF) and text (CSV) files that include gene names, Ensembl (ENSG) IDs, quantified abundances, expressed transcript lengths, and annotated oncology and non-oncology diseases. Conclusions We emphasize that automated and interactive visualization should be an indispensable component of modern RNA-seq analysis, which is currently not the case. However, experts in clinics and researchers in life sciences can use GVViZ to visualize and interpret the transcriptomics data, making it a powerful tool to study the dynamics of gene expression and regulation. Furthermore, with successful deployment in clinical settings, GVViZ has the potential to enable high-throughput correlations between patient diagnoses based on clinical and transcriptomics data.


Insects ◽  
2021 ◽  
Vol 12 (5) ◽  
pp. 396
Author(s):  
Natrada Mitpuangchon ◽  
Kwan Nualcharoen ◽  
Singtoe Boonrotpong ◽  
Patamarerk Engsontia

Many animal species can produce venom for defense, predation, and competition. The venom usually contains diverse peptide and protein toxins, including neurotoxins, proteolytic enzymes, protease inhibitors, and allergens. Some drugs for cancer, neurological disorders, and analgesics were developed based on animal toxin structures and functions. Several caterpillar species possess venoms that cause varying effects on humans both locally and systemically. However, toxins from only a few species have been investigated, limiting the full understanding of the Lepidoptera toxin diversity and evolution. We used the RNA-seq technique to identify toxin genes from the stinging nettle caterpillar, Parasa lepida (Cramer, 1799). We constructed a transcriptome from caterpillar urticating hairs and reported 34,968 unique transcripts. Using our toxin gene annotation pipeline, we identified 168 candidate toxin genes, including protease inhibitors, proteolytic enzymes, and allergens. The 21 P. lepida novel Knottin-like peptides, which do not show sequence similarity to any known peptide, have predicted 3D structures similar to tarantula, scorpion, and cone snail neurotoxins. We highlighted the importance of convergent evolution in the Lepidoptera toxin evolution and the possible mechanisms. This study opens a new path to understanding the hidden diversity of Lepidoptera toxins, which could be a fruitful source for developing new drugs.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Fei Xiong ◽  
Xiangyun Cheng ◽  
Chao Zhang ◽  
Roland Manfred Klar ◽  
Tao He

Abstract Background Reverse transcription quantitative real-time polymerase chain reaction (RT-qPCR) remains one of the best-established techniques to assess gene expression patterns. However, appropriate reference gene(s) selection remains a critical and challenging subject in which inappropriate reference gene selction can distort results leading to false interpretations. To date, mixed opinions still exist in how to choose the most optimal reference gene sets in accodrance to the Minimum Information for Publication of Quantitative Real-Time PCR Experiments (MIQE) guideline. Therefore, the purpose of this study was to investigate which schemes were the most feasible for the identification of reference genes in a bone and cartilage bioengineering experimental setting. In this study, rat bone mesenchymal stem cells (rBMSCs), skeletal muscle tissue and adipose tissue were utilized, undergoing either chondrogenic or osteogenic induction, to investigate the optimal reference gene set identification scheme that would subsequently ensure stable and accurate interpretation of gene expression in bone and cartilage bioengineering. Results The stability and pairwise variance of eight candidate reference genes were analyzed using geNorm. The V0.15- vs. Vmin-based normalization scheme in rBMSCs had no significant effect on the eventual normalization of target genes. In terms of the muscle tissue, the results of the correlation of NF values between the V0.15 and Vmin schemes and the variance of target genes expression levels generated by these two schemes showed that different schemes do indeed have a significant effect on the eventual normalization of target genes. Three selection schemes were adopted in terms of the adipose tissue, including the three optimal reference genes (Opt3), V0.20 and Vmin schemes, and the analysis of NF values with eventual normalization of target genes showed that the different selection schemes also have a significant effect on the eventual normalization of target genes. Conclusions Based on these results, the proposed cut-off value of Vn/n + 1 under 0.15, according to the geNorm algorithm, should be considered with caution. For cell only experiments, at least rBMSCs, a Vn/n + 1 under 0.15 is sufficient in RT-qPCR studies. However, when using certain tissue types such as skeletal muscle and adipose tissue the minimum Vn/n + 1 should be used instead as this provides a far superior mode of generating accurate gene expression results. We thus recommended that when the stability and variation of a candidate reference genes in a specific study is unclear the minimum Vn/n + 1 should always be used as this ensures the best and most accurate gene expression value is achieved during RT-qPCR assays.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Michael F. Z. Wang ◽  
Madhav Mantri ◽  
Shao-Pei Chou ◽  
Gaetano J. Scuderi ◽  
David W. McKellar ◽  
...  

AbstractConventional scRNA-seq expression analyses rely on the availability of a high quality genome annotation. Yet, as we show here with scRNA-seq experiments and analyses spanning human, mouse, chicken, mole rat, lemur and sea urchin, genome annotations are often incomplete, in particular for organisms that are not routinely studied. To overcome this hurdle, we created a scRNA-seq analysis routine that recovers biologically relevant transcriptional activity beyond the scope of the best available genome annotation by performing scRNA-seq analysis on any region in the genome for which transcriptional products are detected. Our tool generates a single-cell expression matrix for all transcriptionally active regions (TARs), performs single-cell TAR expression analysis to identify biologically significant TARs, and then annotates TARs using gene homology analysis. This procedure uses single-cell expression analyses as a filter to direct annotation efforts to biologically significant transcripts and thereby uncovers biology to which scRNA-seq would otherwise be in the dark.


Author(s):  
E-Ming Rau ◽  
Inga Marie Aasen ◽  
Helga Ertesvåg

Abstract Thraustochytrids are oleaginous marine eukaryotic microbes currently used to produce the essential omega-3 fatty acid docosahexaenoic acid (DHA, C22:6 n-3). To improve the production of this essential fatty acid by strain engineering, it is important to deeply understand how thraustochytrids synthesize fatty acids. While DHA is synthesized by a dedicated enzyme complex, other fatty acids are probably synthesized by the fatty acid synthase, followed by desaturases and elongases. Which unsaturated fatty acids are produced differs between different thraustochytrid genera and species; for example, Aurantiochytrium sp. T66, but not Aurantiochytrium limacinum SR21, synthesizes palmitoleic acid (C16:1 n-7) and vaccenic acid (C18:1 n-7). How strain T66 can produce these fatty acids has not been known, because BLAST analyses suggest that strain T66 does not encode any Δ9-desaturase-like enzyme. However, it does encode one Δ12-desaturase-like enzyme. In this study, the latter enzyme was expressed in A. limacinum SR21, and both C16:1 n-7 and C18:1 n-7 could be detected in the transgenic cells. Our results show that this desaturase, annotated T66Des9, is a Δ9-desaturase accepting C16:0 as a substrate. Phylogenetic studies indicate that the corresponding gene probably has evolved from a Δ12-desaturase-encoding gene. This possibility has not been reported earlier and is important to consider when one tries to deduce the potential a given organism has for producing unsaturated fatty acids based on its genome sequence alone. Key points • In thraustochytrids, automatic gene annotation does not always explain the fatty acids produced. • T66Des9 is shown to synthesize palmitoleic acid (C16:1 n-7). • T66des9 has probably evolved from Δ12-desaturase-encoding genes.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Penghua Gao ◽  
Hao Zhang ◽  
Huijun Yan ◽  
Qigang Wang ◽  
Bo Yan ◽  
...  

Abstract Background Rose is an important economic crop in horticulture. However, its field growth and postharvest quality are negatively affected by grey mould disease caused by Botrytis c. However, it is unclear how rose plants defend themselves against this fungal pathogen. Here, we used transcriptomic, metabolomic and VIGS analyses to explore the mechanism of resistance to Botrytis c. Result In this study, a protein activity analysis revealed a significant increase in defence enzyme activities in infected plants. RNA-Seq of plants infected for 0 h, 36 h, 60 h and 72 h produced a total of 54 GB of clean reads. Among these reads, 3990, 5995 and 8683 differentially expressed genes (DEGs) were found in CK vs. T36, CK vs. T60 and CK vs. T72, respectively. Gene annotation and cluster analysis of the DEGs revealed a variety of defence responses to Botrytis c. infection, including resistance (R) proteins, MAPK cascade reactions, plant hormone signal transduction pathways, plant-pathogen interaction pathways, Ca2+ and disease resistance-related genes. qPCR verification showed the reliability of the transcriptome data. The PTRV2-RcTGA1-infected plant material showed improved susceptibility of rose to Botrytis c. A total of 635 metabolites were detected in all samples, which could be divided into 29 groups. Metabonomic data showed that a total of 59, 78 and 74 DEMs were obtained for T36, T60 and T72 (T36: Botrytis c. inoculated rose flowers at 36 h; T60: Botrytis c. inoculated rose flowers at 60 h; T72: Botrytis c. inoculated rose flowers at 72 h) compared to CK, respectively. A variety of secondary metabolites are related to biological disease resistance, including tannins, amino acids and derivatives, and alkaloids, among others; they were significantly increased and enriched in phenylpropanoid biosynthesis, glucosinolates and other disease resistance pathways. This study provides a theoretical basis for breeding new cultivars that are resistant to Botrytis c. Conclusion Fifty-four GB of clean reads were generated through RNA-Seq. R proteins, ROS signalling, Ca2+ signalling, MAPK signalling, and SA signalling were activated in the Old Blush response to Botrytis c. RcTGA1 positively regulates rose resistance to Botrytis c. A total of 635 metabolites were detected in all samples. DEMs were enriched in phenylpropanoid biosynthesis, glucosinolates and other disease resistance pathways.


Sign in / Sign up

Export Citation Format

Share Document