novel transcripts
Recently Published Documents


TOTAL DOCUMENTS

165
(FIVE YEARS 32)

H-INDEX

29
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Luis Ferrandez-Peral ◽  
Xiaoyu Zhan ◽  
Marina Alvarez-Estape ◽  
Cristina Chiva ◽  
Paula Esteller-Cucala ◽  
...  

Transcriptomic diversity greatly contributes to the fundamentals of disease, lineage-specific biology, and environmental adaptation. However, much of the actual isoform repertoire contributing to shaping primate evolution remains unknown. Here, we combined deep long- and short-read sequencing complemented with mass spectrometry proteomics in a panel of lymphoblastoid cell lines (LCLs) from human, three other great apes, and rhesus macaque, producing the largest full-length isoform catalog in primates to date. Our transcriptomes reveal thousands of novel transcripts, some of them under active translation, expanding and completing the repertoire of primate gene models. Our comparative analyses unveil hundreds of transcriptomic innovations and isoform usage changes related to immune function and immunological disorders. The confluence of these innovations with signals of positive selection and their limited impact in the proteome points to changes in alternative splicing in genes involved in immune response as an important target of recent regulatory divergence in primates.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e12454
Author(s):  
Zehu Yuan ◽  
Ling Ge ◽  
Jingyi Sun ◽  
Weibo Zhang ◽  
Shanhe Wang ◽  
...  

Background Nowadays, both customers and producers prefer thin-tailed fat sheep. To effectively breed for this phenotype, it is important to identify candidate genes and uncover the genetic mechanism related to tail fat deposition in sheep. Accumulating evidence suggesting that post-transcriptional modification events of precursor-messenger RNA (pre-mRNA), including alternative splicing (AS) and alternative polyadenylation (APA), may regulate tail fat deposition in sheep. Differentially expressed transcripts (DETs) analysis is a way to identify candidate genes related to tail fat deposition. However, due to the technological limitation, post-transcriptional modification events in the tail fat of sheep and DETs between thin-tailed and fat-tailed sheep remains unclear. Methods In the present study, we applied pooled PacBio isoform sequencing (Iso-Seq) to generate transcriptomic data of tail fat tissue from six sheep (three thin-tailed sheep and three fat-tailed sheep). By comparing with reference genome, potential gene loci and novel transcripts were identified. Post-transcriptional modification events, including AS and APA, and lncRNA in sheep tail fat were uncovered using pooled Iso-Seq data. Combining Iso-Seq data with six RNA-sequencing (RNA-Seq) data, DETs between thin- and fat-tailed sheep were identified. Protein protein interaction (PPI) network, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were implemented to investigate the potential functions of DETs. Results In the present study, we revealed the transcriptomic complexity of the tail fat of sheep, result in 9,001 potential novel gene loci, 17,834 AS events, 5,791 APA events, and 3,764 lncRNAs. Combining Iso-Seq data with RNA-Seq data, we identified hundreds of DETs between thin- and fat-tailed sheep. Among them, 21 differentially expressed lncRNAs, such as ENSOART00020036299, ENSOART00020033641, ENSOART00020024562, ENSOART00020003848 and 9.53.1 may regulate tail fat deposition. Many novel transcripts were identified as DETs, including 15.527.13 (DGAT2), 13.624.23 (ACSS2), 11.689.28 (ACLY), 11.689.18 (ACLY), 11.689.14 (ACLY), 11.660.12 (ACLY), 22.289.6 (SCD), 22.289.3 (SCD) and 22.289.14 (SCD). Most of the identified DETs have been enriched in GO and KEGG pathways related to extracellular matrix (ECM). Our result revealed the transcriptome complexity and identified many candidate transcripts in tail fat, which could enhance the understanding of molecular mechanisms behind tail fat deposition.


Genes ◽  
2021 ◽  
Vol 12 (10) ◽  
pp. 1549
Author(s):  
Amit Singh ◽  
Géza Schermann ◽  
Sven Reislöhner ◽  
Nikola Kellner ◽  
Ed Hurt ◽  
...  

A correct genome annotation is fundamental for research in the field of molecular and structural biology. The annotation of the reference genome of Chaetomium thermophilum has been reported previously, but it is essentially limited to open reading frames (ORFs) of protein coding genes and contains only a few noncoding transcripts. In this study, we identified and annotated full-length transcripts of C. thermophilum by deep RNA sequencing. We annotated 7044 coding genes and 4567 noncoding genes. Astonishingly, 23% of the coding genes are alternatively spliced. We identified 679 novel coding genes as well as 2878 novel noncoding genes and corrected the structural organization of more than 50% of the previously annotated genes. Furthermore, we substantially extended the Gene Ontology (GO) and Enzyme Commission (EC) lists, which provide comprehensive search tools for potential industrial applications and basic research. The identified novel transcripts and improved annotation will help to understand the gene regulatory landscape in C. thermophilum. The analysis pipeline developed here can be used to build transcriptome assemblies and identify coding and noncoding RNAs of other species.


2021 ◽  
Author(s):  
Ridvan Eksi ◽  
Daiyao Yi ◽  
Hongyang Li ◽  
Bradley Godfrey ◽  
Lisa R. Mathew ◽  
...  

AbstractStudying isoform expression at the microscopic level has always been a challenging task. A classical example is kidney, where glomerular and tubulo-insterstitial compartments carry out drastically different physiological functions and thus presumably their isoform expression also differs. We aim at developing an experimental and computational pipeline for identifying isoforms at microscopic structure-level. We microdissed glomerular and tubulo-interstitial compartments from healthy human kidney tissues from two cohorts. The two compartments were separately sequenced with the PacBio RS II platform. These transcripts were then validated using transcripts of the same samples by the traditional Illumina RNA-Seq protocol, distinct Illumina RNA-Seq short reads from European Renal cDNA Bank (ERCB) samples, and annotated GENCODE transcript list, thus identifying novel transcripts. We identified 14,739 and 14,259 annotated transcripts, and 17,268 and 13,118 potentially novel transcripts in the glomerular and tubulo-interstitial compartments, respectively. Of note, relying solely on either short or long reads would have resulted in many erroneous identifications. We identified distinct pathways involved in glomerular and tubulointerstitial compartments at the isoform level.We demonstrated the possibility of micro-dissecting a tissue, incorporating both long- and short-read sequencing to identify isoforms for each compartment.


2021 ◽  
Author(s):  
Caiqiu Gao ◽  
Pei-Long Wang ◽  
Xiao-Jin Lei ◽  
Yuan-Yuan Wang ◽  
Bai-chao Liu ◽  
...  

Abstract Aim Cadmium (Cd) pollution is widely detected in soil and has been recognized as a major environmental problem. Tamarix hispida is a woody halophyte, which can form natural forest on desert and the soil with 0.5–1% salt content, making it an ideal plant for research investigating the effects of various stresses on plants. However, no systematic study has investigated the molecular mechanism of Cd tolerance in T. hispida.Methods In this study, the RNA-seq technique was applied to analyze the transcriptomic changes in T. hispida treated with 150 µmol L− 1 CdCl2 for 24, 48 and 72 h compared with control.Results In total, 72764 unigenes exhibited similar sequences in the NR database, while 41528 unigenes (36.3% of all the unigenes) did not exhibit similar sequences, which may be new transcripts. In addition, 6778, 8282 and 8601 DEGs were detected at 24, 48 and 72 h, respectively. Functional annotation analysis indicated that many genes may be involved in several aspects of the Cd stress response, including ion bonding, signal transduction, stress sensing, hormone responses and ROS metabolism. A ThUGT gene from the abscisic acid (ABA) signaling pathway can enhance the Cd resistance ability of T. hispida by regulating the production of reactive oxygen species under Cd stress and inhibiting T. hispida absorption of Cd.Conclusion The new transcriptome resources and data that we present in this study for T. hispida may substantially facilitate molecular studies of the mechanisms governing Cd resistance.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Zoltán Maróti ◽  
Dóra Tombácz ◽  
István Prazsák ◽  
Norbert Moldován ◽  
Zsolt Csabai ◽  
...  

Abstract Objective In this study, we applied two long-read sequencing (LRS) approaches, including single-molecule real-time and nanopore-based sequencing methods to investigate the time-lapse transcriptome patterns of host gene expression as a response to Vaccinia virus infection. Transcriptomes determined using short-read sequencing approaches are incomplete because these platforms are inefficient or fail to distinguish between polycistronic RNAs, transcript isoforms, transcriptional start sites, as well as transcriptional readthroughs and overlaps. Long-read sequencing is able to read full-length nucleic acids and can therefore be used to assemble complete transcriptome atlases. Results In this work, we identified a number of novel transcripts and transcript isoforms of Chlorocebus sabaeus. Additionally, analysis of the most abundant 768 host transcripts revealed a significant overrepresentation of the class of genes in the “regulation of signaling receptor activity” Gene Ontology annotation as a result of viral infection.


2021 ◽  
Vol 3 (3) ◽  
Author(s):  
Sébastien Riquier ◽  
Chloé Bessiere ◽  
Benoit Guibert ◽  
Anne-Laure Bouge ◽  
Anthony Boureux ◽  
...  

Abstract The huge body of publicly available RNA-sequencing (RNA-seq) libraries is a treasure of functional information allowing to quantify the expression of known or novel transcripts in tissues. However, transcript quantification commonly relies on alignment methods requiring a lot of computational resources and processing time, which does not scale easily to large datasets. K-mer decomposition constitutes a new way to process RNA-seq data for the identification of transcriptional signatures, as k-mers can be used to quantify accurately gene expression in a less resource-consuming way. We present the Kmerator Suite, a set of three tools designed to extract specific k-mer signatures, quantify these k-mers into RNA-seq datasets and quickly visualize large dataset characteristics. The core tool, Kmerator, produces specific k-mers for 97% of human genes, enabling the measure of gene expression with high accuracy in simulated datasets. KmerExploR, a direct application of Kmerator, uses a set of predictor gene-specific k-mers to infer metadata including library protocol, sample features or contaminations from RNA-seq datasets. KmerExploR results are visualized through a user-friendly interface. Moreover, we demonstrate that the Kmerator Suite can be used for advanced queries targeting known or new biomarkers such as mutations, gene fusions or long non-coding RNAs for human health applications.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Tejaswi Yarra ◽  
Kirti Ramesh ◽  
Mark Blaxter ◽  
Anne Hüning ◽  
Frank Melzner ◽  
...  

Abstract Background Biomineralization by molluscs involves regulated deposition of calcium carbonate crystals within a protein framework to produce complex biocomposite structures. Effective biomineralization is a key trait for aquaculture, and animal resilience under future climate change. While many enzymes and structural proteins have been identified from the shell and in mantle tissue, understanding biomieralization is impeded by a lack of fundamental knowledge of the genes and pathways involved. In adult bivalves, shells are secreted by the mantle tissue during growth, maintenance and repair, with the repair process, in particular, amenable to experimental dissection at the transcriptomic level in individual animals. Results Gene expression dynamics were explored in the adult blue mussel, Mytilus edulis, during experimentally induced shell repair, using the two valves of each animal as a matched treatment-control pair. Gene expression was assessed using high-resolution RNA-Seq against a de novo assembled database of functionally annotated transcripts. A large number of differentially expressed transcripts were identified in the repair process. Analysis focused on genes encoding proteins and domains identified in shell biology, using a new database of proteins and domains previously implicated in biomineralization in mussels and other molluscs. The genes implicated in repair included many otherwise novel transcripts that encoded proteins with domains found in other shell matrix proteins, as well as genes previously associated with primary shell formation in larvae. Genes with roles in intracellular signalling and maintenance of membrane resting potential were among the loci implicated in the repair process. While haemocytes have been proposed to be actively involved in repair, no evidence was found for this in the M. edulis data. Conclusions The shell repair experimental model and a newly developed shell protein domain database efficiently identified transcripts involved in M. edulis shell production. In particular, the matched pair analysis allowed factoring out of much of the inherent high level of variability between individual mussels. This snapshot of the damage repair process identified a large number of genes putatively involved in biomineralization from initial signalling, through calcium mobilization to shell construction, providing many novel transcripts for future in-depth functional analyses.


2021 ◽  
Author(s):  
Sebastien Riquier ◽  
Chloe Bessiere ◽  
Benoit Guibert ◽  
Anne-Laure Bouge ◽  
Anthony Boureux ◽  
...  

The huge body of publicly available RNA-seq libraries is a treasure of functional information allowing to quantify the expression of known or novel transcripts in tissues. However, transcript quantification commonly relies on alignment methods requiring a lot of computational resources and processing time, which does not scale easily to large datasets. K-mer decomposition constitutes a new way to process RNA-seq data for the identification of transcriptional signatures, as k-mers can be used to quantify accurately gene expression in a less resource-consuming way. We present the Kmerator Suite, a set of three tools designed to extract specific k-mer signatures, quantify these k-mers into RNA-seq datasets and quickly visualize large datasets characteristics. The core tool, Kmerator, produces specific k-mers for 97% of human genes, enabling the measure of gene expression with high accuracy in simulated datasets. KmerExploR, a direct application of Kmerator, uses a set of predictor genes specific k-mers to infer metadata including library protocol, sample features or contaminations from RNA-seq datasets. KmerExploR results are visualised through a user-friendly interface. Moreover, we demonstrate that the Kmerator Suite can be used for advanced queries targeting known or new biomarkers such as mutations, gene fusions or long non coding-RNAs for human health applications.


2021 ◽  
Vol 12 ◽  
Author(s):  
Julian Droste ◽  
Christian Rückert ◽  
Jörn Kalinowski ◽  
Mohamed Belal Hamed ◽  
Jozef Anné ◽  
...  

Streptomyces lividans TK24 is a relevant Gram-positive soil inhabiting bacterium and one of the model organisms of the genus Streptomyces. It is known for its potential to produce secondary metabolites, antibiotics, and other industrially relevant products. S. lividans TK24 is the plasmid-free derivative of S. lividans 66 and a close genetic relative of the strain Streptomyces coelicolor A3(2). In this study, we used transcriptome and proteome data to improve the annotation of the S. lividans TK24 genome. The RNA-seq data of primary 5′-ends of transcripts were used to determine transcription start sites (TSS) in the genome. We identified 5,424 TSS, of which 4,664 were assigned to annotated CDS and ncRNAs, 687 to antisense transcripts distributed between 606 CDS and their UTRs, 67 to tRNAs, and 108 to novel transcripts and CDS. Using the TSS data, the promoter regions and their motifs were analyzed in detail, revealing a conserved -10 (TAnnnT) and a weakly conserved -35 region (nTGACn). The analysis of the 5′ untranslated region (UTRs) of S. lividans TK24 revealed 17% leaderless transcripts. Several cis-regulatory elements, like riboswitches or attenuator structures could be detected in the 5′-UTRs. The S. lividans TK24 transcriptome contains at least 929 operons. The genome harbors 27 secondary metabolite gene clusters of which 26 could be shown to be transcribed under at least one of the applied conditions. Comparison of the reannotated genome with that of the strain Streptomyces coelicolor A3(2) revealed a high degree of similarity. This study presents an extensive reannotation of the S. lividans TK24 genome based on transcriptome and proteome analyses. The analysis of TSS data revealed insights into the promoter structure, 5′-UTRs, cis-regulatory elements, attenuator structures and novel transcripts, like small RNAs. Finally, the repertoire of secondary metabolite gene clusters was examined. These data provide a basis for future studies regarding gene characterization, transcriptional regulatory networks, and usage as a secondary metabolite producing strain.


Sign in / Sign up

Export Citation Format

Share Document