Global Transcriptome Characterization and Assembly of Thermophilic Ascomycete Chaetomium thermophilum
A correct genome annotation is fundamental for research in the field of molecular and structural biology. The annotation of the reference genome Chaetomium thermophilum has been reported previously, but it is limited to open reading frames (ORFs) of genes and contains only a few noncoding transcripts. In this study, we identified and annotated by deep RNA sequencing full-length transcripts of C.thermophilum. We annotated 7044 coding genes and a large number of noncoding genes (n=4567). Astonishingly, 23% of the coding genes are alternatively spliced. We identified 679 novel coding genes and corrected the structural organization of more than 50% of the previously annotated genes. Furthermore, we substantially extended the Gene Ontology (GO) and Enzyme Commission (EC) lists, which provide comprehensive search tools for potential industrial applications and basic research. The identified novel transcripts and improved annotation will help understanding the gene regulatory landscape in C.thermophilum. The analysis pipeline developed here can be used to build transcriptome assemblies and identify coding and noncoding RNAs of other species. The R packages for gene and GO annotation database can be found under https://www.bzh.uni-heidelberg.de/brunner/Chaetomium_thermophilum.