scholarly journals Improved reference genome annotation of Brassica rapa by PacBio RNA sequencing

2021 ◽  
Author(s):  
Zhicheng Zhang ◽  
Jing Guo ◽  
Xu Cai ◽  
Yufang Li ◽  
Xi Xi ◽  
...  

The species Brassica rapa includes several important vegetable crops. The draft reference genome of B. rapa ssp. pekinensis was completed in 2011, and it has since been updated twice. The pangenome with structural variations of 18 B. rapa accessions was published in 2021. Although extensive genomic analysis has been conducted on B. rapa, a comprehensive genome annotation including gene structure, alternative splicing events, and non-coding genes is still lacking. Therefore, we used the Pacific Biosciences (PacBio) single-molecular long-read technology to improve gene models and produced the annotated genome version 3.5. In total, we obtained 753,041 full-length non-chimeric (FLNC) reads and collapsed these into 92,810 non-redundant consensus isoforms, capturing 48% of the genes annotated in the B. rapa reference genome annotation v3.1. Based on the isoform data, we identified 830 novel protein-coding genes that were missed in previous genome annotations, defined the UTR regions of 20,340 annotated genes and corrected 886 wrongly-spliced genes. We also identified 28,564 alternative splicing (AS) events and 1,480 long non-coding RNAs (lncRNAs). We produced a relatively complete and high-quality reference transcriptome for B. rapa that can facilitate further functional genomic research.

2021 ◽  
Vol 12 ◽  
Author(s):  
Ali Ali ◽  
Gary H. Thorgaard ◽  
Mohamed Salem

Rainbow trout is an important model organism that has received concerted international efforts to study the transcriptome. For this purpose, short-read sequencing has been primarily used over the past decade. However, these sequences are too short of resolving the transcriptome complexity. This study reported a first full-length transcriptome assembly of the rainbow trout using single-molecule long-read isoform sequencing (Iso-Seq). Extensive computational approaches were used to refine and validate the reconstructed transcriptome. The study identified 10,640 high-confidence transcripts not previously annotated, in addition to 1,479 isoforms not mapped to the current Swanson reference genome. Most of the identified lncRNAs were non-coding variants of coding transcripts. The majority of genes had multiple transcript isoforms (average ∼3 isoforms/locus). Intron retention (IR) and exon skipping (ES) accounted for 56% of alternative splicing (AS) events. Iso-Seq improved the reference genome annotation, which allowed identification of characteristic AS associated with fish growth, muscle accretion, disease resistance, stress response, and fish migration. For instance, an ES in GVIN1 gene existed in fish susceptible to bacterial cold-water disease (BCWD). Besides, under five stress conditions, there was a commonly regulated exon in prolyl 4-hydroxylase subunit alpha-2 (P4HA2) gene. The reconstructed gene models and their posttranscriptional processing in rainbow trout provide invaluable resources that could be further used for future genetics and genomics studies. Additionally, the study identified characteristic transcription events associated with economically important phenotypes, which could be applied in selective breeding.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Ying Hu ◽  
Vincent Colantonio ◽  
Bárbara S. F. Müller ◽  
Kristen A. Leach ◽  
Adalena Nanni ◽  
...  

AbstractSweet corn is one of the most important vegetables in the United States and Canada. Here, we present a de novo assembly of a sweet corn inbred line Ia453 with the mutated shrunken2-reference allele (Ia453-sh2). This mutation accumulates more sugar and is present in most commercial hybrids developed for the processing and fresh markets. The ten pseudochromosomes cover 92% of the total assembly and 99% of the estimated genome size, with a scaffold N50 of 222.2 Mb. This reference genome completely assembles the large structural variation that created the mutant sh2-R allele. Furthermore, comparative genomics analysis with six field corn genomes highlights differences in single-nucleotide polymorphisms, structural variations, and transposon composition. Phylogenetic analysis of 5,381 diverse maize and teosinte accessions reveals genetic relationships between sweet corn and other types of maize. Our results show evidence for a common origin in northern Mexico for modern sweet corn in the U.S. Finally, population genomic analysis identifies regions of the genome under selection and candidate genes associated with sweet corn traits, such as early flowering, endosperm composition, plant and tassel architecture, and kernel row number. Our study provides a high-quality reference-genome sequence to facilitate comparative genomics, functional studies, and genomic-assisted breeding for sweet corn.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Joonhyung Jung ◽  
Changkyun Kim ◽  
Joo-Hwan Kim

Abstract Background Commelinaceae (Commelinales) comprise 41 genera and are widely distributed in both the Old and New Worlds, except in Europe. The relationships among genera in this family have been suggested in several morphological and molecular studies. However, it is difficult to explain their relationships due to high morphological variations and low support values. Currently, many researchers have been using complete chloroplast genome data for inferring the evolution of land plants. In this study, we completed 15 new plastid genome sequences of subfamily Commelinoideae using the Mi-seq platform. We utilized genome data to reveal the structural variations and reconstruct the problematic positions of genera for the first time. Results All examined species of Commelinoideae have three pseudogenes (accD, rpoA, and ycf15), and the former two might be a synapomorphy within Commelinales. Only four species in tribe Commelineae presented IR expansion, which affected duplication of the rpl22 gene. We identified inversions that range from approximately 3 to 15 kb in four taxa (Amischotolype, Belosynapsis, Murdannia, and Streptolirion). The phylogenetic analysis using 77 chloroplast protein-coding genes with maximum parsimony, maximum likelihood, and Bayesian inference suggests that Palisota is most closely related to tribe Commelineae, supported by high support values. This result differs significantly from the current classification of Commelinaceae. Also, we resolved the unclear position of Streptoliriinae and the monophyly of Dichorisandrinae. Among the ten CDS (ndhH, rpoC2, ndhA, rps3, ndhG, ndhD, ccsA, ndhF, matK, and ycf1), which have high nucleotide diversity values (Pi > 0.045) and over 500 bp length, four CDS (ndhH, rpoC2, matK, and ycf1) show that they are congruent with the topology derived from 77 chloroplast protein-coding genes. Conclusions In this study, we provide detailed information on the 15 complete plastid genomes of Commelinoideae taxa. We identified characteristic pseudogenes and nucleotide diversity, which can be used to infer the family evolutionary history. Also, further research is needed to revise the position of Palisota in the current classification of Commelinaceae.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Michael F. Z. Wang ◽  
Madhav Mantri ◽  
Shao-Pei Chou ◽  
Gaetano J. Scuderi ◽  
David W. McKellar ◽  
...  

AbstractConventional scRNA-seq expression analyses rely on the availability of a high quality genome annotation. Yet, as we show here with scRNA-seq experiments and analyses spanning human, mouse, chicken, mole rat, lemur and sea urchin, genome annotations are often incomplete, in particular for organisms that are not routinely studied. To overcome this hurdle, we created a scRNA-seq analysis routine that recovers biologically relevant transcriptional activity beyond the scope of the best available genome annotation by performing scRNA-seq analysis on any region in the genome for which transcriptional products are detected. Our tool generates a single-cell expression matrix for all transcriptionally active regions (TARs), performs single-cell TAR expression analysis to identify biologically significant TARs, and then annotates TARs using gene homology analysis. This procedure uses single-cell expression analyses as a filter to direct annotation efforts to biologically significant transcripts and thereby uncovers biology to which scRNA-seq would otherwise be in the dark.


Minerals ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. 74
Author(s):  
Jinjin Chen ◽  
Yilan Liu ◽  
Patrick Diep ◽  
Radhakrishnan Mahadevan

Acidithiobacillus ferridurans JAGS is a newly isolated acidophile from an acid mine drainage (AMD). The genome of isolate JAGS was sequenced and compared with eight other published genomes of Acidithiobacillus. The pairwise mutation distance (Mash) and average nucleotide identity (ANI) revealed that isolate JAGS had a close evolutionary relationship with A. ferridurans JCM18981, but whole-genome alignment showed that it had higher similarity in genomic structure with A. ferrooxidans species. Pan-genome analysis revealed that nine genomes were comprised of 4601 protein coding sequences, of which 43% were core genes (1982) and 23% were unique genes (1064). A. ferridurans species had more unique genes (205–246) than A. ferrooxidans species (21–234). Functional gene categorizations showed that A. ferridurans strains had a higher portion of genes involved in energy production and conversion while A. ferrooxidans had more for inorganic ion transport and metabolism. A high abundance of kdp, mer and ars genes, as well as mobile genetic elements, was found in isolate JAGS, which might contribute to its resistance to harsh environments. These findings expand our understanding of the evolutionary adaptation of Acidithiobacillus and indicate that A. ferridurans JAGS is a promising candidate for biomining and AMD biotreatment applications.


2004 ◽  
Vol 33 (Database issue) ◽  
pp. D75-D79 ◽  
Author(s):  
P. Kim

2018 ◽  
Vol 35 (15) ◽  
pp. 2654-2656 ◽  
Author(s):  
Guoli Ji ◽  
Wenbin Ye ◽  
Yaru Su ◽  
Moliang Chen ◽  
Guangzao Huang ◽  
...  

Abstract Summary Alternative splicing (AS) is a well-established mechanism for increasing transcriptome and proteome diversity, however, detecting AS events and distinguishing among AS types in organisms without available reference genomes remains challenging. We developed a de novo approach called AStrap for AS analysis without using a reference genome. AStrap identifies AS events by extensive pair-wise alignments of transcript sequences and predicts AS types by a machine-learning model integrating more than 500 assembled features. We evaluated AStrap using collected AS events from reference genomes of rice and human as well as single-molecule real-time sequencing data from Amborella trichopoda. Results show that AStrap can identify much more AS events with comparable or higher accuracy than the competing method. AStrap also possesses a unique feature of predicting AS types, which achieves an overall accuracy of ∼0.87 for different species. Extensive evaluation of AStrap using different parameters, sample sizes and machine-learning models on different species also demonstrates the robustness and flexibility of AStrap. AStrap could be a valuable addition to the community for the study of AS in non-model organisms with limited genetic resources. Availability and implementation AStrap is available for download at https://github.com/BMILAB/AStrap. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 12 ◽  
Author(s):  
Fenghua Tian ◽  
Changtian Li ◽  
Yu Li

Yuanmo [Sarcomyxa edulis (Y.C. Dai, Niemelä & G.F. Qin) T. Saito, Tonouchi & T. Harada] is an important edible and medicinal mushroom endemic to Northeastern China. Here we report the de novo sequencing and assembly of the S. edulis genome using single-molecule real-time sequencing technology. The whole genome was approximately 35.65 Mb, with a G + C content of 48.31%. Genome assembly generated 41 contigs with an N50 length of 1,772,559 bp. The genome comprised 9,364 annotated protein-coding genes, many of which encoded enzymes involved in the modification, biosynthesis, and degradation of glycoconjugates and carbohydrates or enzymes predicted to be involved in the biosynthesis of secondary metabolites such as terpene, type I polyketide, siderophore, and fatty acids, which are responsible for the pharmacodynamic activities of S. edulis. We also identified genes encoding 1,3-β-glucan synthase and endo-1,3(4)-β-glucanase, which are involved in polysaccharide and uridine diphosphate glucose biosynthesis. Phylogenetic and comparative analyses of Basidiomycota fungi based on a single-copy orthologous protein indicated that the Sarcomyxa genus is an independent group that evolved from the Pleurotaceae family. The annotated whole-genome sequence of S. edulis can serve as a reference for investigations of bioactive compounds with medicinal value and the development and commercial production of superior S. edulis varieties.


Genes ◽  
2021 ◽  
Vol 12 (12) ◽  
pp. 1857
Author(s):  
Lulu Wang ◽  
Gang Zheng ◽  
Yiming Yuan ◽  
Ziyi Wang ◽  
Changjun Liu ◽  
...  

Marek’s disease (MD) was an immunosuppression disease induced by Marek’s disease virus (MDV). MD caused huge economic loss to the global poultry industry, but it also provided an ideal model for studying diseases induced by the oncogenic virus. Alternative splicing (AS) simultaneously produced different isoform transcripts, which are involved in various diseases and individual development. To investigate AS events in MD, RNA-Seq was performed in tumorous spleens (TS), spleens from the survivors (SS) without any lesion after MDV infection, and non-infected chicken spleens (NS). In this study, 32,703 and 25,217 AS events were identified in TS and SS groups with NS group as the control group, and 1198, 1204, and 348 differently expressed (DE) AS events (p-value < 0.05 and FDR < 0.05) were identified in TS vs. NS, TS vs. SS, SS vs. NS, respectively. Additionally, Function enrichment analysis showed that ubiquitin-mediated proteolysis, p53 signaling pathway, and phosphatidylinositol signaling system were significantly enriched (p-value < 0.05). Small structural variations including SNP and indel were analyzed based on RNA-Seq data, and it showed that the TS group possessed more variants on the splice site region than those in SS and NS groups, which might cause more AS events in the TS group. Combined with previous circRNA data, we found that 287 genes could produce both circular and linear RNAs, which suggested these genes were more active in MD lymphoma transformation. This study has expanded the understanding of the MDV infection process and provided new insights for further analysis of resistance/susceptibility mechanisms.


Sign in / Sign up

Export Citation Format

Share Document