Genome of the Single Human Chromosome 18 as a “Gold Standard” for Its Transcriptome

The cutoff level applied in sequencing analysis varies according to the sequencing technology, sample type, and study purpose, which can largely affect the coverage and reliability of the data obtained. In this study, we aimed to determine the optimal combination of parameters for reliable RNA transcriptome data analysis. Toward this end, we compared the results obtained from different transcriptome analysis platforms (quantitative polymerase chain reaction, Illumina RNASeq, and Oxford Nanopore Technologies MinION) for the transcriptome encoded by human chromosome 18 (Chr 18) using the same sample types (HepG2 cells and liver tissue). A total of 275 protein-coding genes encoded by Chr 18 was taken as the gene set for evaluation. The combination of Illumina RNASeq and MinION nanopore technologies enabled the detection of at least one transcript for each protein-coding gene encoded by Chr 18. This combination also reduced the probability of false-positive detection of low-copy transcripts due to the simultaneous confirmation of the presence of a transcript by the two fundamentally different technologies: short reads essential for reliable detection (Illumina RNASeq) and long-read sequencing data (MinION). The combination of these technologies achieved complete coverage of all 275 protein-coding genes on Chr 18, identifying transcripts with non-zero expression levels. This approach can improve distinguishing the biological and technical reasons for the absence of mRNA detection for a given gene in transcriptomics.

Download Full-text

Telomere-to-telomere genome assembly of asparaginase-producing Trichoderma simmonsii

BMC Genomics ◽

10.1186/s12864-021-08162-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Dawoon Chung ◽

Yong Min Kwon ◽

Youngik Yang

Keyword(s):

Rna Sequencing ◽

Draft Genome ◽

Sequencing Analysis ◽

Sequencing Data ◽

Terrestrial Plants ◽

Protein Coding ◽

Oxford Nanopore ◽

Wide Range ◽

The Family ◽

Encoding Genes

Abstract Background Trichoderma is a genus of fungi in the family Hypocreaceae and includes species known to produce enzymes with commercial use. They are largely found in soil and terrestrial plants. Recently, Trichoderma simmonsii isolated from decaying bark and decorticated wood was newly identified in the Harzianum clade of Trichoderma. Due to a wide range of applications in agriculture and other industries, genomes of at least 12 Trichoderma spp. have been studied. Moreover, antifungal and enzymatic activities have been extensively characterized in Trichoderma spp. However, the genomic information and bioactivities of T. simmonsii from a particular marine-derived isolate remain largely unknown. While we screened for asparaginase-producing fungi, we observed that T. simmonsii GH-Sj1 strain isolated from edible kelp produced asparaginase. In this study, we report a draft genome of T. simmonsii GH-Sj1 using Illumina and Oxford Nanopore technologies. Furthermore, to facilitate biotechnological applications of this species, RNA-sequencing was performed to elucidate the transcriptional profile of T. simmonsii GH-Sj1 in response to asparaginase-rich conditions. Results We generated ~ 14 Gb of sequencing data assembled in a ~ 40 Mb genome. The T. simmonsii GH-Sj1 genome consisted of seven telomere-to-telomere scaffolds with no sequencing gaps, where the N50 length was 6.4 Mb. The total number of protein-coding genes was 13,120, constituting ~ 99% of the genome. The genome harbored 176 tRNAs, which encode a full set of 20 amino acids. In addition, it had an rRNA repeat region consisting of seven repeats of the 18S-ITS1–5.8S-ITS2–26S cluster. The T. simmonsii genome also harbored 7 putative asparaginase-encoding genes with potential medical applications. Using RNA-sequencing analysis, we found that 3 genes among the 7 putative genes were significantly upregulated under asparaginase-rich conditions. Conclusions The genome and transcriptome of T. simmonsii GH-Sj1 established in the current work represent valuable resources for future comparative studies on fungal genomes and asparaginase production.

Download Full-text

Why Are the Correlations between mRNA and Protein Levels so Low among the 275 Predicted Protein-Coding Genes on Human Chromosome 18?

Journal of Proteome Research ◽

10.1021/acs.jproteome.7b00348 ◽

2017 ◽

Vol 16 (12) ◽

pp. 4311-4318 ◽

Cited By ~ 9

Author(s):

Ekaterina V. Poverennaya ◽

Ekaterina V. Ilgisonis ◽

Elena A. Ponomarenko ◽

Arthur T. Kopylov ◽

Victor G. Zgoda ◽

...

Keyword(s):

Human Chromosome ◽

Chromosome 18 ◽

Protein Coding ◽

Protein Levels ◽

Protein Coding Genes

Download Full-text

Assembly of a Complete Mitogenome of Chrysanthemum nankingense Using Oxford Nanopore Long Reads and the Diversity and Evolution of Asteraceae Mitogenomes

Genes ◽

10.3390/genes9110547 ◽

2018 ◽

Vol 9 (11) ◽

pp. 547 ◽

Cited By ~ 9

Author(s):

Shuaibin Wang ◽

Qingwei Song ◽

Shanshan Li ◽

Zhigang Hu ◽

Gangqiang Dong ◽

...

Keyword(s):

Structural Variation ◽

Sequencing Data ◽

Protein Coding ◽

Protein Coding Genes ◽

The Core ◽

Assembly Method ◽

Long Reads ◽

Oxford Nanopore ◽

Complete Mitogenome ◽

Rna Genes

Diversity in structure and organization is one of the main features of angiosperm mitochondrial genomes (mitogenomes). The ultra-long reads of Oxford Nanopore Technology (ONT) provide an opportunity to obtain a complete mitogenome and investigate the structural variation in unprecedented detail. In this study, we compared mitogenome assembly methods using Illumina and/or ONT sequencing data and obtained the complete mitogenome (208 kb) of Chrysanthemum nankingense based on the hybrid assembly method. The mitogenome encoded 19 transfer RNA genes, three ribosomal RNA genes, and 34 protein-coding genes with 21 group II introns disrupting eight intron-contained genes. A total of seven medium repeats were related to homologous recombination at different frequencies as supported by the long ONT reads. Subsequently, we investigated the variations in gene content and constitution of 28 near-complete mitogenomes from Asteraceae. A total of six protein-coding genes were missing in all Asteraceae mitogenomes, while four other genes were not detected in some lineages. The core fragments (~88 kb) of the Asteraceae mitogenomes had a higher GC content (~46.7%) than the variable and specific fragments. The phylogenetic topology based on the core fragments of the Asteraceae mitogenomes was highly consistent with the topologies obtained from the corresponding plastid datasets. Our results highlighted the advantages of the complete assembly of the C. nankingense mitogenome and the investigation of its structural variation based on ONT sequencing data. Moreover, the method based on local collinear blocks of the mitogenomes could achieve the alignment of highly rearrangeable and variable plant mitogenomes as well as construct a robust phylogenetic topology.

Download Full-text

Chromosomal assembly of the nuclear genome of the endosymbiont-bearing trypanosomatid Angomonas deanei

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkaa018 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

John W Davey ◽

Carolina M C Catta-Preta ◽

Sally James ◽

Sarah Forrester ◽

Maria Cristina M Motta ◽

...

Keyword(s):

Chromosome Number ◽

Noncoding Rnas ◽

Nuclear Genome ◽

Supernumerary Chromosome ◽

Ribosomal Rnas ◽

Protein Coding ◽

Transfer Rnas ◽

Protein Coding Genes ◽

Oxford Nanopore ◽

Genome Assemblies

Abstract Angomonas deanei is an endosymbiont-bearing trypanosomatid with several highly fragmented genome assemblies and unknown chromosome number. We present an assembly of the A. deanei nuclear genome based on Oxford Nanopore sequence that resolves into 29 complete or close-to-complete chromosomes. The assembly has several previously unknown special features; it has a supernumerary chromosome, a chromosome with a 340-kb inversion, and there is a translocation between two chromosomes. We also present an updated annotation of the chromosomal genome with 10,365 protein-coding genes, 59 transfer RNAs, 26 ribosomal RNAs, and 62 noncoding RNAs.

Download Full-text

Dual Isoform Sequencing Reveals a Multifaceted Transcriptional Architecture of a Prototype Baculovirus

10.21203/rs.3.rs-637036/v1 ◽

2021 ◽

Author(s):

Gábor Torma ◽

Dóra Tombácz ◽

Norbert Moldován ◽

Ádám Fülöp ◽

István Prazsák ◽

...

Keyword(s):

Protein Coding ◽

Rna Molecules ◽

Non Coding Rna ◽

Oxford Nanopore ◽

The Pacific ◽

Viral Genes ◽

Long Read ◽

Oxford Nanopore Technologies ◽

Overlapping Transcripts

Abstract In this study, we used two long-read sequencing (LRS) techniques, Sequel from the Pacific Biosciences and MinION from Oxford Nanopore Technologies, for the transcriptional characterization of a prototype baculovirus, Autographacalifornica multiple nucleopolyhedrovirus. LRS is able to read full-length RNA molecules, and thereby to distinguish between transcript isoforms, mono- and polycistronic RNAs, and overlapping transcripts. Altogether, we detected 875 transcripts, of which 759 are novel and 116 have been annotated previously. These RNA molecules include 41 novel putative protein coding transcript (each containing 5’-truncated in-frame ORFs), 14 monocistronic transcripts, 99 multicistronic RNAs, 101 non-coding RNA, and 504 length isoforms. We also detected RNA methylation in 12 viral genes and RNA hyper-editing in the longer 5’-UTR transcript isoform of ORF 19 gene.

Download Full-text

Chromosome-level assembly of Drosophila bifasciata reveals important karyotypic transition of the X chromosome

10.1101/847558 ◽

2019 ◽

Author(s):

Ryan Bracewell ◽

Anita Tran ◽

Kamalakar Chatla ◽

Doris Bachtrog

Keyword(s):

X Chromosome ◽

Genome Assembly ◽

De Novo ◽

Pericentromeric Region ◽

Species Group ◽

Chromosome 15 ◽

Protein Coding ◽

Protein Coding Genes ◽

Long Read ◽

Chromosome Level

ABSTRACTThe Drosophila obscura species group is one of the most studied clades of Drosophila and harbors multiple distinct karyotypes. Here we present a de novo genome assembly and annotation of D. bifasciata, a species which represents an important subgroup for which no high-quality chromosome-level genome assembly currently exists. We combined long-read sequencing (Nanopore) and Hi-C scaffolding to achieve a highly contiguous genome assembly approximately 193Mb in size, with repetitive elements constituting 30.1% of the total length. Drosophila bifasciata harbors four large metacentric chromosomes and the small dot, and our assembly contains each chromosome in a single scaffold, including the highly repetitive pericentromere, which were largely composed of Jockey and Gypsy transposable elements. We annotated a total of 12,821 protein-coding genes and comparisons of synteny with D. athabasca orthologs show that the large metacentric pericentromeric regions of multiple chromosomes are conserved between these species. Importantly, Muller A (X chromosome) was found to be metacentric in D. bifasciata and the pericentromeric region appears homologous to the pericentromeric region of the fused Muller A-AD (XL and XR) of pseudoobscura/affinis subgroup species. Our finding suggests a metacentric ancestral X fused to a telocentric Muller D and created the large neo-X (Muller A-AD) chromosome ∼15 MYA. We also confirm the fusion of Muller C and D in D. bifasciata and show that it likely involved a centromere-centromere fusion.

Download Full-text

Chromosome-scale assembly of the Sparassis latifolia genome obtained using long-read and Hi-C sequencing

10.1101/2021.01.08.426014 ◽

2021 ◽

Author(s):

Chi yang ◽

Lu Ma ◽

Donglai Xiao ◽

Xiaoyu Liu ◽

Xiaoling Jiang ◽

...

Keyword(s):

Repetitive Sequences ◽

Draft Genome ◽

Edible Mushroom ◽

Illumina Hiseq ◽

Protein Coding ◽

Long Reads ◽

Oxford Nanopore ◽

Genome Features ◽

Long Read ◽

Genomic Studies

Sparassis latifolia is a valuable edible mushroom cultivated in China. In 2018, our research group reported an incomplete and low quality genome of S. latifolia was obtained by Illumina HiSeq 2500 sequencing. These limitations in the available genome have constrained genetic and genomic studies in this mushroom resource. Herein, an updated draft genome sequence of S. latifolia was generated by Oxford Nanopore sequencing and the Hi-C technique. A total of 8.24 Gb of Oxford Nanopore long reads representing ~198.08X coverage of the S. latifolia genome were generated. Subsequently, a high-quality genome of 41.41 Mb, with scaffold and contig N50 sizes of 3.31 Mb and 1.51 Mb, respectively, was assembled. Hi-C scaffolding of the genome resulted in 12 pseudochromosomes containing 93.56% of the bases in the assembled genome. Genome annotation further revealed that 17.47% of the genome was composed of repetitive sequences. In addition, 13,103 protein-coding genes were predicted, among which 98.72% were functionally annotated. BUSCO assay results further revealed that there were 92.07% complete BUSCOs. The improved chromosome-scale assembly and genome features described here will aid further molecular elucidation of various traits, breeding of S. latifolia, and evolutionary studies with related taxa.

Download Full-text

TagSeqTools: a flexible and comprehensive analysis pipeline for NAD tagSeq data

10.1101/2020.03.09.982934 ◽

2020 ◽

Cited By ~ 1

Author(s):

Huan Zhong ◽

Zongwei Cai ◽

Zhu Yang ◽

Yiji Xia

Keyword(s):

Rna Sequencing ◽

Comprehensive Analysis ◽

Enzymatic Reactions ◽

Computational Tool ◽

Sequencing Data ◽

Analysis Pipeline ◽

Oxford Nanopore ◽

Long Read ◽

Identification And Characterization

AbstractNAD tagSeq has recently been developed for the identification and characterization of NAD+-capped RNAs (NAD-RNAs). This method adopts a strategy of chemo-enzymatic reactions to label the NAD-RNAs with a synthetic RNA tag before subjecting to the Oxford Nanopore direct RNA sequencing. A computational tool designed for analyzing the sequencing data of tagged RNA will facilitate the broader application of this method. Hence, we introduce TagSeqTools as a flexible, general pipeline for the identification and quantification of tagged RNAs (i.e., NAD+-capped RNAs) using long-read transcriptome sequencing data generated by NAD tagSeq method. TagSeqTools comprises two major modules, TagSeek for differentiating tagged and untagged reads, and TagSeqQuant for the quantitative and further characterization analysis of genes and isoforms. Besides, the pipeline also integrates some advanced functions to identify antisense or splicing, and supports the data reformation for visualization. Therefore, TagSeqTools provides a convenient and comprehensive workflow for researchers to analyze the data produced by the NAD tagSeq method or other tagging-based experiments using Oxford nanopore direct RNA sequencing. The pipeline is available at https://github.com/dorothyzh/TagSeqTools, under Apache License 2.0.

Download Full-text

Constructing a Reference Genome in a Single Lab: The Possibility to Use Oxford Nanopore Technology

Plants ◽

10.3390/plants8080270 ◽

2019 ◽

Vol 8 (8) ◽

pp. 270 ◽

Cited By ~ 4

Author(s):

Yun Gyeong Lee ◽

Sang Chul Choi ◽

Yuna Kang ◽

Kyeong Min Kim ◽

Chon-Sik Kang ◽

...

Keyword(s):

Plant Species ◽

Genome Sequencing ◽

Reference Genome ◽

Genome Structure ◽

Plant Genome ◽

Sequence Information ◽

Sequencing Analysis ◽

Oxford Nanopore ◽

A Genome ◽

Long Read

The whole genome sequencing (WGS) has become a crucial tool in understanding genome structure and genetic variation. The MinION sequencing of Oxford Nanopore Technologies (ONT) is an excellent approach for performing WGS and it has advantages in comparison with other Next-Generation Sequencing (NGS): It is relatively inexpensive, portable, has simple library preparation, can be monitored in real-time, and has no theoretical limits on reading length. Sorghum bicolor (L.) Moench is diploid (2n = 2x = 20) with a genome size of about 730 Mb, and its genome sequence information is released in the Phytozome database. Therefore, sorghum can be used as a good reference. However, plant species have complex and large genomes when compared to animals or microorganisms. As a result, complete genome sequencing is difficult for plant species. MinION sequencing that produces long-reads can be an excellent tool for overcoming the weak assembly of short-reads generated from NGS by minimizing the generation of gaps or covering the repetitive sequence that appears on the plant genome. Here, we conducted the genome sequencing for S. bicolor cv. BTx623 while using the MinION platform and obtained 895,678 reads and 17.9 gigabytes (Gb) (ca. 25× coverage of reference) from long-read sequence data. A total of 6124 contigs (covering 45.9%) were generated from Canu, and a total of 2661 contigs (covering 50%) were generated from Minimap and Miniasm with a Racon through a de novo assembly using two different tools and mapped assembled contigs against the sorghum reference genome. Our results provide an optimal series of long-read sequencing analysis for plant species while using the MinION platform and a clue to determine the total sequencing scale for optimal coverage that is based on various genome sizes.

Download Full-text

1202. Multimodal Sequencing of a Clonal Case Cluster of Carbapenem-Resistant Citrobacter Reveals Unexpectedly Rapid Dynamics of KPC3-Containing Plasmids

Open Forum Infectious Diseases ◽

10.1093/ofid/ofy210.1035 ◽

2018 ◽

Vol 5 (suppl_1) ◽

pp. S364-S364

Author(s):

Roby Bhattacharyya ◽

Alejandro Pironti ◽

Bruce J Walker ◽

Abigail Manson ◽

Virginia Pierce ◽

...

Keyword(s):

Point Mutations ◽

Illumina Miseq ◽

Nucleotide Polymorphisms ◽

Sequencing Data ◽

Single Nucleotide ◽

Carbapenem Resistant ◽

Oxford Nanopore ◽

Close Relationship ◽

Long Read ◽

Carbapenem Resistant Enterobacteriaceae

Abstract Background Carbapenem-resistant Enterobacteriaceae (CRE) are a major public health threat. We report four clonally related Citrobacter freundii isolates harboring the blaKPC-3 carbapenemase in April–May 2017 that are nearly identical to a strain from 2014 at the same institution. Despite differing by ≤5 single nucleotide polymorphisms (SNPs), these isolates exhibited dramatic differences in carbapenemase plasmid architecture. Methods We sequenced four carbapenem-resistant C. freundii isolates from 2017 and compared them with an ongoing CRE surveillance project at our institution. SNPs were identified from Illumina MiSeq data aligned to a reference genome using the variant caller Pilon. Plasmids were assembled from Illumina and Oxford Nanopore sequencing data using Unicycler. Results The four 2017 isolates differed from one another by 0–5 chromosomal SNPs; two were identical. With one exception, these isolates differed by >38,000 SNPs from 25 C. freundii isolates sequenced from 2013 to 2017 at the same institution for CRE surveillance. The exception was a 2014 isolate that differed by 13–16 SNPs from each 2017 isolate, with 13 SNPs common to all four. Each C. freundii isolate harbored wild-type blaKPC-3. Despite the close relationship among the 2017 cluster, the plasmids harboring the blaKPC-3 genes differed dramatically: the carbapenemase occurred in one of the two different plasmids, with rearrangements between these plasmids across isolates. The related 2014 isolate harbored both plasmids, each with a separate copy of blaKPC-3. No transmission chains were found between any of the affected patients. Conclusion WGS confirmed clonality among four contemporaneous blaKPC-3-containing C. freundii isolates, and marked similarity with a 2014 isolate, within an institution. That only 13–16 SNPs varied between the 2014 and 2017 isolates suggests durable persistence of the blaKPC-3 gene within this lineage in a hospital ecosystem. The plasmids harboring these carbapenemase genes proved remarkably plastic, with plasmid loss and rearrangements occurring on the same time scale as two to three chromosomal point mutations. Combining short and long-read sequencing in a case cluster uniquely revealed unexpectedly rapid dynamics of carbapenemase plasmids, providing critical insight into their manner of spread. Disclosures M. J. Ferraro, SeLux Diagnostics: Scientific Advisor and Shareholder, Consulting fee. D. C. Hooper, SeLux Diagnostics: Scientific Advisor, Consulting fee.

Download Full-text