De Novo Genome Sequence Assemblies of Gossypium raimondii and Gossypium turneri

Cotton is an agriculturally important crop. Because of its importance, a genome sequence of a diploid cotton species (Gossypium raimondii, D-genome) was first assembled using Sanger sequencing data in 2012. Improvements to DNA sequencing technology have improved accuracy and correctness of assembled genome sequences. Here we report a new de novo genome assembly of G. raimondii and its close relative G. turneri. The two genomes were assembled to a chromosome level using PacBio long-read technology, HiC, and Bionano optical mapping. This report corrects some minor assembly errors found in the Sanger assembly of G. raimondii. We also compare the genome sequences of these two species for gene composition, repetitive element composition, and collinearity. Most of the identified structural rearrangements between these two species are due to intra-chromosomal inversions. More inversions were found in the G. turneri genome sequence than the G. raimondii genome sequence. These findings and updates to the D-genome sequence will improve accuracy and translation of genomics to cotton breeding and genetics.

Download Full-text

Identification of a genome-specific repetitive element in the Gossypium D genome

PeerJ ◽

10.7717/peerj.8344 ◽

2020 ◽

Vol 8 ◽

pp. e8344

Author(s):

Hejun Lu ◽

Xinglei Cui ◽

Yanyan Zhao ◽

Richard Odongo Magwanga ◽

Pengcheng Li ◽

...

Keyword(s):

Repetitive Element ◽

Repetitive Sequences ◽

Long Terminal Repeat Retrotransposons ◽

Gossypium Arboreum ◽

D Genome ◽

Cotton Species ◽

A Genome ◽

Gossypium Raimondii ◽

The Common

The activity of genome-specific repetitive sequences is the main cause of genome variation between Gossypium A and D genomes. Through comparative analysis of the two genomes, we retrieved a repetitive element termed ICRd motif, which appears frequently in the diploid Gossypium raimondii (D5) genome but rarely in the diploid Gossypium arboreum (A2) genome. We further explored the existence of the ICRd motif in chromosomes of G. raimondii, G. arboreum, and two tetraploid (AADD) cotton species, Gossypium hirsutum and Gossypium barbadense, by fluorescence in situ hybridization (FISH), and observed that the ICRd motif exists in the D5 and D-subgenomes but not in the A2 and A-subgenomes. The ICRd motif comprises two components, a variable tandem repeat (TR) region and a conservative sequence (CS). The two constituents each have hundreds of repeats that evenly distribute across 13 chromosomes of the D5genome. The ICRd motif (and its repeats) was revealed as the common conservative region harbored by ancient Long Terminal Repeat Retrotransposons. Identification and investigation of the ICRd motif promotes the study of A and D genome differences, facilitates research on Gossypium genome evolution, and provides assistance to subgenome identification and genome assembling.

Download Full-text

NovoGraph: Human genome graph construction from multiple long-read de novo assemblies

F1000Research ◽

10.12688/f1000research.15895.2 ◽

2018 ◽

Vol 7 ◽

pp. 1391

Author(s):

Evan Biederstedt ◽

Jeffrey C. Oliver ◽

Nancy F. Hansen ◽

Aarti Jajoo ◽

Nathan Dunn ◽

...

Keyword(s):

Human Genome ◽

De Novo ◽

Wide Spectrum ◽

Third Party ◽

Sequencing Data ◽

Multiple Sequence ◽

Human Genomes ◽

A Genome ◽

Long Read ◽

Genome Graph

Genome graphs are emerging as an important novel approach to the analysis of high-throughput human sequencing data. By explicitly representing genetic variants and alternative haplotypes in a mappable data structure, they can enable the improved analysis of structurally variable and hyperpolymorphic regions of the genome. In most existing approaches, graphs are constructed from variant call sets derived from short-read sequencing. As long-read sequencing becomes more cost-effective and enables de novo assembly for increasing numbers of whole genomes, a method for the direct construction of a genome graph from sets of assembled human genomes would be desirable. Such assembly-based genome graphs would encompass the wide spectrum of genetic variation accessible to long-read-based de novo assembly, including large structural variants and divergent haplotypes. Here we present NovoGraph, a method for the construction of a human genome graph directly from a set of de novo assemblies. NovoGraph constructs a genome-wide multiple sequence alignment of all input contigs and creates a graph by merging the input sequences at positions that are both homologous and sequence-identical. NovoGraph outputs resulting graphs in VCF format that can be loaded into third-party genome graph toolkits. To demonstrate NovoGraph, we construct a genome graph with 23,478,835 variant sites and 30,582,795 variant alleles from de novo assemblies of seven ethnically diverse human genomes (AK1, CHM1, CHM13, HG003, HG004, HX1, NA19240). Initial evaluations show that mapping against the constructed graph reduces the average mismatch rate of reads from sample NA12878 by approximately 0.2%, albeit at a slightly increased rate of reads that remain unmapped.

Download Full-text

Genomic Characterization Provides an Insight into the Pathogenicity of the Poplar Canker Bacterium Lonsdalea populi

Genes ◽

10.3390/genes12020246 ◽

2021 ◽

Vol 12 (2) ◽

pp. 246

Author(s):

Xiaomeng Chen ◽

Rui Li ◽

Yonglin Wang ◽

Aining Li

Keyword(s):

Genome Sequence ◽

Extracellular Enzymes ◽

De Novo ◽

Whole Genome Sequence ◽

Hybrid Poplars ◽

A Genome ◽

Conserved Genes ◽

Genomic Characterization ◽

Molecular Bases ◽

Insight Into

An emerging poplar canker caused by the gram-negative bacterium, Lonsdalea populi, has led to high mortality of hybrid poplars Populus × euramericana in China and Europe. The molecular bases of pathogenicity and bark adaptation of L. populi have become a focus of recent research. This study revealed the whole genome sequence and identified putative virulence factors of L. populi. A high-quality L. populi genome sequence was assembled de novo, with a genome size of 3,859,707 bp, containing approximately 3434 genes and 107 RNAs (75 tRNA, 22 rRNA, and 10 ncRNA). The L. populi genome contained 380 virulence-associated genes, mainly encoding for adhesion, extracellular enzymes, secretory systems, and two-component transduction systems. The genome had 110 carbohydrate-active enzyme (CAZy)-coding genes and putative secreted proteins. The antibiotic-resistance database annotation listed that L. populi was resistant to penicillin, fluoroquinolone, and kasugamycin. Analysis of comparative genomics found that L. populi exhibited the highest homology with the L. britannica genome and L. populi encompassed 1905 specific genes, 1769 dispensable genes, and 1381 conserved genes, suggesting high evolutionary diversity and genomic plasticity. Moreover, the pan genome analysis revealed that the N-5-1 genome is an open genome. These findings provide important resources for understanding the molecular basis of the pathogenicity and biology of L. populi and the poplar-bacterium interaction.

Download Full-text

The Gossypium stocksii genome as a novel resource for cotton improvement

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab125 ◽

2021 ◽

Author(s):

Corrinne E Grover ◽

Daojun Yuan ◽

Mark A Arick ◽

Emma R Miller ◽

Guanjing Hu ◽

...

Keyword(s):

Genome Sequence ◽

De Novo ◽

Pest Resistance ◽

Cotton Leaf ◽

High Quality ◽

Cotton Leaf Curl Virus ◽

Cotton Species ◽

Wild Cotton ◽

Leaf Curl Virus ◽

Insight Into

Abstract Cotton is an important textile crop whose gains in production over the last century have been challenged by various diseases. Because many modern cultivars are susceptible to several pests and pathogens, breeding efforts have included attempts to introgress wild, naturally resistant germplasm into elite lines. Gossypium stocksii is a wild cotton species native to Africa, which is part of a clade of vastly understudied species. Most of what is known about this species comes from pest resistance surveys and/or breeding efforts, which suggests that G. stocksii could be a valuable reservoir of natural pest resistance. Here we present a high-quality de novo genome sequence for G. stocksii. We compare the G. stocksii genome with resequencing data from a closely related, understudied species (G. somalense) to generate insight into the relatedness of these cotton species. Finally, we discuss the utility of the G. stocksii genome for understanding pest resistance in cotton, particularly resistance to cotton leaf curl virus.

Download Full-text

Trio deep-sequencing does not reveal unexpected off-target and on-target mutations in Cas9-edited rhesus monkeys

Nature Communications ◽

10.1038/s41467-019-13481-y ◽

2019 ◽

Vol 10 (1) ◽

Cited By ~ 8

Author(s):

Xin Luo ◽

Yaoxi He ◽

Chao Zhang ◽

Xiechao He ◽

Lanzhen Yan ◽

...

Keyword(s):

Rhesus Monkeys ◽

De Novo ◽

Preclinical Model ◽

Structural Variants ◽

Sequencing Data ◽

De Novo Mutations ◽

Target Region ◽

Long Read ◽

Target Effect

AbstractCRISPR-Cas9 is a widely-used genome editing tool, but its off-target effect and on-target complex mutations remain a concern, especially in view of future clinical applications. Non-human primates (NHPs) share close genetic and physiological similarities with humans, making them an ideal preclinical model for developing Cas9-based therapies. However, to our knowledge no comprehensive in vivo off-target and on-target assessment has been conducted in NHPs. Here, we perform whole genome trio sequencing of Cas9-treated rhesus monkeys. We only find a small number of de novo mutations that can be explained by expected spontaneous mutations, and no unexpected off-target mutations (OTMs) were detected. Furthermore, the long-read sequencing data does not detect large structural variants in the target region.

Download Full-text

Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads

Nature Biotechnology ◽

10.1038/s41587-020-0719-5 ◽

2020 ◽

Author(s):

David Porubsky ◽

◽

Peter Ebert ◽

Peter A. Audano ◽

Mitchell R. Vollger ◽

...

Keyword(s):

Single Cell ◽

Genome Assembly ◽

De Novo ◽

Error Rates ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

De Novo Genome Assembly ◽

Parental Data ◽

Human Genome Assembly ◽

Long Read

AbstractHuman genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing1,2 with continuous long-read or high-fidelity3 sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms.

Download Full-text

Closing Human Reference Genome Gaps: Identifying and Characterizing Gap-Closing Sequences

G3 Genes|Genome|Genetics ◽

10.1534/g3.120.401280 ◽

2020 ◽

Vol 10 (8) ◽

pp. 2801-2809 ◽

Cited By ~ 1

Author(s):

Tingting Zhao ◽

Zhongqu Duan ◽

Georgi Z. Genchev ◽

Hui Lu

Keyword(s):

Reference Genome ◽

De Novo ◽

Sequence Length ◽

Sequencing Data ◽

Human Reference Genome ◽

Satellite Sequences ◽

Long Read ◽

Data Gap ◽

Simple Repeats ◽

Gap Closing

Despite continuous updates of the human reference genome, there are still hundreds of unresolved gaps which account for about 5% of the total sequence length. Given the availability of whole genome de novo assemblies, especially those derived from long-read sequencing data, gap-closing sequences can be determined. By comparing 17 de novo long-read sequencing assemblies with the human reference genome, we identified a total of 1,125 gap-closing sequences for 132 (16.9% of 783) gaps and added up to 2.2 Mb novel sequences to the human reference genome. More than 90% of the non-redundant sequences could be verified by unmapped reads from the Simons Genome Diversity Project dataset. In addition, 15.6% of the non-reference sequences were found in at least one of four non-human primate genomes. We further demonstrated that the non-redundant sequences had high content of simple repeats and satellite sequences. Moreover, 43 (32.6%) of the 132 closed gaps were shown to be polymorphic; such sequences may play an important biological role and can be useful in the investigation of human genetic diversity.

Download Full-text

Complete Genome Sequence of Rubrobacter xylanophilus Strain AA3-22, Isolated from Arima Onsen in Japan

Microbiology Resource Announcements ◽

10.1128/mra.00818-19 ◽

2019 ◽

Vol 8 (34) ◽

Cited By ~ 1

Author(s):

Natsuki Tomariguchi ◽

Kentaro Miyazaki

Keyword(s):

Genome Sequence ◽

Complete Genome Sequence ◽

Complete Genome ◽

Hot Spring ◽

Sequencing Data ◽

Short Read ◽

Content Type ◽

Short Read Sequencing ◽

Oxford Nanopore ◽

Long Read

Rubrobacter xylanophilus strain AA3-22, belonging to the phylum Actinobacteria, was isolated from nonvolcanic Arima Onsen (hot spring) in Japan. Here, we report the complete genome sequence of this organism, which was obtained by combining Oxford Nanopore long-read and Illumina short-read sequencing data.

Download Full-text

NanoGalaxy: Nanopore long-read sequencing data analysis in Galaxy

GigaScience ◽

10.1093/gigascience/giaa105 ◽

2020 ◽

Vol 9 (10) ◽

Cited By ~ 1

Author(s):

Willem de Koning ◽

Milad Miladi ◽

Saskia Hiltemann ◽

Astrid Heikema ◽

John P Hays ◽

...

Keyword(s):

Genome Assembly ◽

Bioinformatics Analysis ◽

De Novo ◽

Sequence Data ◽

Ease Of Use ◽

Easy Access ◽

Complex Data ◽

Sequencing Data ◽

Long Read ◽

Sequencing Platforms

Abstract Background Long-read sequencing can be applied to generate very long contigs and even completely assembled genomes at relatively low cost and with minimal sample preparation. As a result, long-read sequencing platforms are becoming more popular. In this respect, the Oxford Nanopore Technologies–based long-read sequencing “nanopore" platform is becoming a widely used tool with a broad range of applications and end-users. However, the need to explore and manipulate the complex data generated by long-read sequencing platforms necessitates accompanying specialized bioinformatics platforms and tools to process the long-read data correctly. Importantly, such tools should additionally help democratize bioinformatics analysis by enabling easy access and ease-of-use solutions for researchers. Results The Galaxy platform provides a user-friendly interface to computational command line–based tools, handles the software dependencies, and provides refined workflows. The users do not have to possess programming experience or extended computer skills. The interface enables researchers to perform powerful bioinformatics analysis, including the assembly and analysis of short- or long-read sequence data. The newly developed “NanoGalaxy" is a Galaxy-based toolkit for analysing long-read sequencing data, which is suitable for diverse applications, including de novo genome assembly from genomic, metagenomic, and plasmid sequence reads. Conclusions A range of best-practice tools and workflows for long-read sequence genome assembly has been integrated into a NanoGalaxy platform to facilitate easy access and use of bioinformatics tools for researchers. NanoGalaxy is freely available at the European Galaxy server https://nanopore.usegalaxy.eu with supporting self-learning training material available at https://training.galaxyproject.org.

Download Full-text

Next generation sequencing allows deeper analysis and understanding of genomes and transcriptomes including aspects to fertility

Reproduction Fertility and Development ◽

10.1071/rd10247 ◽

2011 ◽

Vol 23 (1) ◽

pp. 75 ◽

Cited By ~ 7

Author(s):

Thomas Werner

Keyword(s):

Next Generation Sequencing ◽

Transcriptional Control ◽

Target Genes ◽

De Novo ◽

Alternative Promoters ◽

Next Generation ◽

Sequencing Data ◽

Genome Wide ◽

A Genome ◽

Generation Sequencing

Reproduction and fertility are controlled by specific events naturally linked to oocytes, testes and early embryonal tissues. A significant part of these events involves gene expression, especially transcriptional control and alternative transcription (alternative promoters and alternative splicing). While methods to analyse such events for carefully predetermined target genes are well established, until recently no methodology existed to extend such analyses into a genome-wide de novo discovery process. With the arrival of next generation sequencing (NGS) it becomes possible to attempt genome-wide discovery in genomic sequences as well as whole transcriptomes at a single nucleotide level. This does not only allow identification of the primary changes (e.g. alternative transcripts) but also helps to elucidate the regulatory context that leads to the induction of transcriptional changes. This review discusses the basics of the new technological and scientific concepts arising from NGS, prominent differences from microarray-based approaches and several aspects of its application to reproduction and fertility research. These concepts will then be illustrated in an application example of NGS sequencing data analysis involving postimplantation endometrium tissue from cows.

Download Full-text