The long-read genome assembly of hop (Humulus lupulus) uncovers the pseudoautosomal region and other genomic features

2021 ◽  
pp. 1-16
Author(s):  
L.K. Padgitt-Cobb ◽  
S. Kothen-Hill ◽  
J. Henning ◽  
D.A. Hendrix
2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Xing Wang ◽  
Yi Zhang ◽  
Yufeng Zhang ◽  
Mingming Kang ◽  
Yuanbo Li ◽  
...  

AbstractEarthworms (Annelida: Crassiclitellata) are widely distributed around the world due to their ancient origination as well as adaptation and invasion after introduction into new habitats over the past few centuries. Herein, we report a 1.2 Gb complete genome assembly of the earthworm Amynthas corticis based on a strategy combining third-generation long-read sequencing and Hi-C mapping. A total of 29,256 protein-coding genes are annotated in this genome. Analysis of resequencing data indicates that this earthworm is a triploid species. Furthermore, gene family evolution analysis shows that comprehensive expansion of gene families in the Amynthas corticis genome has produced more defensive functions compared with other species in Annelida. Quantitative proteomic iTRAQ analysis shows that expression of 147 proteins changed in the body of Amynthas corticis and 16 S rDNA sequencing shows that abundance of 28 microorganisms changed in the gut of Amynthas corticis when the earthworm was incubated with pathogenic Escherichia coli O157:H7. Our genome assembly provides abundant and valuable resources for the earthworm research community, serving as a first step toward uncovering the mysteries of this species, and may provide molecular level indicators of its powerful defensive functions, adaptation to complex environments and invasion ability.


Author(s):  
Xiaolin Zhao ◽  
Zhichao Zhang ◽  
Sujiao Zheng ◽  
Wenwu Ye ◽  
Xiaobo Zheng ◽  
...  

Diaporthe-Phomopsis disease complex causes considerable yield losses in soybean production worldwide. As one of the major pathogens, Phomopsis longicolla T. W. Hobbs (syn. Diaporthe longicolla) is not only the primary agent of Phomopsis seed decay, but also one of the agents of Phomopsis pod and stem blight, and Phomopsis stem canker. We performed both PacBio long read sequencing and Illumina short read sequencing, and obtained a genome assembly for the P. longicolla strain YC2-1, which was isolated from soybean stem with Phomopsis stem blight disease. The 63.1 Mb genome assembly contains 87 scaffolds, with a minimum, maximum, and N50 scaffold length of 20 kb, 4.6 Mb, and 1.5 Mb respectively, and a total of 17,407 protein-coding genes. The high-quality data expand the genomic resource of P. longicolla species and will provide a solid foundation for a better understanding of their genetic diversity and pathogenic mechanisms.


2019 ◽  
Author(s):  
Ryan Bracewell ◽  
Anita Tran ◽  
Kamalakar Chatla ◽  
Doris Bachtrog

ABSTRACTThe Drosophila obscura species group is one of the most studied clades of Drosophila and harbors multiple distinct karyotypes. Here we present a de novo genome assembly and annotation of D. bifasciata, a species which represents an important subgroup for which no high-quality chromosome-level genome assembly currently exists. We combined long-read sequencing (Nanopore) and Hi-C scaffolding to achieve a highly contiguous genome assembly approximately 193Mb in size, with repetitive elements constituting 30.1% of the total length. Drosophila bifasciata harbors four large metacentric chromosomes and the small dot, and our assembly contains each chromosome in a single scaffold, including the highly repetitive pericentromere, which were largely composed of Jockey and Gypsy transposable elements. We annotated a total of 12,821 protein-coding genes and comparisons of synteny with D. athabasca orthologs show that the large metacentric pericentromeric regions of multiple chromosomes are conserved between these species. Importantly, Muller A (X chromosome) was found to be metacentric in D. bifasciata and the pericentromeric region appears homologous to the pericentromeric region of the fused Muller A-AD (XL and XR) of pseudoobscura/affinis subgroup species. Our finding suggests a metacentric ancestral X fused to a telocentric Muller D and created the large neo-X (Muller A-AD) chromosome ∼15 MYA. We also confirm the fusion of Muller C and D in D. bifasciata and show that it likely involved a centromere-centromere fusion.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yu Chen ◽  
Yixin Zhang ◽  
Amy Y. Wang ◽  
Min Gao ◽  
Zechen Chong

AbstractLong-read de novo genome assembly continues to advance rapidly. However, there is a lack of effective tools to accurately evaluate the assembly results, especially for structural errors. We present Inspector, a reference-free long-read de novo assembly evaluator which faithfully reports types of errors and their precise locations. Notably, Inspector can correct the assembly errors based on consensus sequences derived from raw reads covering erroneous regions. Based on in silico and long-read assembly results from multiple long-read data and assemblers, we demonstrate that in addition to providing generic metrics, Inspector can accurately identify both large-scale and small-scale assembly errors.


2021 ◽  
pp. gr.275325.121
Author(s):  
Rodrigo P. Baptista ◽  
Yiran Li ◽  
Adam Sateriale ◽  
Karen L. Brooks ◽  
Alan Tracey ◽  
...  

Cryptosporidiosis is a leading cause of waterborne diarrheal disease globally and an important contributor to mortality in infants and the immunosuppressed. Despite its importance, the Cryptosporidium community has only had access to a good, but incomplete, Cryptosporidium parvum IOWA reference genome sequence. Incomplete reference sequences hamper annotation, experimental design and interpretation. We have generated a new C. parvum IOWA genome assembly supported by PacBio and Oxford Nanopore long-read technologies and a new comparative and consistent genome annotation for three closely related species C. parvum, Cryptosporidium hominis and Cryptosporidium tyzzeri. We made 1,926 C. parvum annotation updates based on experimental evidence. They include new transporters, ncRNAs, introns and altered gene structures. The new assembly and annotation revealed a complete Dnmt2 methylase ortholog. Comparative annotation between C. parvum, C. hominis and C. tyzzeri revealed that most "missing" orthologs are found suggesting that the biological differences between the species must result from gene copy number variation, differences in gene regulation and single nucleotide variants (SNVs). Using the new assembly and annotation as reference, 190 genes are identified as evolving under positive selection, including many not detected previously. The new C. parvum IOWA reference genome assembly is larger, gap free and lacks ambiguous bases. This chromosomal assembly recovers all 16 chromosome ends, 13 of which are contiguously assembled. The three remaining chromosome ends are provisionally placed. These ends represent duplication of entire chromosome ends including subtelomeric regions revealing a new level of genome plasticity that will both inform and impact future research.


2017 ◽  
Author(s):  
Ruibang Luo ◽  
Fritz J. Sedlazeck ◽  
Charlotte A. Darby ◽  
Stephen M. Kelly ◽  
Michael C. Schatz

AbstractMotivationLinked reads are a form of DNA sequencing commercialized by 10X Genomics that uses highly multiplexed barcoding within microdroplets to tag short reads to progenitor molecules. The linked reads, spanning tens to hundreds of kilobases, offer an alternative to long-read sequencing for de novo assembly, haplotype phasing and other applications. However, there is no available simulator, making it difficult to measure their capability or develop new informatics tools.ResultsOur analysis of 13 real linked read datasets revealed their characteristics of barcodes, molecules and partitions. Based on this, we introduce LRSim that simulates linked reads by emulating the library preparation and sequencing process with fine control of 1) the number of simulated variants; 2) the linked-read characteristics; and 3) the Illumina reads profile. We conclude from the phasing and genome assembly of multiple datasets, recommendations on coverage, fragment length, and partitioning when sequencing human and non-human genome.AvailabilityLRSIM is under MIT license and is freely available at https://github.com/aquaskyline/[email protected]


2020 ◽  
Author(s):  
Katherine A. Easterling ◽  
Nicholi J. Pitra ◽  
Taylan B. Morcol ◽  
Jenna R. Aquino ◽  
Lauren G. Lopes ◽  
...  

ABSTRACTHop (Humulus lupulus L.) is known for its use as a bittering agent in beer and has a rich history of cultivation, beginning in Europe and now spanning the globe. There are five wild varieties worldwide, which may have been introgressed with cultivated varieties. As a dioecious species, its obligate outcrossing, non-Mendelian inheritance, and genomic structural variability have confounded directed breeding efforts. Consequently, understanding genome evolution in Humulus represents a considerable challenge, requiring additional resources, including integrated genome maps. In order to facilitate cytogenetic investigations into the transmission genetics of hop, we report here the identification and characterization of 17 new and distinct tandem repeat sequence families. A tandem repeat discovery pipeline was developed using k-mer filtering and dot plot analysis of PacBio long-read sequences from the hop cultivar Apollo. We produced oligonucleotide FISH probes from conserved regions of HuluTR120 and HulTR225 and demonstrated their utility to stain meiotic chromosomes from wild hop, var. neomexicanus. The HuluTR225 FISH probe hybridized to several loci per nucleus and exhibited irregular, non-Mendelian transmission in male meiocytes of wild hop. Collectively, these tandem repeat sequence families not only represent unique and valuable new cytogenetic reagents but also have the capacity to inform genome assembly efforts and support comparative genomic analyses.


2021 ◽  
Vol 15 (12) ◽  
pp. e0010062
Author(s):  
Julien Kincaid-Smith ◽  
Alan Tracey ◽  
Ronaldo de Carvalho Augusto ◽  
Ingo Bulla ◽  
Nancy Holroyd ◽  
...  

Schistosomes cause schistosomiasis, the world’s second most important parasitic disease after malaria in terms of public health and social-economic impacts. A peculiar feature of these dioecious parasites is their ability to produce viable and fertile hybrid offspring. Originally only present in the tropics, schistosomiasis is now also endemic in southern Europe. Based on the analysis of two genetic markers the European schistosomes had previously been identified as hybrids between the livestock- and the human-infective species Schistosoma bovis and Schistosoma haematobium, respectively. Here, using PacBio long-read sequencing technology we performed genome assembly improvement and annotation of S. bovis, one of the parental species for which no satisfactory genome assembly was available. We then describe the whole genome introgression levels of the hybrid schistosomes, their morphometric parameters (eggs and adult worms) and their compatibility with two European snail strains used as vectors (Bulinus truncatus and Planorbarius metidjensis). Schistosome-snail compatibility is a key parameter for the parasites life cycle progression, and thus the capability of the parasite to establish in a given area. Our results show that this Schistosoma hybrid is strongly introgressed genetically, composed of 77% S. haematobium and 23% S. bovis origin. This genomic admixture suggests an ancient hybridization event and subsequent backcrosses with the human-specific species, S. haematobium, before its introduction in Corsica. We also show that egg morphology (commonly used as a species diagnostic) does not allow for accurate hybrid identification while genetic tests do.


2019 ◽  
Author(s):  
Lillian K. Padgitt-Cobb ◽  
Sarah B. Kingan ◽  
Jackson Wells ◽  
Justin Elser ◽  
Brent Kronmiller ◽  
...  

AbstractHop (Humulus lupulus L. var Lupulus) is a diploid, dioecious plant with a history of cultivation spanning more than one thousand years. Hop cones are valued for their use in brewing, and around the world, hop has been used in traditional medicine to treat a variety of ailments. Efforts to determine how biochemical pathways responsible for desirable traits are regulated have been challenged by the large, repetitive, and heterozygous genome of hop. We present the first report of a haplotype-phased assembly of a large plant genome. Our assembly and annotation of the Cascade cultivar genome is the most extensive to date. PacBio long-read sequences from hop were assembled with FALCON and phased with FALCON-Unzip. Using the diploid assembly to assess haplotype variation, we discovered genes under positive selection enriched for stress-response, growth, and flowering functions. Comparative analysis of haplotypes provides insight into large-scale structural variation and the selective pressures that have driven hop evolution. Previous studies estimated repeat content at around 60%. With improved resolution of long terminal retrotransposons (LTRs) due to long-read sequencing, we found that hop is nearly 78% repetitive. Our quantification of repeat content provides context for the size of the hop genome, and supports the hypothesis of whole genome duplication (WGD), rather than expansion due to LTRs. With our more complete assembly, we have identified a homolog of cannabidiolic acid synthase (CBDAS) that is expressed in multiple tissues. The approaches we developed to analyze a phased, diploid assembly serve to deepen our understanding of the genomic landscape of hop and may have broader applicability to the study of other large, complex genomes.


Sign in / Sign up

Export Citation Format

Share Document