scholarly journals A de novo assembly of the sweet cherry (Prunus avium cv. Tieton) genome using linked-read sequencing technology

PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e9114 ◽  
Author(s):  
Jiawei Wang ◽  
Weizhen Liu ◽  
Dongzi Zhu ◽  
Xiang Zhou ◽  
Po Hong ◽  
...  

The sweet cherry (Prunus avium) is one of the most economically important fruit species in the world. However, there is a limited amount of genetic information available for this species, which hinders breeding efforts at a molecular level. We were able to describe a high-quality reference genome assembly and annotation of the diploid sweet cherry (2n = 2x = 16) cv. Tieton using linked-read sequencing technology. We generated over 750 million clean reads, representing 112.63 GB of raw sequencing data. The Supernova assembler produced a more highly-ordered and continuous genome sequence than the current P. avium draft genome, with a contig N50 of 63.65 KB and a scaffold N50 of 2.48 MB. The final scaffold assembly was 280.33 MB in length, representing 82.12% of the estimated Tieton genome. Eight chromosome-scale pseudomolecules were constructed, completing a 214 MB sequence of the final scaffold assembly. De novo, homology-based, and RNA-seq methods were used together to predict 30,975 protein-coding loci. 98.39% of core eukaryotic genes and 97.43% of single copy orthologues were identified in the embryo plant, indicating the completeness of the assembly. Linked-read sequencing technology was effective in constructing a high-quality reference genome of the sweet cherry, which will benefit the molecular breeding and cultivar identification in this species.

2017 ◽  
Author(s):  
Zhipeng Li ◽  
Zeshan Lin ◽  
Lei Chen ◽  
Hengxing Ba ◽  
Yongzhi Yang ◽  
...  

AbstractBackgroundReindeer (Rangifer tarandus) is the only fully domesticated species in the Cervidae family, and is the only cervid with a circumpolar distribution. Unlike all other cervids, female reindeer regularly grow cranial appendages (antlers, the defining characteristics of cervids), as well as males. Moreover, reindeer milk contains more protein and less lactose than bovids’ milk. A high quality reference genome of this specie will assist efforts to elucidate these and other important features in the reindeer.FindingsWe obtained 723.2 Gb (Gigabase) of raw reads by an Illumina Hiseq 4000 platform, and a 2.64 Gb final assembly, representing 95.7% of the estimated genome (2.76 Gb according to k-mer analysis), including 92.6% of expected genes according to BUSCO analysis. The contig N50 and scaffold N50 sizes were 89.7 kilo base (kb) and 0.94 mega base (Mb), respectively. We annotated 21,555 protein-coding genes and 1.07 Gb of repetitive sequences by de novo and homology-based prediction. Homology-based searches detected 159 rRNA, 547 miRNA, 1,339 snRNA and 863 tRNA sequences in the genome of R. tarandus. The divergence time between R. tarandus, and ancestors of Bos taurus and Capra hircus, is estimated to be 29.55 million years ago (Mya).ConclusionsOur results provide the first high-quality reference genome for the reindeer, and a valuable resource for studying evolution, domestication and other unusual characteristics of the reindeer.


2020 ◽  
Vol 10 (10) ◽  
pp. 3489-3495
Author(s):  
Natascha van Lieshout ◽  
Ate van der Burgt ◽  
Michiel E. de Vries ◽  
Menno ter Maat ◽  
David Eickholt ◽  
...  

With the rapid expansion of the application of genomics and sequencing in plant breeding, there is a constant drive for better reference genomes. In potato (Solanum tuberosum), the third largest food crop in the world, the related species S. phureja, designated “DM”, has been used as the most popular reference genome for the last 10 years. Here, we introduce the de novo sequenced genome of Solyntus as the next standard reference in potato genome studies. A true Solanum tuberosum made up of 116 contigs that is also highly homozygous, diploid, vigorous and self-compatible, Solyntus provides a more direct and contiguous reference then ever before available. It was constructed by sequencing with state-of-the-art long and short read technology and assembled with Canu. The 116 contigs were assembled into scaffolds to form each pseudochromosome, with three contigs to 17 contigs per chromosome. This assembly contains 93.7% of the single-copy gene orthologs from the Solanaceae set and has an N50 of 63.7 Mbp. The genome and related files can be found at https://www.plantbreeding.wur.nl/Solyntus/. With the release of this research line and its draft genome we anticipate many exciting developments in (diploid) potato research.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e8210
Author(s):  
Xueqing Zhao ◽  
Ming Yan ◽  
Yu Ding ◽  
Yan Huo ◽  
Zhaohe Yuan

Background Sweet cherry (Prunus avium) is one of the most popular of the temperate fruits. Previous studies have demonstrated that there were several haplotypes in the chloroplast genome of sweet cherry cultivars. However, none of chloroplast genome of a sweet cherry cultivar were yet released, and the phylogenetic relationships among Prunus based on chloroplast genome data were unclear. Methods In this study, we assembled and annotated the complete chloroplast genome of a sweet cherry cultivar P. avium ‘Summit’ from high-throughput sequencing data. Gene Ontology (GO) terms were assigned to classify the function of the annotated genes. Maximum likelihood (ML) trees were constructed to reveal the phylogenetic relationships within Prunus species, using LSC (large single-copy) regions, SSC (small single-copy) regions, IR (inverted repeats) regions, CDS (coding sequences), intergenic regions, and whole cp genome datasets, respectively. Results The complete plastid genome was 157, 886 bp in length with a typical quadripartite structure of LSC (85,990 bp) and SSC (19,080 bp) regions, separated by a pair of IR regions (26,408 bp). It contained 131 genes, including 86 protein-coding genes, 37 transfer RNA genes and 8 ribosomal RNA genes. A total of 77 genes were assigned to three major GO categories, including molecular function, cellular component and biological process categories. Comparison with other Prunus species showed that P. avium ‘Summit’ was quite conserved in gene content and structure. The non-coding regions, ndhc-trnV, rps12-trnV and rpl32-trnL were the most variable sequences between wild Mazzard cherry and ‘Summit’ cherry. A total of 73 simple sequence repeats (SSRs) were identified in ‘Summit’ cherry and most of them were mononucleotide repeats. ML phylogenetic tree within Prunus species revealed four clades: Amygdalus, Cerasus, Padus, and Prunus. The SSC and IR trees were incongruent with results using other cp data partitions. These data provide valuable genetic resources for future research on sweet cherry and Prunus species.


2020 ◽  
Author(s):  
C. Molitor ◽  
T.J. Kurowski ◽  
P.M. Fidalgo de Almeida ◽  
P. Eerolla ◽  
D.J. Spindlow ◽  
...  

AbstractSolanum sitiens is a self-incompatible wild relative of tomato, characterised by salt and drought resistance traits, with the potential to contribute to crop improvement in cultivated tomato. This species has a distinct morphology, classification and ecotype compared to other stress resistant wild tomato relatives such as S. pennellii and S. chilense. Therefore, the availability of a high-quality reference genome for S. sitiens will facilitate the genetic and molecular understanding of salt and drought resistance. Here, we present a de novo genome and transcriptome assembly for S. sitiens (Accession LA1974). A hybrid assembly strategy was followed using Illumina short reads (∼159X coverage) and PacBio long reads (∼44X coverage), generating a total of ∼262 Gbp of DNA sequence; in addition, ∼2,670 Gbp of BioNano data was obtained. A reference genome of 1,245 Mbp, arranged in 1,481 scaffolds with a N50 of 1,826 Mbp was generated. Genome completeness was estimated at 95% using the Benchmarking Universal Single-Copy Orthologs (BUSCO) and the K-mer Analysis Tool (KAT); this is within the range of current high-quality reference genomes for other tomato wild relatives. Additionally, we identified three large inversions compared to S. lycopersicum, containing several drought resistance related genes, such as beta-amylase 1 and YUCCA7.In addition, ∼63 Gbp of RNA-Seq were generated to support the prediction of 31,164 genes from the assembly, and perform a de novo transcriptome. Some of the protein clusters unique to S. sitiens were associated with genes involved in drought and salt resistance, including GLO1 and FQR1.This first reference genome for S. sitiens will provide a valuable resource to progress QTL studies to the gene level, and will assist molecular breeding to improve crop production in water-limited environments.


GigaScience ◽  
2021 ◽  
Vol 10 (1) ◽  
Author(s):  
Monica M Sheffer ◽  
Anica Hoppe ◽  
Henrik Krehenwinkel ◽  
Gabriele Uhl ◽  
Andreas W Kuss ◽  
...  

Abstract Background Argiope bruennichi, the European wasp spider, has been investigated intensively as a focal species for studies on sexual selection, chemical communication, and the dynamics of rapid range expansion at a behavioral and genetic level. However, the lack of a reference genome has limited insights into the genetic basis for these phenomena. Therefore, we assembled a high-quality chromosome-level reference genome of the European wasp spider as a tool for more in-depth future studies. Findings We generated, de novo, a 1.67 Gb genome assembly of A. bruennichi using 21.8× Pacific Biosciences sequencing, polished with 19.8× Illumina paired-end sequencing data, and proximity ligation (Hi-C)-based scaffolding. This resulted in an N50 scaffold size of 124 Mb and an N50 contig size of 288 kb. We found 98.4% of the genome to be contained in 13 scaffolds, fitting the expected number of chromosomes (n = 13). Analyses showed the presence of 91.1% of complete arthropod BUSCOs, indicating a high-quality assembly. Conclusions We present the first chromosome-level genome assembly in the order Araneae. With this genomic resource, we open the door for more precise and informative studies on evolution and adaptation not only in A. bruennichi but also in arachnids overall, shedding light on questions such as the genomic architecture of traits, whole-genome duplication, and the genomic mechanisms behind silk and venom evolution.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Hai-Feng Tian ◽  
Qiao-Mu Hu ◽  
Zhong Li

Abstract The swamp eel (Monopterus albus) is one economically important fish in China and South-Eastern Asia and a good model species to study sex inversion. There are different genetic lineages and multiple local strains of swamp eel in China, and one local strain of M. albus with deep yellow and big spots has been selected for consecutive selective breeding due to superiority in growth rate and fecundity. A high-quality reference genome of the swamp eel would be a very useful resource for future selective breeding program. In the present study, we applied PacBio single-molecule sequencing technique (SMRT) and the high-throughput chromosome conformation capture (Hi-C) technologies to assemble the M. albus genome. A 799 Mb genome was obtained with the contig N50 length of 2.4 Mb and scaffold N50 length of 67.24 Mb, indicating 110-fold and ∼31.87-fold improvement compared to the earlier released assembly (∼22.24 Kb and 2.11 Mb, respectively). Aided with Hi-C data, a total of 750 contigs were reliably assembled into 12 chromosomes. Using 22,373 protein-coding genes annotated here, the phylogenetic relationships of the swamp eel with other teleosts showed that swamp eel separated from the common ancestor of Zig-zag eel ∼49.9 million years ago, and 769 gene families were found expanded, which are mainly enriched in the immune system, sensory system, and transport and catabolism. This highly accurate, chromosome-level reference genome of M. albus obtained in this work will be used for the development of genome-scale selective breeding.


Author(s):  
Maximilian Driller ◽  
Sibelle Torres Vilaça ◽  
Larissa Souza Arantes ◽  
Tomás Carrasco-Valenzuela ◽  
Felix Heeger ◽  
...  

AbstractReduced representation libraries present an opportunity to perform large scale studies on non-model species without the need for a reference genome. Methods that use restriction enzymes and fragment size selection to help obtain the desired number of loci - such as ddRAD - are highly flexible and therefore suitable to different types of studies. However, a number of technical issues are not approachable without a reference genome, such as size selection reproducibility across samples and coverage across fragment lengths. Moreover, identity thresholds are usually chosen arbitrarily in order to maximize the number of SNPs considering arbitrary parameters. We have developed a strategy to identify de novo a set of reduced-representation single-copy orthologs (R2SCOs). Our approach is based on overlapping reads that recreate original fragments and add information about coverage per fragment size. A further in silico digestion step limits the data to well covered fragment sizes, increasing the chance of covering the majority of loci across different individuals. By using full sequences as putative alleles, we estimate optimal identity thresholds from pairwise comparisons. We have demonstrated our full workflow with data from five sea turtle species. Locus numbers were similar across all species, even at increasing phylogenetics distances. Our results indicated that sea turtles have in general very low levels of heterozygosity. Our approach produced a high-quality set of reference loci, eliminating a series of biological and experimental biases that can strongly affect downstream analysis, and allowed us to explore the genetic variability within and across sea turtle species.


GigaScience ◽  
2020 ◽  
Vol 9 (4) ◽  
Author(s):  
Matt A Field ◽  
Benjamin D Rosen ◽  
Olga Dudchenko ◽  
Eva K F Chan ◽  
Andre E Minoche ◽  
...  

Abstract Background The German Shepherd Dog (GSD) is one of the most common breeds on earth and has been bred for its utility and intelligence. It is often first choice for police and military work, as well as protection, disability assistance, and search-and-rescue. Yet, GSDs are well known to be susceptible to a range of genetic diseases that can interfere with their training. Such diseases are of particular concern when they occur later in life, and fully trained animals are not able to continue their duties. Findings Here, we provide the draft genome sequence of a healthy German Shepherd female as a reference for future disease and evolutionary studies. We generated this improved canid reference genome (CanFam_GSD) utilizing a combination of Pacific Bioscience, Oxford Nanopore, 10X Genomics, Bionano, and Hi-C technologies. The GSD assembly is ∼80 times as contiguous as the current canid reference genome (20.9 vs 0.267 Mb contig N50), containing far fewer gaps (306 vs 23,876) and fewer scaffolds (429 vs 3,310) than the current canid reference genome CanFamv3.1. Two chromosomes (4 and 35) are assembled into single scaffolds with no gaps. BUSCO analyses of the genome assembly results show that 93.0% of the conserved single-copy genes are complete in the GSD assembly compared with 92.2% for CanFam v3.1. Homology-based gene annotation increases this value to ∼99%. Detailed examination of the evolutionarily important pancreatic amylase region reveals that there are most likely 7 copies of the gene, indicative of a duplication of 4 ancestral copies and the disruption of 1 copy. Conclusions GSD genome assembly and annotation were produced with major improvement in completeness, continuity, and quality over the existing canid reference. This resource will enable further research related to canine diseases, the evolutionary relationships of canids, and other aspects of canid biology.


Sign in / Sign up

Export Citation Format

Share Document