scholarly journals Chromosome-level genome assembly and annotation of the loquat (Eriobotrya japonica) genome

GigaScience ◽  
2020 ◽  
Vol 9 (3) ◽  
Author(s):  
Shuang Jiang ◽  
Haishan An ◽  
Fangjie Xu ◽  
Xueying Zhang

Abstract Background The loquat (Eriobotrya japonica) is a species of flowering plant in the family Rosaceae that is widely cultivated in Asian, European, and African countries. It blossoms in the winter and ripens in the early summer. The genome of loquat has to date not been published, which limits the study of molecular biology in this cultivated species. Here, we used the third-generation sequencing technology of Nanopore and Hi-C technology to sequence the genome of Eriobotrya. Findings We generated 100.10 Gb of long reads using Oxford Nanopore sequencing technologies. Three types of Illumina high-throughput sequencing data, including genome short reads (47.42 Gb), transcriptome short reads (11.06 Gb), and Hi-C short reads (67.25 Gb), were also generated to help construct the loquat genome. All data were assembled into a 760.1-Mb genome assembly. The contigs were mapped to chromosomes by using Hi-C technology based on the contacts between contigs, and then a genome was assembled exhibiting 17 chromosomes and a scaffold N50 length of 39.7 Mb. A total of 45,743 protein-coding genes were annotated in the Eriobotrya genome, and we investigated the phylogenetic relationships between the Eriobotrya and 6 other Rosaceae species. Eriobotrya shows a close relationship with Malus and Pyrus, with the divergence time of Eriobotrya and Malus being 6.76 million years ago. Furthermore, chromosome rearrangement was found in Eriobotrya and Malus. Conclusions We constructed the first high-quality chromosome-level Eriobotrya genome using Illumina, Nanopore, and Hi-C technologies. This work provides a valuable reference genome for molecular studies of the loquat and provides new insight into chromosome evolution in this species.

Diversity ◽  
2019 ◽  
Vol 11 (9) ◽  
pp. 144 ◽  
Author(s):  
Laís Coelho ◽  
Lukas Musher ◽  
Joel Cracraft

Current generation high-throughput sequencing technology has facilitated the generation of more genomic-scale data than ever before, thus greatly improving our understanding of avian biology across a range of disciplines. Recent developments in linked-read sequencing (Chromium 10×) and reference-based whole-genome assembly offer an exciting prospect of more accessible chromosome-level genome sequencing in the near future. We sequenced and assembled a genome of the Hairy-crested Antbird (Rhegmatorhina melanosticta), which represents the first publicly available genome for any antbird (Thamnophilidae). Our objectives were to (1) assemble scaffolds to chromosome level based on multiple reference genomes, and report on differences relative to other genomes, (2) assess genome completeness and compare content to other related genomes, and (3) assess the suitability of linked-read sequencing technology for future studies in comparative phylogenomics and population genomics studies. Our R. melanosticta assembly was both highly contiguous (de novo scaffold N50 = 3.3 Mb, reference based N50 = 53.3 Mb) and relatively complete (contained close to 90% of evolutionarily conserved single-copy avian genes and known tetrapod ultraconserved elements). The high contiguity and completeness of this assembly enabled the genome to be successfully mapped to the chromosome level, which uncovered a consistent structural difference between R. melanosticta and other avian genomes. Our results are consistent with the observation that avian genomes are structurally conserved. Additionally, our results demonstrate the utility of linked-read sequencing for non-model genomics. Finally, we demonstrate the value of our R. melanosticta genome for future researchers by mapping reduced representation sequencing data, and by accurately reconstructing the phylogenetic relationships among a sample of thamnophilid species.


2015 ◽  
Author(s):  
Mohammed-Amin Madoui ◽  
Carole Dossat ◽  
Leo d'Agata ◽  
Edwin van der Vossen ◽  
Jan van Oeveren ◽  
...  

Background Scaffolding is a crucial step in the genome assembly process. Current methods based on large fragment paired-end reads or long reads allow an increase in continuity but often lack consistency in repetitive regions, resulting in fragmented assemblies. Here, we describe a novel tool to link assemblies to a genome map to aid complex genome reconstruction by detecting assembly errors and allowing scaffold ordering and anchoring. Results We present MaGuS (map-guided scaffolding), a modular tool that uses a draft genome assembly, a genome map, and high-throughput paired-end sequencing data to estimate the quality and to enhance the continuity of an assembly. We generated several assemblies of the Arabidopsis genome using different scaffolding programs and applied MaGuS to select the best assembly using quality metrics. Then, we used MaGuS to perform map-guided scaffolding to increase continuity by creating new scaffold links in low-covered and highly repetitive regions where other commonly used scaffolding methods lack consistency. Conclusions MaGuS is a powerful reference-free evaluator of assembly quality and a map-guided scaffolder that is freely available at https://github.com/institut-de-genomique/MaGuS. Its use can be extended to other high-throughput sequencing data (e.g., long-read data) and also to other map data (e.g., genetic maps) to improve the quality and the continuity of large and complex genome assemblies.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Guifang Lin ◽  
Cheng He ◽  
Jun Zheng ◽  
Dal-Hoe Koo ◽  
Ha Le ◽  
...  

Abstract Background The maize inbred line A188 is an attractive model for elucidation of gene function and improvement due to its high embryogenic capacity and many contrasting traits to the first maize reference genome, B73, and other elite lines. The lack of a genome assembly of A188 limits its use as a model for functional studies. Results Here, we present a chromosome-level genome assembly of A188 using long reads and optical maps. Comparison of A188 with B73 using both whole-genome alignments and read depths from sequencing reads identify approximately 1.1 Gb of syntenic sequences as well as extensive structural variation, including a 1.8-Mb duplication containing the Gametophyte factor1 locus for unilateral cross-incompatibility, and six inversions of 0.7 Mb or greater. Increased copy number of carotenoid cleavage dioxygenase 1 (ccd1) in A188 is associated with elevated expression during seed development. High ccd1 expression in seeds together with low expression of yellow endosperm 1 (y1) reduces carotenoid accumulation, accounting for the white seed phenotype of A188. Furthermore, transcriptome and epigenome analyses reveal enhanced expression of defense pathways and altered DNA methylation patterns of the embryonic callus. Conclusions The A188 genome assembly provides a high-resolution sequence for a complex genome species and a foundational resource for analyses of genome variation and gene function in maize. The genome, in comparison to B73, contains extensive intra-species structural variations and other genetic differences. Expression and network analyses identify discrete profiles for embryonic callus and other tissues.


BMC Genetics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Liping Guan ◽  
Ke Cao ◽  
Yong Li ◽  
Jian Guo ◽  
Qiang Xu ◽  
...  

Abstract Background Peach (Prunus persica L.) is a diploid species and model plant of the Rosaceae family. In the past decade, significant progress has been made in peach genetic research via DNA markers, but the number of these markers remains limited. Results In this study, we performed a genome-wide DNA markers detection based on sequencing data of six distantly related peach accessions. A total of 650,693~1,053,547 single nucleotide polymorphisms (SNPs), 114,227~178,968 small insertion/deletions (InDels), 8386~12,298 structure variants (SVs), 2111~2581 copy number variants (CNVs) and 229,357~346,940 simple sequence repeats (SSRs) were detected and annotated. To demonstrate the application of DNA markers, 944 SNPs were filtered for association study of fruit ripening time and 15 highly polymorphic SSRs were selected to analyze the genetic relationship among 221 accessions. Conclusions The results showed that the use of high-throughput sequencing to develop DNA markers is fast and effective. Comprehensive identification of DNA markers, including SVs and SSRs, would be of benefit to genetic diversity evaluation, genetic mapping, and molecular breeding of peach.


2020 ◽  
Vol 10 (7) ◽  
pp. 2179-2183 ◽  
Author(s):  
Stefan Prost ◽  
Malte Petersen ◽  
Martin Grethlein ◽  
Sarah Joy Hahn ◽  
Nina Kuschik-Maczollek ◽  
...  

Ever decreasing costs along with advances in sequencing and library preparation technologies enable even small research groups to generate chromosome-level assemblies today. Here we report the generation of an improved chromosome-level assembly for the Siamese fighting fish (Betta splendens) that was carried out during a practical university master’s course. The Siamese fighting fish is a popular aquarium fish and an emerging model species for research on aggressive behavior. We updated the current genome assembly by generating a new long-read nanopore-based assembly with subsequent scaffolding to chromosome-level using previously published Hi-C data. The use of ∼35x nanopore-based long-read data sequenced on a MinION platform (Oxford Nanopore Technologies) allowed us to generate a baseline assembly of only 1,276 contigs with a contig N50 of 2.1 Mbp, and a total length of 441 Mbp. Scaffolding using the Hi-C data resulted in 109 scaffolds with a scaffold N50 of 20.7 Mbp. More than 99% of the assembly is comprised in 21 scaffolds. The assembly showed the presence of 96.1% complete BUSCO genes from the Actinopterygii dataset indicating a high quality of the assembly. We present an improved full chromosome-level assembly of the Siamese fighting fish generated during a university master’s course. The use of ∼35× long-read nanopore data drastically improved the baseline assembly in terms of continuity. We show that relatively in-expensive high-throughput sequencing technologies such as the long-read MinION sequencing platform can be used in educational settings allowing the students to gain practical skills in modern genomics and generate high quality results that benefit downstream research projects.


2019 ◽  
Vol 31 (7) ◽  
pp. 1189 ◽  
Author(s):  
Janine E. Deakin ◽  
Sally Potter

Marsupials have unique features that make them particularly interesting to study, and sequencing of marsupial genomes is helping to understand their evolution. A decade ago, it was a huge feat to sequence the first marsupial genome. Now, the advances in sequencing technology have made the sequencing of many more marsupial genomes possible. However, the DNA sequence is only one component of the structures it is packaged into: chromosomes. Knowing the arrangement of the DNA sequence on each chromosome is essential for a genome assembly to be used to its full potential. The importance of combining sequence information with cytogenetics has previously been demonstrated for rapidly evolving regions of the genome, such as the sex chromosomes, as well as for reconstructing the ancestral marsupial karyotype and understanding the chromosome rearrangements involved in the Tasmanian devil facial tumour disease. Despite the recent advances in sequencing technology assisting in genome assembly, physical anchoring of the sequence to chromosomes is required to achieve a chromosome-level assembly. Once chromosome-level assemblies are achieved for more marsupials, we will be able to investigate changes in the packaging and interactions between chromosomes to gain an understanding of the role genome architecture has played during marsupial evolution.


2015 ◽  
Author(s):  
Neeraja M Krishnan ◽  
Prachi Jain ◽  
Saurabh Gupta ◽  
Arun K Hariharan ◽  
Binay Panda

Neem (Azadirachta indica A. Juss.), an evergreen tree of the Meliaceae family, is known for its medicinal, cosmetic, pesticidal and insecticidal properties. We had previously sequenced and published the draft genome of the plant, using mainly short read sequencing data. In this report, we present an improved genome assembly generated using additional short reads from Illumina and long reads from Pacific Biosciences SMRT sequencer. We assembled short reads and error corrected long reads using Platanus, an assembler designed to perform well for heterozygous genomes. The updated genome assembly (v2.0) yielded 3- and 3.5-fold increase in N50 and N75, respectively; 2.6-fold decrease in the total number of scaffolds; 1.25-fold increase in the number of valid transcriptome alignments; 13.4-fold less mis-assembly and 1.85-fold increase in the percentage repeat, over the earlier assembly (v1.0). The current assembly also maps better to the genes known to be involved in the terpenoid biosynthesis pathway. Together, the data represents an improved assembly of the A. indica genome. The raw data described in this manuscript are submitted to the NCBI Short Read Archive under the accession numbers SRX1074131, SRX1074132, SRX1074133, and SRX1074134 (SRP013453).


2019 ◽  
Author(s):  
Liping Guan ◽  
ke Cao ◽  
yong Li ◽  
jian guo ◽  
qiang xu ◽  
...  

Abstract Background: Peach (Prunus persica L.) is a diploid species and model plant of the Rosaceae family. In the past decade, significant progress has been made in peach genetic research via DNA markers, but the number of these markers remains limited. Results: In this study, we performed a genome-wide DNA markers detection based on sequencing data of six distantly related peach accessions. A total of 650,693~1,053,547 single nucleotide polymorphisms (SNPs), 114,227~178,968 small insertion/deletions (InDels), 8,386~12,298 structure variants (SVs), 2,111~2,581 copy number variants (CNVs) and 229,357~346,940 simple sequence repeats (SSRs) were detected and annotated. To demonstrate the application of DNA markers, 944 SNPs were filtered for association study of fruit ripening time and 15 highly polymorphic SSRs were selected to analyze the genetic relationship among 221 accessions. Conclusions: The results showed that the use of high-throughput sequencing to develop DNA markers is fast and effective. Comprehensive identification of DNA markers, including SVs and SSRs, would be of benefit to genetic diversity evaluation, genetic mapping, and molecular breeding of peach.


2019 ◽  
Author(s):  
Liping Guan ◽  
ke Cao ◽  
yong Li ◽  
jian guo ◽  
qiang xu ◽  
...  

Abstract Abstract Background: Peach (Prunus persica L.) is a diploid species and model plant of the Rosaceae family. In the past decade, significant progress has been made in peach genetic research via DNA markers, but the number of these markers remains limited. Results: In this study, we performed a genome-wide DNA markers detection based on sequencing data of six distantly related peach accessions. A total of 650,693~1,053,547 single nucleotide polymorphisms (SNPs), 114,227~178,968 small insertion/deletions (InDels), 8,386~12,298 structure variants (SVs), 2,111~2,581 copy number variants (CNVs) and 229,357~346,940 simple sequence repeats (SSRs) were detected and annotated. To demonstrate the application of DNA markers, 944 SNPs were filtered for association study of fruit ripening time and 15 highly polymorphic SSRs were selected to analyze the genetic relationship among 221 accessions. Conclusions: The results showed that the use of high-throughput sequencing to develop DNA markers is fast and effective. Comprehensive identification of DNA markers, including SVs and SSRs, would be of benefit to genetic diversity evaluation, genetic mapping, and molecular breeding of peach.


Sign in / Sign up

Export Citation Format

Share Document