scholarly journals De novo genome assembly of the endangered Acer yangbiense, a plant species with extremely small populations endemic to Yunnan Province, China

GigaScience ◽  
2019 ◽  
Vol 8 (7) ◽  
Author(s):  
Jing Yang ◽  
Hafiz Muhammad Wariss ◽  
Lidan Tao ◽  
Rengang Zhang ◽  
Quanzheng Yun ◽  
...  

Abstract Background Acer yangbiense is a newly described critically endangered endemic maple tree confined to Yangbi County in Yunnan Province in Southwest China. It was included in a programme for rescuing the most threatened species in China, focusing on “plant species with extremely small populations (PSESP)”. Findings We generated 64, 94, and 110 Gb of raw DNA sequences and obtained a chromosome-level genome assembly of A. yangbiense through a combination of Pacific Biosciences Single-molecule Real-time, Illumina HiSeq X, and Hi-C mapping, respectively. The final genome assembly is ∼666 Mb, with 13 chromosomes covering ∼97% of the genome and scaffold N50 sizes of 45 Mb. Further, BUSCO analysis recovered 95.5% complete BUSCO genes. The total number of repetitive elements account for 68.0% of the A. yangbiense genome. Genome annotation generated 28,320 protein-coding genes, assisted by a combination of prediction and transcriptome sequencing. In addition, a nearly 1:1 orthology ratio of dot plots of longer syntenic blocks revealed a similar evolutionary history between A. yangbiense and grape, indicating that the genome has not undergone a whole-genome duplication event after the core eudicot common hexaploidization. Conclusion Here, we report a high-quality de novo genome assembly of A. yangbiense, the first genome for the genus Acer and the family Aceraceae. This will provide fundamental conservation genomics resources, as well as representing a new high-quality reference genome for the economically important Acer lineage and the wider order of Sapindales.

2018 ◽  
Author(s):  
Sarah B. Kingan ◽  
Haynes Heaton ◽  
Juliana Cudini ◽  
Christine C. Lambert ◽  
Primo Baybayan ◽  
...  

AbstractA high-quality reference genome is a fundamental resource for functional genetics, comparative genomics, and population genomics, and is increasingly important for conservation biology. PacBio Single Molecule, Real-Time (SMRT) sequencing generates long reads with uniform coverage and high consensus accuracy, making it a powerful technology for de novo genome assembly. Improvements in throughput and concomitant reductions in cost have made PacBio an attractive core technology for many large genome initiatives, however, relatively high DNA input requirements (∼5 µg for standard library protocol) have placed PacBio out of reach for many projects on small organisms that have lower DNA content, or on projects with limited input DNA for other reasons. Here we present a high-quality de novo genome assembly from a single Anopheles coluzzii mosquito. A modified SMRTbell library construction protocol without DNA shearing and size selection was used to generate a SMRTbell library from just 100 ng of starting genomic DNA. The sample was run on the Sequel System with chemistry 3.0 and software v6.0, generating, on average, 25 Gb of sequence per SMRT Cell with 20 hour movies, followed by diploid de novo genome assembly with FALCON-Unzip. The resulting curated assembly had high contiguity (contig N50 3.5 Mb) and completeness (more than 98% of conserved genes are present and full-length). In addition, this single-insect assembly now places 667 (>90%) of formerly unplaced genes into their appropriate chromosomal contexts in the AgamP4 PEST reference. We were also able to resolve maternal and paternal haplotypes for over 1/3 of the genome. By sequencing and assembling material from a single diploid individual, only two haplotypes are present, simplifying the assembly process compared to samples from multiple pooled individuals. The method presented here can be applied to samples with starting DNA amounts as low as 100 ng per 1 Gb genome size. This new low-input approach puts PacBio-based assemblies in reach for small highly heterozygous organisms that comprise much of the diversity of life.


Genes ◽  
2019 ◽  
Vol 10 (1) ◽  
pp. 62 ◽  
Author(s):  
Sarah Kingan ◽  
Haynes Heaton ◽  
Juliana Cudini ◽  
Christine Lambert ◽  
Primo Baybayan ◽  
...  

A high-quality reference genome is a fundamental resource for functional genetics, comparative genomics, and population genomics, and is increasingly important for conservation biology. PacBio Single Molecule, Real-Time (SMRT) sequencing generates long reads with uniform coverage and high consensus accuracy, making it a powerful technology for de novo genome assembly. Improvements in throughput and concomitant reductions in cost have made PacBio an attractive core technology for many large genome initiatives, however, relatively high DNA input requirements (~5 µg for standard library protocol) have placed PacBio out of reach for many projects on small organisms that have lower DNA content, or on projects with limited input DNA for other reasons. Here we present a high-quality de novo genome assembly from a single Anopheles coluzzii mosquito. A modified SMRTbell library construction protocol without DNA shearing and size selection was used to generate a SMRTbell library from just 100 ng of starting genomic DNA. The sample was run on the Sequel System with chemistry 3.0 and software v6.0, generating, on average, 25 Gb of sequence per SMRT Cell with 20 h movies, followed by diploid de novo genome assembly with FALCON-Unzip. The resulting curated assembly had high contiguity (contig N50 3.5 Mb) and completeness (more than 98% of conserved genes were present and full-length). In addition, this single-insect assembly now places 667 (>90%) of formerly unplaced genes into their appropriate chromosomal contexts in the AgamP4 PEST reference. We were also able to resolve maternal and paternal haplotypes for over 1/3 of the genome. By sequencing and assembling material from a single diploid individual, only two haplotypes were present, simplifying the assembly process compared to samples from multiple pooled individuals. The method presented here can be applied to samples with starting DNA amounts as low as 100 ng per 1 Gb genome size. This new low-input approach puts PacBio-based assemblies in reach for small highly heterozygous organisms that comprise much of the diversity of life.


Gigabyte ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Weixue Mu ◽  
Jinpu Wei ◽  
Ting Yang ◽  
Yannan Fan ◽  
Le Cheng ◽  
...  

Nyssa yunnanensis is a deciduous tree species in the family Nyssaceae within the order Cornales. As only eight individual trees and two populations have been recorded in China’s Yunnan province, this species has been listed among China’s national Class I protection species since 1999 and also among 120 PSESP (Plant Species with Extremely Small Populations) in the Implementation Plan of Rescuing and Conserving China’s Plant Species with Extremely Small Populations (PSESP) (2011-2-15). Here, we present the draft genome assembly of N. yunnanensis. Using 10X Genomics linked-reads sequencing data, we carried out the de novo assembly and annotation analysis. The N. yunnanensis genome assembly is 1475 Mb in length, containing 288,519 scaffolds with a scaffold N50 length of 985.59 kb. Within the assembled genome, 799.51 Mb was identified as repetitive elements, accounting for 54.24% of the sequenced genome, and a total of 39,803 protein-coding genes were predicted. With the genomic characteristics of N. yunnanensis available, our study might facilitate future conservation biology studies to help protect this extremely threatened tree species.


Author(s):  
Weixue Mu ◽  
Jinpu Wei ◽  
Ting Yang ◽  
Yannan Fan ◽  
Le Cheng ◽  
...  

Nyssa yunnanensis is a deciduous tree species in the family Nyssaceae within the order Cornales. As only eight individual trees and two populations have been recorded in China’s Yunnan province, this species has been listed among China’s national Class I protection species since 1999 and also among 120 PSESP (Plant Species with Extremely Small Populations) in the Implementation Plan of Rescuing and Conserving China’s Plant Species with Extremely Small Populations(PSESP) (2011-2-15). Here, we present the draft genome assembly of N. yunnanensis. Using 10X Genomics linked-reads sequencing data, we carried out the de novo assembly and annotation analysis. The N. yunnanensis genome assembly is 1475 Mb in length, containing 288,519 scaffolds with a scaffold N50 length of 985.59 kb. Within the assembled genome, 799.51 Mb was identified as repetitive elements, accounting for 54.24% of the sequenced genome, and a total of 39,803 protein-coding genes were predicted. With the genomic characteristics of N. yunnanensis available, our study might facilitate future conservation biology studies to help protect this extremely threatened tree species.


2019 ◽  
Author(s):  
Chen-Shan Chin ◽  
Asif Khalak

AbstractDe novo genome assembly provides comprehensive, unbiased genomic information and makes it possible to gain insight into new DNA sequences not present in reference genomes. Many de novo human genomes have been published in the last few years, leveraging a combination of inexpensive short-read and single-molecule long-read technologies. As long-read DNA sequencers become more prevalent, the computational burden of generating assemblies persists as a critical factor. The most common approach to long-read assembly, using an overlap-layout-consensus (OLC) paradigm, requires all-to-all read comparisons, which quadratically scales in computational complexity with the number of reads. We assert that recently achievements in sequencing technology (i.e. with accuracy ~99% and read length ~10-15k) enables a fundamentally better strategy for OLC that is effectively linear rather than quadratic. Our genome assembly implementation, Peregrine uses sparse hierarchical minimizers (SHIMMER) to index reads thereby avoiding the need for an all-to-all read comparison step. Peregrine can assemble 30x human PacBio CCS read datasets in less than 30 CPU hours and around 100 wall-clock minutes to a high contiguity assembly (N50 > 20Mb). The continued advance of sequencing technologies coupled with the Peregrine assembler enables routine generation of human de novo assemblies. This will allow for population scale measurements of more comprehensive genomic variations -- beyond SNPs and small indels -- as well as novel applications requiring rapid access to de novo assemblies.


Author(s):  
Hailin Liu ◽  
Shigang Wu ◽  
Alun Li ◽  
Jue Ruan

Long-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. It also has been widely used to study structural variants, phase haplotypes and more. Here, we introduce the assembler— SMARTdenovo, which is an SMS assembler that follows the overlap-layout-consensus (OLC) paradigm. SMARTdenovo (RRID: SCR_017622) was designed to be a fast assembler that did not require highly accurate raw reads for error correction, unlike other, contemporaneous SMS assemblers. It has performed well for evaluating congeneric assemblers and has been successful for a variety of assembly projects. It is compatible with Canu for assembling high-quality genomes, and several of the assembly strategies in this program have been incorporated into subsequent popular assemblers. The assembler has been in use since 2015, and here we provide information on the development of SMARTdenovo and how to implement its algorithms into current projects.


2020 ◽  
Author(s):  
Shangang Jia ◽  
Guoliang Wang ◽  
Guiming Liu ◽  
Jiangyong Qu ◽  
Beilun Zhao ◽  
...  

ABSTRACTThe red algae Kappaphycus alvarezii is the most important aquaculture species in Kappaphycus, widely distributed in tropical waters, and it has become the main crop of carrageenan production at present. The mechanisms of adaptation for high temperature, high salinity environments and carbohydrate metabolism may provide an important inspiration for marine algae study. Scientific background knowledge such as genomic data will be also essential to improve disease resistance and production traits of K. alvarezii. 43.28 Gb short paired-end reads and 18.52 Gb single-molecule long reads of K. alvarezii were generated by Illumina HiSeq platform and Pacbio RSII platform respectively. The de novo genome assembly was performed using Falcon_unzip and Canu software, and then improved with Pilon. The final assembled genome (336 Mb) consists of 888 scaffolds with a contig N50 of 849 Kb. Further annotation analyses predicted 21,422 protein-coding genes, with 61.28% functionally annotated. Here we report the draft genome and annotations of K. alvarezii, which are valuable resources for future genomic and genetic studies in Kappaphycus and other algae.


Author(s):  
Weixue Mu ◽  
Jinpu Wei ◽  
Ting Yang ◽  
Yannan Fan ◽  
Le Cheng ◽  
...  

Nyssa yunnanensis is a deciduous tree in family Nayssaceae within the order Cornales. As only 8 individuals in 2 sites recorded in Yunnan province of China, the species was listed as the China’s national grade-I protection species in 1999, and also as one of 120 PSESP(Plant Species with Extremely Small Populations) in Implementation Plan of Rescuing and Conserving China’s Plant Species with extremely Small Populations(PSESP) (2011-2-15). N. yunnanensis was also been evaluated as Critically Endangered in IUCN red list and Threatened Species List of China's Higher Plants. Hence understanding the genomic characteristics of this highly endangered Tertiary relict tree species is essential, especially for developing conservation strategies. Here we sequenced and annotated the genome of N. yunnanensis using 10X genomics linked-reads sequencing data. The de novo assembled genome is 1474Mb in length with a scaffold N50 length of 985.59kb. We identified 823.51Mb of non-redundant sequence as repetitive elements and annotated 39,803 protein-coding genes in the assembly. Our result provided the genomic characteristics of N. yunnanensis, which will provide valuable resources for future genomic and evolutionary studies, especially for conservation biology studies of this extremely threatened tree species.


BMC Genomics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Pingping Liang ◽  
Hafiz Sohaib Ahmed Saqib ◽  
Xiaomin Ni ◽  
Yingjia Shen

Abstract Background Marine medaka (Oryzias melastigma) is considered as an important ecotoxicological indicator to study the biochemical, physiological and molecular responses of marine organisms towards increasing amount of pollutants in marine and estuarine waters. Results In this study, we reported a high-quality and accurate de novo genome assembly of marine medaka through the integration of single-molecule sequencing, Illumina paired-end sequencing, and 10X Genomics linked-reads. The 844.17 Mb assembly is estimated to cover more than 98% of the genome and is more continuous with fewer gaps and errors than the previous genome assembly. Comparison of O. melastigma with closely related species showed significant expansion of gene families associated with DNA repair and ATP-binding cassette (ABC) transporter pathways. We identified 274 genes that appear to be under significant positive selection and are involved in DNA repair, cellular transportation processes, conservation and stability of the genome. The positive selection of genes and the considerable expansion in gene numbers, especially related to stimulus responses provide strong supports for adaptations of O. melastigma under varying environmental stresses. Conclusions The highly contiguous marine medaka genome and comparative genomic analyses will increase our understanding of the underlying mechanisms related to its extraordinary adaptation capability, leading towards acceleration in the ongoing and future investigations in marine ecotoxicology.


Sign in / Sign up

Export Citation Format

Share Document