scholarly journals The tea plant reference genome and improved gene annotation using long-read and paired-end sequencing data

2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Enhua Xia ◽  
Fangdong Li ◽  
Wei Tong ◽  
Hua Yang ◽  
Songbo Wang ◽  
...  
2020 ◽  
Vol 10 (8) ◽  
pp. 2801-2809 ◽  
Author(s):  
Tingting Zhao ◽  
Zhongqu Duan ◽  
Georgi Z. Genchev ◽  
Hui Lu

Despite continuous updates of the human reference genome, there are still hundreds of unresolved gaps which account for about 5% of the total sequence length. Given the availability of whole genome de novo assemblies, especially those derived from long-read sequencing data, gap-closing sequences can be determined. By comparing 17 de novo long-read sequencing assemblies with the human reference genome, we identified a total of 1,125 gap-closing sequences for 132 (16.9% of 783) gaps and added up to 2.2 Mb novel sequences to the human reference genome. More than 90% of the non-redundant sequences could be verified by unmapped reads from the Simons Genome Diversity Project dataset. In addition, 15.6% of the non-reference sequences were found in at least one of four non-human primate genomes. We further demonstrated that the non-redundant sequences had high content of simple repeats and satellite sequences. Moreover, 43 (32.6%) of the 132 closed gaps were shown to be polymorphic; such sequences may play an important biological role and can be useful in the investigation of human genetic diversity.


2021 ◽  
Author(s):  
Milyausha Kaskinova ◽  
Bayazit Yunusbayev ◽  
Radick Altinbaev ◽  
Rika Raffiudin ◽  
Madeline H. Carpenter ◽  
...  

ABSTRACTApis mellifera L., the western honey bee is a major crop pollinator that plays a key role in beekeeping and serves as an important model organism in social behavior studies. Recent efforts have improved on the quality of the honey bee reference genome and developed a chromosome-level assembly of sixteen chromosomes, two of which are gapless. However, the rest suffer from 51 gaps, 160 unplaced/unlocalized scaffolds, and the lack of 2 distal telomeres. The gaps are located at the hard-to-assemble extended highly repetitive chromosomal regions that may contain functional genomic elements. Here, we use de-novo re-assemblies from the most recent reference genome Amel_HAv_3.1 raw reads and other long-read-based assemblies (INRA_AMelMel_1.0, ASM1384120v1, and ASM1384124v1) of the honey bee genome to resolve 13 gaps, five unplaced/unlocalized scaffolds and, the lacking telomeres of the Amel_HAv_3.1. The total length of the resolved gaps is 848,747 bp. The accuracy of the corrected assembly was validated by mapping PacBio reads and performing gene annotation assessment. Comparative analysis suggests that the PacBio-reads-based assemblies of the honey bee genomes failed in the same highly repetitive extended regions of the chromosomes, especially on chromosome 10. To fully resolve these extended repetitive regions, further work using ultra-long Nanopore sequencing would be needed. Our updated assembly facilitates more accurate reference-guided scaffolding and marker/sequence mapping in honey bee genomics studies.


Genes ◽  
2021 ◽  
Vol 12 (6) ◽  
pp. 847
Author(s):  
Vidhya Jagannathan ◽  
Christophe Hitte ◽  
Jeffrey M. Kidd ◽  
Patrick Masterson ◽  
Terence D. Murphy ◽  
...  

The domestic dog has evolved to be an important biomedical model for studies regarding the genetic basis of disease, morphology and behavior. Genetic studies in the dog have relied on a draft reference genome of a purebred female boxer dog named “Tasha” initially published in 2005. Derived from a Sanger whole genome shotgun sequencing approach coupled with limited clone-based sequencing, the initial assembly and subsequent updates have served as the predominant resource for canine genetics for 15 years. While the initial assembly produced a good-quality draft, as with all assemblies produced at the time, it contained gaps, assembly errors and missing sequences, particularly in GC-rich regions, which are found at many promoters and in the first exons of protein-coding genes. Here, we present Dog10K_Boxer_Tasha_1.0, an improved chromosome-level highly contiguous genome assembly of Tasha created with long-read technologies that increases sequence contiguity >100-fold, closes >23,000 gaps of the CanFam3.1 reference assembly and improves gene annotation by identifying >1200 new protein-coding transcripts. The assembly and annotation are available at NCBI under the accession GCF_000002285.5.


2020 ◽  
Author(s):  
Bo Wang ◽  
Houlin Yu ◽  
Yanyan Jia ◽  
Quanbin Dong ◽  
Christian Steinberg ◽  
...  

AbstractHere, we report a chromosome-level genome assembly of Fusarium oxysporum strain Fo47 (12 pseudomolecules; contig N50: 4.52Mb), generated using a combination of PacBio long-read, Illumina pair-ended and Hi-C sequencing data. Although F. oxysporum causes vascular wilt to over 100 plant species, the strain Fo47 is classified as an endophyte and widely used as a biocontrol agent for plant disease control. The Fo47 genome carries a single accessory chromosome of 4.23 Mb, compared to the reference genome of F. oxysporum f.sp. lycopersici strain Fol4287. The high-quality assembly and annotation of the Fo47 genome will be a valuable resource for studying the mechanisms underlying the endophytic interactions between F. oxysporum and plants, as well as deciphering the genome evolution of the F. oxysporum species complex.


2021 ◽  
Vol 7 (3) ◽  
Author(s):  
David R. Greig ◽  
Claire Jenkins ◽  
Saheer E. Gharbia ◽  
Timothy J. Dallman

Compared to short-read sequencing data, long-read sequencing facilitates single contiguous de novo assemblies and characterization of the prophage region of the genome. Here, we describe our methodological approach to using Oxford Nanopore Technology (ONT) sequencing data to quantify genetic relatedness and to look for microevolutionary events in the core and accessory genomes to assess the within-outbreak variation of four genetically and epidemiologically linked isolates. Analysis of both Illumina and ONT sequencing data detected one SNP between the four sequences of the outbreak isolates. The variant calling procedure highlighted the importance of masking homologous sequences in the reference genome regardless of the sequencing technology used. Variant calling also highlighted the systemic errors in ONT base-calling and ambiguous mapping of Illumina reads that results in variations in the genetic distance when comparing one technology to the other. The prophage component of the outbreak strain was analysed, and nine of the 16 prophages showed some similarity to the prophage in the Sakai reference genome, including the stx2a-encoding phage. Prophage comparison between the outbreak isolates identified minor genome rearrangements in one of the isolates, including an inversion and a deletion event. The ability to characterize the accessory genome in this way is the first step to understanding the significance of these microevolutionary events and their impact on the evolutionary history, virulence and potentially the likely source and transmission of this zoonotic, foodborne pathogen.


2020 ◽  
Author(s):  
Yunpeng Gai ◽  
Tao Xiong ◽  
Xiaoe Xiao ◽  
Pudong Li ◽  
Yating Zeng ◽  
...  

Melanose disease is one the most widely distributed and economically important fungal diseases of citrus worldwide. The causative agent is the filamentous fungus Diaporthe citri Wolf (syn. Phomopsis citri H.S. Fawc.). Here, we report the genome assemblies of three strains of D. citri, namely strains ZJUD2, ZJUD14 and Q7, which were generated using a combination of PacBio Sequel long-read and Illumina paired-end sequencing data. The assembled genomes of D. citri ranged from 52.06 Mb to 63.61 Mb in genome size, containing 15,977 ~ 16,622 protein-coding genes. We also sequenced and annotated the genome sequences of two Citrus-related Diaporthe species, D. citriasiana and D. citrichinensis. In addition, a database for citrus-related Diaporthe genomes was established to provide a public platform to access genome sequences, genome annotation and comparative genomics data of these Diaporthe species. The described genome sequences and the citrus-related Diaporthe genomes database provide a useful resource for the study of fungal biology, pathogen-host interaction, molecular diagnostic marker development, and population genomic analyses of Diaporthe species. The database will be updated regularly when the genomes of newly isolated Diaporthe species are sequenced. The citrus-related Diaporthe genomes database is freely available for non-profit use at http://www.zjudata.com/blast/diaporthe.php.


2020 ◽  
Vol 33 (9) ◽  
pp. 1108-1111 ◽  
Author(s):  
Bo Wang ◽  
Houlin Yu ◽  
Yanyan Jia ◽  
Quanbin Dong ◽  
Christian Steinberg ◽  
...  

Here, we report a chromosome-level genome assembly of Fusarium oxysporum Fo47 (12 pseudomolecules; contig N50: 4.52 Mb), generated using a combination of PacBio long-read, Illumina paired end, and high-throughput chromosome conformation capture sequencing data. Although F. oxysporum causes vascular wilt to over 100 plant species, the strain Fo47 is classified as an endophyte and is widely used as a biocontrol agent for plant disease control. The Fo47 genome carries a single accessory chromosome of 4.23 Mb, compared with the reference genome of F. oxysporum f. sp. lycopersici Fol4287. The high-quality assembly and annotation of the Fo47 genome will be a valuable resource for studying the mechanisms underlying the endophytic interactions between F. oxysporum and plants as well as for deciphering the genome evolution of the F. oxysporum species complex.


2020 ◽  
Author(s):  
Mohamed Awad ◽  
Xiangchao Gan

AbstractHigh-quality genome assembly has wide applications in genetics and medical studies. However, it is still very challenging to achieve gap-free chromosome-scale assemblies using current workflows for long-read platforms. Here we propose GALA (Gap-free long-read assembler), a chromosome-by-chromosome assembly method implemented through a multi-layer computer graph that identifies mis-assemblies within preliminary assemblies or chimeric raw reads and partitions the data into chromosome-scale linkage groups. The subsequent independent assembly of each linkage group generates a gap-free assembly free from the mis-assembly errors which usually hamper existing workflows. This flexible framework also allows us to integrate data from various technologies, such as Hi-C, genetic maps, a reference genome and even motif analyses, to generate gap-free chromosome-scale assemblies. We de novo assembled the C. elegans and A. thaliana genomes using combined Pacbio and Nanopore sequencing data from publicly available datasets. We also demonstrated the new method’s applicability with a gap-free assembly of a human genome with the help a reference genome. In addition, GALA showed promising performance for Pacbio high-fidelity long reads. Thus, our method enables straightforward assembly of genomes with multiple data sources and overcomes barriers that at present restrict the application of de novo genome assembly technology.


2019 ◽  
Author(s):  
Thu-Phuong Nguyen ◽  
Cornelia Mühlich ◽  
Setareh Mohammadin ◽  
Erik van den Bergh ◽  
Adrian E. Platts ◽  
...  

AbstractBackgroundThe genus Aethionema is a sister-group to the core-group of the Brassicaceae family that includes Arabidopsis thaliana and the Brassica crops. Thus, Aethionema is phylogenetically well-placed for the investigation and understanding of genome and trait evolution across the family. We aimed to improve the quality of the reference genome draft version of the annual species Aethionema arabicum. Secondly, we constructed the first Ae. arabicum genetic map. The improved reference genome and genetic map enabled the development of each other.ResultsWe started with the initially published genome (version 2.5). PacBio and MinION sequencing together with genetic map v2.5 were incorporated to produce the new reference genome v3.0. The improved genome contains 203 MB of sequence, with approximately 94% of the assembly made up of called bases, assembled into 2,883 scaffolds. The N50 (10.3 MB) represents an 80-fold over the initial genome release. We generated a Recombinant Inbred Line (RIL) population that was derived from two ecotypes: Cyprus and Turkey (the reference genotype. Using a Genotyping by Sequencing (GBS) approach, we generated a high-density genetic map with 749 (v2.5) and then 632 SNPs (v3.0) was generated. The genetic map and reference genome were integrated, thus greatly improving the scaffolding of the reference genome into 11 linkage groups.ConclusionsWe show that long-read sequencing data and genetics are complementary, resulting in an improved genome assembly in Ae. arabicum. They will facilitate comparative genetic mapping work for the Brassicaceae family and are also valuable resources to investigate wide range of life history traits in Aethionema.


2016 ◽  
Vol 3 (1) ◽  
Author(s):  
Jianwei Zhang ◽  
Ling-Ling Chen ◽  
Shuai Sun ◽  
Dave Kudrna ◽  
Dario Copetti ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document