Population sequencing reveals clonal diversity and ancestral inbreeding in the grapevine cultivar Chardonnay

AbstractChardonnay is the basis of some of the world’s most iconic wines and its success is underpinned by a historic program of clonal selection. There are numerous clones of Chardonnay available that exhibit differences in key viticultural and oenological traits that have arisen from the accumulation of somatic mutations during centuries of asexual propagation. However, the genetic variation that underlies these differences remains largely unknown. To address this knowledge gap, a high-quality, diploid-phased Chardonnay genome assembly was produced from single-molecule real time sequencing, and combined with re-sequencing data from 15 different commercial Chardonnay clones. There were 1620 markers identified that distinguish the 15 Chardonnay clones. These markers were reliably used for clonal identification of validation genomic material, as well as in identifying a potential genetic basis for some clonal phenotypic differences. The predicted parentage of the Chardonnay haplomes was elucidated by mapping sequence data from the predicted parents of Chardonnay (Gouais blanc and Pinot noir) against the Chardonnay reference genome. This enabled the detection of instances of heterosis, with differentially-expanded gene families being inherited from the parents of Chardonnay. Most surprisingly however, the patterns of nucleotide variation present in the Chardonnay genome indicate that Pinot noir and Gouais blanc share an extremely high degree of kinship that has resulted in the Chardonnay genome displaying characteristics that are indicative of inbreeding.Author SummaryPhenotypic variation within a grapevine cultivar arises from an accumulation of mutations from serial vegetative propagation. Old cultivars such as Chardonnay have been propagated for centuries resulting in hundreds of available ‘clones’ containing unique genetic mutations and a range of various phenotypic peculiarities. The genetic mutations can be leveraged as genetic markers and are useful in identifying specific clones for authenticity testing, or as breeding markers for new clonal selections where particular mutations are known to confer a phenotypic trait. We produced a high-quality genome assembly for Chardonnay, and using re-sequencing data for 15 popular clones, were able to identify a large selection of markers that are unique to at least one clone. We identified mutations that may confer phenotypic effects, and were able to identify clones from material independently sourced from nurseries and vineyards. The marker detection framework we describe for authenticity testing would be applicable to other grapevine cultivars or even other agriculturally important woody-plant crops that are vegetatively propagated such as fruit orchards. Finally, we show that the Chardonnay genome contains extensive evidence for parental inbreeding, such that its parents, Gouais blanc and Pinot noir, may even represent first-degree relatives.

Download Full-text

First draft genome of loach (Orenectus shuilongensis; Cypriniformes: Nemacheilidae) provide insights into the evolution of cavefish

10.21203/rs.3.rs-192229/v1 ◽

2021 ◽

Author(s):

Zhijin Liu ◽

Xuekun Qian ◽

Ziming Wang ◽

Huamei Wen ◽

Ling Han ◽

...

Keyword(s):

Single Molecule ◽

Genome Assembly ◽

Eye Development ◽

Draft Genome ◽

Evolutionary Process ◽

Integrated Approach ◽

Sequencing Data ◽

Retina Development ◽

Draft Genome Assembly ◽

Surface Dwelling

Abstract BcakgroundLoaches of the superfamily Cobitoidea (Cypriniformes, Nemacheilidae) are small elongated bottom-dwelling freshwater fishes with several barbels near the mouth. The genus Oreonectes with 18 currently recognized species contains representatives for all three key stages of the evolutionary process (a surface-dwelling lifestyle, facultative cave persistence, and permanent cave dwelling). Some Oreonectes species show typical cave dwelling-related traits, such as partial or complete leucism and regression of the eyes, rendering them as suitable study objects of micro-evolution. Genome information of Oreonectes species is therefore an indispensable resource for research into the evolution of cavefishes.ResultsHere we assembled the genome sequence of O. shuilongensis, a surface-dwelling species, using an integrated approach that combined PacBio single-molecule real-time sequencing and Illumina X-ten paired-end sequencing. Based on in total 50.9 Gb of sequencing data, our genome assembly from Canu and Pilon spans approximately 515.64 Mb (estimated coverage of 100 ×), containing 803 contigs with N50 values of 5.58 Mb. 25,247 protein-coding genes were predicted, of which 95.65% have been functionally annotated. We also performed genome re-sequencing of three additional cave-dwelling Oreonectes fishes. Twenty-nine pseudogenes annotated using DAVID showed significant enrichment for the GO terms of “eye development” and “retina development in camera-type eye”. It is presumed that these pseudogenes might lead to eye degeneration of semi/complete cave-dwelling Oreonectes species. Furthermore, Mc1r (melanocortin-1 receptor) is a pseudogenization by a deletion in O. daqikongensis, likely blocking biosynthesis of melanin and leading to the albino phenotype.ConclusionsWe here report the first draft genome assembly of Oreonectes fishes, which is also the first genome reference for Cobitidea fishes. Pseudogenization of genes related to body color and eye development may be responsible for loss of pigmentation and vision deterioration in cave-dwelling species. This genome assembly will contribute to the study of the evolution and adaptation of fishes within Oreonectes and beyond (Cobitidea).

Download Full-text

Assembly of chromosome-scale contigs by efficiently resolving repetitive sequences with long reads

10.1101/345983 ◽

2018 ◽

Cited By ~ 2

Author(s):

Huilong Du ◽

Chengzhi Liang

Keyword(s):

Single Molecule ◽

High Efficiency ◽

Reference Genome ◽

Repetitive Sequences ◽

Sequencing Data ◽

High Quality ◽

Single Molecule Sequencing ◽

Genome Maps ◽

Long Reads ◽

Novel Method

AbstractDue to the large number of repetitive sequences in complex eukaryotic genomes, fragmented and incompletely assembled genomes lose value as reference sequences, often due to short contigs that cannot be anchored or mispositioned onto chromosomes. Here we report a novel method Highly Efficient Repeat Assembly (HERA), which includes a new concept called a connection graph as well as algorithms for constructing the graph. HERA resolves repeats at high efficiency with single-molecule sequencing data, and enables the assembly of chromosome-scale contigs by further integrating genome maps and Hi-C data. We tested HERA with the genomes of rice R498, maize B73, human HX1 and Tartary buckwheat Pinku1. HERA can correctly assemble most of the tandemly repetitive sequences in rice using single-molecule sequencing data only. Using the same maize and human sequencing data published by Jiao et al. (2017) and Shi et al. (2016), respectively, we dramatically improved on the sequence contiguity compared with the published assemblies, increasing the contig N50 from 1.3 Mb to 61.2 Mb in maize B73 assembly and from 8.3 Mb to 54.4 Mb in human HX1 assembly with HERA. We provided a high-quality maize reference genome with 96.9% of the gaps filled (only 76 gaps left) and several incorrectly positioned sequences fixed compared with the B73 RefGen_v4 assembly. Comparisons between the HERA assembly of HX1 and the human GRCh38 reference genome showed that many gaps in GRCh38 could be filled, and that GRCh38 contained some potential errors that could be fixed. We assembled the Pinku1 genome into 12 scaffolds with a contig N50 size of 27.85 Mb. HERA serves as a new genome assembly/phasing method to generate high quality sequences for complex genomes and as a curation tool to improve the contiguity and completeness of existing reference genomes, including the correction of assembly errors in repetitive regions.

Download Full-text

ESCA pipeline: Easy-to-use SARS-CoV-2 genome Assembler

10.1101/2021.05.21.445156 ◽

2021 ◽

Author(s):

Martina Rueca ◽

Emanuela Giombini ◽

Francesco Messina ◽

Barbara Bartolini ◽

Antonino Di Caro ◽

...

Keyword(s):

Amino Acid ◽

Genome Assembly ◽

Global Level ◽

Sequencing Data ◽

High Quality ◽

Rapid Succession ◽

Novel Variants ◽

Low Coverage ◽

High Quality Genome ◽

Genome Assembler

Early sequencing and quick analysis of SARS-CoV-2 genome are contributing to un-derstand the dynamics of COVID19 epidemics and to countermeasures design at global level. Amplicon-based NGS methods are widely used to sequence the SARS-CoV-2 genome and to identify novel variants that are emerging in rapid succession, harboring multiple deletions and amino acid changing mutations. To facilitate the analysis of NGS sequencing data obtained from amplicon-based sequencing methods, here we propose an easy-to-use SARS-CoV-2 genome Assembler: the ESCA pipeline. Results showed that ESCA can perform high quality genome assembly from IonTor-rent and Illumina raw data, and help the user in easily correct low-coverage regions. Moreover, ESCA includes the possibility to compare assembled genomes of multi sample runs through an easy table format.

Download Full-text

Single-Molecule Real-Time Transcript Sequencing of Turnips Unveiling the Complexity of the Turnip Transcriptome

G3 Genes|Genome|Genetics ◽

10.1534/g3.120.401434 ◽

2020 ◽

Vol 10 (10) ◽

pp. 3505-3514

Author(s):

Hongmei Zhuang ◽

Qiang Wang ◽

Hongwei Han ◽

Huifang Liu ◽

Hao Wang

Keyword(s):

Real Time ◽

Brassica Rapa ◽

Single Molecule ◽

Developmental Stages ◽

Full Length ◽

Sequencing Data ◽

Smrt Sequencing ◽

High Quality ◽

Transcript Structure ◽

Novel Transcripts

To generate the full-length transcriptome of Xinjiang green and purple turnips, Brassica rapa var. Rapa, using single-molecule real-time (SMRT) sequencing. The samples of two varieties of Brassica rapa var. Rapa at five developmental stages were collected and combined to perform SMRT sequencing. Meanwhile, next generation sequencing was performed to correct SMRT sequencing data. A series of analyses were performed to investigate the transcript structure. Finally, the obtained transcripts were mapped to the genome of Brassica rapa ssp. pekinesis Chiifu to identify potential novel transcripts. For green turnip (F01), a total of 19.54 Gb clean data were obtained from 8 cells. The number of reads of insert (ROI) and full-length non-chimeric (FLNC) reads were 510,137 and 267,666. In addition, 82,640 consensus isoforms were obtained in the isoform sequences clustering, of which 69,480 were high-quality, and 13,160 low-quality sequences were corrected using Illumina RNA seq data. For purple turnip (F02), there were 20.41 Gb clean data, 552,829 ROIs, and 274,915 FLNC sequences. A total of 93,775 consensus isoforms were obtained, of which 78,798 were high-quality, and the 14,977 low-quality sequences were corrected. Following the removal of redundant sequences, there were 46,516 and 49,429 non-redundant transcripts for F01 and F02, respectively; 7,774 and 9,385 alternative splicing events were predicted for F01 and F02; 63,890 simple sequence repeats, 59,460 complete coding sequences, and 535 long-non coding RNAs were predicted. Moreover, 5,194 and 5,369 novel transcripts were identified by mapping to Brassica rapa ssp. pekinesis Chiifu. The obtained transcriptome data may improve turnip genome annotation and facilitate further study of the Brassica rapa var. Rapa genome and transcriptome.

Download Full-text

Hybrid de novo genome assembly of Chinese chestnut (Castanea mollissima)

GigaScience ◽

10.1093/gigascience/giz112 ◽

2019 ◽

Vol 8 (9) ◽

Cited By ~ 11

Author(s):

Yu Xing ◽

Yang Liu ◽

Qing Zhang ◽

Xinghua Nie ◽

Yamin Sun ◽

...

Keyword(s):

Single Molecule ◽

Genome Assembly ◽

Genetic Improvement ◽

De Novo ◽

Draft Genome ◽

Whole Genome Sequence ◽

Whole Genome ◽

High Quality ◽

Chinese Chestnut ◽

Castanea Mollissima

AbstractBackgroundThe Chinese chestnut (Castanea mollissima) is widely cultivated in China for nut production. This plant also plays an important ecological role in afforestation and ecosystem services. To facilitate and expand the use of C. mollissima for breeding and its genetic improvement, we report here the whole-genome sequence of C. mollissima.FindingsWe produced a high-quality assembly of the C. mollissima genome using Pacific Biosciences single-molecule sequencing. The final draft genome is ∼785.53 Mb long, with a contig N50 size of 944 kb, and we further annotated 36,479 protein-coding genes in the genome. Phylogenetic analysis showed that C. mollissima diverged from Quercus robur, a member of the Fagaceae family, ∼13.62 million years ago.ConclusionsThe high-quality whole-genome assembly of C. mollissima will be a valuable resource for further genetic improvement and breeding for disease resistance and nut quality.

Download Full-text

A high-quality genome assembly for the endangered golden snub-nosed monkey (Rhinopithecus roxellana)

GigaScience ◽

10.1093/gigascience/giz098 ◽

2019 ◽

Vol 8 (8) ◽

Cited By ~ 5

Author(s):

Lu Wang ◽

Jinwei Wu ◽

Xiaomei Liu ◽

Dandan Di ◽

Yuhong Liang ◽

...

Keyword(s):

Single Molecule ◽

Genome Assembly ◽

Gene Families ◽

Rhinopithecus Roxellana ◽

High Quality ◽

Chromosome Conformation ◽

Protein Coding ◽

A Genome ◽

Close Relationship ◽

High Quality Genome

Abstract Background The golden snub-nosed monkey (Rhinopithecus roxellana) is an endangered colobine species endemic to China, which has several distinct traits including a unique social structure. Although a genome assembly for R. roxellana is available, it is incomplete and fragmented because it was constructed using short-read sequencing technology. Thus, important information such as genome structural variation and repeat sequences may be absent. Findings To obtain a high-quality chromosomal assembly for R. roxellana qinlingensis, we used 5 methods: Pacific Bioscience single-molecule real-time sequencing, Illumina paired-end sequencing, BioNano optical maps, 10X Genomics link-reads, and high-throughput chromosome conformation capture. The assembled genome was ∼3.04 Gb, with a contig N50 of 5.72 Mb and a scaffold N50 of 144.56 Mb. This represented a 100-fold improvement over the previously published genome. In the new genome, 22,497 protein-coding genes were predicted, of which 22,053 were functionally annotated. Gene family analysis showed that 993 and 2,745 gene families were expanded and contracted, respectively. The reconstructed phylogeny recovered a close relationship between R. rollexana and Macaca mulatta, and these 2 species diverged ∼13.4 million years ago. Conclusion We constructed a high-quality genome assembly of the Qinling golden snub-nosed monkey; it had superior continuity and accuracy, which might be useful for future genetic studies in this species and as a new standard reference genome for colobine primates. In addition, the updated genome assembly might improve our understanding of this species and could assist conservation efforts.

Download Full-text

Efficient long single molecule sequencing for cost effective and accurate sequencing, haplotyping, and de novo assembly

10.1101/324392 ◽

2018 ◽

Author(s):

Ou Wang ◽

Robert Chin ◽

Xiaofang Cheng ◽

Michelle Ka Wu ◽

Qing Mao ◽

...

Keyword(s):

Single Molecule ◽

Genome Assembly ◽

De Novo ◽

Low Cost ◽

Variant Calling ◽

Cost Effective ◽

High Quality ◽

Single Molecule Sequencing ◽

Single Tube ◽

Complex Structural

Obtaining accurate sequences from long DNA molecules is very important for genome assembly and other applications. Here we describe single tube long fragment read (stLFR), a technology that enables this a low cost. It is based on adding the same barcode sequence to sub-fragments of the original long DNA molecule (DNA co-barcoding). To achieve this efficiently, stLFR uses the surface of microbeads to create millions of miniaturized barcoding reactions in a single tube. Using a combinatorial process up to 3.6 billion unique barcode sequences were generated on beads, enabling practically non-redundant co-barcoding with 50 million barcodes per sample. Using stLFR, we demonstrate efficient unique co-barcoding of over 8 million 20-300 kb genomic DNA fragments. Analysis of the genome of the human genome NA12878 with stLFR demonstrated high quality variant calling and phasing into contigs up to N50 34 Mb. We also demonstrate detection of complex structural variants and complete diploid de novo assembly of NA12878. These analyses were all performed using single stLFR libraries and their construction did not significantly add to the time or cost of whole genome sequencing (WGS) library preparation. stLFR represents an easily automatable solution that enables high quality sequencing, phasing, SV detection, scaffolding, cost-effective diploid de novo genome assembly, and other long DNA sequencing applications.

Download Full-text

A High-Quality De Novo Genome Assembly from a Single Mosquito using PacBio Sequencing

10.1101/499954 ◽

2018 ◽

Author(s):

Sarah B. Kingan ◽

Haynes Heaton ◽

Juliana Cudini ◽

Christine C. Lambert ◽

Primo Baybayan ◽

...

Keyword(s):

Single Molecule ◽

Genome Assembly ◽

Population Genomics ◽

De Novo ◽

Anopheles Coluzzii ◽

High Quality ◽

De Novo Genome Assembly ◽

Core Technology ◽

Conserved Genes ◽

Diploid Individual

AbstractA high-quality reference genome is a fundamental resource for functional genetics, comparative genomics, and population genomics, and is increasingly important for conservation biology. PacBio Single Molecule, Real-Time (SMRT) sequencing generates long reads with uniform coverage and high consensus accuracy, making it a powerful technology for de novo genome assembly. Improvements in throughput and concomitant reductions in cost have made PacBio an attractive core technology for many large genome initiatives, however, relatively high DNA input requirements (∼5 µg for standard library protocol) have placed PacBio out of reach for many projects on small organisms that have lower DNA content, or on projects with limited input DNA for other reasons. Here we present a high-quality de novo genome assembly from a single Anopheles coluzzii mosquito. A modified SMRTbell library construction protocol without DNA shearing and size selection was used to generate a SMRTbell library from just 100 ng of starting genomic DNA. The sample was run on the Sequel System with chemistry 3.0 and software v6.0, generating, on average, 25 Gb of sequence per SMRT Cell with 20 hour movies, followed by diploid de novo genome assembly with FALCON-Unzip. The resulting curated assembly had high contiguity (contig N50 3.5 Mb) and completeness (more than 98% of conserved genes are present and full-length). In addition, this single-insect assembly now places 667 (>90%) of formerly unplaced genes into their appropriate chromosomal contexts in the AgamP4 PEST reference. We were also able to resolve maternal and paternal haplotypes for over 1/3 of the genome. By sequencing and assembling material from a single diploid individual, only two haplotypes are present, simplifying the assembly process compared to samples from multiple pooled individuals. The method presented here can be applied to samples with starting DNA amounts as low as 100 ng per 1 Gb genome size. This new low-input approach puts PacBio-based assemblies in reach for small highly heterozygous organisms that comprise much of the diversity of life.

Download Full-text

The chromosomal-level genome assembly and comprehensive transcriptomes of Chinese razor clam (Sinonovacula constricta) with deep-burrowing life style and broad-range salinity adaptation

10.1101/735142 ◽

2019 ◽

Cited By ~ 1

Author(s):

Yinghui Dong ◽

Qifan Zeng ◽

Jianfeng Ren ◽

Hanhan Yao ◽

Wenbin Ruan ◽

...

Keyword(s):

Genome Assembly ◽

Developmental Stages ◽

De Novo ◽

Gene Families ◽

Sequencing Data ◽

High Quality ◽

Protein Coding ◽

Sinonovacula Constricta ◽

Razor Clam ◽

Genomic Resources

AbstractBackgroundThe Chinese razor clam, Sinonovacula constricta, is one of the commercially important marine bivalves with deep-burrowing lifestyle and remarkable adaptability of broad-range salinity. Despite its economic impact and representative of the less-understood deep-burrowing bivalve lifestyle, there are few genomic resources for exploring its unique biology and adaptive evolution. Herein, we reported a high-quality chromosomal-level reference genome of S. constricta, the first genome of the family Solenidae, along with a large amount of short-read/full-length transcriptomic data of whole-ontogeny developmental stages, all major adult tissues, and gill tissues under salinity challenge.FindingsA total of 101.79 Gb and 129.73 Gb sequencing data were obtained with the PacBio and Illumina platforms, which represented approximately 186.63X genome coverage. In addition, a total of 160.90 Gb and 24.55 Gb clean data were also obtained with the Illumina and PacBio platforms for transcriptomic investigation. A de novo genome assembly of 1,340.13 Mb was generated, with a contig N50 of 689.18 kb. Hi-C scaffolding resulted in 19 chromosomes with a scaffold N50 of 57.99 Mb. The repeat sequences account for 50.71% of the assembled genome. A total of 26,273 protein-coding genes were predicted and 99.5% of them were annotated. Phylogenetic analysis revealed that S. constricta diverged from the lineage of Pteriomorphia at approximately 494 million years ago. Notably, cytoskeletal protein tubulin and motor protein dynein gene families are rapidly expanded in the S. constricta genome and are highly expressed in the mantle and gill, implicating potential genomic bases for the well-developed ciliary system in the S. constricta.ConclusionsThe high-quality genome assembly and comprehensive transcriptomes generated in this work not only provides highly valuable genomic resources for future studies of S. constricta, but also lays a solid foundation for further investigation into the adaptive mechanisms of benthic burrowing mollusks.

Download Full-text

Fully Phased Sequence of a Diploid Human Genome Determined de Novo from the DNA of a Single Individual

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400995 ◽

2020 ◽

Vol 10 (9) ◽

pp. 2911-2925

Author(s):

llya Soifer ◽

Nicole L Fong ◽

Nelda Yi ◽

Andrea T Ireland ◽

Irene Lam ◽

...

Keyword(s):

Single Molecule ◽

Genome Assembly ◽

De Novo ◽

Sequence Data ◽

Phenotypic Trait ◽

Single Individual ◽

Phase Information ◽

Metaphase Chromosomes ◽

De Novo Genome Assembly ◽

Diploid Genome

Abstract In recent years, improved sequencing technology and computational tools have made de novo genome assembly more accessible. Many approaches, however, generate either an unphased or only partially resolved representation of a diploid genome, in which polymorphisms are detected but not assigned to one or the other of the homologous chromosomes. Yet chromosomal phase information is invaluable for the understanding of phenotypic trait inheritance in the cases of compound heterozygosity, allele-specific expression or cis-acting variants. Here we use a combination of tools and sequencing technologies to generate a de novo diploid assembly of the human primary cell line WI-38. First, data from PacBio single molecule sequencing and Bionano Genomics optical mapping were combined to generate an unphased assembly. Next, 10x Genomics linked reads were combined with the hybrid assembly to generate a partially phased assembly. Lastly, we developed and optimized methods to use short-read (Illumina) sequencing of flow cytometry-sorted metaphase chromosomes to provide phase information. The final genome assembly was almost fully (94%) phased with the addition of approximately 2.5-fold coverage of Illumina data from the sequenced metaphase chromosomes. The diploid nature of the final de novo genome assembly improved the resolution of structural variants between the WI-38 genome and the human reference genome. The phased WI-38 sequence data are available for browsing and download at wi38.research.calicolabs.com. Our work shows that assembling a completely phased diploid genome de novo from the DNA of a single individual is now readily achievable.

Download Full-text