scholarly journals The first sheep graph pan-genome reveals the spectrum of structural variations and their effects on different tail phenotypes

2021 ◽  
Author(s):  
Ran Li ◽  
Mian Gong ◽  
Xinmiao Zhang ◽  
Fei Wang ◽  
Zhenyu Liu ◽  
...  

Structural variations (SVs) are a major contributor of genetic diversity and phenotypic variations, however their prevalence and functions in domestic animals are largely unexplored. Here, we assembled 26 haplotype-resolved genome assemblies from 13 genetically diverse sheep breeds using PacBio HiFi sequencing. We then constructed an ovine graph pan-genome and demonstrated its advantage in discovering 142,593 biallelic SVs (Insertions and deletions), 7,028 divergent alleles and 13,419 multiallelic variations with high accuracy and sensitivity. To link the SVs to genotypes, we genotyped the SVs in 687 resequenced individuals of domestic and wild sheep using a graph-based approach and identified numerous population-stratified variants, of which expression-associated SVs were detected by integrating RNA-seq data. Taking the varying sheep tail morphology as example, we located a putative causative insertion in HOXB13 gene responsible for the long tail and reported multiple large SVs associated with the fat tail. Beyond generating a benchmark resource for ovine structural variants, our study also highlighted that the population genetics analysis based on graph pan-genome rather than reference genome will greatly benefit the animal genetic research.

2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Michael Alonge ◽  
Sebastian Soyk ◽  
Srividya Ramakrishnan ◽  
Xingang Wang ◽  
Sara Goodwin ◽  
...  

Abstract We present RaGOO, a reference-guided contig ordering and orienting tool that leverages the speed and sensitivity of Minimap2 to accurately achieve chromosome-scale assemblies in minutes. After the pseudomolecules are constructed, RaGOO identifies structural variants, including those spanning sequencing gaps. We show that RaGOO accurately orders and orients 3 de novo tomato genome assemblies, including the widely used M82 reference cultivar. We then demonstrate the scalability and utility of RaGOO with a pan-genome analysis of 103 Arabidopsis thaliana accessions by examining the structural variants detected in the newly assembled pseudomolecules. RaGOO is available open source at https://github.com/malonge/RaGOO.


2021 ◽  
Author(s):  
Bo Wang ◽  
Yinping Jiao ◽  
Kapeel Chougule ◽  
Andrew Olson ◽  
Jian Huang ◽  
...  

ABSTRACTSorghum bicolor, one of the most important grass crops around the world, harbors a high degree of genetic diversity. We constructed chromosome-level genome assemblies for two important sorghum inbred lines, Tx2783 and RTx436. The final high-quality reference assemblies consist of 19 and 18 scaffolds, respectively, with contig N50 values of 25.6 and 20.3 Mb. Genes were annotated using evidence-based and de novo gene predictors, and RAMPAGE data demonstrate that transcription start sites were effectively captured. Together with other public sorghum genomes, BTx623, RTx430, and Rio, extensive structural variations (SVs) of various sizes were characterized using Tx2783 as a reference. Genome-wide scanning for disease resistance (R) genes revealed high levels of diversity among these five sorghum accessions. To characterize sugarcane aphid (SCA) resistance in Tx2783, we mapped the resistance region on chromosome 6 using a recombinant inbred line (RIL) population and found a SV of 191 kb containing a cluster of R genes in Tx2783. Using Tx2783 as a backbone, along with the SVs, we constructed a pan-genome to support alignment of resequencing data from 62 sorghum accessions, and then identified core and dispensable genes using this population. This study provides the first overview of the extent of genomic structural variations and R genes in the sorghum population, and reveals potential targets for breeding of SCA resistance.


2019 ◽  
Author(s):  
Michael Alonge ◽  
Sebastian Soyk ◽  
Srividya Ramakrishnan ◽  
Xingang Wang ◽  
Sara Goodwin ◽  
...  

AbstractBackgroundAs the number of new genome assemblies continues to grow, there is increasing demand for methods to coalesce contigs from draft assemblies into pseudomolecules. Most current methods use genetic maps, optical maps, chromatin conformation (Hi-C), or other long-range linking data, however these data are expensive and analysis methods often fail to accurately order and orient a high percentage of assembly contigs. Other approaches utilize alignments to a reference genome for ordering and orienting, however these tools rely on slow aligners and are not robust to repetitive contigs.ResultsWe present RaGOO, an open-source reference-guided contig ordering and orienting tool that leverages the speed and sensitivity of Minimap2 to accurately achieve chromosome-scale assemblies in just minutes. With the pseudomolecules constructed, RaGOO identifies structural variants, including those spanning sequencing gaps that are not reported by alternative methods. We show that RaGOO accurately orders and orients contigs into nearly complete chromosomes based on de novo assemblies of Oxford Nanopore long-read sequencing from three wild and domesticated tomato genotypes, including the widely used M82 reference cultivar. We then demonstrate the scalability and utility of RaGOO with a pan-genome analysis of 103 Arabidopsis thaliana accessions by examining the structural variants detected in the newly assembled pseudomolecules. RaGOO is available open-source with an MIT license at https://github.com/malonge/RaGOO.ConclusionsWe demonstrate that with a highly contiguous assembly and a structurally accurate reference genome, reference-guided scaffolding with RaGOO outperforms error-prone reference-free methods and enable rapid pan-genome analysis.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Chao Wang ◽  
Ola Wallerman ◽  
Maja-Louise Arendt ◽  
Elisabeth Sundström ◽  
Åsa Karlsson ◽  
...  

AbstractWe present GSD_1.0, a high-quality domestic dog reference genome with chromosome length scaffolds and contiguity increased 55-fold over CanFam3.1. Annotation with generated and existing long and short read RNA-seq, miRNA-seq and ATAC-seq, revealed that 32.1% of lifted over CanFam3.1 gaps harboured previously hidden functional elements, including promoters, genes and miRNAs in GSD_1.0. A catalogue of canine “dark” regions was made to facilitate mapping rescue. Alignment in these regions is difficult, but we demonstrate that they harbour trait-associated variation. Key genomic regions were completed, including the Dog Leucocyte Antigen (DLA), T Cell Receptor (TCR) and 366 COSMIC cancer genes. 10x linked-read sequencing of 27 dogs (19 breeds) uncovered 22.1 million SNPs, indels and larger structural variants. Subsequent intersection with protein coding genes showed that 1.4% of these could directly influence gene products, and so provide a source of normal or aberrant phenotypic modifications.


Author(s):  
Chao Wang ◽  
Ola Wallerman ◽  
Maja-Louise Arendt ◽  
Elisabeth Sundström ◽  
Åsa Karlsson ◽  
...  

AbstractHere we present a new high-quality canine reference genome with gap number reduced 41-fold, from 23,836 to 585. Analysis of existing and novel data, RNA-seq, miRNA-seq and ATAC-seq, revealed a large proportion of these harboured previously hidden elements, including genes, promoters and miRNAs. Short-read dark regions were detected, and genomic regions completed, including the DLA, TCR and 366 cancer genes. 10x sequencing of 27 dogs uncovered a total of 22.1 million SNPs, Indels and larger structural variants (SVs). 1.4% overlap with protein coding genes and could provide a source of normal or aberrant phenotypic modifications.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
John T. Lovell ◽  
Nolan B. Bentley ◽  
Gaurab Bhattarai ◽  
Jerry W. Jenkins ◽  
Avinash Sreedasyam ◽  
...  

AbstractGenome-enabled biotechnologies have the potential to accelerate breeding efforts in long-lived perennial crop species. Despite the transformative potential of molecular tools in pecan and other outcrossing tree species, highly heterozygous genomes, significant presence–absence gene content variation, and histories of interspecific hybridization have constrained breeding efforts. To overcome these challenges, here, we present diploid genome assemblies and annotations of four outbred pecan genotypes, including a PacBio HiFi chromosome-scale assembly of both haplotypes of the ‘Pawnee’ cultivar. Comparative analysis and pan-genome integration reveal substantial and likely adaptive interspecific genomic introgressions, including an over-retained haplotype introgressed from bitternut hickory into pecan breeding pedigrees. Further, by leveraging our pan-genome presence–absence and functional annotation database among genomes and within the two outbred haplotypes of the ‘Lakota’ genome, we identify candidate genes for pest and pathogen resistance. Combined, these analyses and resources highlight significant progress towards functional and quantitative genomics in highly diverse and outbred crops.


Author(s):  
Wayne Xu ◽  
James R Tucker ◽  
Wubishet A Bekele ◽  
Frank M You ◽  
Yong-Bi Fu ◽  
...  

Abstract Barley (Hordeum vulgare L.) is one of the most important global crops. The six-row barley cultivar Morex reference genome has been used by the barley research community worldwide. However, this reference genome can have limitations when used for genomic and genetic diversity analysis studies, gene discovery, and marker development when working in two-row germplasm that is more common to Canadian barley. Here we assembled, for the first time, the genome sequence of a Canadian two-row malting barley, cultivar AAC Synergy. We applied deep Illumina paired-end reads, long mate-pair reads, PacBio sequences, 10X chromium linked read libraries, and chromosome conformation capture sequencing (Hi-C) to generate a contiguous assembly. The genome assembled from super-scaffolds had a size of 4.85 Gb, N50 of 2.32 Mb and an estimated 93.9% of complete genes from a plant database (BUSCO, benchmarking universal single-copy orthologous genes). After removal of small scaffolds (< 300 Kb), the assembly was arranged into pseudomolecules of 4.14 Gb in size with seven chromosomes plus unanchored scaffolds. The completeness and annotation of the assembly were assessed by comparing it with the updated version of six-row Morex and recently released two-row Golden Promise genome assemblies.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Surajit Bhattacharya ◽  
Hayk Barseghyan ◽  
Emmanuèle C. Délot ◽  
Eric Vilain

Abstract Background Whole genome sequencing is effective at identification of small variants, but because it is based on short reads, assessment of structural variants (SVs) is limited. The advent of Optical Genome Mapping (OGM), which utilizes long fluorescently labeled DNA molecules for de novo genome assembly and SV calling, has allowed for increased sensitivity and specificity in SV detection. However, compared to small variant annotation tools, OGM-based SV annotation software has seen little development, and currently available SV annotation tools do not provide sufficient information for determination of variant pathogenicity. Results We developed an R-based package, nanotatoR, which provides comprehensive annotation as a tool for SV classification. nanotatoR uses both external (DGV; DECIPHER; Bionano Genomics BNDB) and internal (user-defined) databases to estimate SV frequency. Human genome reference GRCh37/38-based BED files are used to annotate SVs with overlapping, upstream, and downstream genes. Overlap percentages and distances for nearest genes are calculated and can be used for filtration. A primary gene list is extracted from public databases based on the patient’s phenotype and used to filter genes overlapping SVs, providing the analyst with an easy way to prioritize variants. If available, expression of overlapping or nearby genes of interest is extracted (e.g. from an RNA-Seq dataset, allowing the user to assess the effects of SVs on the transcriptome). Most quality-control filtration parameters are customizable by the user. The output is given in an Excel file format, subdivided into multiple sheets based on SV type and inheritance pattern (INDELs, inversions, translocations, de novo, etc.). nanotatoR passed all quality and run time criteria of Bioconductor, where it was accepted in the April 2019 release. We evaluated nanotatoR’s annotation capabilities using publicly available reference datasets: the singleton sample NA12878, mapped with two types of enzyme labeling, and the NA24143 trio. nanotatoR was also able to accurately filter the known pathogenic variants in a cohort of patients with Duchenne Muscular Dystrophy for which we had previously demonstrated the diagnostic ability of OGM. Conclusions The extensive annotation enables users to rapidly identify potential pathogenic SVs, a critical step toward use of OGM in the clinical setting.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Christopher Alan Smith

AbstractThe basidiomycete fungus Lentinula novae-zelandiae is endemic to New Zealand and is a sister taxon to Lentinula edodes, the second most cultivated mushroom in the world. To explore the biology of this organism, a high-quality chromosome level reference genome of L. novae-zelandiae was produced. Macrosyntenic comparisons between the genome assembly of L. novae-zelandiae, L. edodes and a set of three genome assemblies of diverse species from the Agaricomycota reveal a high degree of macrosyntenic restructuring within L. edodes consistent with signal of domestication. These results show L. edodes has undergone significant genomic change during the course of its evolutionary history, likely a result of its cultivation and domestication over the last 1000 years.


Sign in / Sign up

Export Citation Format

Share Document