Pangenomics enables genotyping of known structural variants in 5202 diverse genomes

Science ◽  
2021 ◽  
Vol 374 (6574) ◽  
Author(s):  
Jouni Sirén ◽  
Jean Monlong ◽  
Xian Chang ◽  
Adam M. Novak ◽  
Jordan M. Eizenga ◽  
...  
Keyword(s):  
BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Surajit Bhattacharya ◽  
Hayk Barseghyan ◽  
Emmanuèle C. Délot ◽  
Eric Vilain

Abstract Background Whole genome sequencing is effective at identification of small variants, but because it is based on short reads, assessment of structural variants (SVs) is limited. The advent of Optical Genome Mapping (OGM), which utilizes long fluorescently labeled DNA molecules for de novo genome assembly and SV calling, has allowed for increased sensitivity and specificity in SV detection. However, compared to small variant annotation tools, OGM-based SV annotation software has seen little development, and currently available SV annotation tools do not provide sufficient information for determination of variant pathogenicity. Results We developed an R-based package, nanotatoR, which provides comprehensive annotation as a tool for SV classification. nanotatoR uses both external (DGV; DECIPHER; Bionano Genomics BNDB) and internal (user-defined) databases to estimate SV frequency. Human genome reference GRCh37/38-based BED files are used to annotate SVs with overlapping, upstream, and downstream genes. Overlap percentages and distances for nearest genes are calculated and can be used for filtration. A primary gene list is extracted from public databases based on the patient’s phenotype and used to filter genes overlapping SVs, providing the analyst with an easy way to prioritize variants. If available, expression of overlapping or nearby genes of interest is extracted (e.g. from an RNA-Seq dataset, allowing the user to assess the effects of SVs on the transcriptome). Most quality-control filtration parameters are customizable by the user. The output is given in an Excel file format, subdivided into multiple sheets based on SV type and inheritance pattern (INDELs, inversions, translocations, de novo, etc.). nanotatoR passed all quality and run time criteria of Bioconductor, where it was accepted in the April 2019 release. We evaluated nanotatoR’s annotation capabilities using publicly available reference datasets: the singleton sample NA12878, mapped with two types of enzyme labeling, and the NA24143 trio. nanotatoR was also able to accurately filter the known pathogenic variants in a cohort of patients with Duchenne Muscular Dystrophy for which we had previously demonstrated the diagnostic ability of OGM. Conclusions The extensive annotation enables users to rapidly identify potential pathogenic SVs, a critical step toward use of OGM in the clinical setting.


2021 ◽  
Author(s):  
Parsoa Khorsand ◽  
Fereydoun Hormozdiari

Abstract Large scale catalogs of common genetic variants (including indels and structural variants) are being created using data from second and third generation whole-genome sequencing technologies. However, the genotyping of these variants in newly sequenced samples is a nontrivial task that requires extensive computational resources. Furthermore, current approaches are mostly limited to only specific types of variants and are generally prone to various errors and ambiguities when genotyping complex events. We are proposing an ultra-efficient approach for genotyping any type of structural variation that is not limited by the shortcomings and complexities of current mapping-based approaches. Our method Nebula utilizes the changes in the count of k-mers to predict the genotype of structural variants. We have shown that not only Nebula is an order of magnitude faster than mapping based approaches for genotyping structural variants, but also has comparable accuracy to state-of-the-art approaches. Furthermore, Nebula is a generic framework not limited to any specific type of event. Nebula is publicly available at https://github.com/Parsoa/Nebula.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Josué Barrera-Redondo ◽  
Guillermo Sánchez-de la Vega ◽  
Jonás A. Aguirre-Liguori ◽  
Gabriela Castellanos-Morales ◽  
Yocelyn T. Gutiérrez-Guerrero ◽  
...  

AbstractDespite their economic importance and well-characterized domestication syndrome, the genomic impact of domestication and the identification of variants underlying the domestication traits in Cucurbita species (pumpkins and squashes) is currently lacking. Cucurbita argyrosperma, also known as cushaw pumpkin or silver-seed gourd, is a Mexican crop consumed primarily for its seeds rather than fruit flesh. This makes it a good model to study Cucurbita domestication, as seeds were an essential component of early Mesoamerican diet and likely the first targets of human-guided selection in pumpkins and squashes. We obtained population-level data using tunable Genotype by Sequencing libraries for 192 individuals of the wild and domesticated subspecies of C. argyrosperma across Mexico. We also assembled the first high-quality wild Cucurbita genome. Comparative genomic analyses revealed several structural variants and presence/absence of genes related to domestication. Our results indicate a monophyletic origin of this domesticated crop in the lowlands of Jalisco. We found evidence of gene flow between the domesticated and wild subspecies, which likely alleviated the effects of the domestication bottleneck. We uncovered candidate domestication genes that are involved in the regulation of growth hormones, plant defense mechanisms, seed development, and germination. The presence of shared selected alleles with the closely related species Cucurbita moschata suggests domestication-related introgression between both taxa.


Author(s):  
Paul Vollrath ◽  
Harmeet S. Chawla ◽  
Sarah V. Schiessl ◽  
Iulian Gabur ◽  
HueyTyng Lee ◽  
...  

Abstract Key message A novel structural variant was discovered in the FLOWERING LOCUS T orthologue BnaFT.A02 by long-read sequencing. Nested association mapping in an elite winter oilseed rape population revealed that this 288 bp deletion associates with early flowering, putatively by modification of binding-sites for important flowering regulation genes. Abstract Perfect timing of flowering is crucial for optimal pollination and high seed yield. Extensive previous studies of flowering behavior in Brassica napus (canola, rapeseed) identified mutations in key flowering regulators which differentiate winter, semi-winter and spring ecotypes. However, because these are generally fixed in locally adapted genotypes, they have only limited relevance for fine adjustment of flowering time in elite cultivar gene pools. In crosses between ecotypes, the ecotype-specific major-effect mutations mask minor-effect loci of interest for breeding. Here, we investigated flowering time in a multiparental mapping population derived from seven elite winter oilseed rape cultivars which are fixed for major-effect mutations separating winter-type rapeseed from other ecotypes. Association mapping revealed eight genomic regions on chromosomes A02, C02 and C03 associating with fine modulation of flowering time. Long-read genomic resequencing of the seven parental lines identified seven structural variants coinciding with candidate genes for flowering time within chromosome regions associated with flowering time. Segregation patterns for these variants in the elite multiparental population and a diversity set of winter types using locus-specific assays revealed significant associations with flowering time for three deletions on chromosome A02. One of these was a previously undescribed 288 bp deletion within the second intron of FLOWERING LOCUS T on chromosome A02, emphasizing the advantage of long-read sequencing for detection of structural variants in this size range. Detailed analysis revealed the impact of this specific deletion on flowering-time modulation under extreme environments and varying day lengths in elite, winter-type oilseed rape.


Minerals ◽  
2021 ◽  
Vol 11 (4) ◽  
pp. 395
Author(s):  
Anastasiia Topnikova ◽  
Elena Belokoneva ◽  
Olga Dimitrova ◽  
Anatoly Volkov ◽  
Dina Deyneko

Crystals of new silicate-germanate Rb1.66Cs1.34Tb[Si5.43Ge0.57O15]·H2O have been synthesized hydrothermally in a multi-component system TbCl3:GeO2:SiO2 = 1:1:5 at T = 280 °C and P = 100 atm. K2CO3, Rb2CO3 and Cs2CO3 were added to the solution as mineralizers. The crystal structure was solved using single crystal X-ray data: a = 15.9429(3), b = 14.8407(3), c = 7.2781(1) Å, sp. gr. Pbam. New Rb,Cs,Tb-silicate-germanate consists of a [Si5.43Ge0.57O15]∞∞ corrugated tetrahedral layer combined by isolated TbO6 octahedra into the mixed microporous framework as in synthetic K3Nd[Si6O15]·2H2O, K3Nd[Si6O15] and K3Eu[Si6O15]·2H2O with the cavities occupied by Cs, Rb atoms and water molecules. Luminescence spectrum on new crystals was obtained and analysed. A comparison with the other representatives of related layered natural and synthetic silicates was carried out based on the topology-symmetry analysis by the OD (order-disorder) approach. The wollastonite chain was selected as the initial structural unit. Three symmetrical ways of forming ribbon from such a chain and three ways of further connecting ribbons to each other into the layer were revealed and described with symmetry groupoids. Hypothetical structural variants of the layers and ribbons in this family were predicted.


1984 ◽  
Vol 195 (1-2) ◽  
pp. 153-158 ◽  
Author(s):  
Andras Gal ◽  
Jean-Louis Nahon ◽  
Gérard Lucotte ◽  
José M. Sala-Trepat

2013 ◽  
Vol 113 (4) ◽  
pp. 043709 ◽  
Author(s):  
M. Shaughnessy ◽  
L. Damewood ◽  
C. Y. Fong ◽  
L. H. Yang ◽  
C. Felser

Sign in / Sign up

Export Citation Format

Share Document