scholarly journals Genotyping and De Novo Discovery of Allelic Variants at the Brassicaceae Self-Incompatibility Locus from Short-Read Sequencing Data

2019 ◽  
Vol 37 (4) ◽  
pp. 1193-1201 ◽  
Author(s):  
Mathieu Genete ◽  
Vincent Castric ◽  
Xavier Vekemans

Abstract Plant self-incompatibility (SI) is a genetic system that prevents selfing and enforces outcrossing. Because of strong balancing selection, the genes encoding SI are predicted to maintain extraordinarily high levels of polymorphism, both in terms of the number of functionally distinct S-alleles that segregate in SI species and in terms of their nucleotide sequence divergence. However, because of these two combined features, documenting polymorphism of these genes also presents important methodological challenges that have so far largely prevented the comprehensive analysis of complete allelic series in natural populations, and also precluded the obtention of complete genic sequences for many S-alleles. Here, we develop a powerful methodological approach based on a computationally optimized comparison of short Illumina sequencing reads from genomic DNA to a database of known nucleotide sequences of the extracellular domain of SRK (eSRK). By examining mapping patterns along the reference sequences, we obtain highly reliable predictions of S-genotypes from individuals collected from natural populations of Arabidopsis halleri. Furthermore, using a de novo assembly approach of the filtered short reads, we obtain full-length sequences of eSRK even when the initial sequence in the database was only partial, and we discover putative new SRK alleles that were not initially present in the database. When including those new alleles in the reference database, we were able to resolve the complete diploid SI genotypes of all individuals. Beyond the specific case of Brassicaceae S-alleles, our approach can be readily applied to other polymorphic loci, given reference allelic sequences are available.

2019 ◽  
Author(s):  
Mathieu Genete ◽  
Vincent Castric ◽  
Xavier Vekemans

AbstractPlant self-incompatibility (SI) is a genetic system that prevents selfing and enforces outcrossing. Because of strong balancing selection, the genes encoding SI are predicted to maintain extraordinary high levels of polymorphism, both in terms of the number of S-alleles that segregate in SI species and in terms of nucleotide sequence divergence among distinct S-allelic lines. However, because of these two combined features, documenting polymorphism of these genes also presents important methodological challenges that have so far largely prevented the comprehensive analysis of complete allelic series in natural populations, and also precluded the obtention of complete genic sequences for many S-alleles. Here, we present a novel methodological approach based on a computationally optimized comparison of short Illumina sequencing reads from genomic DNA to a database of known nucleotide sequences of the extracellular domain of SRK (eSRK). By examining mapping patterns along the reference sequences, we obtain highly reliable predictions of S-genotypes from individuals collected in natural populations of Arabidopsis halleri. Furthermore, using a de novo assembly approach of the filtered short reads, we obtain full length sequences of eSRK even when the initial sequence in the database was only partial, and we discover new SRK alleles that were not initially present in the database. When including those new alleles in the reference database, we were able to resolve the complete diploid SI genotypes of all individuals. Beyond the specific case of Brassicaceae S-alleles, our approach can be readily applied to other polymorphic loci, given reference allelic sequences are available.


2019 ◽  
Author(s):  
Matthias H. Weissensteiner ◽  
Ignas Bunikis ◽  
Ana Catalán ◽  
Kees-Jan Francoijs ◽  
Ulrich Knief ◽  
...  

AbstractStructural variation (SV) accounts for a substantial part of genetic mutations segregating across eukaryotic genomes with important medical and evolutionary implications. Here, we characterized SV across evolutionary time scales in the songbird genus Corvus using de novo assembly and read mapping approaches. Combining information from short-read (N = 127) and long-read re-sequencing data (N = 31) as well as from optical maps (N = 16) revealed a total of 201,738 insertions, deletions and inversions. Population genetic analysis of SV in the Eurasian crow speciation model revealed an evolutionary young (~530,000 years) cis-acting 2.25-kb retrotransposon insertion reducing expression of the NDP gene with consequences for premating isolation. Our results attest to the wealth of SV segregating in natural populations and demonstrate its evolutionary significance.


Genetics ◽  
2018 ◽  
Vol 211 (3) ◽  
pp. 943-961 ◽  
Author(s):  
John K. Kelly ◽  
Kimberly A. Hughes

We develop analytical and simulation tools for evolve-and-resequencing experiments and apply them to a new study of rapid evolution in Drosophila simulans. Likelihood test statistics applied to pooled population sequencing data suggest parallel evolution of 138 SNPs across the genome. This number is reduced by orders of magnitude from previous studies (thousands or tens of thousands), owing to differences in both experimental design and statistical analysis. Whole genome simulations calibrated from Drosophila genetic data sets indicate that major features of the genome-wide response could be explained by as few as 30 loci under strong directional selection with a corresponding hitchhiking effect. Smaller effect loci are likely also responding, but are below the detection limit of the experiment. Finally, SNPs showing strong parallel evolution in the experiment are intermediate in frequency in the natural population (usually 30–70%) indicative of balancing selection in nature. These loci also exhibit elevated differentiation among natural populations of D. simulans, suggesting environmental heterogeneity as a potential balancing mechanism.


2022 ◽  
Author(s):  
Leeban Yusuf ◽  
Venera Tyukmaeva ◽  
Anneli Hoikkala ◽  
Michael G Ritchie

Speciation with gene flow is now widely regarded as common. However, the frequency of introgression between recently diverged species and the evolutionary consequences of gene flow are still poorly understood. The virilis group of Drosophila contains around a dozen species that are geographically widespread and show varying levels of pre-zygotic and post-zygotic isolation. Here, we utilize de novo genome assemblies and whole-genome sequencing data to resolve phylogenetic relationships and describe patterns of introgression and divergence across the group. We suggest that the virilis group consists of three, rather than the traditional two, subgroups. We found evidence of pervasive phylogenetic discordance caused by ancient introgression events between distant lineages within the group, and much more recent gene flow between closely-related species. When assessing patterns of genome-wide divergence in species pairs across the group, we found no consistent genomic evidence of a disproportionate role for the X chromosome. Some genes undergoing rapid sequence divergence across the group were involved in chemical communication and may be related to the evolution of sexual isolation. We suggest that gene flow between closely-related species has potentially had an impact on lineage-specific adaptation and the evolution of reproductive barriers. Our results show how ancient and recent introgression confuse phylogenetic reconstruction, and suggest that shared variation can facilitate adaptation and speciation.


2021 ◽  
Vol 7 (3) ◽  
Author(s):  
David R. Greig ◽  
Claire Jenkins ◽  
Saheer E. Gharbia ◽  
Timothy J. Dallman

Compared to short-read sequencing data, long-read sequencing facilitates single contiguous de novo assemblies and characterization of the prophage region of the genome. Here, we describe our methodological approach to using Oxford Nanopore Technology (ONT) sequencing data to quantify genetic relatedness and to look for microevolutionary events in the core and accessory genomes to assess the within-outbreak variation of four genetically and epidemiologically linked isolates. Analysis of both Illumina and ONT sequencing data detected one SNP between the four sequences of the outbreak isolates. The variant calling procedure highlighted the importance of masking homologous sequences in the reference genome regardless of the sequencing technology used. Variant calling also highlighted the systemic errors in ONT base-calling and ambiguous mapping of Illumina reads that results in variations in the genetic distance when comparing one technology to the other. The prophage component of the outbreak strain was analysed, and nine of the 16 prophages showed some similarity to the prophage in the Sakai reference genome, including the stx2a-encoding phage. Prophage comparison between the outbreak isolates identified minor genome rearrangements in one of the isolates, including an inversion and a deletion event. The ability to characterize the accessory genome in this way is the first step to understanding the significance of these microevolutionary events and their impact on the evolutionary history, virulence and potentially the likely source and transmission of this zoonotic, foodborne pathogen.


eLife ◽  
2019 ◽  
Vol 8 ◽  
Author(s):  
Maxime Chantreau ◽  
Céline Poux ◽  
Marc F Lensink ◽  
Guillaume Brysbaert ◽  
Xavier Vekemans ◽  
...  

How two-component genetic systems accumulate evolutionary novelty and diversify in the course of evolution is a fundamental problem in evolutionary systems biology. In the Brassicaceae, self-incompatibility (SI) is a spectacular example of a diversified allelic series in which numerous highly diverged receptor-ligand combinations are segregating in natural populations. However, the evolutionary mechanisms by which new SI specificities arise have remained elusive. Using in planta ancestral protein reconstruction, we demonstrate that two allelic variants segregating as distinct receptor-ligand combinations diverged through an asymmetrical process whereby one variant has retained the same recognition specificity as their (now extinct) putative ancestor, while the other has functionally diverged and now represents a novel specificity no longer recognized by the ancestor. Examination of the structural determinants of the shift in binding specificity suggests that qualitative rather than quantitative changes of the interaction are an important source of evolutionary novelty in this highly diversified receptor-ligand system.


2017 ◽  
Author(s):  
Benjamin J Callahan ◽  
Paul J McMurdie ◽  
Susan P Holmes

AbstractRecent advances have made it possible to analyze high-throughput marker-gene sequencing data without resorting to the customary construction of molecular operational taxonomic units (OTUs): clusters of sequencing reads that differ by less than a fixed dissimilarity threshold. New methods control errors sufficiently that sequence variants (SVs) can be resolved exactly, down to the level of single-nucleotide differences over the sequenced gene region. The benefits of finer taxonomic resolution are immediately apparent, and arguments for SV methods have focused on their improved resolution. Less obvious, but we believe more important, are the broad benefits deriving from the status of SVs as consistent labels with intrinsic biological meaning identified independently from a reference database. Here we discuss how those features grant SVs the combined advantages of closed-reference OTUs — including computational costs that scale linearly with study size, simple merging between independently processed datasets, and forward prediction — and of de novo OTUs — including accurate diversity measurement and applicability to communities lacking deep coverage in reference databases. We argue that the improvements in reusability, reproducibility and comprehensiveness are sufficiently great that SVs should replace OTUs as the standard unit of marker gene analysis and reporting.


2019 ◽  
Author(s):  
Chantreau Maxime ◽  
Céline Poux ◽  
Marc F. Lensink ◽  
Guillaume Brysbaert ◽  
Xavier Vekemans ◽  
...  

AbstractHow two-components genetic systems accumulate evolutionary novelty and become diversified in the course of evolution is a fundamental problem in evolutionary systems biology. In the Brassicaceae, self-incompatibility (SI) is a spectacular example of a diversified allelic series in which numerous highly diverged receptor-ligand combinations are segregating in natural populations. However, the evolutionary mechanisms by which new SI specificities arise in the first place have remained elusive. Using in planta ancestral protein resurrection, we demonstrate that two allelic variants currently segregating as distinct receptor-ligand combinations diverged through an asymmetrical process whereby one variant has retained the same recognition specificity as the (now extinct) ancestor, while the other has functionally diverged and now represents a novel specificity no longer recognized by the ancestor. Examination of the structural determinants of the shift in binding specificity suggests that allosteric changes may be an important source of evolutionary novelty in this highly diversified receptor-ligand system.


2017 ◽  
Author(s):  
Rachael E. Workman ◽  
Alexander M. Myrka ◽  
Elizabeth Tseng ◽  
G. William Wong ◽  
Kenneth C. Welch ◽  
...  

AbstractHummingbirds can support their high metabolic rates exclusively by oxidizing ingested sugars, which is unsurprising given their sugar-rich nectar diet and use of energetically expensive hovering flight. However, they cannot rely on dietary sugars as a fuel during fasting periods, such as during the night, at first light, or when undertaking long-distance migratory flights, and must instead rely exclusively on onboard lipids. This metabolic flexibility is remarkable both in that the birds can switch between exclusive use of each fuel type within minutes and in that de novo lipogenesis from dietary sugar precursors is the principle way in which fat stores are built, sometimes at exceptionally high rates, such as during the few days prior to a migratory flight. The hummingbird hepatopancreas is the principle location of de novo lipogenesis and likely plays a key role in fuel selection, fuel switching, and glucose homeostasis. Yet understanding how this tissue, and the whole organism, achieves and moderates high rates of energy turnover is hampered by a fundamental lack of information regarding how genes coding for relevant enzymes differ in their sequence, expression, and regulation in these unique animals. To address this knowledge gap, we generated a de novo transcriptome of the hummingbird liver using PacBio full-length cDNA sequencing (Iso-Seq), yielding a total of 8.6Gb of sequencing data, or 2.6M reads from 4 different size fractions. We analyzed data using the SMRTAnalysis v3.1 Iso-Seq pipeline, including classification of reads and clustering of isoforms (ICE) followed by error-correction (Arrow). With COGENT, we clustered different isoforms into gene families to generate de novo gene contigs. We performed orthology analysis to identify closely related sequences between our transcriptome and other avian and human gene sets. We also aligned our transcriptome against the Calypte anna genome where possible. Finally, we closely examined homology of critical lipid metabolic genes between our transcriptome data and avian and human genomes. We confirmed high levels of sequence divergence within hummingbird lipogenic enzymes, suggesting a high probability of adaptive divergent function in the hepatic lipogenic pathways. Our results have leveraged cutting-edge technology and a novel bioinformatics pipeline to provide a compelling first direct look at the transcriptome of this incredible organism.


Sign in / Sign up

Export Citation Format

Share Document