scholarly journals Detection of shared balancing selection in the absence of trans-species polymorphism

2018 ◽  
Author(s):  
Xiaoheng Cheng ◽  
Michael DeGiorgio

AbstractTrans-species polymorphism has been widely used as a key sign of long-term balancing selection across multiple species. However, such sites are often rare in the genome, and could result from mutational processes or technical artifacts. Few methods are yet available to specifically detect footprints of trans-species balancing selection without using trans-species polymorphic sites. In this study, we develop summary- and model-based approaches that are each specifically tailored to uncover regions of long-term balancing selection shared by a set of species by using genomic patterns of intra-specific polymorphism and inter-specific fixed differences. We demonstrate that our trans-species statistics have substantially higher power than single-species approaches to detect footprints of trans-species balancing selection, and are robust to those that do not affect all tested species. We further apply our model-based methods to human and chimpanzee whole genome sequencing data. In addition to the previously-established MHC and malaria resistance-associated FREM3/GYPE regions, we also find outstanding genomic regions involved in barrier integrity and innate immunity, such as the GRIK1/CLDN17 intergenic region, and the SLC35F1 and ABCA13 genes. Our findings not only echo the significance of pathogen defense, but also reveal novel candidates in maintaining balanced polymorphisms across human and chimpanzee lineages. Finally, we show that these trans-species statistics can be applied to and work well for an arbitrary number of species, and integrate them into open-source software packages for ease of use by the scientific community.

2016 ◽  
Author(s):  
Lyndal Henden ◽  
Stuart Lee ◽  
Ivo Mueller ◽  
Alyssa Barry ◽  
Melanie Bahlo

AbstractIdentification of genomic regions that are identical by descent (IBD) has proven useful for human genetic studies where analyses have led to the discovery of familial relatedness and fine-mapping of disease critical regions. Unfortunately however, IBD analyses have been underutilized inanalysis of other organisms, including human pathogens. This is in part due to the lack of statistical methodologies for non-diploid genomes in addition to the added complexity of multiclonal infections. As such, we have developed an IBD methodology, called isoRelate, for analysis of haploid recombining microorganisms in the presence of multiclonal infections. Using the inferred IBD status at genomic locations, we have also developed a novel statistic for identifying loci under positive selection and propose relatedness networks as a means of exploring shared haplotypes within populations. We evaluate the performance of our methodologies for detecting IBD and selection, including comparisons with existing tools, then perform an exploratory analysis of whole genome sequencing data from a global Plasmodium falciparum dataset of more than 2500 genomes. This analysis identifies Southeast Asia as havingmany highly related isolates, possibly as a result of both reduced transmission from intensified control efforts and population bottlenecks following the emergence of antimalarial drug resistance. Many signals of selection are also identified, most of which overlap genes that are known to be associated with drug resistance, in addition to two novel signals observed in multiple countries that have yet to be explored in detail. Additionally, we investigate relatedness networks over the selected loci and determine that one of these sweeps has spread between continents while the other has arisen independently in different countries. IBD analysis of microorganisms using isoRelate can be used for exploring population structure, positive selection and haplotype distributions, and will be a valuable tool for monitoring disease control and elimination efforts of many diseases.


2021 ◽  
Author(s):  
Sophie Hoffman ◽  
Zena Lapp ◽  
Joyce Wang ◽  
Evan Snitkin

Increasing evidence of regional pathogen transmission networks highlights the importance of investigating the dissemination of multidrug-resistant organisms (MDROs) across a region to identify where transmission is happening and how pathogens move across regions. We developed a framework for investigating MDRO regional transmission dynamics using whole-genome sequencing data and created regentrans, an easy-to-use, open source R package that implements these methods (https://github.com/Snitkin-Lab-Umich/regentrans). Using a dataset of over 400 carbapenem-resistant Klebsiella pneumoniae isolates collected from patients in 21 long-term acute care hospitals (LTACHs) over a one-year period, we demonstrate how to use our framework to gain insights into differences in inter- and intra-facility transmission across different LTACHs and over time. These tools will allow investigators to better understand the origins and transmission patterns of MDROs, which is the first step in understanding how to stop transmission at the regional level.


Author(s):  
Sofonias K Tessema ◽  
Nicholas J Hathaway ◽  
Noam B Teyssier ◽  
Maxwell Murphy ◽  
Anna Chen ◽  
...  

Abstract Background Targeted next-generation sequencing offers the potential for consistent, deep coverage of information-rich genomic regions to characterize polyclonal Plasmodium falciparum infections. However, methods to identify and sequence these genomic regions are currently limited. Methods A bioinformatic pipeline and multiplex methods were developed to identify and simultaneously sequence 100 targets and applied to dried blood spot (DBS) controls and field isolates from Mozambique. For comparison, whole-genome sequencing data were generated for the same controls. Results Using publicly available genomes, 4465 high-diversity genomic regions suited for targeted sequencing were identified, representing the P. falciparum heterozygome. For this study, 93 microhaplotypes with high diversity (median expected heterozygosity = 0.7) were selected along with 7 drug resistance loci. The sequencing method achieved very high coverage (median 99%), specificity (99.8%), and sensitivity (90% for haplotypes with 5% within sample frequency in dried blood spots with 100 parasites/µL). In silico analyses revealed that microhaplotypes provided much higher resolution to discriminate related from unrelated polyclonal infections than biallelic single-nucleotide polymorphism barcodes. Conclusions The bioinformatic and laboratory methods outlined here provide a flexible tool for efficient, low-cost, high-throughput interrogation of the P. falciparum genome, and can be tailored to simultaneously address multiple questions of interest in various epidemiological settings.


2020 ◽  
Vol 37 (11) ◽  
pp. 3267-3291 ◽  
Author(s):  
Xiaoheng Cheng ◽  
Michael DeGiorgio

Abstract Long-term balancing selection typically leaves narrow footprints of increased genetic diversity, and therefore most detection approaches only achieve optimal performances when sufficiently small genomic regions (i.e., windows) are examined. Such methods are sensitive to window sizes and suffer substantial losses in power when windows are large. Here, we employ mixture models to construct a set of five composite likelihood ratio test statistics, which we collectively term B statistics. These statistics are agnostic to window sizes and can operate on diverse forms of input data. Through simulations, we show that they exhibit comparable power to the best-performing current methods, and retain substantially high power regardless of window sizes. They also display considerable robustness to high mutation rates and uneven recombination landscapes, as well as an array of other common confounding scenarios. Moreover, we applied a specific version of the B statistics, termed B2, to a human population-genomic data set and recovered many top candidates from prior studies, including the then-uncharacterized STPG2 and CCDC169–SOHLH2, both of which are related to gamete functions. We further applied B2 on a bonobo population-genomic data set. In addition to the MHC-DQ genes, we uncovered several novel candidate genes, such as KLRD1, involved in viral defense, and SCN9A, associated with pain perception. Finally, we show that our methods can be extended to account for multiallelic balancing selection and integrated the set of statistics into open-source software named BalLeRMix for future applications by the scientific community.


2012 ◽  
Vol 79 (1) ◽  
pp. 322-327 ◽  
Author(s):  
Samuel Duodu ◽  
Knut Madslien ◽  
Eva Hjelm ◽  
Ylva Molin ◽  
Anna Paziewska-Harris ◽  
...  

ABSTRACTInfections withBartonellaspp. have been recognized as emerging zoonotic diseases in humans. Large knowledge gaps exist, however, relating to reservoirs, vectors, and transmission of these bacteria. We describe identification by culture, PCR, and housekeeping gene sequencing ofBartonellaspp. in fed, wingless deer keds (Lipoptena cervi), deer ked pupae, and blood samples collected from moose,Alces alces, sampled within the deer ked distribution range in Norway. Direct sequencing from moose blood sampled in a deer ked-free area also indicatedBartonellainfection but at a much lower prevalence. The sequencing data suggested the presence of mixed infections involving two species ofBartonellawithin the deer ked range, while moose outside the range appeared to be infected with a single species.Bartonellawere not detected or cultured from unfed winged deer keds. The results may indicate that long-term bacteremia in the moose represents a reservoir of infection and thatL. cerviacts as a vector for the spread of infection ofBartonellaspp. Further research is needed to evaluate the role ofL. cerviin the transmission ofBartonellato animals and humans and the possible pathogenicity of these bacteria for humans and animals.


2021 ◽  
Author(s):  
Manu Kumar Gundappa ◽  
Thu-Hien To ◽  
Lars Grønvold ◽  
Samuel A M Martin ◽  
Sigbjørn Lien ◽  
...  

The long-term evolutionary impacts of whole genome duplication (WGD) are strongly influenced by the ensuing rediploidization process. Following autopolyploidization, rediploidization involves a transition from tetraploid to diploid meiotic pairing, allowing duplicated genes (ohnologues) to diverge genetically and functionally. Our understanding of autopolyploid rediploidization has been informed by a WGD event ancestral to salmonid fishes, where large genomic regions are characterized by temporally delayed rediploidization, allowing lineage-specific ohnologue sequence divergence in the major salmonid clades. Here, we investigate the long-term outcomes of autopolyploid rediploidization at genome-wide resolution, exploiting a recent 'explosion' of salmonid genome assemblies, including a new genome sequence for the huchen (Hucho hucho). We developed a genome alignment approach to capture duplicated regions across multiple species, allowing us to create 121,864 phylogenetic trees describing ohnologue divergence across salmonid evolution. Using molecular clock analysis, we show that 61% of the ancestral salmonid genome experienced an initial 'wave' of rediploidization in the late Cretaceous (85-106 Mya). This was followed by a period of relative genomic stasis lasting 17-39 My, where much of the genome remained in a tetraploid state. A second rediploidization wave began in the early Eocene and proceeded alongside species diversification, generating predictable patterns of lineage-specific ohnologue divergence, scaling in complexity with the number of speciation events. Finally, using gene set enrichment, gene expression, and codon-based selection analyses, we provide insights into potential functional outcomes of delayed rediploidization. Overall, this study enhances our understanding of delayed autopolyploid rediploidization and has broad implications for future studies of WGD events.


2020 ◽  
Vol 10 (9) ◽  
pp. 3041-3046
Author(s):  
Silas Tittes

Abstract The availability of whole genome sequencing data from multiple related populations creates opportunities to test sophisticated population genetic models of convergent adaptation. Recent work by Lee and Coop (2017) developed models to infer modes of convergent adaption at local genomic scales, providing a rich framework for assessing how selection has acted across multiple populations at the tested locus. Here I present, rdmc, an R package that builds on the existing software implementation of Lee and Coop (2017) that prioritizes ease of use, portability, and scalability. I demonstrate installation and comprehensive overview of the package’s current utilities.


2019 ◽  
Vol 25 (3) ◽  
pp. 307-325
Author(s):  
Florence Briton ◽  
Claire Macher ◽  
Mathieu Merzeréaud ◽  
Christelle Le Grand ◽  
Spyros Fifas ◽  
...  

AbstractWell-established single-species approaches are not adapted to the management of mixed fisheries where multiple species are simultaneously caught in unselective fishing operations. In particular, ignoring joint production when setting total allowable catches (TACs) for individual species is likely to lead to over-quota discards or, when discards are not allowed, to lost fishing opportunities. Furthermore, economic and social objectives have been poorly addressed in the design of fisheries harvest strategies, despite being an explicit objective of ecosystem-based fisheries management in many jurisdictions worldwide. We introduce the notion of operating space as the ensemble of reachable, single-species fishing mortality targets, given joint production in a mixed fishery. We then use the concept of eco-viability to identify TAC combinations which simultaneously account for multiple objectives. The approach is applied to the joint management of hake and sole fishing in the Bay of Biscay, also accounting for catches of Norway lobster, European seabass and anglerfish. Results show that fishing at the upper end of the MSY range for sole and slightly above Fmsy for hake can generate gains in terms of long-term economic viability of the fleets without impeding the biological viability of the stocks, nor the incentives for crews to remain in the fishery. We also identify reachable fishing mortality targets in the MSY ranges for these two species, given existing technical interactions.


2017 ◽  
Author(s):  
Jakob M. Goldmann ◽  
Vladimir B. Seplyarskiy ◽  
Wendy S.W. Wong ◽  
Thierry Vilboux ◽  
Dale L. Bodian ◽  
...  

Clustering of mutations has been found both in somatic mutations from cancer genomes and in germline de novo mutations (DNMs). We identified 1,755 clustered DNMs (cDNMs) within whole-genome sequencing data from 1,291 parent-offspring trios and investigated the underlying mutational mechanisms. We found that the number of clusters on the maternalallele was positively correlated with maternal age and that these consist of more individual mutations with larger intra-mutational distances compared to paternal clusters. More than 50% of maternal clusters were located on chromosomes 8, 9 and 16, in regions with an overall increased maternal mutation rate. Maternal clusters in these regions showed a distinct mutation signature characterized by C>G mutations. Finally, we found that maternal clusters associate with processes involving double-stranded-breaks (DSBs) such as meiotic gene conversions and de novo deletions events. These findings suggest accumulation of DSB-induced mutations throughout oocyte aging as an underlying mechanism leading to maternal mutation clusters.


2020 ◽  
Author(s):  
Xi Wang ◽  
Pär K Ingvarsson

AbstractDetecting natural selection is one of the major goals of evolutionary genomics. Here, we sequence whole genomes of 34 Picea abies individuals and quantify the amount of selection across the genome. Using an estimate of the distribution of fitness effects, we show that negative selection is very limited in coding regions, while positive selection is rare in coding regions but very strong in non-coding regions, suggesting the great importance of regulatory changes in evolution of Norway spruce. Additionally, we found a positive correlation between adaptive rate with recombination rate and a negative correlation between adaptive rate and gene density, suggesting a widespread influence from Hill-Robertson interference to efficiency of protein adaptation in P. abies. Finally, the distinct population statistics between genomic regions under either positive or balancing selection with that under neutral regions indicated impact from selection to genomic architecture of Norway spruce. Further gene ontology enrichment analysis for genes located in regions identified as undergoing either positive or long-term balancing selection also highlighted specific molecular functions and biological processes in that appear to be targets of selection in Norway spruce.


Sign in / Sign up

Export Citation Format

Share Document