scholarly journals selscan 2.0: scanning for sweeps in unphased data

2021 ◽  
Author(s):  
Zachary A Szpiech

Haplotype-based scans to identify recent and ongoing positive selection have become commonplace in evolutionary genomics studies of numerous species across the tree of life. However, the most widely adopted approaches require phased haplotypes to compute the key statistics. Here we release a major update to the selscan software that re-defines popular haplotype-based statistics for use with unphased "multi-locus genotype" data. We provide unphased implementations of iHS, nSL, XP-EHH, and XP-nSL and evaluate their performance across a range of important parameters in a generic demographic history. Source code and executables are available at https://www.github.com/szpiech/selscan.

2019 ◽  
Author(s):  
Lewis G. Spurgin ◽  
Mirte Bosse ◽  
Frank Adriaensen ◽  
Tamer Albayrak ◽  
Christos Barboutis ◽  
...  

AbstractA major aim of evolutionary biology is to understand why patterns of genomic diversity vary among populations and species. Large-scale genomic studies of widespread species are useful for studying how the environment and demographic history shape patterns of genomic divergence, and with the continually decreasing cost of sequencing and genotyping, such studies are now becoming feasible. Here, we carry out one of the most geographically comprehensive surveys of genomic variation in a wild vertebrate to date; the great tit (Parus major) HapMap project. We screened ca 500,000 SNP markers across 647 individuals from 29 populations, spanning almost the entire geographic range of the European great tit subspecies. We found that genome-wide variation was consistent with a recent colonisation across Europe from a single refugium in South-East Europe, with bottlenecks and reduced genetic diversity in island populations. Differentiation across the genome was highly heterogeneous, with clear “islands of differentiation” even among populations with very low levels of genome-wide differentiation. Low local recombination rate in the genome was a strong predictor of high local genomic differentiation (FST), especially in island and peripheral mainland populations, suggesting that the interplay between genetic drift and recombination is a key driver of highly heterogeneous differentiation landscapes. We also detected genomic outlier regions that were confined to one or more peripheral great tit populations, most likely as a result of recent directional selection at the range edges of this species. Haplotype-based measures of selection were also related to recombination rate, albeit less strongly, and highlighted population-specific sweeps that likely resulted from positive selection. These regions under positive selection contained candidate genes associated with morphology, thermal adaptation and colouration, providing promising avenues for future investigation. Our study highlights how comprehensive screens of genomic variation in wild organisms can provide unique insights into evolution.


2018 ◽  
Author(s):  
Claudia Arnedo-Pac ◽  
Loris Mularoni ◽  
Ferran Muiños ◽  
Abel Gonzalez-Perez ◽  
Nuria Lopez-Bigas

AbstractSummaryThe identification of the genomic alterations driving tumorigenesis is one of the main goals in oncogenomics research. Given the evolutionary principles of cancer development, computational methods that detect signals of positive selection in the pattern of tumor mutations have been effectively applied in the search for cancer genes. One of these signals is the abnormal clustering of mutations, which has been shown to be complementary to other signals in the detection of driver genes. We have developed OncodriveCLUSTL, a new sequence-based clustering algorithm to detect significant clustering signals across genomic regions. OncodriveCLUSTL is based on a local background model derived from the simulation of mutations accounting for the composition of tri- or penta-nucleotide context substitutions observed in the cohort under study. Our method is able to identify known clusters and bona-fide cancer drivers across cohorts of tumor whole-exomes, outperforming the existing OncodriveCLUST algorithm and complementing other methods based on different signals of positive selection. We show that OncodriveCLUSTL may be applied to the analysis of non-coding genomic elements and non-human mutations data.Availability and implementationOncodriveCLUSTL is available as an installable Python 3.5 package. The source code and running examples are freely available at https://bitbucket.org/bbglab/oncodriveclustl under GNU Affero General Public [email protected]


2017 ◽  
Author(s):  
Jiyun M. Moon ◽  
David M. Aronoff ◽  
John A. Capra ◽  
Patrick Abbot ◽  
Antonis Rokas

AbstractSialic acids are nine carbon sugars ubiquitously found on the surfaces of vertebrate cells and are involved in various immune response-related processes. In humans, at least 58 genes spanning diverse functions, from biosynthesis and activation to recycling and degradation, are involved in sialic acid biology. Because of their role in immunity, sialic acid biology genes have been hypothesized to exhibit elevated rates of evolutionary change. Consistent with this hypothesis, several genes involved in sialic acid biology have experienced higher rates of non-synonymous substitutions in the human lineage than their counterparts in other great apes, perhaps in response to ancient pathogens that infected hominins millions of years ago (paleopathogens). To test whether sialic acid biology genes have also experienced more recent positive selection during the evolution of the modern human lineage, reflecting adaptation to contemporary cosmopolitan or geographically-restricted pathogens, we examined whether their protein-coding regions showed evidence of recent hard and soft selective sweeps. This examination involved the calculation of four measures that quantify changes in allele frequency spectra, extent of population differentiation, and haplotype homozygosity caused by recent hard and soft selective sweeps for 55 sialic acid biology genes using publicly available whole genome sequencing data from 1,668 humans from three ethnic groups. To disentangle evidence for selection from confounding demographic effects, we compared the observed patterns in sialic acid biology genes to simulated sequences of the same length under a model of neutral evolution that takes into account human demographic history. We found that the patterns of genetic variation of most sialic acid biology genes did not significantly deviate from neutral expectations and were not significantly different among genes belonging to different functional categories. Those few sialic acid biology genes that significantly deviated from neutrality either experienced soft sweeps or population-specific hard sweeps. Interestingly, while most hard sweeps occurred on genes involved in sialic acid recognition, most soft sweeps involved genes associated with recycling, degradation and activation, transport, and transfer functions. We propose that the lack of signatures of recent positive selection for the majority of the sialic acid biology genes is consistent with the view that these genes regulate immune responses against ancient rather than contemporary cosmopolitan or geographically restricted pathogens.


Genomics ◽  
2020 ◽  
Vol 112 (6) ◽  
pp. 4722-4731
Author(s):  
Mohamed Emam ◽  
João Paulo Machado ◽  
Agostinho Antunes

2017 ◽  
Author(s):  
Santiago Sánchez-Ramírez ◽  
Jean-Marc Moncalvo

AbstractMany different evolutionary processes may be responsible for explaining natural variation within genomes, some of which include natural selection at the molecular level and changes in population size. Fungi are highly adaptable organisms, and their relatively small genomes and short generation times make them pliable for evolutionary genomic studies. However, adaptation in wild populations has been relatively less documented compared to experimental or clinical studies. Here, we analyzed DNA sequences from 502 putative single-copy orthologous genes in 63 samples that represent seven recently diverged North American Amanita (jacksonii-complex) lineages. For each gene and each species, we measured the genealogical sorting index (gsi) and infinite-site-based summary statistics, such as , and DTaj in coding and intron regions. MKT-based approaches and likelihood-ratio-test Kn/Ks models were used to measure natural selection in all coding sequences. Multi-locus (Extended) Bayesian Skyline Plots (eBSP) were used to model intraspecific demographic changes through time based on unlinked, putative neutral regions (introns). Most genes show evidence of long-term purifying selection, likely reflecting a functional bias implicit in single-copy genes. We find that two species have strongly negatively skewed Tajima’s D, while three other have a positive skew, corresponding well with patterns of demographic expansion and contraction. Standard MKT analyses resulted in a high incidence of near-zero α with a tendency towards negative values. In contrast, α estimates based on the distribution of fitness effects (DFE), which accounts for demographic effects and slightly deleterious mutations, suggest a higher proportion of sites fixed by positive selection. The difference was more notorious in species with expansion signatures or with historically low population sizes, evidencing the concealing effects of specific demographic histories. Finally, we attempt to mitigate Gene Ontology term overrepresentation, highlighting the potential adaptive or ecological roles of some genes under positive selection.


2020 ◽  
Vol 2 (3) ◽  
Author(s):  
Jessica Nye ◽  
Mayukh Mondal ◽  
Jaume Bertranpetit ◽  
Hafid Laayouni

Abstract After diverging, each chimpanzee subspecies has been the target of unique selective pressures. Here, we employ a machine learning approach to classify regions as under positive selection or neutrality genome-wide. The regions determined to be under selection reflect the unique demographic and adaptive history of each subspecies. The results indicate that effective population size is important for determining the proportion of the genome under positive selection. The chimpanzee subspecies share signals of selection in genes associated with immunity and gene regulation. With these results, we have created a selection map for each population that can be displayed in a genome browser (www.hsb.upf.edu/chimp_browser). This study is the first to use a detailed demographic history and machine learning to map selection genome-wide in chimpanzee. The chimpanzee selection map will improve our understanding of the impact of selection on closely related subspecies and will empower future studies of chimpanzee.


BMC Genomics ◽  
2014 ◽  
Vol 15 (1) ◽  
pp. 332 ◽  
Author(s):  
Xuanyao Liu ◽  
Woei-Yuh Saw ◽  
Mohammad Ali ◽  
Rick Twee-Hee Ong ◽  
Yik-Ying Teo

2020 ◽  
Vol 36 (19) ◽  
pp. 4970-4971
Author(s):  
Gonzalo Neira ◽  
Diego Cortez ◽  
Joaquin Jil ◽  
David S Holmes

Abstract Motivation There are about 600 available genome sequences of acidophilic organisms (grow at a pH < 5) from the three domains of the Tree of Life. Information about acidophiles is scattered over many heterogeneous sites making it extraordinarily difficult to link physiological traits with genomic data. We were motivated to generate a curated, searchable database to address this problem. Results AciDB 1.0 is a curated database of sequenced acidophiles that enables researchers to execute complex queries linking genomic features to growth data, environmental descriptions and taxonomic information. Availability and implementation AciDB 1.0 is freely available online at: http://AciDB.cl. The source code is released under an MIT license at: https://gitlab.com/Hawkline451/acidb/.


Sign in / Sign up

Export Citation Format

Share Document