selscan 2.0: scanning for sweeps in unphased data

Mapping Intimacies ◽

10.1101/2021.10.22.465497 ◽

2021 ◽

Author(s):

Zachary A Szpiech

Keyword(s):

Positive Selection ◽

Demographic History ◽

Source Code ◽

Tree Of Life ◽

Evolutionary Genomics ◽

Genotype Data ◽

Numerous Species

Haplotype-based scans to identify recent and ongoing positive selection have become commonplace in evolutionary genomics studies of numerous species across the tree of life. However, the most widely adopted approaches require phased haplotypes to compute the key statistics. Here we release a major update to the selscan software that re-defines popular haplotype-based statistics for use with unphased "multi-locus genotype" data. We provide unphased implementations of iHS, nSL, XP-EHH, and XP-nSL and evaluate their performance across a range of important parameters in a generic demographic history. Source code and executables are available at https://www.github.com/szpiech/selscan.

Download Full-text

The great tit HapMap project: a continental-scale analysis of genomic variation in a songbird

10.1101/561399 ◽

2019 ◽

Cited By ~ 3

Author(s):

Lewis G. Spurgin ◽

Mirte Bosse ◽

Frank Adriaensen ◽

Tamer Albayrak ◽

Christos Barboutis ◽

...

Keyword(s):

Positive Selection ◽

Recombination Rate ◽

Demographic History ◽

Great Tit ◽

Genomic Variation ◽

Genomic Diversity ◽

Hapmap Project ◽

Widespread Species ◽

Continental Scale ◽

Genome Wide

AbstractA major aim of evolutionary biology is to understand why patterns of genomic diversity vary among populations and species. Large-scale genomic studies of widespread species are useful for studying how the environment and demographic history shape patterns of genomic divergence, and with the continually decreasing cost of sequencing and genotyping, such studies are now becoming feasible. Here, we carry out one of the most geographically comprehensive surveys of genomic variation in a wild vertebrate to date; the great tit (Parus major) HapMap project. We screened ca 500,000 SNP markers across 647 individuals from 29 populations, spanning almost the entire geographic range of the European great tit subspecies. We found that genome-wide variation was consistent with a recent colonisation across Europe from a single refugium in South-East Europe, with bottlenecks and reduced genetic diversity in island populations. Differentiation across the genome was highly heterogeneous, with clear “islands of differentiation” even among populations with very low levels of genome-wide differentiation. Low local recombination rate in the genome was a strong predictor of high local genomic differentiation (FST), especially in island and peripheral mainland populations, suggesting that the interplay between genetic drift and recombination is a key driver of highly heterogeneous differentiation landscapes. We also detected genomic outlier regions that were confined to one or more peripheral great tit populations, most likely as a result of recent directional selection at the range edges of this species. Haplotype-based measures of selection were also related to recombination rate, albeit less strongly, and highlighted population-specific sweeps that likely resulted from positive selection. These regions under positive selection contained candidate genes associated with morphology, thermal adaptation and colouration, providing promising avenues for future investigation. Our study highlights how comprehensive screens of genomic variation in wild organisms can provide unique insights into evolution.

Download Full-text

Using identity by descent estimation with dense genotype data to detect positive selection

European Journal of Human Genetics ◽

10.1038/ejhg.2012.148 ◽

2012 ◽

Vol 21 (2) ◽

pp. 205-211 ◽

Cited By ~ 28

Author(s):

Lide Han ◽

Mark Abney

Keyword(s):

Positive Selection ◽

Genotype Data ◽

Identity By Descent

Download Full-text

OncodriveCLUSTL: a sequence-based clustering method to identify cancer drivers

10.1101/500132 ◽

2018 ◽

Author(s):

Claudia Arnedo-Pac ◽

Loris Mularoni ◽

Ferran Muiños ◽

Abel Gonzalez-Perez ◽

Nuria Lopez-Bigas

Keyword(s):

Positive Selection ◽

Clustering Algorithm ◽

Source Code ◽

Cancer Genes ◽

Driver Genes ◽

Local Background ◽

Nucleotide Context ◽

Bona Fide ◽

Cancer Drivers ◽

Genomic Regions

AbstractSummaryThe identification of the genomic alterations driving tumorigenesis is one of the main goals in oncogenomics research. Given the evolutionary principles of cancer development, computational methods that detect signals of positive selection in the pattern of tumor mutations have been effectively applied in the search for cancer genes. One of these signals is the abnormal clustering of mutations, which has been shown to be complementary to other signals in the detection of driver genes. We have developed OncodriveCLUSTL, a new sequence-based clustering algorithm to detect significant clustering signals across genomic regions. OncodriveCLUSTL is based on a local background model derived from the simulation of mutations accounting for the composition of tri- or penta-nucleotide context substitutions observed in the cohort under study. Our method is able to identify known clusters and bona-fide cancer drivers across cohorts of tumor whole-exomes, outperforming the existing OncodriveCLUST algorithm and complementing other methods based on different signals of positive selection. We show that OncodriveCLUSTL may be applied to the analysis of non-coding genomic elements and non-human mutations data.Availability and implementationOncodriveCLUSTL is available as an installable Python 3.5 package. The source code and running examples are freely available at https://bitbucket.org/bbglab/oncodriveclustl under GNU Affero General Public [email protected]

Download Full-text

Examination of signatures of recent positive selection on genes involved in human sialic acid biology

10.1101/137034 ◽

2017 ◽

Author(s):

Jiyun M. Moon ◽

David M. Aronoff ◽

John A. Capra ◽

Patrick Abbot ◽

Antonis Rokas

Keyword(s):

Sialic Acid ◽

Positive Selection ◽

Transfer Functions ◽

Demographic History ◽

Neutral Evolution ◽

Human Lineage ◽

Selective Sweeps ◽

Frequency Spectra ◽

Recent Positive Selection ◽

Soft Selective Sweeps

AbstractSialic acids are nine carbon sugars ubiquitously found on the surfaces of vertebrate cells and are involved in various immune response-related processes. In humans, at least 58 genes spanning diverse functions, from biosynthesis and activation to recycling and degradation, are involved in sialic acid biology. Because of their role in immunity, sialic acid biology genes have been hypothesized to exhibit elevated rates of evolutionary change. Consistent with this hypothesis, several genes involved in sialic acid biology have experienced higher rates of non-synonymous substitutions in the human lineage than their counterparts in other great apes, perhaps in response to ancient pathogens that infected hominins millions of years ago (paleopathogens). To test whether sialic acid biology genes have also experienced more recent positive selection during the evolution of the modern human lineage, reflecting adaptation to contemporary cosmopolitan or geographically-restricted pathogens, we examined whether their protein-coding regions showed evidence of recent hard and soft selective sweeps. This examination involved the calculation of four measures that quantify changes in allele frequency spectra, extent of population differentiation, and haplotype homozygosity caused by recent hard and soft selective sweeps for 55 sialic acid biology genes using publicly available whole genome sequencing data from 1,668 humans from three ethnic groups. To disentangle evidence for selection from confounding demographic effects, we compared the observed patterns in sialic acid biology genes to simulated sequences of the same length under a model of neutral evolution that takes into account human demographic history. We found that the patterns of genetic variation of most sialic acid biology genes did not significantly deviate from neutral expectations and were not significantly different among genes belonging to different functional categories. Those few sialic acid biology genes that significantly deviated from neutrality either experienced soft sweeps or population-specific hard sweeps. Interestingly, while most hard sweeps occurred on genes involved in sialic acid recognition, most soft sweeps involved genes associated with recycling, degradation and activation, transport, and transfer functions. We propose that the lack of signatures of recent positive selection for the majority of the sialic acid biology genes is consistent with the view that these genes regulate immune responses against ancient rather than contemporary cosmopolitan or geographically restricted pathogens.

Download Full-text

Evolutionary genomics of mammalian lung cancer genes reveals signatures of positive selection in APC, RB1 and TP53

Genomics ◽

10.1016/j.ygeno.2020.08.020 ◽

2020 ◽

Vol 112 (6) ◽

pp. 4722-4731

Author(s):

Mohamed Emam ◽

João Paulo Machado ◽

Agostinho Antunes

Keyword(s):

Lung Cancer ◽

Positive Selection ◽

Evolutionary Genomics ◽

Cancer Genes ◽

Mammalian Lung

Download Full-text

Functional Bias and Demographic History Obscure Patterns of Selection among Single-Copy Genes in a Fungal Species Complex

10.1101/107326 ◽

2017 ◽

Author(s):

Santiago Sánchez-Ramírez ◽

Jean-Marc Moncalvo

Keyword(s):

Natural Selection ◽

Positive Selection ◽

Dna Sequences ◽

Demographic History ◽

Purifying Selection ◽

Single Copy ◽

Fungal Species ◽

Ratio Test ◽

Ontology Term ◽

Slightly Deleterious Mutations

AbstractMany different evolutionary processes may be responsible for explaining natural variation within genomes, some of which include natural selection at the molecular level and changes in population size. Fungi are highly adaptable organisms, and their relatively small genomes and short generation times make them pliable for evolutionary genomic studies. However, adaptation in wild populations has been relatively less documented compared to experimental or clinical studies. Here, we analyzed DNA sequences from 502 putative single-copy orthologous genes in 63 samples that represent seven recently diverged North American Amanita (jacksonii-complex) lineages. For each gene and each species, we measured the genealogical sorting index (gsi) and infinite-site-based summary statistics, such as , and DTaj in coding and intron regions. MKT-based approaches and likelihood-ratio-test Kn/Ks models were used to measure natural selection in all coding sequences. Multi-locus (Extended) Bayesian Skyline Plots (eBSP) were used to model intraspecific demographic changes through time based on unlinked, putative neutral regions (introns). Most genes show evidence of long-term purifying selection, likely reflecting a functional bias implicit in single-copy genes. We find that two species have strongly negatively skewed Tajima’s D, while three other have a positive skew, corresponding well with patterns of demographic expansion and contraction. Standard MKT analyses resulted in a high incidence of near-zero α with a tendency towards negative values. In contrast, α estimates based on the distribution of fitness effects (DFE), which accounts for demographic effects and slightly deleterious mutations, suggest a higher proportion of sites fixed by positive selection. The difference was more notorious in species with expansion signatures or with historically low population sizes, evidencing the concealing effects of specific demographic histories. Finally, we attempt to mitigate Gene Ontology term overrepresentation, highlighting the potential adaptive or ecological roles of some genes under positive selection.

Download Full-text

A fully integrated machine learning scan of selection in the chimpanzee genome

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa061 ◽

2020 ◽

Vol 2 (3) ◽

Author(s):

Jessica Nye ◽

Mayukh Mondal ◽

Jaume Bertranpetit ◽

Hafid Laayouni

Keyword(s):

Machine Learning ◽

Positive Selection ◽

Demographic History ◽

Effective Population ◽

Fully Integrated ◽

Genome Wide ◽

A Genome ◽

Machine Learning Approach ◽

History Of ◽

The Impact

Abstract After diverging, each chimpanzee subspecies has been the target of unique selective pressures. Here, we employ a machine learning approach to classify regions as under positive selection or neutrality genome-wide. The regions determined to be under selection reflect the unique demographic and adaptive history of each subspecies. The results indicate that effective population size is important for determining the proportion of the genome under positive selection. The chimpanzee subspecies share signals of selection in genes associated with immunity and gene regulation. With these results, we have created a selection map for each population that can be displayed in a genome browser (www.hsb.upf.edu/chimp_browser). This study is the first to use a detailed demographic history and machine learning to map selection genome-wide in chimpanzee. The chimpanzee selection map will improve our understanding of the impact of selection on closely related subspecies and will empower future studies of chimpanzee.

Download Full-text

Evaluating the possibility of detecting evidence of positive selection across Asia with sparse genotype data from the HUGO Pan-Asian SNP Consortium

BMC Genomics ◽

10.1186/1471-2164-15-332 ◽

2014 ◽

Vol 15 (1) ◽

pp. 332 ◽

Cited By ~ 8

Author(s):

Xuanyao Liu ◽

Woei-Yuh Saw ◽

Mohammad Ali ◽

Rick Twee-Hee Ong ◽

Yik-Ying Teo

Keyword(s):

Positive Selection ◽

Genotype Data

Download Full-text

AciDB 1.0: a database of acidophilic organisms, their genomic information and associated metadata

Bioinformatics ◽

10.1093/bioinformatics/btaa638 ◽

2020 ◽

Vol 36 (19) ◽

pp. 4970-4971

Author(s):

Gonzalo Neira ◽

Diego Cortez ◽

Joaquin Jil ◽

David S Holmes

Keyword(s):

Source Code ◽

Genomic Data ◽

Tree Of Life ◽

Physiological Traits ◽

Genomic Information ◽

Growth Data ◽

Genome Sequences ◽

Genomic Features ◽

Searchable Database ◽

Complex Queries

Abstract Motivation There are about 600 available genome sequences of acidophilic organisms (grow at a pH < 5) from the three domains of the Tree of Life. Information about acidophiles is scattered over many heterogeneous sites making it extraordinarily difficult to link physiological traits with genomic data. We were motivated to generate a curated, searchable database to address this problem. Results AciDB 1.0 is a curated database of sequenced acidophiles that enables researchers to execute complex queries linking genomic features to growth data, environmental descriptions and taxonomic information. Availability and implementation AciDB 1.0 is freely available online at: http://AciDB.cl. The source code is released under an MIT license at: https://gitlab.com/Hawkline451/acidb/.

Download Full-text

Genomic regions exhibiting positive selection identified from dense genotype data

Genome Research ◽

10.1101/gr.4326505 ◽

2005 ◽

Vol 15 (11) ◽

pp. 1553-1565 ◽

Cited By ~ 161

Author(s):

C. S. Carlson

Keyword(s):

Positive Selection ◽

Genotype Data ◽

Genomic Regions

Download Full-text