scholarly journals Bait-ER: a Bayesian method to detect targets of selection in Evolve-and-Resequence experiments

Author(s):  
Carolina Barata ◽  
Rui Borges ◽  
Carolin Kosiol

For over a decade, experimental evolution has been combined with high-throughput sequencing techniques in so-called Evolve-and-Resequence (E&R) experiments. This allows testing for selection in populations kept in the laboratory under given experimental conditions. However, identifying signatures of adaptation in E&R datasets is far from trivial, and it is still necessary to develop more efficient and statistically sound methods for detecting selection in genome-wide data. Here, we present Bait-ER - a fully Bayesian approach based on the Moran model of allele evolution to estimate selection coefficients from E&R experiments. The model has overlapping generations, a feature that describes several experimental designs found in the literature. We tested our method under several different demographic and experimental conditions to assess its accuracy and precision, and it performs well in most scenarios. However, some care must be taken when analysing specific allele trajectories, particularly those where drift largely dominates and starting frequencies are low. We compare our method with other available software and report that ours has generally high accuracy even for very difficult trajectories. Furthermore, our approach avoids the computational burden of simulating an empirical null distribution, outperforming available software in terms of computational time and facilitating its use on genome-wide data. We implemented and released our method in a new open-source software package that can be accessed at https://github.com/mrborges23/Bait-ER.

2020 ◽  
Author(s):  
Carolina Barata ◽  
Rui Borges ◽  
Carolin Kosiol

For over a decade, experimental evolution has been combined with high-throughput sequencing techniques in so-called Evolve-and-Resequence (E&R) experiments. This allows testing for selection in populations kept in the laboratory under given experimental conditions. However, identifying signatures of adaptation in E&R datasets is far from trivial, and it is still necessary to develop more efficient and statistically sound methods for detecting selection in genome-wide data. Here, we present Bait-ER - a fully Bayesian approach based on the Moran model of allele evolution to estimate selection coefficients from E&R experiments. The model has overlapping generations, a feature that describes several experimental designs found in the literature. We tested our method under several different demographic and experimental conditions to assess its accuracy and precision, and it performs well in most scenarios. However, some care must be taken when analysing specific allele trajectories, particularly those where drift largely dominates and starting frequencies are low. We compare our method with other available software and report that ours has generally high accuracy even for very difficult trajectories. Furthermore, our approach avoids the computational burden of simulating an empirical null distribution, outperforming available software in terms of computational time and facilitating its use on genome-wide data. We implemented and released our method in a new open-source software package that can be accessed at https://github.com/mrborges23/Bait-ER.


PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e2209 ◽  
Author(s):  
Georgios Georgiou ◽  
Simon J. van Heeringen

Summary.In this article we describe fluff, a software package that allows for simple exploration, clustering and visualization of high-throughput sequencing data mapped to a reference genome. The package contains three command-line tools to generate publication-quality figures in an uncomplicated manner using sensible defaults. Genome-wide data can be aggregated, clustered and visualized in a heatmap, according to different clustering methods. This includes a predefined setting to identify dynamic clusters between different conditions or developmental stages. Alternatively, clustered data can be visualized in a bandplot. Finally, fluff includes a tool to generate genomic profiles. As command-line tools, the fluff programs can easily be integrated into standard analysis pipelines. The installation is straightforward and documentation is available athttp://fluff.readthedocs.org.Availability.fluff is implemented in Python and runs on Linux. The source code is freely available for download athttps://github.com/simonvh/fluff.


2018 ◽  
Author(s):  
Jane Hosegood ◽  
Emily Humble ◽  
Rob Ogden ◽  
Mark de Bruyn ◽  
Si Creer ◽  
...  

AbstractPractical biodiversity conservation relies on delineation of biologically meaningful units, particularly with respect to global conventions and regulatory frameworks. Traditional approaches have typically relied on morphological observation, resulting in artificially broad delineations and non-optimal species units for conservation. More recently, species delimitation methods have been revolutionised with High-Throughput Sequencing approaches, allowing study of diversity within species radiations using genome-wide data. The highly mobile elasmobranchs, manta and devil rays (Mobulaspp.), are threatened globally by targeted and bycatch fishing pressures resulting in recent protection under several global conventions. However, a lack of global data, morphological similarities, a succession of recent taxonomic changes and ineffectual traceability measures combine to impede development and implementation of a coherent and enforceable conservation strategy. Here, we generate genome-wide Single Nucleotide Polymorphism (SNP) data from among the most globally and taxonomically representative set of mobulid tissues. The resulting phylogeny and delimitation of species units represents the most comprehensive assessment of mobulid diversity with molecular data to date. We find a mismatch between current species classifications, and optimal species units for effective conservation. Specifically, we find robust evidence for an undescribed species of manta ray in the Gulf of Mexico and show that species recently synonymised are reproductively isolated. Further resolution is achieved at the population level, where cryptic diversity is detected in geographically distinct populations, and indicates potential for future traceability work determining regional location of catch. We estimate the optimal species tree and uncover substantial incomplete lineage sorting, where standing variation in extinct ancestral populations is identified as a driver of phylogenetic uncertainty, with further conservation implications. Our study provides a framework for molecular genetic species delimitation that is relevant to wide-ranging taxa of conservation concern, and highlights the potential for genomic data to support effective management, conservation and law enforcement strategies.


2020 ◽  
Vol 21 (16) ◽  
pp. 5778 ◽  
Author(s):  
Neetika Nath ◽  
Lisa Hagenau ◽  
Stefan Weiss ◽  
Ana Tzvetkova ◽  
Lars R. Jensen ◽  
...  

While ionizing radiation (IR) is a powerful tool in medical diagnostics, nuclear medicine, and radiology, it also is a serious threat to the integrity of genetic material. Mutagenic effects of IR to the human genome have long been the subject of research, yet still comparatively little is known about the genome-wide effects of IR exposure on the DNA-sequence level. In this study, we employed high throughput sequencing technologies to investigate IR-induced DNA alterations in human gingiva fibroblasts (HGF) that were acutely exposed to 0.5, 2, and 10 Gy of 240 kV X-radiation followed by repair times of 16 h or 7 days before whole-genome sequencing (WGS). Our analysis of the obtained WGS datasets revealed patterns of IR-induced variant (SNV and InDel) accumulation across the genome, within chromosomes as well as around the borders of topologically associating domains (TADs). Chromosome 19 consistently accumulated the highest SNVs and InDels events. Translocations showed variable patterns but with recurrent chromosomes of origin (e.g., Chr7 and Chr16). IR-induced InDels showed a relative increase in number relative to SNVs and a characteristic signature with respect to the frequency of triplet deletions in areas without repetitive or microhomology features. Overall experimental conditions and datasets the majority of SNVs per genome had no or little predicted functional impact with a maximum of 62, showing damaging potential. A dose-dependent effect of IR was surprisingly not apparent. We also observed a significant reduction in transition/transversion (Ti/Tv) ratios for IR-dependent SNVs, which could point to a contribution of the mismatch repair (MMR) system that strongly favors the repair of transitions over transversions, to the IR-induced DNA-damage response in human cells. Taken together, our results show the presence of distinguishable characteristic patterns of IR-induced DNA-alterations on a genome-wide level and implicate DNA-repair mechanisms in the formation of these signatures.


2016 ◽  
Author(s):  
Georgios Georgiou ◽  
Simon J. van Heeringen

AbstractSummaryIn this application note we describe fluff, a software package that allows for simple exploration, clustering and visualization of high-throughput sequencing data mapped to a reference genome. The package contains three command-line tools to generate publication-quality figures in an uncomplicated manner using sensible defaults. Genome-wide data can be aggregated, clustered and visualized in a heatmap, according to different clustering methods. This includes a predefined setting to identify dynamic clusters between different conditions or developmental stages. Alternatively, clustered data can be visualized in a bandplot. Finally, fluff includes a tool to generate genomic profiles. As command-line tools, the fluff programs can easily be integrated into standard analysis pipelines. The installation is straightforward and documentation is available at http://fluff.readthedocs.org.Availabilityfluff is implemented in Python and runs on Linux. The source code is freely available for download at http://github.com/simonvh/[email protected]


2021 ◽  
Vol 7 (3) ◽  
pp. eabd9036
Author(s):  
Sara Saez-Atienzar ◽  
Sara Bandres-Ciga ◽  
Rebekah G. Langston ◽  
Jonggeol J. Kim ◽  
Shing Wan Choi ◽  
...  

Despite the considerable progress in unraveling the genetic causes of amyotrophic lateral sclerosis (ALS), we do not fully understand the molecular mechanisms underlying the disease. We analyzed genome-wide data involving 78,500 individuals using a polygenic risk score approach to identify the biological pathways and cell types involved in ALS. This data-driven approach identified multiple aspects of the biology underlying the disease that resolved into broader themes, namely, neuron projection morphogenesis, membrane trafficking, and signal transduction mediated by ribonucleotides. We also found that genomic risk in ALS maps consistently to GABAergic interneurons and oligodendrocytes, as confirmed in human single-nucleus RNA-seq data. Using two-sample Mendelian randomization, we nominated six differentially expressed genes (ATG16L2, ACSL5, MAP1LC3A, MAPKAPK3, PLXNB2, and SCFD1) within the significant pathways as relevant to ALS. We conclude that the disparate genetic etiologies of this fatal neurological disease converge on a smaller number of final common pathways and cell types.


2021 ◽  
Vol 7 (13) ◽  
pp. eabe4414
Author(s):  
Guido Alberto Gnecchi-Ruscone ◽  
Elmira Khussainova ◽  
Nurzhibek Kahbatkyzy ◽  
Lyazzat Musralina ◽  
Maria A. Spyrou ◽  
...  

The Scythians were a multitude of horse-warrior nomad cultures dwelling in the Eurasian steppe during the first millennium BCE. Because of the lack of first-hand written records, little is known about the origins and relations among the different cultures. To address these questions, we produced genome-wide data for 111 ancient individuals retrieved from 39 archaeological sites from the first millennia BCE and CE across the Central Asian Steppe. We uncovered major admixture events in the Late Bronze Age forming the genetic substratum for two main Iron Age gene-pools emerging around the Altai and the Urals respectively. Their demise was mirrored by new genetic turnovers, linked to the spread of the eastern nomad empires in the first centuries CE. Compared to the high genetic heterogeneity of the past, the homogenization of the present-day Kazakhs gene pool is notable, likely a result of 400 years of strict exogamous social rules.


GigaScience ◽  
2021 ◽  
Vol 10 (1) ◽  
Author(s):  
Taras K Oleksyk ◽  
Walter W Wolfsberger ◽  
Alexandra M Weber ◽  
Khrystyna Shchubelka ◽  
Olga T Oleksyk ◽  
...  

Abstract Background The main goal of this collaborative effort is to provide genome-wide data for the previously underrepresented population in Eastern Europe, and to provide cross-validation of the data from genome sequences and genotypes of the same individuals acquired by different technologies. We collected 97 genome-grade DNA samples from consented individuals representing major regions of Ukraine that were consented for public data release. BGISEQ-500 sequence data and genotypes by an Illumina GWAS chip were cross-validated on multiple samples and additionally referenced to 1 sample that has been resequenced by Illumina NovaSeq6000 S4 at high coverage. Results The genome data have been searched for genomic variation represented in this population, and a number of variants have been reported: large structural variants, indels, copy number variations, single-nucletide polymorphisms, and microsatellites. To our knowledge, this study provides the largest to-date survey of genetic variation in Ukraine, creating a public reference resource aiming to provide data for medical research in a large understudied population. Conclusions Our results indicate that the genetic diversity of the Ukrainian population is uniquely shaped by evolutionary and demographic forces and cannot be ignored in future genetic and biomedical studies. These data will contribute a wealth of new information bringing forth a wealth of novel, endemic and medically related alleles.


Nature ◽  
2021 ◽  
Vol 592 (7853) ◽  
pp. 253-257 ◽  
Author(s):  
Mateja Hajdinjak ◽  
Fabrizio Mafessoni ◽  
Laurits Skov ◽  
Benjamin Vernot ◽  
Alexander Hübner ◽  
...  

AbstractModern humans appeared in Europe by at least 45,000 years ago1–5, but the extent of their interactions with Neanderthals, who disappeared by about 40,000 years ago6, and their relationship to the broader expansion of modern humans outside Africa are poorly understood. Here we present genome-wide data from three individuals dated to between 45,930 and 42,580 years ago from Bacho Kiro Cave, Bulgaria1,2. They are the earliest Late Pleistocene modern humans known to have been recovered in Europe so far, and were found in association with an Initial Upper Palaeolithic artefact assemblage. Unlike two previously studied individuals of similar ages from Romania7 and Siberia8 who did not contribute detectably to later populations, these individuals are more closely related to present-day and ancient populations in East Asia and the Americas than to later west Eurasian populations. This indicates that they belonged to a modern human migration into Europe that was not previously known from the genetic record, and provides evidence that there was at least some continuity between the earliest modern humans in Europe and later people in Eurasia. Moreover, we find that all three individuals had Neanderthal ancestors a few generations back in their family history, confirming that the first European modern humans mixed with Neanderthals and suggesting that such mixing could have been common.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
An Zheng ◽  
Michael Lamkin ◽  
Yutong Qiu ◽  
Kevin Ren ◽  
Alon Goren ◽  
...  

Abstract Background A major challenge in evaluating quantitative ChIP-seq analyses, such as peak calling and differential binding, is a lack of reliable ground truth data. Accurate simulation of ChIP-seq data can mitigate this challenge, but existing frameworks are either too cumbersome to apply genome-wide or unable to model a number of important experimental conditions in ChIP-seq. Results We present ChIPs, a toolkit for rapidly simulating ChIP-seq data using statistical models of key experimental steps. We demonstrate how ChIPs can be used for a range of applications, including benchmarking analysis tools and evaluating the impact of various experimental parameters. ChIPs is implemented as a standalone command-line program written in C++ and is available from https://github.com/gymreklab/chips. Conclusions ChIPs is an efficient ChIP-seq simulation framework that generates realistic datasets over a flexible range of experimental conditions. It can serve as an important component in various ChIP-seq analyses where ground truth data are needed.


Sign in / Sign up

Export Citation Format

Share Document