An ABC method for whole-genome sequence data: inferring paleolithic and neolithic human expansions

AbstractSpecies generally undergo a complex demographic history, consisting, in particular, of multiple changes in population size. Genome-wide sequencing data are potentially highly informative for reconstructing this demographic history. A crucial point is to extract the relevant information from these very large datasets. Here we designed an approach for inferring past demographic events from a moderate number of fully sequenced genomes. Our new approach uses Approximate Bayesian Computation (ABC), a simulation-based statistical framework that allows (i) identifying the best demographic scenario among several competing scenarios, and (ii) estimating the best-fitting parameters under the chosen scenario. ABC relies on the computation of summary statistics. Using a cross-validation approach, we showed that statistics such as the lengths of haplotypes shared between individuals, or the decay of linkage disequilibrium with distance, can be combined with classical statistics (eg heterozygosity, Tajima’s D) to accurately infer complex demographic scenarios including bottlenecks and expansion periods. We also demonstrated the importance of simultaneously estimating the genotyping error rate. Applying our method on genome-wide human-sequence databases, we finally showed that a model consisting in a bottleneck followed by a Paleolithic and a Neolithic expansion was the most relevant for Eurasian populations.

Download Full-text

Diverse demographic histories in a guild of hymenopteran parasitoids

10.1101/2019.12.20.884403 ◽

2019 ◽

Author(s):

William Walton ◽

Graham N Stone ◽

Konrad Lohse

Keyword(s):

Population Size ◽

Sequence Data ◽

Demographic History ◽

Whole Genome Sequence ◽

Parasitoid Wasps ◽

Effective Population ◽

Glacial Cycles ◽

Genome Wide ◽

History Of ◽

Hymenopteran Parasitoids

AbstractSignatures of changes in population size have been detected in genome-wide variation in many species. However, the causes of such changes and the extent to which they are shared across co-distributed species remain poorly understood. During Pleistocene glacial maxima, many temperate European species were confined to southern refugia. While vicariance and range expansion processes associated with glacial cycles have been widely studied, little is known about the demographic history of refugial populations, and the extent and causes of demographic variation among codistributed species. We used whole genome sequence data to reconstruct and compare demographic histories during the Quaternary for Iberian refuge populations in a single ecological guild (seven species of chalcid parasitoid wasps associated with oak cynipid galls). We find support for large changes in effective population size (Ne) through the Pleistocene that coincide with major climate change events. However, there is little evidence that the timing, direction and magnitude of demographic change are shared across species, suggesting that demographic histories are largely idiosyncratic. Our results are compatible with the idea that specialist parasitoids attacking a narrow range of hosts experience greater fluctuations in Ne than generalists.

Download Full-text

Sporadic Occurrence of Recent Selective Sweeps from Standing Variation in Humans as Revealed by an Approximate Bayesian Computation Approach

Genetics ◽

10.1093/genetics/iyab161 ◽

2021 ◽

Author(s):

Guillaume Laval ◽

Etienne Patin ◽

Pierre Boutillier ◽

Lluis Quintana-Murci

Keyword(s):

Approximate Bayesian Computation ◽

Sequence Data ◽

Machine Learning Algorithms ◽

Whole Genome Sequence ◽

Bayesian Computation ◽

Human Adaptation ◽

Selective Sweeps ◽

Standing Variation ◽

Genome Wide ◽

Approximate Bayesian

Abstract During their dispersals over the last 100,000 years, modern humans have been exposed to a large variety of environments, resulting in genetic adaptation. While genome-wide scans for the footprints of positive Darwinian selection have increased knowledge of genes and functions potentially involved in human local adaptation, they have globally produced evidence of a limited contribution of selective sweeps in humans. Conversely, studies based on machine learning algorithms suggest that recent sweeps from standing variation are widespread in humans, an observation that has been recently questioned. Here, we sought to formally quantify the number of recent selective sweeps in humans, by leveraging approximate Bayesian computation and whole-genome sequence data. Our computer simulations revealed suitable ABC estimations, regardless of the frequency of the selected alleles at the onset of selection and the completion of sweeps. Under a model of recent selection from standing variation, we inferred that an average of 68 (from 56 to 79) and 140 (from 94 to 198) sweeps occurred over the last 100,000 years of human history, in African and Eurasian populations, respectively. The former estimation is compatible with human adaptation rates estimated since divergence with chimps, and reveal numbers of sweeps per generation per site in the range of values estimated in Drosophila. Our results confirm the rarity of selective sweeps in humans and show a low contribution of sweeps from standing variation to recent human adaptation.

Download Full-text

Inferring species compositions of complex fungal communities from long- and short-read sequence data

10.1101/2021.05.02.442318 ◽

2021 ◽

Author(s):

Yiheng Hu ◽

Laszlo Irinyi ◽

Minh Thuy Vi Hoang ◽

Tavish Eenjes ◽

Abigail Graetz ◽

...

Keyword(s):

Community Composition ◽

Pathogen Detection ◽

High Throughput Sequencing ◽

Sequence Data ◽

Whole Genome Sequence ◽

Composition Analysis ◽

Sequencing Data ◽

Species Classification ◽

Shotgun Metagenomics ◽

Query Coverage

Background: The kingdom fungi is crucial for life on earth and is highly diverse. Yet fungi are challenging to characterize. They can be difficult to culture and may be morphologically indistinct in culture. They can have complex genomes of over 1 Gb in size and are still underrepresented in whole genome sequence databases. Overall their description and analysis lags far behind other microbes such as bacteria. At the same time, classification of species via high throughput sequencing without prior purification is increasingly becoming the norm for pathogen detection, microbiome studies, and environmental monitoring. However, standardized procedures for characterizing unknown fungi from complex sequencing data have not yet been established. Results: We compared different metagenomics sequencing and analysis strategies for the identification of fungal species. Using two fungal mock communities of 44 phylogenetically diverse species, we compared species classification and community composition analysis pipelines using shotgun metagenomics and amplicon sequencing data generated from both short and long read sequencing technologies. We show that regardless of the sequencing methodology used, the highest accuracy of species identification was achieved by sequence alignment against a fungi-specific database. During the assessment of classification algorithms, we found that applying cut-offs to the query coverage of each read or contig significantly improved the classification accuracy and community composition analysis without significant data loss. Conclusion: Overall, our study expands the toolkit for identifying fungi by improving sequence-based fungal classification, and provides a practical guide for the design of metagenomics analyses.

Download Full-text

ALSgeneScanner: a pipeline for the analysis and interpretation of DNA NGS data of ALS patients

10.1101/378158 ◽

2018 ◽

Author(s):

Alfredo Iacoangeli ◽

Ahmad Al Khleifat ◽

William Sproviero ◽

Aleksey Shatunov ◽

Ashley R Jones ◽

...

Keyword(s):

Motor Neurons ◽

Health Care Professionals ◽

Sequence Data ◽

Whole Genome Sequence ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

Whole Exome ◽

Exome Sequence Data ◽

Als Patients ◽

Ngs Data

AbstractAmyotrophic lateral sclerosis (ALS, MND) is a neurodegenerative disease of upper and lower motor neurons resulting in death from neuromuscular respiratory failure, typically within two years of first symptoms. Genetic factors are an important cause of ALS, with variants in more than 25 genes having strong evidence, and weaker evidence available for variants in more than 120 genes. With the increasing availability of Next-Generation sequencing data, non-specialists, including health care professionals and patients, are obtaining their genomic information without a corresponding ability to analyse and interpret it. Furthermore, the relevance of novel or existing variants in ALS genes is not always apparent. Here we present ALSgeneScanner, a tool that is easy to install and use, able to provide an automatic, detailed, annotated report, on a list of ALS genes from whole genome sequence data in a few hours and whole exome sequence data in about one hour on a readily available mid-range computer. This will be of value to non-specialists and aid in the interpretation of the relevance of novel and existing variants identified in DNA sequencing data.

Download Full-text

Ethnically diverse urban transmission networks of Neisseria gonorrhoeae without evidence of HIV serosorting

Sexually Transmitted Infections ◽

10.1136/sextrans-2019-054025 ◽

2019 ◽

Vol 96 (2) ◽

pp. 106-109

Author(s):

Jayshree Dave ◽

John Paul ◽

Thomas Joshua Pasvol ◽

Andy Williams ◽

Fiona Warburton ◽

...

Keyword(s):

Neisseria Gonorrhoeae ◽

Ethnic Groups ◽

Antimicrobial Susceptibility ◽

Sequence Data ◽

Small Sample ◽

Whole Genome Sequence ◽

Whole Genome ◽

Sequencing Data ◽

Transmission Networks ◽

Hiv Serosorting

ObjectiveWe aimed to characterise gonorrhoea transmission patterns in a diverse urban population by linking genomic, epidemiological and antimicrobial susceptibility data.MethodsNeisseria gonorrhoeae isolates from patients attending sexual health clinics at Barts Health NHS Trust, London, UK, during an 11-month period underwent whole-genome sequencing and antimicrobial susceptibility testing. We combined laboratory and patient data to investigate the transmission network structure.ResultsOne hundred and fifty-eight isolates from 158 patients were available with associated descriptive data. One hundred and twenty-nine (82%) patients identified as male and 25 (16%) as female; four (3%) records lacked gender information. Self-described ethnicities were: 51 (32%) English/Welsh/Scottish; 33 (21%) white, other; 23 (15%) black British/black African/black, other; 12 (8%) Caribbean; 9 (6%) South Asian; 6 (4%) mixed ethnicity; and 10 (6%) other; data were missing for 14 (9%). Self-reported sexual orientations were 82 (52%) men who have sex with men (MSM); 49 (31%) heterosexual; 2 (1%) bisexual; data were missing for 25 individuals. Twenty-two (14%) patients were HIV positive. Whole-genome sequence data were generated for 151 isolates, which linked 75 (50%) patients to at least one other case. Using sequencing data, we found no evidence of transmission networks related to specific ethnic groups (p=0.64) or of HIV serosorting (p=0.35). Of 82 MSM/bisexual patients with sequencing data, 45 (55%) belonged to clusters of ≥2 cases, compared with 16/44 (36%) heterosexuals with sequencing data (p=0.06).ConclusionWe demonstrate links between 50% of patients in transmission networks using a relatively small sample in a large cosmopolitan city. We found no evidence of HIV serosorting. Our results do not support assortative selectivity as an explanation for differences in gonorrhoea incidence between ethnic groups.

Download Full-text

MTBseq: a comprehensive pipeline for whole genome sequence analysis of Mycobacterium tuberculosis complex isolates

PeerJ ◽

10.7717/peerj.5895 ◽

2018 ◽

Vol 6 ◽

pp. e5895 ◽

Cited By ~ 35

Author(s):

Thomas Andreas Kohl ◽

Christian Utpatel ◽

Viola Schleusener ◽

Maria Rosaria De Filippo ◽

Patrick Beckert ◽

...

Keyword(s):

Antibiotic Resistance ◽

Mycobacterium Tuberculosis ◽

Genome Sequence ◽

Sequence Data ◽

Whole Genome Sequence ◽

Whole Genome Sequencing Data ◽

Phylogenomic Analysis ◽

Whole Genome ◽

Sequencing Data ◽

Desktop Computer

Analyzing whole-genome sequencing data of Mycobacterium tuberculosis complex (MTBC) isolates in a standardized workflow enables both comprehensive antibiotic resistance profiling and outbreak surveillance with highest resolution up to the identification of recent transmission chains. Here, we present MTBseq, a bioinformatics pipeline for next-generation genome sequence data analysis of MTBC isolates. Employing a reference mapping based workflow, MTBseq reports detected variant positions annotated with known association to antibiotic resistance and performs a lineage classification based on phylogenetic single nucleotide polymorphisms (SNPs). When comparing multiple datasets, MTBseq provides a joint list of variants and a FASTA alignment of SNP positions for use in phylogenomic analysis, and identifies groups of related isolates. The pipeline is customizable, expandable and can be used on a desktop computer or laptop without any internet connection, ensuring mobile usage and data security. MTBseq and accompanying documentation is available from https://github.com/ngs-fzb/MTBseq_source.

Download Full-text

Whole-genome analysis of Malawian Plasmodium falciparum isolates identifies potential targets of allele-specific immunity to clinical malaria

10.1101/2020.09.16.20196253 ◽

2020 ◽

Author(s):

Zalak Shah ◽

Myo T Naung ◽

Kara A Moser ◽

Matthew Adams ◽

Andrea G Buchwald ◽

...

Keyword(s):

Plasmodium Falciparum ◽

Sequence Data ◽

Clinical Malaria ◽

Whole Genome Sequence ◽

Whole Genome ◽

Whole Genome Analysis ◽

Multiple Alleles ◽

Vaccine Candidates ◽

Genome Wide ◽

Allele Specific

Individuals acquire immunity to clinical malaria after repeated Plasmodium falciparum infections. This immunity to disease is thought to reflect the acquisition of a repertoire of responses to multiple alleles in diverse parasite antigens. In previous studies, we identified polymorphic sites within individual antigens that are associated with parasite immune evasion by examining antigen allele dynamics in individuals followed longitudinally. Here we expand this approach by analyzing genome-wide polymorphisms using whole genome sequence data from 140 parasite isolates representing malaria cases from a longitudinal study in Malawi and identify 25 genes that encode likely targets of naturally acquired immunity and that should be further characterized for their potential as vaccine candidates.

Download Full-text

Population-level genome-wide STR typing in Plasmodium species reveals higher resolution population structure and genetic diversity relative to SNP typing

10.1101/2021.05.19.444768 ◽

2021 ◽

Author(s):

Jiru Han ◽

Jacob E Munro ◽

Anthony Kocoski ◽

Alyssa E Barry ◽

Melanie Bahlo

Keyword(s):

Genetic Diversity ◽

Large Scale ◽

Tandem Repeats ◽

Plasmodium Species ◽

Whole Genome Sequence ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Genome Wide ◽

Field Samples

Short tandem repeats (STRs) are highly informative genetic markers that have been used extensively in population genetics analysis. They are an important source of genetic diversity and can also have functional impact. Despite the availability of bioinformatic methods that permit large-scale genome-wide genotyping of STRs from whole genome sequencing data, they have not previously been applied to sequencing data from large collections of malaria parasite field samples. Here, we have genotyped STRs using HipSTR in more than 3,000 Plasmodium falciparum and 174 Plasmodium vivax published whole-genome sequence data from samples collected across the globe. High levels of noise and variability in the resultant callset necessitated the development of a novel method for quality control of STR genotype calls. A set of high-quality STR loci (6,768 from P. falciparum and 3,496 from P. vivax) were used to study Plasmodium genetic diversity, population structures and genomic signatures of selection and these were compared to genome-wide single nucleotide polymorphism (SNP) genotyping data. In addition, the genome-wide information about genetic variation and other characteristics of STRs in P. falciparum and P. vivax have been made available in an interactive web-based R Shiny application PlasmoSTR (https://github.com/bahlolab/PlasmoSTR).

Download Full-text

Inferring the Joint Demographic History of Multiple Populations: Beyond the Diffusion Approximation

10.1101/103275 ◽

2017 ◽

Cited By ~ 2

Author(s):

Julien Jouganous ◽

Will Long ◽

Simon Gravel

Keyword(s):

Diffusion Approximation ◽

Sequence Data ◽

Demographic History ◽

Allele Frequencies ◽

Human Sequence ◽

Medical Study ◽

Joint Frequency ◽

Classical Models ◽

History Of ◽

Demographic Inference

AbstractUnderstanding variation in allele frequencies across populations is a central goal of population genetics. Classical models for the distribution of allele frequencies, using forward simulation, coalescent theory, or the diffusion approximation, have been applied extensively for demographic inference, medical study design, and evolutionary studies. Here we propose a tractable model of ordinary differential equations for the evolution of allele frequencies that is closely related to the diffusion approximation but avoids many of its limitations and approximations. We show that the approach is typically faster, more numerically stable, and more easily generalizable than the state-of-the-art software implementation of the diffusion approximation. We present a number of applications to human sequence data, including demographic inference with a five-population joint frequency spectrum and a discussion of the transferability of demographic histories across populations.

Download Full-text

PHARP: A pig haplotype reference panel for genotype imputation

10.1101/2021.06.03.446888 ◽

2021 ◽

Author(s):

Zhen Wang ◽

Zhenyang Zhang ◽

Zitao Chen ◽

Jiabao Sun ◽

Caiyun Cao ◽

...

Keyword(s):

Complex Traits ◽

Sequence Data ◽

Genotype Imputation ◽

Reference Panel ◽

Whole Genome Sequence ◽

Sequencing Data ◽

Large White ◽

Downstream Analysis ◽

Low Coverage ◽

Analytical Tools

Pigs not only function as a major meat source worldwide but also are commonly used as an animal model for studying human complex traits. A large haplotype reference panel has been used to facilitate efficient phasing and imputation of relatively sparse genome-wide microarray chips and low-coverage sequencing data. Using the imputed genotypes in the downstream analysis, such as GWASs, TWASs, eQTL mapping and genomic prediction (GS), is beneficial for obtaining novel findings. However, currently, there is still a lack of publicly available and high-quality pig reference panels with large sample sizes and high diversity, which greatly limits the application of genotype imputation in pigs. In response, we built the pig Haplotype Reference Panel (PHARP) database. PHARP provides a reference panel of 2,012 pig haplotypes at 34 million SNPs constructed using whole-genome sequence data from more than 49 studies of 71 pig breeds. It also provides Web-based analytical tools that allow researchers to carry out phasing and imputation consistently and efficiently. PHARP is freely accessible at http://alphaindex.zju.edu.cn/PHARP/index.php. We demonstrate its applicability for pig commercial 50K SNP arrays, by accurately imputing 2.6 billion genotypes at a concordance rate value of 0.971 in 81 Large White pigs (~ 17x sequencing coverage). We also applied our reference panel to impute the low-density SNP chip into the high-density data for three GWASs and found novel significantly associated SNPs that might be casual variants.

Download Full-text