scholarly journals Robust and scalable inference of population history from hundreds of unphased whole genomes

2016 ◽  
Vol 49 (2) ◽  
pp. 303-309 ◽  
Author(s):  
Jonathan Terhorst ◽  
John A Kamm ◽  
Yun S Song
2020 ◽  
Vol 117 (17) ◽  
pp. 9458-9465 ◽  
Author(s):  
Daniel N. Harris ◽  
Michael D. Kessler ◽  
Amol C. Shetty ◽  
Daniel E. Weeks ◽  
Ryan L. Minster ◽  
...  

Archaeological studies estimate the initial settlement of Samoa at 2,750 to 2,880 y ago and identify only limited settlement and human modification to the landscape until about 1,000 to 1,500 y ago. At this point, a complex history of migration is thought to have begun with the arrival of people sharing ancestry with Near Oceanic groups (i.e., Austronesian-speaking and Papuan-speaking groups), and was then followed by the arrival of non-Oceanic groups during European colonialism. However, the specifics of this peopling are not entirely clear from the archaeological and anthropological records, and is therefore a focus of continued debate. To shed additional light on the Samoan population history that this peopling reflects, we employ a population genetic approach to analyze 1,197 Samoan high-coverage whole genomes. We identify population splits between the major Samoan islands and detect asymmetrical gene flow to the capital city. We also find an extreme bottleneck until about 1,000 y ago, which is followed by distinct expansions across the islands and subsequent bottlenecks consistent with European colonization. These results provide for an increased understanding of Samoan population history and the dynamics that inform it, and also demonstrate how rapid demographic processes can shape modern genomes.


mBio ◽  
2014 ◽  
Vol 5 (6) ◽  
Author(s):  
Jessica Hedge ◽  
Daniel J. Wilson

ABSTRACT Phylogenetic inference in bacterial genomics is fundamental to understanding problems such as population history, antimicrobial resistance, and transmission dynamics. The field has been plagued by an apparent state of contradiction since the distorting effects of recombination on phylogeny were discovered more than a decade ago. Researchers persist with detailed phylogenetic analyses while simultaneously acknowledging that recombination seriously misleads inference of population dynamics and selection. Here we resolve this paradox by showing that phylogenetic tree topologies based on whole genomes robustly reconstruct the clonal frame topology but that branch lengths are badly skewed. Surprisingly, removing recombining sites can exacerbate branch length distortion caused by recombination. IMPORTANCE Phylogenetic tree reconstruction is a popular approach for understanding the relatedness of bacteria in a population from differences in their genome sequences. However, bacteria frequently exchange regions of their genomes by a process called homologous recombination, which violates a fundamental assumption of phylogenetic methods. Since many researchers continue to use phylogenetics for recombining bacteria, it is important to understand how recombination affects the conclusions drawn from these analyses. We find that whole-genome sequences afford great accuracy in reconstructing evolutionary relationships despite concerns surrounding the presence of recombination, but the branch lengths of the phylogenetic tree are indeed badly distorted. Surprisingly, methods to reduce the impact of recombination on branch lengths can exacerbate the problem.


2020 ◽  
Vol 12 (12) ◽  
pp. 2535-2551
Author(s):  
Melanie Parejo ◽  
David Wragg ◽  
Dora Henriques ◽  
Jean-Daniel Charrière ◽  
Andone Estonba

Abstract Historical specimens in museum collections provide opportunities to gain insights into the genomic past. For the Western honey bee, Apis mellifera L., this is particularly important because its populations are currently under threat worldwide and have experienced many changes in management and environment over the last century. Using Swiss Apis mellifera mellifera as a case study, our research provides important insights into the genetic diversity of native honey bees prior to the industrial-scale introductions and trade of non-native stocks during the 20th century—the onset of intensive commercial breeding and the decline of wild honey bees following the arrival of Varroa destructor. We sequenced whole-genomes of 22 honey bees from the Natural History Museum in Bern collected in Switzerland, including the oldest A. mellifera sample ever sequenced. We identify both, a historic and a recent migrant, natural or human-mediated, which corroborates with the population history of honey bees in Switzerland. Contrary to what we expected, we find no evidence for a significant genetic bottleneck in Swiss honey bees, and find that genetic diversity is not only maintained, but even slightly increased, most probably due to modern apicultural practices. Finally, we identify signals of selection between historic and modern honey bee populations associated with genes enriched in functions linked to xenobiotics, suggesting a possible selective pressure from the increasing use and diversity of chemicals used in agriculture and apiculture over the last century.


1996 ◽  
Vol 50 (2) ◽  
pp. 284-285
Author(s):  
Eilidh Garrett

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Marleen M. Nieboer ◽  
Luan Nguyen ◽  
Jeroen de Ridder

AbstractOver the past years, large consortia have been established to fuel the sequencing of whole genomes of many cancer patients. Despite the increased abundance in tools to study the impact of SNVs, non-coding SVs have been largely ignored in these data. Here, we introduce svMIL2, an improved version of our Multiple Instance Learning-based method to study the effect of somatic non-coding SVs disrupting boundaries of TADs and CTCF loops in 1646 cancer genomes. We demonstrate that svMIL2 predicts pathogenic non-coding SVs with an average AUC of 0.86 across 12 cancer types, and identifies non-coding SVs affecting well-known driver genes. The disruption of active (super) enhancers in open chromatin regions appears to be a common mechanism by which non-coding SVs exert their pathogenicity. Finally, our results reveal that the contribution of pathogenic non-coding SVs as opposed to driver SNVs may highly vary between cancers, with notably high numbers of genes being disrupted by pathogenic non-coding SVs in ovarian and pancreatic cancer. Taken together, our machine learning method offers a potent way to prioritize putatively pathogenic non-coding SVs and leverage non-coding SVs to identify driver genes. Moreover, our analysis of 1646 cancer genomes demonstrates the importance of including non-coding SVs in cancer diagnostics.


2021 ◽  
Author(s):  
Kimberley C. Batley ◽  
Jonathan Sandoval‐Castillo ◽  
Catherine Kemper ◽  
Nikki Zanardo ◽  
Ikuko Tomo ◽  
...  

Genetics ◽  
2000 ◽  
Vol 155 (3) ◽  
pp. 1429-1437
Author(s):  
Oliver G Pybus ◽  
Andrew Rambaut ◽  
Paul H Harvey

Abstract We describe a unified set of methods for the inference of demographic history using genealogies reconstructed from gene sequence data. We introduce the skyline plot, a graphical, nonparametric estimate of demographic history. We discuss both maximum-likelihood parameter estimation and demographic hypothesis testing. Simulations are carried out to investigate the statistical properties of maximum-likelihood estimates of demographic parameters. The simulations reveal that (i) the performance of exponential growth model estimates is determined by a simple function of the true parameter values and (ii) under some conditions, estimates from reconstructed trees perform as well as estimates from perfect trees. We apply our methods to HIV-1 sequence data and find strong evidence that subtypes A and B have different demographic histories. We also provide the first (albeit tentative) genetic evidence for a recent decrease in the growth rate of subtype B.


2021 ◽  
Vol 15 (6) ◽  
pp. 1-22
Author(s):  
Yashen Wang ◽  
Huanhuan Zhang ◽  
Zhirun Liu ◽  
Qiang Zhou

For guiding natural language generation, many semantic-driven methods have been proposed. While clearly improving the performance of the end-to-end training task, these existing semantic-driven methods still have clear limitations: for example, (i) they only utilize shallow semantic signals (e.g., from topic models) with only a single stochastic hidden layer in their data generation process, which suffer easily from noise (especially adapted for short-text etc.) and lack of interpretation; (ii) they ignore the sentence order and document context, as they treat each document as a bag of sentences, and fail to capture the long-distance dependencies and global semantic meaning of a document. To overcome these problems, we propose a novel semantic-driven language modeling framework, which is a method to learn a Hierarchical Language Model and a Recurrent Conceptualization-enhanced Gamma Belief Network, simultaneously. For scalable inference, we develop the auto-encoding Variational Recurrent Inference, allowing efficient end-to-end training and simultaneously capturing global semantics from a text corpus. Especially, this article introduces concept information derived from high-quality lexical knowledge graph Probase, which leverages strong interpretability and anti-nose capability for the proposed model. Moreover, the proposed model captures not only intra-sentence word dependencies, but also temporal transitions between sentences and inter-sentence concept dependence. Experiments conducted on several NLP tasks validate the superiority of the proposed approach, which could effectively infer meaningful hierarchical concept structure of document and hierarchical multi-scale structures of sequences, even compared with latest state-of-the-art Transformer-based models.


Sign in / Sign up

Export Citation Format

Share Document