Genome-wide RAD sequence data provide unprecedented resolution of species boundaries and relationships in the Lake Victoria cichlid adaptive radiation

Catherine E. Wagner; Irene Keller; Samuel Wittwer; Oliver M. Selz; Salome Mwaiko; Lucie Greuter; Arjun Sivasundar; Ole Seehausen

doi:10.1111/mec.12023

Stable Species Boundaries Despite Ten Million Years of Hybridization in Tropical Eels

10.1101/635631 ◽

2019 ◽

Author(s):

Julia M. I. Barth ◽

Chrysoula Gubili ◽

Michael Matschiner ◽

Ole K. Tørresen ◽

Shun Watanabe ◽

...

Keyword(s):

Sequence Data ◽

F1 Hybrids ◽

Species Boundaries ◽

Hybrid Breakdown ◽

Genome Wide ◽

Dynamic Scenario ◽

Stable Species ◽

Species Pairs ◽

History Of ◽

Tropical Eels

AbstractGenomic evidence is increasingly underpinning that hybridization between taxa is commonplace, challenging our views on the mechanisms that maintain their boundaries. Here, we focus on seven catadromous eel species (genusAnguilla), and use genome-wide sequence data from more than 450 individuals sampled across the tropical Indo-Pacific, morphological information, and three newly assembled draft genomes to compare contemporary patterns of hybridization with signatures of past gene flow across a time-calibrated phylogeny. We show that the seven species have remained distinct entities for up to 10 million years, despite a dynamic scenario of incomplete isolation whereby the current frequencies of hybridization across species pairs (over 5% of all individuals were either F1 hybrids or backcrosses) contrast remarkably with patterns of past introgression. Based on near-complete asymmetry in the directionality of hybridization and decreasing frequencies of later-generation hybrids, we identify cytonuclear incompatibilities and hybrid breakdown as two powerful mechanisms that can support species cohesion even when hybridization has been pervasive throughout the evolutionary history of entire clades.

RAD sequencing enables unprecedented phylogenetic resolution and objective species delimitation in recalcitrant divergent taxa

10.1101/019745 ◽

2015 ◽

Cited By ~ 3

Author(s):

Santiago Herrera ◽

Timothy M. Shank

Keyword(s):

Species Delimitation ◽

Sequence Data ◽

Conclusive Evidence ◽

Species Boundaries ◽

Effective Population ◽

Dna Sequence Data ◽

Genome Wide ◽

Phylogenetic Resolution ◽

Specific Factors ◽

Morphological Species

Species delimitation is problematic in many taxa due to the difficulty of evaluating predictions from species delimitation hypotheses, which chiefly relay on subjective interpretations of morphological observations and/or DNA sequence data. This problem is exacerbated in recalcitrant taxa for which genetic resources are scarce and inadequate to resolve questions regarding evolutionary relationships and uniqueness. In this case study we demonstrate the empirical utility of restriction site associated DNA sequencing (RAD-seq) by unambiguously resolving phylogenetic relationships among recalcitrant octocoral taxa with divergences greater than 80 million years. We objectively infer robust species boundaries in the genusParagorgia, which contains some of the most important ecosystem engineers in the deep-sea, by testing alternative taxonomy-guided or unguided species delimitation hypotheses using the Bayes factors delimitation method (BFD*) with genome-wide single nucleotide polymorphism data. We present conclusive evidence rejecting the current morphological species delimitation model for the genusParagorgiaand indicating the presence of cryptic species boundaries associated with environmental variables. We argue that the suitability limits of RAD-seq for phylogenetic inferences in divergent taxa cannot be assessed in terms of absolute time, but depend on taxon-specific factors such as mutation rate, generation time and effective population size. We show that classic morphological taxonomy can greatly benefit from integrative approaches that provide objective tests to species delimitation hypothesis. Our results pave the way for addressing further questions in biogeography, species ranges, community ecology, population dynamics, conservation, and evolution in octocorals and other marine taxa.

Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program

Nature ◽

10.1038/s41586-021-03205-y ◽

2021 ◽

Vol 590 (7845) ◽

pp. 290-299 ◽

Cited By ~ 22

Author(s):

Daniel Taliun ◽

◽

Daniel N. Harris ◽

Michael D. Kessler ◽

Jedidiah Carlson ◽

...

Keyword(s):

Rare Variants ◽

Sequence Data ◽

Association Studies ◽

Genotype Imputation ◽

Genome Wide Association Studies ◽

Phenotypic Data ◽

Treatment And Prevention ◽

Genome Wide ◽

Diverse Backgrounds ◽

Unmapped Reads

AbstractThe Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1. In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.

Genome diversity in Ukraine

GigaScience ◽

10.1093/gigascience/giaa159 ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

Taras K Oleksyk ◽

Walter W Wolfsberger ◽

Alexandra M Weber ◽

Khrystyna Shchubelka ◽

Olga T Oleksyk ◽

...

Keyword(s):

Sequence Data ◽

Copy Number Variations ◽

Genomic Variation ◽

High Coverage ◽

Genome Data ◽

New Information ◽

Genome Wide ◽

Public Data ◽

Genome Wide Data ◽

Multiple Samples

Abstract Background The main goal of this collaborative effort is to provide genome-wide data for the previously underrepresented population in Eastern Europe, and to provide cross-validation of the data from genome sequences and genotypes of the same individuals acquired by different technologies. We collected 97 genome-grade DNA samples from consented individuals representing major regions of Ukraine that were consented for public data release. BGISEQ-500 sequence data and genotypes by an Illumina GWAS chip were cross-validated on multiple samples and additionally referenced to 1 sample that has been resequenced by Illumina NovaSeq6000 S4 at high coverage. Results The genome data have been searched for genomic variation represented in this population, and a number of variants have been reported: large structural variants, indels, copy number variations, single-nucletide polymorphisms, and microsatellites. To our knowledge, this study provides the largest to-date survey of genetic variation in Ukraine, creating a public reference resource aiming to provide data for medical research in a large understudied population. Conclusions Our results indicate that the genetic diversity of the Ukrainian population is uniquely shaped by evolutionary and demographic forces and cannot be ignored in future genetic and biomedical studies. These data will contribute a wealth of new information bringing forth a wealth of novel, endemic and medically related alleles.

Divergence and gene flow among Darwin's finches: A genome-wide view of adaptive radiation driven by interspecies allele sharing

BioEssays ◽

10.1002/bies.201500047 ◽

2015 ◽

Vol 37 (9) ◽

pp. 968-974 ◽

Cited By ~ 13

Author(s):

Daniela H. Palmer ◽

Marcus R. Kronforst

Keyword(s):

Gene Flow ◽

Adaptive Radiation ◽

Allele Sharing ◽

Darwin's Finches ◽

Darwin’S Finches ◽

Genome Wide ◽

A Genome

A curated dataset of modern and ancient high-coverage shotgun human genomes

Scientific Data ◽

10.1038/s41597-021-00980-1 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Pierpaolo Maisano Delser ◽

Eppie R. Jones ◽

Anahit Hovhannisyan ◽

Lara Cassidy ◽

Ron Pinhasi ◽

...

Keyword(s):

Sequence Data ◽

Whole Genome ◽

Reference Dataset ◽

High Coverage ◽

Sample Distribution ◽

Human Samples ◽

Human Genomes ◽

Genome Wide ◽

Genome Wide Data ◽

Computationally Intensive

AbstractOver the last few years, genome-wide data for a large number of ancient human samples have been collected. Whilst datasets of captured SNPs have been collated, high coverage shotgun genomes (which are relatively few but allow certain types of analyses not possible with ascertained captured SNPs) have to be reprocessed by individual groups from raw reads. This task is computationally intensive. Here, we release a dataset including 35 whole-genome sequenced samples, previously published and distributed worldwide, together with the genetic pipeline used to process them. The dataset contains 72,041,355 sites called across 19 ancient and 16 modern individuals and includes sequence data from four previously published ancient samples which we sequenced to higher coverage (10–18x). Such a resource will allow researchers to analyse their new samples with the same genetic pipeline and directly compare them to the reference dataset without re-processing published samples. Moreover, this dataset can be easily expanded to increase the sample distribution both across time and space.

Mismatch induced speciation in Salmonella : model and data

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2006.1925 ◽

2006 ◽

Vol 361 (1475) ◽

pp. 2045-2053 ◽

Cited By ~ 82

Author(s):

Daniel Falush ◽

Mia Torpdahl ◽

Xavier Didelot ◽

Donald F Conrad ◽

Daniel J Wilson ◽

...

Keyword(s):

New Species ◽

Natural Selection ◽

Dna Sequence ◽

Computer Simulations ◽

De Novo ◽

Sequence Data ◽

Biological Species ◽

Species Boundaries ◽

Population Subdivision ◽

Geographical Population

In bacteria, DNA sequence mismatches act as a barrier to recombination between distantly related organisms and can potentially promote the cohesion of species. We have performed computer simulations which show that the homology dependence of recombination can cause de novo speciation in a neutrally evolving population once a critical population size has been exceeded. Our model can explain the patterns of divergence and genetic exchange observed in the genus Salmonella , without invoking either natural selection or geographical population subdivision. If this model was validated, based on extensive sequence data, it would imply that the named subspecies of Salmonella enterica correspond to good biological species, making species boundaries objective. However, multilocus sequence typing data, analysed using several conventional tools, provide a misleading impression of relationships within S. enterica subspecies enterica and do not provide the resolution to establish whether new species are presently being formed.

Transcriptome analysis ofSchistosoma mansonilarval development using serial analysis of gene expression (SAGE)

Parasitology ◽

10.1017/s0031182009005733 ◽

2009 ◽

Vol 136 (5) ◽

pp. 469-485 ◽

Cited By ~ 22

Author(s):

A. S. TAFT ◽

J. J. VERMEIRE ◽

J. BERNIER ◽

S. R. BIRKELAND ◽

M. J. CIPRIANO ◽

...

Keyword(s):

Gene Expression ◽

Sequence Data ◽

Subsequent Development ◽

Differentially Expressed ◽

Cdna Libraries ◽

Genome Wide ◽

A Genome ◽

Genome Wide Expression ◽

Cell Conditioned Medium

SUMMARYInfection of the snail,Biomphalaria glabrata, by the free-swimming miracidial stage of the human blood fluke,Schistosoma mansoni, and its subsequent development to the parasitic sporocyst stage is critical to establishment of viable infections and continued human transmission. We performed a genome-wide expression analysis of theS. mansonimiracidia and developing sporocyst using Long Serial Analysis of Gene Expression (LongSAGE). Five cDNA libraries were constructed from miracidia andin vitrocultured 6- and 20-day-old sporocysts maintained in sporocyst medium (SM) or in SM conditioned by previous cultivation with cells of theB. glabrataembryonic (Bge) cell line. We generated 21 440 SAGE tags and mapped 13 381 to theS. mansonigene predictions (v4.0e) either by estimating theoretical 3′ UTR lengths or using existing 3′ EST sequence data. Overall, 432 transcripts were found to be differentially expressed amongst all 5 libraries. In total, 172 tags were differentially expressed between miracidia and 6-day conditioned sporocysts and 152 were differentially expressed between miracidia and 6-day unconditioned sporocysts. In addition, 53 and 45 tags, respectively, were differentially expressed in 6-day and 20-day cultured sporocysts, due to the effects of exposure to Bge cell-conditioned medium.

Creation and evaluation of full-text literature-derived, feature-weighted disease models of genetically determined developmental disorders

10.1101/2021.11.04.21265878 ◽

2021 ◽

Author(s):

Thabo Michael Yates ◽

Antoine Lain ◽

Jamie Campbell ◽

T. Ian Simpson ◽

David R FitzPatrick

Keyword(s):

Literature Review ◽

Full Text ◽

Developmental Disorders ◽

Sequence Data ◽

Roc Curves ◽

Disease Models ◽

Human Phenotype ◽

Automated Method ◽

Genome Wide ◽

Genetically Determined

There are >2500 different genetically-determined developmental disorders (DD), which, as a group, show very high levels of both locus and allelic heterogeneity. This has led to the wide-spread use of evidence-based filtering of genome-wide sequence data as a diagnostic tool in DD. Determining whether the association of a filtered variant at a specific locus is a plausible explanation of the phenotype in the proband is crucial and commonly requires extensive manual literature review by both clinical scientists and clinicians. Access to a database of weighted clinical features extracted from rigorously curated literature would increase the efficiency of this process and facilitate the development of robust phenotypic similarity metrics. However, given the large and rapidly increasing volume of published information, conventional biocuration approaches are becoming impractical. Here, we present a scalable, automated method for extraction of categorical phenotypic descriptors from full-text literature. Papers identified through literature review were downloaded and parsed using the Cadmus custom retrieval package. Human Phenotype Ontology terms were extracted using MetaMap, with 76-83% precision and 72-81% recall. Mean terms per paper increased from 9 in title + abstract, to 69 using full text. We demonstrate that these literature-derived disease models plausibly reflect true disease expressivity more accurately than gold standard manually-curated models, through comparison with prospectively gathered data from the Deciphering Developmental Disorders study. AUC for ROC curves increased by 5-10% through use of literature-derived models. This work shows that scalable automated literature curation increases performance and adds weight to the need for this strategy to be integrated into informatic variant analysis pipelines.

Introduction of Neolophiotrema xiaokongense gen. et sp. nov. to the poorly represented Anteagloniaceae (Pleosporales, Dothideomycetes)

Phytotaxa ◽

10.11646/phytotaxa.482.1.3 ◽

2021 ◽

Vol 482 (1) ◽

pp. 25-35

Author(s):

GUANG-CONG REN ◽

DHANUSHKA N. WANASINGHE ◽

JUTAMART MONKAI ◽

KEVIN D. HYDE ◽

PETER E. MORTIMER ◽

...

Keyword(s):

Phylogenetic Analysis ◽

Sequence Data ◽

Species Boundaries ◽

Identification Of Species

The monotypic Neolophiotrema (typified by N. xiaokongense) is introduced for a wood-inhabiting taxon classified in Dothideomycetes. The genus is characterized by, coriaceous, immersed to semi-immersed ascomata, hamathecium with cellular pseudoparaphyses and overlapping 1–2-seriate, hyaline ascospores. Phylogenetic analysis of combined SSU, LSU, ITS, tef1-α and rpb2 sequence data supports the placement of Neolophiotrema in Anteagloniaceae (Pleosporales). A morphology-based synopsis key is provided to facilitate the identification of species of Anteagloniaceae. The classification and nature of species boundaries in Anteagloniaceae are discussed.