Recoverability of Ancestral Recombination Graph Topologies

Recombination is a powerful evolutionary process that shapes the genetic diversity observed in the populations of many species. Reconstructing genealogies in the presence of recombination from sequencing data is a very challenging problem, as this relies on mutations having occurred on the correct lineages in order to detect the recombination and resolve the placement of edges in the local trees. We investigate the probability of recovering the true topology of ancestral recombination graphs (ARGs) under the coalescent with recombination and gene conversion. We explore how sample size and mutation rate affect the inherent uncertainty in reconstructed ARGs; this sheds light on the theoretical limitations of ARG reconstruction methods. We illustrate our results using estimates of evolutionary rates for several biological organisms; in particular, we find that for parameter values that are realistic for SARS-CoV-2, the probability of reconstructing genealogies that are close to the truth is low.

Download Full-text

Investigation of Ongoing Recombination Through Genealogical Reconstruction for Sars-Cov-2

10.1101/2021.01.21.427579 ◽

2021 ◽

Author(s):

Anastasia Ignatieva ◽

Jotun Hein ◽

Paul A. Jenkins

Keyword(s):

South Africa ◽

Genetic Diversity ◽

Genetic Recombination ◽

Phylogenetic Analyses ◽

Evolutionary Process ◽

Recurrent Mutation ◽

Challenging Problem ◽

Sequencing Data ◽

Viral Pathogen ◽

History Of

AbstractThe evolutionary process of genetic recombination has the potential to rapidly change the properties of a viral pathogen, and its presence is a crucial factor to consider in the development of treatments and vaccines. It can also significantly affect the results of phylogenetic analyses and the inference of evolutionary rates. The detection of recombination from samples of sequencing data is a very challenging problem, and is further complicated for SARS-CoV-2 by its relatively slow accumulation of genetic diversity. The extent to which recombination is ongoing for SARS-CoV-2 is not yet resolved. To address this, we use a parsimony-based method to reconstruct possible genealogical histories for samples of SARS-CoV-2 sequences, which enables the analysis of recombination events that could have generated the data. We propose a framework for disentangling the effects of recurrent mutation from recombination in the history of a sample, and hence provide a way of estimating the probability that ongoing recombination is present. We apply this to samples of sequencing data collected in England and in South Africa, and find evidence of ongoing recombination.

Download Full-text

Evolutionary process unveiled by the maximum genetic diversity hypothesis

Hereditas (Beijing) ◽

10.3724/sp.j.1005.2013.00599 ◽

2013 ◽

Vol 35 (5) ◽

pp. 599-606 ◽

Cited By ~ 1

Author(s):

Yi-Min HUANG ◽

Meng-Ying XIA ◽

Shi HUANG

Keyword(s):

Genetic Diversity ◽

Evolutionary Process ◽

Maximum Genetic Diversity

Download Full-text

Distinguishing linear and branched evolution given single-cell DNA sequencing data of tumors

Algorithms for Molecular Biology ◽

10.1186/s13015-021-00194-5 ◽

2021 ◽

Vol 16 (1) ◽

Author(s):

Leah L. Weber ◽

Mohammed El-Kebir

Keyword(s):

Dna Sequencing ◽

Single Cell ◽

Evolutionary Process ◽

Treatment Decision ◽

Real Data ◽

Current Data ◽

Fast Method ◽

Sequencing Data ◽

Evolutionary Trajectory ◽

Cancer Types

Abstract Background Cancer arises from an evolutionary process where somatic mutations give rise to clonal expansions. Reconstructing this evolutionary process is useful for treatment decision-making as well as understanding evolutionary patterns across patients and cancer types. In particular, classifying a tumor’s evolutionary process as either linear or branched and understanding what cancer types and which patients have each of these trajectories could provide useful insights for both clinicians and researchers. While comprehensive cancer phylogeny inference from single-cell DNA sequencing data is challenging due to limitations with current sequencing technology and the complexity of the resulting problem, current data might provide sufficient signal to accurately classify a tumor’s evolutionary history as either linear or branched. Results We introduce the Linear Perfect Phylogeny Flipping (LPPF) problem as a means of testing two alternative hypotheses for the pattern of evolution, which we prove to be NP-hard. We develop Phyolin, which uses constraint programming to solve the LPPF problem. Through both in silico experiments and real data application, we demonstrate the performance of our method, outperforming a competing machine learning approach. Conclusion Phyolin is an accurate, easy to use and fast method for classifying an evolutionary trajectory as linear or branched given a tumor’s single-cell DNA sequencing data.

Download Full-text

Recombination and Gene Conversion in a 170-kb Genomic Region of Arabidopsis thaliana

Genetics ◽

10.1093/genetics/161.3.1269 ◽

2002 ◽

Vol 161 (3) ◽

pp. 1269-1278 ◽

Cited By ~ 7

Author(s):

Bernhard Haubold ◽

Jürgen Kroymann ◽

Andreas Ratzka ◽

Thomas Mitchell-Olds ◽

Thomas Wiehe

Keyword(s):

Genetic Diversity ◽

Arabidopsis Thaliana ◽

Linkage Disequilibrium ◽

Gene Conversion ◽

Genomic Sequence ◽

Genomic Region ◽

Resistance To Herbivory ◽

Linkage Disequilibria ◽

Coalescent Simulations ◽

The Mean

Abstract Arabidopsis thaliana is a highly selfing plant that nevertheless appears to undergo substantial recombination. To reconcile its selfing habit with the observations of recombination, we have sampled the genetic diversity of A. thaliana at 14 loci of ~500 bp each, spread across 170 kb of genomic sequence centered on a QTL for resistance to herbivory. A total of 170 of the 6321 nucleotides surveyed were polymorphic, with 169 being biallelic. The mean silent genetic diversity (πs) varied between 0.001 and 0.03. Pairwise linkage disequilibria between the polymorphisms were negatively correlated with distance, although this effect vanished when only pairs of polymorphisms with four haplotypes were included in the analysis. The absence of a consistent negative correlation between distance and linkage disequilibrium indicated that gene conversion might have played an important role in distributing genetic diversity throughout the region. We tested this by coalescent simulations and estimate that up to 90% of recombination is due to gene conversion.

Download Full-text

Whole genome sequencing reveals high differentiation, low levels of genetic diversity and short runs of homozygosity among Swedish wels catfish

Heredity ◽

10.1038/s41437-021-00438-5 ◽

2021 ◽

Author(s):

Axel Jensen ◽

Mette Lillie ◽

Kristofer Bergström ◽

Per Larsson ◽

Jacob Höglund

Keyword(s):

Genetic Diversity ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Whole Genome Sequencing Data ◽

Peripheral Populations ◽

Whole Genome ◽

Runs Of Homozygosity ◽

Sequencing Data ◽

Isolated Populations ◽

Native Populations

AbstractThe use of genetic markers in the context of conservation is largely being outcompeted by whole-genome data. Comparative studies between the two are sparse, and the knowledge about potential effects of this methodology shift is limited. Here, we used whole-genome sequencing data to assess the genetic status of peripheral populations of the wels catfish (Silurus glanis), and discuss the results in light of a recent microsatellite study of the same populations. The Swedish populations of the wels catfish have suffered from severe declines during the last centuries and persists in only a few isolated water systems. Fragmented populations generally are at greater risk of extinction, for example due to loss of genetic diversity, and may thus require conservation actions. We sequenced individuals from the three remaining native populations (Båven, Emån, and Möckeln) and one reintroduced population of admixed origin (Helge å), and found that genetic diversity was highest in Emån but low overall, with strong differentiation among the populations. No signature of recent inbreeding was found, but a considerable number of short runs of homozygosity were present in all populations, likely linked to historically small population sizes and bottleneck events. Genetic substructure within any of the native populations was at best weak. Individuals from the admixed population Helge å shared most genetic ancestry with the Båven population (72%). Our results are largely in agreement with the microsatellite study, and stresses the need to protect these isolated populations at the northern edge of the distribution of the species.

Download Full-text

Phylogenetic Analysis: Basic Concepts and Its Use as a Tool for Virology and Molecular Epidemiology

Acta Scientiae Veterinariae ◽

10.22456/1679-9216.81158 ◽

2018 ◽

Vol 44 (1) ◽

pp. 20

Author(s):

Eloiza Teles Caldart ◽

Helena Mata ◽

Cláudio Wageck Canal ◽

Ana Paula Ravazzolo

Keyword(s):

Phylogenetic Analysis ◽

Amino Acid ◽

Molecular Epidemiology ◽

Phylogenetic Analyses ◽

Phylogenetic Reconstruction ◽

Evolutionary Process ◽

Amino Acid Sequences ◽

Evolutionary Models ◽

Reconstruction Methods ◽

Basic Concepts

Background: Phylogenetic analyses are an essential part in the exploratory assessment of nucleic acid and amino acid sequences. Particularly in virology, they are able to delineate the evolution and epidemiology of disease etiologic agents and/or the evolutionary path of their hosts. The objective of this review is to help researchers who want to use phylogenetic analyses as a tool in virology and molecular epidemiology studies, presenting the most commonly used methodologies, describing the importance of the different techniques, their peculiar vocabulary and some examples of their use in virology.Review: This article starts presenting basic concepts of molecular epidemiology and molecular evolution, emphasizing their relevance in the context of viral infectious diseases. It presents a session on the vocabulary relevant to the subject, bringing readers to a minimum level of knowledge needed throughout this literature review. Within its main subject, the text explains what a molecular phylogenetic analysis is, starting from a multiple alignment of nucleotide or amino acid sequences. The different software used to perform multiple alignments may apply different algorithms. To build a phylogeny based on amino acid or nucleotide sequences it is necessary to produce a data matrix based on a model for nucleotide or amino acid replacement, also called evolutionary model. There are a number of evolutionary models available, varying in complexity according to the number of parameters (transition, transversion, GC content, nucleotide position in the codon, among others). Some papers presented herein provide techniques that can be used to choose evolutionary models. After the model is chosen, the next step is to opt for a phylogenetic reconstruction method that best fits the available data and the selected model. Here we present the most common reconstruction methods currently used, describing their principles, advantages and disadvantages. Distance methods, for example, are simpler and faster, however, they do not provide reliable estimations when the sequences are highly divergent. The accuracy of the analysis with probabilistic models (neighbour joining, maximum likelihood and bayesian inference) strongly depends on the adherence of the actual data to the chosen development model. Finally, we also explore topology confidence tests, especially the most used one, the bootstrap. To assist the reader, this review presents figures to explain specific situations discussed in the text and numerous examples of previously published scientific articles in virology that demonstrate the importance of the techniques discussed herein, as well as their judicious use.Conclusion: The DNA sequence is not only a record of phylogeny and divergence times, but also keeps signs of how the evolutionary process has shaped its history and also the elapsed time in the evolutionary process of the population. Analyses of genomic sequences by molecular phylogeny have demonstrated a broad spectrum of applications. It is important to note that for the different available data and different purposes of phylogenies, reconstruction methods and evolutionary models should be wisely chosen. This review provides theoretical basis for the choice of evolutionary models and phylogenetic reconstruction methods best suited to each situation. In addition, it presents examples of diverse applications of molecular phylogeny in virology.

Download Full-text

Using single-cell cytometry to illustrate integrated multi-perspective evaluation of clustering algorithms using Pareto fronts

Bioinformatics ◽

10.1093/bioinformatics/btab038 ◽

2021 ◽

Author(s):

Givanna H Putri ◽

Irena Koprinska ◽

Thomas M Ashhurst ◽

Nicholas J C King ◽

Mark N Read

Keyword(s):

Single Cell ◽

Performance Metrics ◽

Clustering Algorithms ◽

Latin Hypercube Sampling ◽

Supplementary Information ◽

Sequencing Data ◽

Evaluation Protocol ◽

Benchmark Datasets ◽

Pareto Fronts ◽

Parameter Values

Abstract Motivation Many ‘automated gating’ algorithms now exist to cluster cytometry and single-cell sequencing data into discrete populations. Comparative algorithm evaluations on benchmark datasets rely either on a single performance metric, or a few metrics considered independently of one another. However, single metrics emphasize different aspects of clustering performance and do not rank clustering solutions in the same order. This underlies the lack of consensus between comparative studies regarding optimal clustering algorithms and undermines the translatability of results onto other non-benchmark datasets. Results We propose the Pareto fronts framework as an integrative evaluation protocol, wherein individual metrics are instead leveraged as complementary perspectives. Judged superior are algorithms that provide the best trade-off between the multiple metrics considered simultaneously. This yields a more comprehensive and complete view of clustering performance. Moreover, by broadly and systematically sampling algorithm parameter values using the Latin Hypercube sampling method, our evaluation protocol minimizes (un)fortunate parameter value selections as confounding factors. Furthermore, it reveals how meticulously each algorithm must be tuned in order to obtain good results, vital knowledge for users with novel data. We exemplify the protocol by conducting a comparative study between three clustering algorithms (ChronoClust, FlowSOM and Phenograph) using four common performance metrics applied across four cytometry benchmark datasets. To our knowledge, this is the first time Pareto fronts have been used to evaluate the performance of clustering algorithms in any application domain. Availability and implementation Implementation of our Pareto front methodology and all scripts and datasets to reproduce this article are available at https://github.com/ghar1821/ParetoBench. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Design and performance of a bovine 200 k SNP chip developed for endangered German Black Pied cattle (DSN)

BMC Genomics ◽

10.1186/s12864-021-08237-2 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Guilherme B. Neumann ◽

Paula Korkuć ◽

Danny Arends ◽

Manuel J. Wolf ◽

Katharina May ◽

...

Keyword(s):

Genetic Diversity ◽

Association Studies ◽

Bos Indicus ◽

Diversity Management ◽

Whole Genome Sequencing Data ◽

General Interest ◽

Whole Genome ◽

Sequencing Data ◽

Snp Chip ◽

Selection Of

Abstract Background German Black Pied cattle (DSN) are an endangered dual-purpose breed which was largely replaced by Holstein cattle due to their lower milk yield. DSN cattle are kept as a genetic reserve with a current herd size of around 2500 animals. The ability to track sequence variants specific to DSN could help to support the conservation of DSN’s genetic diversity and to provide avenues for genetic improvement. Results Whole-genome sequencing data of 304 DSN cattle were used to design a customized DSN200k SNP chip harboring 182,154 variants (173,569 SNPs and 8585 indels) based on ten selection categories. We included variants of interest to DSN such as DSN unique variants and variants from previous association studies in DSN, but also variants of general interest such as variants with predicted consequences of high, moderate, or low impact on the transcripts and SNPs from the Illumina BovineSNP50 BeadChip. Further, the selection of variants based on haplotype blocks ensured that the whole-genome was uniformly covered with an average variant distance of 14.4 kb on autosomes. Using 300 DSN and 162 animals from other cattle breeds including Holstein, endangered local cattle populations, and also a Bos indicus breed, performance of the SNP chip was evaluated. Altogether, 171,978 (94.31%) of the variants were successfully called in at least one of the analyzed breeds. In DSN, the number of successfully called variants was 166,563 (91.44%) while 156,684 (86.02%) were segregating at a minor allele frequency > 1%. The concordance rate between technical replicates was 99.83 ± 0.19%. Conclusion The DSN200k SNP chip was proved useful for DSN and other Bos taurus as well as one Bos indicus breed. It is suitable for genetic diversity management and marker-assisted selection of DSN animals. Moreover, variants that were segregating in other breeds can be used for the design of breed-specific customized SNP chips. This will be of great value in the application of conservation programs for endangered local populations in the future.

Download Full-text

First draft genome of loach (Orenectus shuilongensis; Cypriniformes: Nemacheilidae) provide insights into the evolution of cavefish

10.21203/rs.3.rs-192229/v1 ◽

2021 ◽

Author(s):

Zhijin Liu ◽

Xuekun Qian ◽

Ziming Wang ◽

Huamei Wen ◽

Ling Han ◽

...

Keyword(s):

Single Molecule ◽

Genome Assembly ◽

Eye Development ◽

Draft Genome ◽

Evolutionary Process ◽

Integrated Approach ◽

Sequencing Data ◽

Retina Development ◽

Draft Genome Assembly ◽

Surface Dwelling

Abstract BcakgroundLoaches of the superfamily Cobitoidea (Cypriniformes, Nemacheilidae) are small elongated bottom-dwelling freshwater fishes with several barbels near the mouth. The genus Oreonectes with 18 currently recognized species contains representatives for all three key stages of the evolutionary process (a surface-dwelling lifestyle, facultative cave persistence, and permanent cave dwelling). Some Oreonectes species show typical cave dwelling-related traits, such as partial or complete leucism and regression of the eyes, rendering them as suitable study objects of micro-evolution. Genome information of Oreonectes species is therefore an indispensable resource for research into the evolution of cavefishes.ResultsHere we assembled the genome sequence of O. shuilongensis, a surface-dwelling species, using an integrated approach that combined PacBio single-molecule real-time sequencing and Illumina X-ten paired-end sequencing. Based on in total 50.9 Gb of sequencing data, our genome assembly from Canu and Pilon spans approximately 515.64 Mb (estimated coverage of 100 ×), containing 803 contigs with N50 values of 5.58 Mb. 25,247 protein-coding genes were predicted, of which 95.65% have been functionally annotated. We also performed genome re-sequencing of three additional cave-dwelling Oreonectes fishes. Twenty-nine pseudogenes annotated using DAVID showed significant enrichment for the GO terms of “eye development” and “retina development in camera-type eye”. It is presumed that these pseudogenes might lead to eye degeneration of semi/complete cave-dwelling Oreonectes species. Furthermore, Mc1r (melanocortin-1 receptor) is a pseudogenization by a deletion in O. daqikongensis, likely blocking biosynthesis of melanin and leading to the albino phenotype.ConclusionsWe here report the first draft genome assembly of Oreonectes fishes, which is also the first genome reference for Cobitidea fishes. Pseudogenization of genes related to body color and eye development may be responsible for loss of pigmentation and vision deterioration in cave-dwelling species. This genome assembly will contribute to the study of the evolution and adaptation of fishes within Oreonectes and beyond (Cobitidea).

Download Full-text

Characterization of Plastidial and Nuclear SSR Markers for Understanding Invasion Histories and Genetic Diversity of Schinus molle L.

Biology ◽

10.3390/biology7030043 ◽

2018 ◽

Vol 7 (3) ◽

pp. 43 ◽

Cited By ~ 2

Author(s):

Rafael Lemos ◽

Cristiane Matielo ◽

Dalvan Beise ◽

Vanessa da Rosa ◽

Deise Sarzi ◽

...

Keyword(s):

Genetic Diversity ◽

Invasive Plant ◽

Whole Genome Sequencing Data ◽

Natural Occurrence ◽

Sequencing Data ◽

Dispersal Capacity ◽

Schinus Molle ◽

High Dispersal ◽

History Of

Invasive plant species are expected to display high dispersal capacity but low levels of genetic diversity due to the founder effect occurring at each invasion episode. Understanding the history of invasions and the levels of genetic diversity of such species is an important task for planning management and monitoring strategy for these events. Peruvian Peppertree (Schinus molle L.) is a pioneer tree species native from South America which was introduced in North America, Europe and Africa, becoming a threat to these non-native habitats. In this study, we report the discovery and characterization of 17 plastidial (ptSSR) and seven nuclear (nSSR) markers for S. molle based on low-coverage whole-genome sequencing data acquired through next-generation sequencing. The markers were tested in 56 individuals from two natural populations sampled in the Brazilian Caatinga and Pampa biomes. All loci are moderately to highly polymorphic and revealed to be suitable for genetic monitoring of new invasions, for understanding the history of old invasions, as well as for genetic studies of native populations in their natural occurrence range and of orchards established with commercial purposes.

Download Full-text