An ancestral recombination graph of human, Neanderthal, and Denisovan genomes

Nathan K. Schaefer; Beth Shapiro; Richard E. Green

doi:10.1126/sciadv.abc0776

An ancestral recombination graph of human, Neanderthal, and Denisovan genomes

Science Advances ◽

10.1126/sciadv.abc0776 ◽

2021 ◽

Vol 7 (29) ◽

pp. eabc0776

Author(s):

Nathan K. Schaefer ◽

Beth Shapiro ◽

Richard E. Green

Keyword(s):

Incomplete Lineage Sorting ◽

Simulated Data ◽

Modern Human ◽

Ancestral Recombination Graph ◽

Lineage Sorting ◽

Human Genomes ◽

Genome Wide ◽

A Genome ◽

Graph Inference ◽

And Function

Many humans carry genes from Neanderthals, a legacy of past admixture. Existing methods detect this archaic hominin ancestry within human genomes using patterns of linkage disequilibrium or direct comparison to Neanderthal genomes. Each of these methods is limited in sensitivity and scalability. We describe a new ancestral recombination graph inference algorithm that scales to large genome-wide datasets and demonstrate its accuracy on real and simulated data. We then generate a genome-wide ancestral recombination graph including human and archaic hominin genomes. From this, we generate a map within human genomes of archaic ancestry and of genomic regions not shared with archaic hominins either by admixture or incomplete lineage sorting. We find that only 1.5 to 7% of the modern human genome is uniquely human. We also find evidence of multiple bursts of adaptive changes specific to modern humans within the past 600,000 years involving genes related to brain development and function.

Consistency and identifiability of the polymorphism-aware phylogenetic models

10.1101/718320 ◽

2019 ◽

Author(s):

Rui Borges ◽

Carolin Kosiol

Keyword(s):

Incomplete Lineage Sorting ◽

Simulated Data ◽

Population Level ◽

Species Tree ◽

Population Variation ◽

Necessary Condition ◽

Lineage Sorting ◽

Data Set ◽

Genome Wide ◽

Phylogenetic Models

AbstractPolymorphism-aware phylogenetic models (PoMo) constitute an alternative approach for species tree estimation from genome-wide data. PoMo builds on the standard substitution models of DNA evolution but expands the classic alphabet of the four nucleotide bases to include polymorphic states. By doing so, PoMo accounts for ancestral and current intra-population variation, while also accommodating population-level processes ruling the substitution process (e.g. genetic drift, mutations, allelic selection). PoMo has shown to be a valuable tool in several phylogenetic applications but a proof of statistical consistency (and identifiability, a necessary condition for consistency) is lacking. Here, we prove that PoMo is identifiable and, using this result, we further show that the maximum a posteriori (MAP) tree estimator of PoMo is a consistent estimator of the species tree. We complement our theoretical results with a simulated data set mimicking the diversity observed in natural populations exhibiting incomplete lineage sorting. We implemented PoMo in a Bayesian framework and show that the MAP tree easily recovers the true tree for typical numbers of sites that are sampled in genome-wide analyses.

A genome-wide scan for a simulated data set using two newly developed methods

Genetic Epidemiology ◽

10.1002/gepi.13701707101 ◽

1999 ◽

Vol 17 (S1) ◽

pp. S621-S626

Author(s):

Li Hsu ◽

Corinne Aragaki ◽

Filemon Quiaoit ◽

Xiangjing Wang ◽

Xiubin Xu ◽

...

Keyword(s):

Simulated Data ◽

Data Set ◽

Genome Wide ◽

A Genome ◽

Genome Wide Scan

Analysis of MicroRNA Transcriptomes Insights into the miR-148a-3p Inducer of Rumen Development in Goats

10.21203/rs.3.rs-40834/v1 ◽

2020 ◽

Author(s):

Tao Zhong ◽

Cheng Wang ◽

Jiangtao Hu ◽

Xiaoyong Chen ◽

Lili Niu ◽

...

Keyword(s):

Signaling Pathway ◽

Target Genes ◽

Genetic Regulation ◽

Mapk Signaling ◽

Regulation Mechanism ◽

Ras Signaling ◽

Genome Wide ◽

A Genome ◽

Differentially Expressed Mirnas ◽

And Function

Abstract Background: Rumen is an important digestive organ of ruminant. From fetal to adult stage, the morphology, structure and function of rumen have changed significantly. But the intrinsic genetic regulation is still limited. We previously reported a genome-wide expression profile of miRNAs in prenatal goat rumens. In the present study, we rejoined analyzed the transcriptomes of rumen miRNAs during prenatal (E60 and E135) and postnatal (D30 and D150) stages.Results: A total of 66 differentially expressed miRNAs (DEMs) were identified in the rumen tissues from D30 and D150 goats. Of these, 17 DEMs were consistently highly expressed in the rumens at the preweaning stages (E60, E135 and D30), while down-regulated at D150. Noteworthy, annotation analysis revealed that the target genes regulated by the DEMs were mainly enriched in MAPK signaling pathway, Jak-STAT signaling pathway and Ras signaling pathway. Interestingly, the expression of miR-148a-3p was significantly high in the embryonic stage and down-regulated at D150. The potential binding sites between miR-148a-3p and QKI were predicted by the TargetScan and verified by the dual luciferase report assay. The co-localization of miR-148a-3p and QKI was observed not in intestinal tracts but in rumen tissues by in situ hybridization. Moreover, the expression of miR-148a-3p in the epithelium was significantly higher than that in the other layers, suggesting that miR-148a-3p involve in the development of rumen epithelial cells by targeting QKI. Subsequently, miR-148a-3p inhibitor was found to induce the proliferation of GES-1 cells.Conclusions: Taken together, these results identified the DEMs involved in the development of rumen and provided an insight into the regulation mechanism of goat rumens during development.

A Clustering Approach for Motif Discovery in ChIP-Seq Dataset

Entropy ◽

10.3390/e21080802 ◽

2019 ◽

Vol 21 (8) ◽

pp. 802

Author(s):

Chun-xiao Sun ◽

Yu Yang ◽

Hua Wang ◽

Wen-hu Wang

Keyword(s):

Dna Sequences ◽

Motif Discovery ◽

Simulated Data ◽

Data Set ◽

Genome Wide ◽

A Genome ◽

Wide Scale ◽

Clustering Approach ◽

Ap Clustering ◽

Generation Sequencing

Chromatin immunoprecipitation combined with next-generation sequencing (ChIP-Seq) technology has enabled the identification of transcription factor binding sites (TFBSs) on a genome-wide scale. To effectively and efficiently discover TFBSs in the thousand or more DNA sequences generated by a ChIP-Seq data set, we propose a new algorithm named AP-ChIP. First, we set two thresholds based on probabilistic analysis to construct and further filter the cluster subsets. Then, we use Affinity Propagation (AP) clustering on the candidate cluster subsets to find the potential motifs. Experimental results on simulated data show that the AP-ChIP algorithm is able to make an almost accurate prediction of TFBSs in a reasonable time. Also, the validity of the AP-ChIP algorithm is tested on a real ChIP-Seq data set.

Identification and function of hypoxia-response genes in Drosophila melanogaster

Physiological Genomics ◽

10.1152/physiolgenomics.00262.2005 ◽

2006 ◽

Vol 25 (1) ◽

pp. 134-141 ◽

Cited By ~ 46

Author(s):

Guowen Liu ◽

Julianne Roy ◽

Eric A. Johnson

Keyword(s):

Recovery Time ◽

Normal Activity ◽

Mrna Levels ◽

Low Oxygen ◽

Hypoxia Response ◽

Oxygen Treatment ◽

Genome Wide ◽

A Genome ◽

And Function ◽

Genome Wide Study

Hypoxia, an insufficient level of oxygen in the cell, occurs during normal activity and also in pathological conditions such as ischemia and tumorigenesis. Although many hypoxia-response genes have been identified, an understanding of the functional role for these genes in the living animal is lacking. Here we present a genome-wide study of gene expression changes during hypoxia and then functionally test a subset of these genes for roles in survival and recovery from hypoxia. We found 79 genes with increased mRNA levels when adult flies were treated with 0.5% O2 for 6 h. A subset of these genes had detectably increased levels in as short as 1 h of low-oxygen treatment. Mild hypoxia levels resulted in an increase in transcription levels for only 20 genes. Viability during hypoxia and recovery time from hypoxia-induced paralysis was examined in flies with a reduction in activity in hypoxia-response genes. The observed decreased viability and increased recovery time from paralysis in many of the lines demonstrate that the increased transcript levels seen after hypoxia are important for the response to low oxygen.

C4 photosynthetic evolution : sub-types, diversity, and function within the grass tribe Paniceae

10.32469/10355/64234 ◽

2017 ◽

Author(s):

◽

Jacob Daniel Washburn

Keyword(s):

Incomplete Lineage Sorting ◽

Strong Support ◽

Cell Types ◽

Bioenergy Crops ◽

Comparative Genomic ◽

Lineage Sorting ◽

Dna And Rna ◽

Genomic Studies ◽

Rna Expression Profiling ◽

And Function

Most plants convert sunlight into chemical energy using a process known as C[subscript 3] photosynthesis. However, some of the world's most successful plants instead use the C[subscript 4] photosynthetic pathway which allows them to more efficiently use water, nitrogen, and solar energy. In the past 30 million years, C4 photosynthesis has convergently evolved from C3 over 60 times and new lineages are in the process of evolving even today. Because of this complex evolutionary history, C[subscript 4] is not "one" uniform photosynthetic type, but a diverse collection of photosynthetic sub-types that are classically grouped according to their use of three different biochemical pathways. The grass tribe Paniceae is especially interesting in this aspect because it contains all three of these biochemical subtypes as well as important food and bioenergy crops. To better understand the evolution of C[subscript 4] photosynthesis, DNA and RNA sequencing were undertaken for various species from within the Paniceae and used for phylogenetic and comparative genomic studies. Cell type specific RNA expression profiling for the two major C4 cell types was also completed for representative species of each C[subscript 4] sub-type. Streamlined bioinformatics pipelines for both chloroplast and nuclear phylogenetics were developed for processing the data. These analyses resulted in: 1) The first "genome scale" phylogenetic tree of the grass tribe Paniceae, 2) The clearest evidence to date of the evolutionary relationships between the three classically defined C[subscript 4] sub-types, 3) The most convincing results to date that the chloroplast and nuclear phylogenies of the Paniceae are incongruent, 4) Evidence that this chloroplast nuclear incongruence is likely due to introgression and/or incomplete lineage sorting, and 5) Strong support for sub-type mixing as well as the existence of a PCK sub-type.

Divergence estimation in the presence of incomplete lineage sorting and migration

10.1101/174342 ◽

2017 ◽

Author(s):

Graham Jones

Keyword(s):

Incomplete Lineage Sorting ◽

Simulated Data ◽

Species Tree ◽

Lineage Sorting ◽

Data Set ◽

Migration Rates ◽

Isolation With Migration ◽

Multispecies Coalescent ◽

Tree Inference ◽

And Migration

AbstractThis paper focuses on the problem of estimating a species tree from multilocus data in the presence of incomplete lineage sorting and migration. We develop a mathematical model similar to IMa2 (Hey 2010) for the relevant evolutionary processes which allows both the the population size parameters and the migration rates between pairs of species tree branches to be integrated out. We then describe a BEAST2 package DENIM which based on this model, and which uses an approximation to sample from the posterior. The approximation is based on the assumption that migrations are rare, and it only samples from certain regions of the posterior which seem likely given this assumption. The method breaks down if there is a lot of migration. Using simulations, Leaché et al 2014 showed migration causes problems for species tree inference using the multispecies coalescent when migration is present but ignored. We re-analyze this simulated data to explore DENIM’s performance, and demonstrate substantial improvements over *BEAST. We also re-analyze an empirical data set. [isolation-with-migration; incomplete lineage sorting; multispecies coalescent; species tree; phylogenetic analysis; Bayesian; Markov chain Monte Carlo]

Co-estimating Reticulate Phylogenies and Gene Trees from Multi-locus Sequence Data

10.1101/095539 ◽

2016 ◽

Cited By ~ 6

Author(s):

Dingqiao Wen ◽

Luay Nakhleh

Keyword(s):

Gene Flow ◽

Incomplete Lineage Sorting ◽

Phylogenetic Network ◽

Simulated Data ◽

Biological Data ◽

Generative Model ◽

Divergence Times ◽

Gene Trees ◽

Lineage Sorting ◽

Coalescence Times

AbstractThe multispecies network coalescent (MSNC) is a stochastic process that captures how gene trees grow within the branches of a phylogenetic network. Coupling the MSNC with a stochastic mutational process that operates along the branches of the gene trees gives rise to a generative model of how multiple loci from within and across species evolve in the presence of both incomplete lineage sorting (ILS) and reticulation (e.g., hybridization). We report on a Bayesian method for sampling the parameters of this generative model, including the species phylogeny, gene trees, divergence times, and population sizes, from DNA sequences of multiple independent loci. We demonstrate the utility of our method by analyzing simulated data and reanalyzing three biological data sets. Our results demonstrate the significance of not only co-estimating species phylogenies and gene trees, but also accounting for reticulation and ILS simultaneously. In particular, we show that when gene flow occurs, our method accurately estimates the evolutionary histories, coalescence times, and divergence times. Tree inference methods, on the other hand, underestimate divergence times and overestimate coalescence times when the evolutionary history is reticulate. While the MSNC corresponds to an abstract model of “intermixture,” we study the performance of the model and method on simulated data generated under a gene flow model. We show that the method accurately infers the most recent time at which gene flow occurs. Finally, we demonstrate the application of the new method to a 106-locus yeast data set. [Multispecies network coalescent; reticulation; incomplete lineage sorting; phylogenetic network; Bayesian inference; RJMCMC.]

Practical Aspects of Phylogenetic Network Analysis Using PhyloNet

10.1101/746362 ◽

2019 ◽

Author(s):

Zhen Cao ◽

Xinhao Liu ◽

Huw A. Ogilvie ◽

Zhi Yan ◽

Luay Nakhleh

Keyword(s):

Incomplete Lineage Sorting ◽

Phylogenetic Network ◽

Synthetic Data ◽

Simulated Data ◽

Single Species ◽

Phylogenetic Networks ◽

Lineage Sorting ◽

Data Set ◽

Types Of Information ◽

Analyze Data

AbstractPhylogenetic networks extend trees to enable simultaneous modeling of both vertical and horizontal evolutionary processes. PhyloNet is a software package that has been under constant development for over 10 years and includes a wide array of functionalities for inferring and analyzing phylogenetic networks. These functionalities differ in terms of the input data they require, the criteria and models they employ, and the types of information they allow to infer about the networks beyond their topologies. Furthermore, PhyloNet includes functionalities for simulating synthetic data on phylogenetic networks, quantifying the topological differences between phylogenetic networks, and evaluating evolutionary hypotheses given in the form of phylogenetic networks.In this paper, we use a simulated data set to illustrate the use of several of PhyloNet’s functionalities and make recommendations on how to analyze data sets and interpret the results when using these functionalities. All inference methods that we illustrate are incomplete lineage sorting (ILS) aware; that is, they account for the potential of ILS in the data while inferring the phylogenetic network. While the models do not include gene duplication and loss, we discuss how the methods can be used to analyze data in the presence of polyploidy.The concept of species is irrelevant for the computational analyses enabled by PhyloNet in that species-individuals mappings are user-defined. Consequently, none of the functionalities in PhyloNet deals with the task of species delimitation. In this sense, the data being analyzed could come from different individuals within a single species, in which case population structure along with potential gene flow is inferred (assuming the data has sufficient signal), or from different individuals sampled from different species, in which case the species phylogeny is being inferred.

Genome-Wide Analysis Identifies an Essential Human TBX3 Pacemaker Enhancer

Circulation Research ◽

10.1161/circresaha.120.317054 ◽

2020 ◽

Vol 127 (12) ◽

pp. 1522-1535 ◽

Cited By ~ 2

Author(s):

Vincent W.W. van Eif ◽

Stephanie I. Protze ◽

Fernanda M. Bosada ◽

Xuefei Yuan ◽

Tanvi Sinha ◽

...

Keyword(s):

Heart Rate ◽

Heart Rate Recovery ◽

Homozygous Deletion ◽

Pacemaker Cell ◽

Pacemaker Cells ◽

Selective Loss ◽

Adult Mice ◽

Genome Wide ◽

A Genome ◽

And Function

Rationale: The development and function of the pacemaker cardiomyocytes of the sinoatrial node (SAN), the leading pacemaker of the heart, are tightly controlled by a conserved network of transcription factors, including TBX3 (T-box transcription factor 3), ISL1 (ISL LIM homeobox 1), and SHOX2 (short stature homeobox 2). Yet, the regulatory DNA elements (REs) controlling target gene expression in the SAN pacemaker cells have remained undefined. Objective: Identification of the regulatory landscape of human SAN-like pacemaker cells and functional assessment of SAN-specific REs potentially involved in pacemaker cell gene regulation. Methods and Results: We performed Assay for Transposase-Accessible Chromatin using sequencing on human pluripotent stem cell–derived SAN-like pacemaker cells and ventricle-like cells and identified thousands of putative REs specific for either human cell type. We validated pacemaker cell–specific elements in the SHOX2 and TBX3 loci. CRISPR-mediated homozygous deletion of the mouse ortholog of a noncoding region with candidate pacemaker-specific REs in the SHOX2 locus resulted in selective loss of Shox2 expression from the developing SAN and embryonic lethality. Putative pacemaker-specific REs were identified up to 1 Mbp upstream of TBX3 in a region close to MED13L harboring variants associated with heart rate recovery after exercise. The orthologous region was deleted in mice, which resulted in selective loss of expression of Tbx3 from the SAN and (cardiac) ganglia and in neonatal lethality. Expression of Tbx3 was maintained in other tissues including the atrioventricular conduction system, lungs, and liver. Heterozygous adult mice showed increased SAN recovery times after pacing. The human REs harboring the associated variants robustly drove expression in the SAN of transgenic mouse embryos. Conclusions: We provided a genome-wide collection of candidate human pacemaker-specific REs, including the loci of SHOX2 , TBX3 , and ISL1 , and identified a link between human genetic variants influencing heart rate recovery after exercise and a variant RE with highly conserved function, driving SAN expression of TBX3 .