scholarly journals Evolutionary dynamics of abundant 7 bp satellites in the genome ofDrosophila virilis

2019 ◽  
Author(s):  
Jullien M. Flynn ◽  
Manyuan Long ◽  
Rod A. Wing ◽  
Andrew G. Clark

AbstractThe factors that drive the rapid changes in satellite DNA genomic composition we see in eukaryotes are not well understood.Drosophila virilishas one of the highest relative amounts of simple satellites of any organism that has been studied, with an estimated >40% of its genome composed of a few related 7 bp satellites. Here we useD. virilisas a model to understand technical biases affecting satellite sequencing and the evolutionary processes that drive satellite composition. By analyzing sequencing data from Illumina, PacBio, and Nanopore platforms, we identify platform-specific biases and suggest best practices for accurate characterization of satellites by sequencing. We use comparative genomics and cytogenetics to demonstrate that the highly abundant satellite family arose from a related satellite in the branch leading to the virilis phylad 4.5 - 11 million years ago before exploding in abundance in some species of the clade. The most abundant satellite is conserved in sequence and location in the pericentromeric region but has diverged widely in abundance among species, whereas the satellites nearest the centromere are rapidly turning over in sequence composition. By analyzing multiple strains ofD. virilis, we saw that one centromere-proximal satellite is increasing in abundance along a geographical gradient while the other is contracting in an anti-correlated manner, suggesting ongoing conflicts at the centromere. In conclusion, we illuminate several key attributes of satellite evolutionary dynamics that we hypothesize to be driven by processes like selection, meiotic drive, and constraints on satellite sequence and abundance.


2020 ◽  
Vol 37 (5) ◽  
pp. 1362-1375 ◽  
Author(s):  
Jullien M Flynn ◽  
Manyuan Long ◽  
Rod A Wing ◽  
Andrew G Clark

Abstract The factors that drive the rapid changes in abundance of tandem arrays of highly repetitive sequences, known as satellite DNA, are not well understood. Drosophila virilis has one of the highest relative amounts of simple satellites of any organism that has been studied, with an estimated >40% of its genome composed of a few related 7-bp satellites. Here, we use D. virilis as a model to understand technical biases affecting satellite sequencing and the evolutionary processes that drive satellite composition. By analyzing sequencing data from Illumina, PacBio, and Nanopore platforms, we identify platform-specific biases and suggest best practices for accurate characterization of satellites by sequencing. We use comparative genomics and cytogenetics to demonstrate that the highly abundant AAACTAC satellite family arose from a related satellite in the branch leading to the virilis phylad 4.5–11 Ma before exploding in abundance in some species of the clade. The most abundant satellite is conserved in sequence and location in the pericentromeric region but has diverged widely in abundance among species, whereas the satellites nearest the centromere are rapidly turning over in sequence composition. By analyzing multiple strains of D. virilis, we saw that the abundances of two centromere-proximal satellites are anticorrelated along a geographical gradient, which we suggest could be caused by ongoing conflicts at the centromere. In conclusion, we illuminate several key attributes of satellite evolutionary dynamics that we hypothesize to be driven by processes including selection, meiotic drive, and constraints on satellite sequence and abundance.



Viruses ◽  
2021 ◽  
Vol 13 (7) ◽  
pp. 1338
Author(s):  
Morgan E. Meissner ◽  
Emily J. Julik ◽  
Jonathan P. Badalamenti ◽  
William G. Arndt ◽  
Lauren J. Mills ◽  
...  

Human immunodeficiency virus type 2 (HIV-2) accumulates fewer mutations during replication than HIV type 1 (HIV-1). Advanced studies of HIV-2 mutagenesis, however, have historically been confounded by high background error rates in traditional next-generation sequencing techniques. In this study, we describe the adaptation of the previously described maximum-depth sequencing (MDS) technique to studies of both HIV-1 and HIV-2 for the ultra-accurate characterization of viral mutagenesis. We also present the development of a user-friendly Galaxy workflow for the bioinformatic analyses of sequencing data generated using the MDS technique, designed to improve replicability and accessibility to molecular virologists. This adapted MDS technique and analysis pipeline were validated by comparisons with previously published analyses of the frequency and spectra of mutations in HIV-1 and HIV-2 and is readily expandable to studies of viral mutation across the genomes of both viruses. Using this novel sequencing pipeline, we observed that the background error rate was reduced 100-fold over standard Illumina error rates, and 10-fold over traditional unique molecular identifier (UMI)-based sequencing. This technical advancement will allow for the exploration of novel and previously unrecognized sources of viral mutagenesis in both HIV-1 and HIV-2, which will expand our understanding of retroviral diversity and evolution.



2021 ◽  
pp. 1-4
Author(s):  
Yu-Wei Tseng ◽  
Chi-Chun Huang ◽  
Chih-Chiang Wang ◽  
Chiuan-Yu Li ◽  
Kuo-Hsiang Hung

Abstract Epilobium belongs to the family Onagraceae, which consists of approximately 200 species distributed worldwide, and some species have been used as medicinal plants. Epilobium nankotaizanense is an endemic and endangered herb that grows in the high mountains in Taiwan at an elevation of more than 3300 m. Alpine herbs are severely threatened by climate change, which leads to a reduction in their habitats and population sizes. However, only a few studies have addressed genetic diversity and population genetics. In the present study, we developed a new set of microsatellite markers for E. nankotaizanense using high-throughput genome sequencing data. Twenty polymorphic microsatellite markers were developed and tested on 30 individuals collected from three natural populations. These loci were successfully amplified, and polymorphisms were observed in E. nankotaizanense. The number of alleles per locus (A) ranged from 2.000 to 3.000, and the observed (Ho) and expected (He) heterozygosities ranged from 0.000 to 0.929 and from 0.034 to 0.631, respectively. The developed polymorphic microsatellite markers will be useful in future conservation genetic studies of E. nankotaizanense as well as for developing an effective conservation strategy for this species and facilitating germplasm collections and sustainable utilization of other Epilobium species.



2020 ◽  
Vol 15 (4) ◽  
pp. 5-18
Author(s):  
Carlos Abraham Moya ◽  
Vincent Boly ◽  
Laure Morel ◽  
Daniel Gálvez ◽  
Mauricio Camargo


2018 ◽  
Author(s):  
Arghavan Bahadorinejad ◽  
Ivan Ivanov ◽  
Johanna W Lampe ◽  
Meredith AJ Hullar ◽  
Robert S Chapkin ◽  
...  

AbstractWe propose a Bayesian method for the classification of 16S rRNA metagenomic profiles of bacterial abundance, by introducing a Poisson-Dirichlet-Multinomial hierarchical model for the sequencing data, constructing a prior distribution from sample data, calculating the posterior distribution in closed form; and deriving an Optimal Bayesian Classifier (OBC). The proposed algorithm is compared to state-of-the-art classification methods for 16S rRNA metagenomic data, including Random Forests and the phylogeny-based Metaphyl algorithm, for varying sample size, classification difficulty, and dimensionality (number of OTUs), using both synthetic and real metagenomic data sets. The results demonstrate that the proposed OBC method, with either noninformative or constructed priors, is competitive or superior to the other methods. In particular, in the case where the ratio of sample size to dimensionality is small, it was observed that the proposed method can vastly outperform the others.Author summaryRecent studies have highlighted the interplay between host genetics, gut microbes, and colorectal tumor initiation/progression. The characterization of microbial communities using metagenomic profiling has therefore received renewed interest. In this paper, we propose a method for classification, i.e., prediction of different outcomes, based on 16S rRNA metagenomic data. The proposed method employs a Bayesian approach, which is suitable for data sets with small ration of number of available instances to the dimensionality. Results using both synthetic and real metagenomic data show that the proposed method can outperform other state-of-the-art metagenomic classification algorithms.



PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e4840 ◽  
Author(s):  
Kai Wei ◽  
Tingting Zhang ◽  
Lei Ma

Housekeeping genes are ubiquitously expressed and maintain basic cellular functions across tissue/cell type conditions. The present study aimed to develop a set of pig housekeeping genes and compare the structure, evolution and function of housekeeping genes in the human–pig lineage. By using RNA sequencing data, we identified 3,136 pig housekeeping genes. Compared with human housekeeping genes, we found that pig housekeeping genes were longer and subjected to slightly weaker purifying selection pressure and faster neutral evolution. Common housekeeping genes, shared by the two species, achieve stronger purifying selection than species-specific genes. However, pig- and human-specific housekeeping genes have similar functions. Some species-specific housekeeping genes have evolved independently to form similar protein active sites or structure, such as the classical catalytic serine–histidine–aspartate triad, implying that they have converged for maintaining the basic cellular function, which allows them to adapt to the environment. Human and pig housekeeping genes have varied structures and gene lists, but they have converged to maintain basic cellular functions essential for the existence of a cell, regardless of its specific role in the species. The results of our study shed light on the evolutionary dynamics of housekeeping genes.



Author(s):  
Huan Zhong ◽  
Zongwei Cai ◽  
Zhu Yang ◽  
Yiji Xia

AbstractNAD tagSeq has recently been developed for the identification and characterization of NAD+-capped RNAs (NAD-RNAs). This method adopts a strategy of chemo-enzymatic reactions to label the NAD-RNAs with a synthetic RNA tag before subjecting to the Oxford Nanopore direct RNA sequencing. A computational tool designed for analyzing the sequencing data of tagged RNA will facilitate the broader application of this method. Hence, we introduce TagSeqTools as a flexible, general pipeline for the identification and quantification of tagged RNAs (i.e., NAD+-capped RNAs) using long-read transcriptome sequencing data generated by NAD tagSeq method. TagSeqTools comprises two major modules, TagSeek for differentiating tagged and untagged reads, and TagSeqQuant for the quantitative and further characterization analysis of genes and isoforms. Besides, the pipeline also integrates some advanced functions to identify antisense or splicing, and supports the data reformation for visualization. Therefore, TagSeqTools provides a convenient and comprehensive workflow for researchers to analyze the data produced by the NAD tagSeq method or other tagging-based experiments using Oxford nanopore direct RNA sequencing. The pipeline is available at https://github.com/dorothyzh/TagSeqTools, under Apache License 2.0.



2012 ◽  
Vol 18 (A) ◽  
pp. 17
Author(s):  
R Giugno ◽  
F Abate ◽  
N Bombieri ◽  
M Delledonne ◽  
A Ferrarini ◽  
...  


2019 ◽  
Author(s):  
Kate Chkhaidze ◽  
Timon Heide ◽  
Benjamin Werner ◽  
Marc J. Williams ◽  
Weini Huang ◽  
...  

AbstractQuantification of the effect of spatial tumour sampling on the patterns of mutations detected in next-generation sequencing data is largely lacking. Here we use a spatial stochastic cellular automaton model of tumour growth that accounts for somatic mutations, selection, drift and spatial constrains, to simulate multi-region sequencing data derived from spatial sampling of a neoplasm. We show that the spatial structure of a solid cancer has a major impact on the detection of clonal selection and genetic drift from bulk sequencing data and single-cell sequencing data. Our results indicate that spatial constrains can introduce significant sampling biases when performing multi-region bulk sampling and that such bias becomes a major confounding factor for the measurement of the evolutionary dynamics of human tumours. We present a statistical inference framework that takes into account the spatial effects of a growing tumour and allows inferring the evolutionary dynamics from patient genomic data. Our analysis shows that measuring cancer evolution using next-generation sequencing while accounting for the numerous confounding factors requires a mechanistic model-based approach that captures the sources of noise in the data.SummarySequencing the DNA of cancer cells from human tumours has become one of the main tools to study cancer biology. However, sequencing data are complex and often difficult to interpret. In particular, the way in which the tissue is sampled and the data are collected, impact the interpretation of the results significantly. We argue that understanding cancer genomic data requires mathematical models and computer simulations that tell us what we expect the data to look like, with the aim of understanding the impact of confounding factors and biases in the data generation step. In this study, we develop a spatial simulation of tumour growth that also simulates the data generation process, and demonstrate that biases in the sampling step and current technological limitations severely impact the interpretation of the results. We then provide a statistical framework that can be used to overcome these biases and more robustly measure aspects of the biology of tumours from the data.



2018 ◽  
Author(s):  
Jingxian Liu ◽  
Jackson Champer ◽  
Chen Liu ◽  
Joan Chung ◽  
Riona Reeves ◽  
...  

AbstractEstimating fitness differences between allelic variants is a central goal of experimental evolution. Current methods for inferring selection from allele frequency time series typically assume that evolutionary dynamics at the locus of interest can be described by a fixed selection coefficient. However, fitness is an aggregate of several components including mating success, fecundity, and viability, and distinguishing between these components could be critical in many scenarios. Here we develop a flexible maximum likelihood framework that can disentangle different components of fitness and estimate them individually in males and females from genotype frequency data. As a proof-of-principle, we apply our method to experimentally-evolved cage populations of Drosophila melanogaster, in which we tracked the relative frequencies of a loss-of-function and wild-type allele of yellow. This X-linked gene produces a recessive yellow phenotype when disrupted and is involved in male courtship ability. We find that the fitness costs of the yellow phenotype take the form of substantially reduced mating preference of wild-type females for yellow males, together with a modest reduction in the viability of yellow males and females. Our framework should be generally applicable to situations where it is important to quantify fitness components of specific genetic variants, including quantitative characterization of the population dynamics of CRISPR gene drives.



Sign in / Sign up

Export Citation Format

Share Document