Genome-wide analyses of human noroviruses provide insights on evolutionary dynamics and evidence of coexisting viral populations evolving under recombination constraints

Norovirus is a major cause of acute gastroenteritis worldwide. Over 30 different genotypes, mostly from genogroup I (GI) and II (GII), have been shown to infect humans. Despite three decades of genome sequencing, our understanding of the role of genomic diversification across continents and time is incomplete. To close the spatiotemporal gap of genomic information of human noroviruses, we conducted a large-scale genome-wide analyses that included the nearly full-length sequencing of 281 archival viruses circulating since the 1970s in over 10 countries from four continents, with a major emphasis on norovirus genotypes that are currently underrepresented in public genome databases. We provided new genome information for 24 distinct genotypes, including the oldest genome information from 12 norovirus genotypes. Analyses of this new genomic information, together with those publicly available, showed that (i) noroviruses evolve at similar rates across genomic regions and genotypes; (ii) emerging viruses evolved from transiently-circulating intermediate viruses; (iii) diversifying selection on the VP1 protein was recorded in genotypes with multiple variants; (iv) non-structural proteins showed a similar branching on their phylogenetic trees; and (v) contrary to the current understanding, there are restrictions on the ability to recombine different genomic regions, which results in co-circulating populations of viruses evolving independently in human communities. This study provides a comprehensive genetic analysis of diverse norovirus genotypes and the role of non-structural proteins on viral diversification, shedding new light on the mechanisms of norovirus evolution and transmission.

Download Full-text

Large-scale integration of meta-QTL and genome-wide association study discovers the genomic regions and candidate genes for yield and yield-related traits in bread wheat

Theoretical and Applied Genetics ◽

10.1007/s00122-021-03881-4 ◽

2021 ◽

Author(s):

Yang Yang ◽

Aduragbemi Amo ◽

Di Wei ◽

Yongmao Chai ◽

Jie Zheng ◽

...

Keyword(s):

Association Study ◽

Candidate Genes ◽

Bread Wheat ◽

Large Scale ◽

Genome Wide Association Study ◽

Genome Wide Association ◽

Genome Wide ◽

Large Scale Integration ◽

Scale Integration ◽

Genomic Regions

Download Full-text

The Genomic Ecosystem of Transposable Elements in Maize

10.1101/559922 ◽

2019 ◽

Cited By ~ 18

Author(s):

Michelle C. Stitzer ◽

Sarah N. Anderson ◽

Nathan M. Springer ◽

Jeffrey Ross-Ibarra

Keyword(s):

Transposable Elements ◽

Evolutionary Dynamics ◽

Phenotypic Diversity ◽

Flowering Plant ◽

Genome Wide ◽

Family Level ◽

Genomic Environment ◽

Single Category ◽

The Impact

Transposable elements (TEs) constitute the majority of flowering plant DNA, reflecting their tremendous success in subverting, avoiding, and surviving the defenses of their host genomes to ensure their selfish replication. More than 85% of the sequence of the maize genome can be ascribed to past transposition, providing a major contribution to the structure of the genome. Evidence from individual loci has informed our understanding of how transposition has shaped the genome, and a number of individual TE insertions have been causally linked to dramatic phenotypic changes. But genome-wide analyses in maize and other taxa have frequently represented TEs as a relatively homogeneous class of fragmentary relics of past transposition, obscuring their evolutionary history and interaction with their host genome. Using an updated annotation of structurally intact TEs in the maize reference genome, we investigate the family-level ecological and evolutionary dynamics of TEs in maize. Integrating a variety of data, from descriptors of individual TEs like coding capacity, expression, and methylation, as well as similar features of the sequence they inserted into, we model the relationship between these attributes of the genomic environment and the survival of TE copies and families. Our analyses reveal a diversity of ecological strategies of TE families, each representing the evolution of a distinct ecological niche allowing survival of the TE family. In contrast to the wholesale relegation of all TEs to a single category of junk DNA, these differences generate a rich ecology of the genome, suggesting families of TEs that coexist in time and space compete and cooperate with each other. We conclude that while the impact of transposition is highly family- and context-dependent, a family-level understanding of the ecology of TEs in the genome can refine our ability to predict the role of TEs in generating genetic and phenotypic diversity.‘Lumping our beautiful collection of transposons into a single category is a crime’-Michael R. Freeling, Mar. 10, 2017

Download Full-text

The genomic footprint of coastal earthquake uplift

Proceedings of The Royal Society B Biological Sciences ◽

10.1098/rspb.2020.0712 ◽

2020 ◽

Vol 287 (1930) ◽

pp. 20200712 ◽

Cited By ~ 1

Author(s):

Elahe Parvizi ◽

Ceridwen I. Fraser ◽

Ludovic Dutoit ◽

Dave Craw ◽

Jonathan M. Waters

Keyword(s):

Large Scale ◽

Biological Evolution ◽

Habitat Choice ◽

Natural Experiments ◽

Ecological Constraints ◽

Dispersal Capacity ◽

Interspecific Differences ◽

Genome Wide ◽

Genetic Patterns

Theory suggests that catastrophic earth-history events can drive rapid biological evolution, but empirical evidence for such processes is scarce. Destructive geological events such as earthquakes can represent large-scale natural experiments for inferring such evolutionary processes. We capitalized on a major prehistoric (800 yr BP) geological uplift event affecting a southern New Zealand coastline to test for the lasting genomic impacts of disturbance. Genome-wide analyses of three co-distributed keystone kelp taxa revealed that post-earthquake recolonization drove the evolution of novel, large-scale intertidal spatial genetic ‘sectors’ which are tightly linked to geological fault boundaries. Demographic simulations confirmed that, following widespread extirpation, parallel expansions into newly vacant habitats rapidly restructured genome-wide diversity. Interspecific differences in recolonization mode and tempo reflect differing ecological constraints relating to habitat choice and dispersal capacity among taxa. This study highlights the rapid and enduring evolutionary effects of catastrophic ecosystem disturbance and reveals the key role of range expansion in reshaping spatial genetic patterns.

Download Full-text

Use of the epigenetic toolbox to contextualize common variants associated with schizophrenia risk

Dialogues in Clinical Neuroscience ◽

10.31887/dcns.2019.21.4/sakbarian ◽

2019 ◽

Vol 21 (4) ◽

pp. 407-416 ◽

Cited By ~ 1

Keyword(s):

Gene Networks ◽

Large Scale ◽

Clinical Diagnostics ◽

Common Variants ◽

Diagnostic Measures ◽

Regulatory Variants ◽

Risk Variants ◽

Genome Wide ◽

Genomic Regions ◽

Generation Sequencing

Schizophrenia is a debilitating psychiatric disorder with a complex genetic architecture and limited understanding of its neuropathology, reflected by the lack of diagnostic measures and effective pharmacological treatments. Geneticists have recently identified more than 145 risk loci comprising hundreds of common variants of small effect sizes, most of which lie in noncoding genomic regions. This review will discuss how the epigenetic toolbox can be applied to contextualize genetic findings in schizophrenia. Progress in next-generation sequencing, along with increasing methodological complexity, has led to the compilation of genome-wide maps of DNA methylation, histone modifications, DNA expression, and more. Integration of chromatin conformation datasets is one of the latest efforts in deciphering schizophrenia risk, allowing the identification of genes in contact with regulatory variants across 100s of kilobases. Large-scale multiomics studies will facilitate the prioritization of putative causal risk variants and gene networks that contribute to schizophrenia etiology, informing clinical diagnostics and treatment downstream.

Download Full-text

Population genomic evidence of selection on structural variants in a natural hybrid zone

10.1101/2022.01.14.476419 ◽

2022 ◽

Author(s):

Linyi Zhang ◽

Samridhi Chaturvedi ◽

Chris Nice ◽

Lauren Lucas ◽

Zachariah Gompert

Keyword(s):

Reproductive Isolation ◽

Hybrid Zone ◽

Structural Variants ◽

Hybrid Fitness ◽

Genome Wide ◽

Oxford Nanopore ◽

Jackson Hole ◽

Long Read ◽

Genomic Regions

Structural variants (SVs) can promote speciation by directly causing reproductive isolation or by suppressing recombination across large genomic regions. Whereas examples of each mechanism have been documented, systematic tests of the role of SVs in speciation are lacking. Here, we take advantage of long-read (Oxford nanopore) whole-genome sequencing and a hybrid zone between two Lycaeides butterfly taxa (L. melissa and Jackson Hole Lycaeides) to comprehensively evaluate genome-wide patterns of introgression for SVs and relate these patterns to hypotheses about speciation. We found >100,000 SVs segregating within or between the two hybridizing species. SVs and SNPs exhibited similar levels of genetic differentiation between species, with the exception of inversions, which were more differentiated. We detected credible variation in patterns of introgression among SV loci in the hybrid zone, with 562 of 1419 ancestry-informative SVs exhibiting genomic clines that deviating from null expectations based on genome-average ancestry. Overall, hybrids exhibited a directional shift towards Jackson Hole Lycaeides ancestry at SV loci, consistent with the hypothesis that these loci experienced more selection on average then SNP loci. Surprisingly, we found that deletions, rather than inversions, showed the highest skew towards excess introgression from Jackson Hole Lycaeides. Excess Jackson Hole Lycaeides ancestry in hybrids was also especially pronounced for Z-linked SVs and inversions containing many genes. In conclusion, our results show that SVs are ubiquitous and suggest that SVs in general, but especially deletions, might contribute disproportionately to hybrid fitness and thus (partial) reproductive isolation.

Download Full-text

Non-B DNA: a major contributor to small- and large-scale variation in nucleotide substitution frequencies across the genome

Nucleic Acids Research ◽

10.1093/nar/gkaa1269 ◽

2021 ◽

Vol 49 (3) ◽

pp. 1497-1516

Author(s):

Wilfried M Guiblet ◽

Marzia A Cremona ◽

Robert S Harris ◽

Di Chen ◽

Kristin A Eckert ◽

...

Keyword(s):

Large Scale ◽

Nucleotide Substitution ◽

Genetic Diseases ◽

Nucleotide Polymorphisms ◽

Dna Structures ◽

Cellular Processes ◽

Genome Wide ◽

Dna Types ◽

Flanking Regions

Abstract Approximately 13% of the human genome can fold into non-canonical (non-B) DNA structures (e.g. G-quadruplexes, Z-DNA, etc.), which have been implicated in vital cellular processes. Non-B DNA also hinders replication, increasing errors and facilitating mutagenesis, yet its contribution to genome-wide variation in mutation rates remains unexplored. Here, we conducted a comprehensive analysis of nucleotide substitution frequencies at non-B DNA loci within noncoding, non-repetitive genome regions, their ±2 kb flanking regions, and 1-Megabase windows, using human-orangutan divergence and human single-nucleotide polymorphisms. Functional data analysis at single-base resolution demonstrated that substitution frequencies are usually elevated at non-B DNA, with patterns specific to each non-B DNA type. Mirror, direct and inverted repeats have higher substitution frequencies in spacers than in repeat arms, whereas G-quadruplexes, particularly stable ones, have higher substitution frequencies in loops than in stems. Several non-B DNA types also affect substitution frequencies in their flanking regions. Finally, non-B DNA explains more variation than any other predictor in multiple regression models for diversity or divergence at 1-Megabase scale. Thus, non-B DNA substantially contributes to variation in substitution frequencies at small and large scales. Our results highlight the role of non-B DNA in germline mutagenesis with implications to evolution and genetic diseases.

Download Full-text

Population Differentiation at the HLA Genes

10.1101/214668 ◽

2017 ◽

Author(s):

Débora Y. C. Brandt ◽

Jônatas César ◽

Jérôme Goudet ◽

Diogo Meyer

Keyword(s):

Population Differentiation ◽

Balancing Selection ◽

Global Scale ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Hla Genes ◽

Genome Wide ◽

Different Populations ◽

Genomic Regions

ABSTRACTBalancing selection is defined as a class of selective regimes that maintain polymorphism above what is expected under neutrality. Theory predicts that balancing selection reduces population differentiation, as measured by FST. However, balancing selection regimes in which different sets of alleles are maintained in different populations could increase population differentiation. To tackle this issue, we investigated population differentiation at the HLA genes, which constitute the most striking example of balancing selection in humans. We found that population differentiation of single nucleotide polymorphisms (SNPs) at the HLA genes is on average lower than that of SNPs in other genomic regions. However, this result depends on accounting for the differences in allele frequency between selected and putatively neutral sites. Our finding of reduced differentiation at SNPs within HLA genes suggests a predominant role of shared selective pressures among populations at a global scale. However, in pairs of closely related populations, where genome-wide differentiation is low, differentiation at HLA is higher than in other genomic regions. This pattern was reproduced in simulations of overdominant selection. We conclude that population differentiation at the HLA genes is generally lower than genome-wide, but it may be higher for recently diverged population pairs, and that this pattern can be explained by a simple overdominance regime.

Download Full-text

A summary-statistics-based approach to examine the role of serotonin transporter promoter tandem repeat polymorphism in psychiatric phenotypes

European Journal of Human Genetics ◽

10.1038/s41431-021-00996-6 ◽

2021 ◽

Author(s):

Arunabha Majumdar ◽

Preksha Patel ◽

Bogdan Pasaniuc ◽

Roel A. Ophoff

Keyword(s):

Serotonin Transporter ◽

Tandem Repeat ◽

Large Scale ◽

Disease Risk ◽

Learning Algorithm ◽

Summary Statistics ◽

Significant Evidence ◽

Individual Level ◽

Genome Wide

AbstractIn genetic studies of psychiatric disorders in the pre-genome-wide association study (GWAS) era, one of the most commonly studied loci is the serotonin transporter (SLC6A4) promoter polymorphism, a 43-base-pair insertion/deletion polymorphism in the promoter region (5-HTTLPR). The genetic association signals between 5-HTTLPR and psychiatric phenotypes, however, have been inconsistent across many studies. Since the polymorphism cannot be tested via available SNP arrays, we had previously proposed an efficient machine learning algorithm to predict the genotypes of 5-HTTLPR based on the genotypes of eight nearby SNPs, which requires access to individual-level genotype and phenotype data. To utilize the advantage of publicly available GWAS summary statistics obtained from studies with very large sample sizes, we develop a GWAS summary-statistics-based approach for testing the variable number of tandem repeat (VNTR) associations with various phenotypes. We first cross-verify the accuracy of the summary-statistics-based approach for 61 phenotypes in the UK Biobank. Since we observed a strong similarity between the predicted individual-level 5-HTTLPR genotype-based approach and the summary-statistics-based approach, we applied our method to the available neurobehavioral GWAS summary statistics data obtained from large-scale GWAS. We found no genome-wide significant evidence for association between 5-HTTLPR and any of the neurobehavioral traits. We did observe, however, genome-wide significant evidence for association between this locus and human adult height, BMI, and total cholesterol. Our summary-statistics-based approach provides a systematic way to examine the role of VNTRs and related types of genetic polymorphisms in disease risk and trait susceptibility of phenotypes for which large-scale GWAS summary statistics data are available.

Download Full-text

VCF2PopTree: a one-click client-side software to construct population phylogeny from genome-wide SNPs

10.7287/peerj.preprints.27682 ◽

2019 ◽

Author(s):

Sankar Subramanian ◽

Umayal Ramasamy ◽

David Chen

Keyword(s):

Phylogenetic Trees ◽

Large Scale ◽

Web Applications ◽

Third Party ◽

Genotype Data ◽

Whole Genome ◽

Genome Data ◽

Genome Wide ◽

Software Programs ◽

Computationally Intensive

In the past decades a number of software programs have been developed to deduce the phylogenetic relationship between populations. However, these programs are not suited for large-scale whole genome data. Recently, a few standalone or web applications have been developed to handle genome-wide data, but they were either computationally intensive, dependent on third party software or required significant time and resource of a web server. In the post-genomic era, researchers are able to obtain bioinformatically processed high-quality publication-ready whole genome data for many individuals in a population from next generation sequencing companies due to the reduction in the cost of sequencing and analysis. Such genotype data is typically presented in the Variant Call Format (VCF) and there is no simple software available that uses this data to construct the phylogeny of populations in a short time. To address this limitation, we have developed a one-click user-friendly software, VCF2PopTree that uses gnome-wide SNPs to construct and display phylogenetic trees in seconds to minutes. For example, it reads a 1 GB VCF file and draws a tree in less than 5 minutes. VCF2PopTree accepts genotype data from a local machine, constructs a tree using UPGMA and Neighbour-Joining algorithms and displays it on a web-browser. It also produces pairwise-diversity matrix in MEGA and PHYLIP file formats as well as trees in the Newick format which could be directly used by other popular phylogenetic software programs. The software including the source code, a test VCF input file and short documentation are available at: https://github.com/sansubs/vcf2pop.

Download Full-text