scholarly journals Prediction of evolutionary constraint by genomic annotations improves prioritization of causal variants in maize

2021 ◽  
Author(s):  
Guillaume P. Ramstein ◽  
Edward S. Buckler

AbstractCrop improvement through cross-population genomic prediction and genome editing requires identification of causal variants at single-site resolution. Most genetic mapping studies have generally lacked such resolution. In contrast, evolutionary approaches can detect genetic effects at high resolution, but they are limited by shifting selection, missing data, and low depth of multiple-sequence alignments. Here we used genomic annotations to accurately predict nucleotide conservation across Angiosperms, as a proxy for fitness effect of mutations. Using only sequence analysis, we annotated non-synonymous mutations in 25,824 maize gene models, with information from bioinformatics (SIFT scores, GC content, transposon insertion, k-mer frequency) and deep learning (predicted effects of polymorphisms on protein representations by UniRep). Our predictions were validated by experimental information: within-species conservation, chromatin accessibility, gene expression and gene ontology enrichment. Importantly, they also improved genomic prediction for fitness-related traits (grain yield) in elite maize panels (+5% and +38% prediction accuracy within and across panels, respectively), by stringent prioritization of ≤ 1% of single-site variants (e.g., 104 sites and approximately 15 deleterious alleles per haploid genome). Together, our results suggest that our proposed approach may effectively prioritize sites most likely to impact fitness-related traits in crops. Such prioritizations could be useful to select polymorphisms for accurate genomic prediction, and candidate mutations for efficient base editing.

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Elena N. Judd ◽  
Alison R. Gilchrist ◽  
Nicholas R. Meyerson ◽  
Sara L. Sawyer

Abstract Background The Type I interferon response is an important first-line defense against viruses. In turn, viruses antagonize (i.e., degrade, mis-localize, etc.) many proteins in interferon pathways. Thus, hosts and viruses are locked in an evolutionary arms race for dominance of the Type I interferon pathway. As a result, many genes in interferon pathways have experienced positive natural selection in favor of new allelic forms that can better recognize viruses or escape viral antagonists. Here, we performed a holistic analysis of selective pressures acting on genes in the Type I interferon family. We initially hypothesized that the genes responsible for inducing the production of interferon would be antagonized more heavily by viruses than genes that are turned on as a result of interferon. Our logic was that viruses would have greater effect if they worked upstream of the production of interferon molecules because, once interferon is produced, hundreds of interferon-stimulated proteins would activate and the virus would need to counteract them one-by-one. Results We curated multiple sequence alignments of primate orthologs for 131 genes active in interferon production and signaling (herein, “induction” genes), 100 interferon-stimulated genes, and 100 randomly chosen genes. We analyzed each multiple sequence alignment for the signatures of recurrent positive selection. Counter to our hypothesis, we found the interferon-stimulated genes, and not interferon induction genes, are evolving significantly more rapidly than a random set of genes. Interferon induction genes evolve in a way that is indistinguishable from a matched set of random genes (22% and 18% of genes bear signatures of positive selection, respectively). In contrast, interferon-stimulated genes evolve differently, with 33% of genes evolving under positive selection and containing a significantly higher fraction of codons that have experienced selection for recurrent replacement of the encoded amino acid. Conclusion Viruses may antagonize individual products of the interferon response more often than trying to neutralize the system altogether.


2020 ◽  
Vol 22 (1) ◽  
pp. 253
Author(s):  
Venura Herath ◽  
Jeanmarie Verchot

The basic region-leucine zipper (bZIP) transcription factors (TFs) form homodimers and heterodimers via the coil–coil region. The bZIP dimerization network influences gene expression across plant development and in response to a range of environmental stresses. The recent release of the most comprehensive potato reference genome was used to identify 80 StbZIP genes and to characterize their gene structure, phylogenetic relationships, and gene expression profiles. The StbZIP genes have undergone 22 segmental and one tandem duplication events. Ka/Ks analysis suggested that most duplications experienced purifying selection. Amino acid sequence alignments and phylogenetic comparisons made with the Arabidopsis bZIP family were used to assign the StbZIP genes to functional groups based on the Arabidopsis orthologs. The patterns of introns and exons were conserved within the assigned functional groups which are supportive of the phylogeny and evidence of a common progenitor. Inspection of the leucine repeat heptads within the bZIP domains identified a pattern of attractive pairs favoring homodimerization, and repulsive pairs favoring heterodimerization. These patterns of attractive and repulsive heptads were similar within each functional group for Arabidopsis and S. tuberosum orthologs. High-throughput RNA-seq data indicated the most highly expressed and repressed genes that might play significant roles in tissue growth and development, abiotic stress response, and response to pathogens including Potato virus X. These data provide useful information for further functional analysis of the StbZIP gene family and their potential applications in crop improvement.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Juan C. Muñoz-Escalante ◽  
Andreu Comas-García ◽  
Sofía Bernal-Silva ◽  
Daniel E. Noyola

AbstractRespiratory syncytial virus (RSV) is a major cause of respiratory infections and is classified in two main groups, RSV-A and RSV-B, with multiple genotypes within each of them. For RSV-B, more than 30 genotypes have been described, without consensus on their definition. The lack of genotype assignation criteria has a direct impact on viral evolution understanding, development of viral detection methods as well as vaccines design. Here we analyzed the totality of complete RSV-B G gene ectodomain sequences published in GenBank until September 2018 (n = 2190) including 478 complete genome sequences using maximum likelihood and Bayesian phylogenetic analyses, as well as intergenotypic and intragenotypic distance matrices, in order to generate a systematic genotype assignation. Individual RSV-B genes were also assessed using maximum likelihood phylogenetic analyses and multiple sequence alignments were used to identify molecular markers associated to specific genotypes. Analyses of the complete G gene ectodomain region, sequences clustering patterns, and the presence of molecular markers of each individual gene indicate that the 37 previously described genotypes can be classified into fifteen distinct genotypes: BA, BA-C, BA-CC, CB1-THB, GB1-GB4, GB6, JAB1-NZB2, SAB1, SAB2, SAB4, URU2 and a novel early circulating genotype characterized in the present study and designated GB0.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Farhan Quadir ◽  
Raj S. Roy ◽  
Randal Halfmann ◽  
Jianlin Cheng

AbstractDeep learning methods that achieved great success in predicting intrachain residue-residue contacts have been applied to predict interchain contacts between proteins. However, these methods require multiple sequence alignments (MSAs) of a pair of interacting proteins (dimers) as input, which are often difficult to obtain because there are not many known protein complexes available to generate MSAs of sufficient depth for a pair of proteins. In recognizing that multiple sequence alignments of a monomer that forms homomultimers contain the co-evolutionary signals of both intrachain and interchain residue pairs in contact, we applied DNCON2 (a deep learning-based protein intrachain residue-residue contact predictor) to predict both intrachain and interchain contacts for homomultimers using multiple sequence alignment (MSA) and other co-evolutionary features of a single monomer followed by discrimination of interchain and intrachain contacts according to the tertiary structure of the monomer. We name this tool DNCON2_Inter. Allowing true-positive predictions within two residue shifts, the best average precision was obtained for the Top-L/10 predictions of 22.9% for homodimers and 17.0% for higher-order homomultimers. In some instances, especially where interchain contact densities are high, DNCON2_Inter predicted interchain contacts with 100% precision. We also developed Con_Complex, a complex structure reconstruction tool that uses predicted contacts to produce the structure of the complex. Using Con_Complex, we show that the predicted contacts can be used to accurately construct the structure of some complexes. Our experiment demonstrates that monomeric multiple sequence alignments can be used with deep learning to predict interchain contacts of homomeric proteins.


2015 ◽  
Vol 28 (1) ◽  
pp. 46 ◽  
Author(s):  
David A. Morrison ◽  
Matthew J. Morgan ◽  
Scot A. Kelchner

Sequence alignment is just as much a part of phylogenetics as is tree building, although it is often viewed solely as a necessary tool to construct trees. However, alignment for the purpose of phylogenetic inference is primarily about homology, as it is the procedure that expresses homology relationships among the characters, rather than the historical relationships of the taxa. Molecular homology is rather vaguely defined and understood, despite its importance in the molecular age. Indeed, homology has rarely been evaluated with respect to nucleotide sequence alignments, in spite of the fact that nucleotides are the only data that directly represent genotype. All other molecular data represent phenotype, just as do morphology and anatomy. Thus, efforts to improve sequence alignment for phylogenetic purposes should involve a more refined use of the homology concept at a molecular level. To this end, we present examples of molecular-data levels at which homology might be considered, and arrange them in a hierarchy. The concept that we propose has many levels, which link directly to the developmental and morphological components of homology. Of note, there is no simple relationship between gene homology and nucleotide homology. We also propose terminology with which to better describe and discuss molecular homology at these levels. Our over-arching conceptual framework is then used to shed light on the multitude of automated procedures that have been created for multiple-sequence alignment. Sequence alignment needs to be based on aligning homologous nucleotides, without necessary reference to homology at any other level of the hierarchy. In particular, inference of nucleotide homology involves deriving a plausible scenario for molecular change among the set of sequences. Our clarifications should allow the development of a procedure that specifically addresses homology, which is required when performing alignment for phylogenetic purposes, but which does not yet exist.


Sign in / Sign up

Export Citation Format

Share Document