dinucleotide composition
Recently Published Documents


TOTAL DOCUMENTS

26
(FIVE YEARS 14)

H-INDEX

10
(FIVE YEARS 3)

PLoS Biology ◽  
2021 ◽  
Vol 19 (4) ◽  
pp. e3001201
Author(s):  
Jelke J. Fros ◽  
Imke Visser ◽  
Bing Tang ◽  
Kexin Yan ◽  
Eri Nakayama ◽  
...  

Most vertebrate RNA viruses show pervasive suppression of CpG and UpA dinucleotides, closely resembling the dinucleotide composition of host cell transcriptomes. In contrast, CpG suppression is absent in both invertebrate mRNA and RNA viruses that exclusively infect arthropods. Arthropod-borne (arbo) viruses are transmitted between vertebrate hosts by invertebrate vectors and thus encounter potentially conflicting evolutionary pressures in the different cytoplasmic environments. Using a newly developed Zika virus (ZIKV) model, we have investigated how demands for CpG suppression in vertebrate cells can be reconciled with potentially quite different compositional requirements in invertebrates and how this affects ZIKV replication and transmission. Mutant viruses with synonymously elevated CpG or UpA dinucleotide frequencies showed attenuated replication in vertebrate cell lines, which was rescued by knockout of the zinc-finger antiviral protein (ZAP). Conversely, in mosquito cells, ZIKV mutants with elevated CpG dinucleotide frequencies showed substantially enhanced replication compared to wild type. Host-driven effects on virus replication attenuation and enhancement were even more apparent in mouse and mosquito models. Infections with CpG- or UpA-high ZIKV mutants in mice did not cause typical ZIKV-induced tissue damage and completely protected mice during subsequent challenge with wild-type virus, which demonstrates their potential as live-attenuated vaccines. In contrast, the CpG-high mutants displayed enhanced replication in Aedes aegypti mosquitoes and a larger proportion of mosquitoes carried infectious virus in their saliva. These findings show that mosquito cells are also capable of discriminating RNA based on dinucleotide composition. However, the evolutionary pressure on the CpG dinucleotides of viral genomes in arthropod vectors directly opposes the pressure present in vertebrate host cells, which provides evidence that an adaptive compromise is required for arbovirus transmission. This suggests that the genome composition of arbo flaviviruses is crucial to maintain the balance between high-level replication in the vertebrate host and persistent replication in the mosquito vector.


Genes ◽  
2021 ◽  
Vol 12 (3) ◽  
pp. 354
Author(s):  
Lu Zhang ◽  
Xinyi Qin ◽  
Min Liu ◽  
Ziwei Xu ◽  
Guangzhong Liu

As a prevalent existing post-transcriptional modification of RNA, N6-methyladenosine (m6A) plays a crucial role in various biological processes. To better radically reveal its regulatory mechanism and provide new insights for drug design, the accurate identification of m6A sites in genome-wide is vital. As the traditional experimental methods are time-consuming and cost-prohibitive, it is necessary to design a more efficient computational method to detect the m6A sites. In this study, we propose a novel cross-species computational method DNN-m6A based on the deep neural network (DNN) to identify m6A sites in multiple tissues of human, mouse and rat. Firstly, binary encoding (BE), tri-nucleotide composition (TNC), enhanced nucleic acid composition (ENAC), K-spaced nucleotide pair frequencies (KSNPFs), nucleotide chemical property (NCP), pseudo dinucleotide composition (PseDNC), position-specific nucleotide propensity (PSNP) and position-specific dinucleotide propensity (PSDP) are employed to extract RNA sequence features which are subsequently fused to construct the initial feature vector set. Secondly, we use elastic net to eliminate redundant features while building the optimal feature subset. Finally, the hyper-parameters of DNN are tuned with Bayesian hyper-parameter optimization based on the selected feature subset. The five-fold cross-validation test on training datasets show that the proposed DNN-m6A method outperformed the state-of-the-art method for predicting m6A sites, with an accuracy (ACC) of 73.58%–83.38% and an area under the curve (AUC) of 81.39%–91.04%. Furthermore, the independent datasets achieved an ACC of 72.95%–83.04% and an AUC of 80.79%–91.09%, which shows an excellent generalization ability of our proposed method.


2021 ◽  
Author(s):  
Jelke J. Fros ◽  
Imke Visser ◽  
Bing Tang ◽  
Kexin Yan ◽  
Eri Nakayama ◽  
...  

AbstractMost vertebrate RNA viruses show pervasive suppression of CpG and UpA dinucleotides, closely resembling the dinucleotide composition of host cell transcriptomes. In contrast, CpG suppression is absent in both invertebrate mRNA and RNA viruses that exclusively infect arthropods. Arthropod-borne (arbo) viruses are transmitted between vertebrate hosts by invertebrate vectors and thus encounter potentially conflicting evolutionary pressures in the different cytoplasmic environments. Using a newly developed Zika virus (ZIKV) model, we have investigated how demands for CpG suppression in vertebrate cells can be reconciled with potentially quite different compositional requirements in invertebrates, and how this affects ZIKV replication and transmission.Mutant viruses with synonymously elevated CpG or UpA dinucleotide frequencies showed attenuated replication in vertebrate cell lines, which was rescued by knockout of the zinc-finger antiviral protein (ZAP). Conversely, in mosquito cells, ZIKV mutants with elevated CpG dinucleotide frequencies showed substantially enhanced replication compared to wildtype. Host-driven effects on virus replication attenuation and enhancement were even more apparent in mouse and mosquito models. Infections with CpG-or UpA-high ZIKV mutants in mice did not cause typical ZIKV-induced tissue damage and completely protected mice during subsequent challenge with wildtype virus, which demonstrates their potential as live-attenuated vaccines. In contrast, the CpG-high mutants displayed enhanced replication in Aedes aegypti mosquitoes and a larger proportion of mosquitoes carried infectious virus in their saliva.These findings show that mosquito cells are also capable of discriminating RNA based on dinucleotide composition. However, the evolutionary pressure on the CpG dinucleotides of viral genomes in arthropod vectors directly opposes the pressure present in vertebrate host cells, which provides evidence that an adaptive compromise is required for arbovirus transmission. This suggests that the genome composition of arthropod-borne flaviviruses is crucial to maintain the balance between high-level replication in the vertebrate host and persistent replication in the mosquito vector.


Author(s):  
Lijun Cai ◽  
Xuanbai Ren ◽  
Xiangzheng Fu ◽  
Li Peng ◽  
Mingyu Gao ◽  
...  

Abstract Motivation Enhancers are non-coding DNA fragments with high position variability and free scattering. They play an important role in controlling gene expression. As machine learning has become more widely used in identifying enhancers, a number of bioinformatic tools have been developed. Although several models for identifying enhancers and their strengths have been proposed, their accuracy and efficiency have yet to be improved. Results We propose a two-layer predictor called ‘iEnhancer-XG.’ It comprises a one-layer predictor (for identifying enhancers) and a second classifier (for their strength) and uses ‘XGBoost’ as a base classifier and five feature extraction methods, namely, k-Spectrum Profile, Mismatch k-tuple, Subsequence Profile, Position-specific scoring matrix (PSSM) and Pseudo dinucleotide composition (PseDNC). Each method has an independent output. We place the feature vector matrix into the ensemble learning for fusion. This experiment involves the method of ‘SHapley Additive explanations’ to provide interpretability for the previous black box machine learning methods and improve their credibility. The accuracies of the ensemble learning method are 0.811 (first layer) and 0.657 (second layer). The rigorous 10-fold cross-validation confirms that the proposed method is significantly better than existing technologies. Availability and implementation The source code and dataset for the enhancer predictions have been uploaded to https://github.com/jimmyrate/ienhancer-xg. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 6 (2) ◽  
Author(s):  
Paul Digard ◽  
Hui Min Lee ◽  
Colin Sharp ◽  
Finn Grey ◽  
Eleanor Gaunt

Abstract CpG dinucleotides are under-represented in the genomes of single-stranded RNA viruses, and SARS-CoV-2 is no exception to this. Artificial modification of CpG frequency is a valid approach for live attenuated vaccine development; if this is to be applied to SARS-CoV-2, we must first understand the role CpG motifs play in regulating SARS-CoV-2 replication. Accordingly, the CpG composition of the SARS-CoV-2 genome was characterised. CpG suppression among coronaviruses does not differ between virus genera but does vary with host species and primary replication site (a proxy for tissue tropism), supporting the hypothesis that viral CpG content may influence cross-species transmission. Although SARS-CoV-2 exhibits overall strong CpG suppression, this varies considerably across the genome, and the Envelope (E) open reading frame (ORF) and ORF10 demonstrate an absence of CpG suppression. Across the Coronaviridae, E genes display remarkably high variation in CpG composition, with those of SARS and SARS-CoV-2 having much higher CpG content than other coronaviruses isolated from humans. This is an ancestrally derived trait reflecting their bat origins. Conservation of CpG motifs in these regions suggests that they have a functionality which over-rides the need to suppress CpG; an observation relevant to future strategies towards a rationally attenuated SARS-CoV-2 vaccine.


2020 ◽  
Author(s):  
Kennosuke Wada ◽  
Yoshiko Wada ◽  
Toshimichi Ikemura

AbstractWe first conducted time-series analysis of mono- and dinucleotide composition for over 10,000 SARS-CoV-2 genomes, as well as over 1500 Zaire ebolavirus genomes, and found clear time-series changes in the compositions on a monthly basis, which should reflect viral adaptations for efficient growth in human cells. We next developed a sequence alignment free method that extensively searches for advantageous mutations and rank them in an increase level for their intrapopulation frequency. Time-series analysis of occurrences of oligonucleotides of diverse lengths for SARS-CoV-2 genomes revealed seven distinctive mutations that rapidly expanded their intrapopulation frequency and are thought to be candidates of advantageous mutations for the efficient growth in human cells.


Author(s):  
Paul Digard ◽  
Hui Min Lee ◽  
Colin Sharp ◽  
Finn Grey ◽  
Eleanor Gaunt

AbstractCpG dinucleotides are under-represented in the genomes of single stranded RNA viruses, and coronaviruses, including SARS-CoV-2, are no exception to this. Artificial modification of CpG frequency is a valid approach for live attenuated vaccine development, and if this is to be applied to SARS-CoV-2, we must first understand the role CpG motifs play in regulating SARS-CoV-2 replication. Accordingly, the CpG composition of the newly emerged SARS-CoV-2 genome was characterised in the context of other coronaviruses. CpG suppression amongst coronaviruses does not significantly differ according to genera of virus, but does vary according to host species and primary replication site (a proxy for tissue tropism), supporting the hypothesis that viral CpG content may influence cross-species transmission. Although SARS-CoV-2 exhibits overall strong CpG suppression, this varies considerably across the genome, and the Envelope (E) open reading frame (ORF) and ORF10 demonstrate an absence of CpG suppression. While ORF10 is only present in the genomes of a subset of coronaviruses, E is essential for virus replication. Across the Coronaviridae, E genes display remarkably high variation in CpG composition, with those of SARS and SARS-CoV-2 having much higher CpG content than other coronaviruses isolated from humans. Phylogeny indicates that this is an ancestrally-derived trait reflecting their origin in bats, rather than something selected for after zoonotic transfer. Conservation of CpG motifs in these regions suggests that they have a functionality which over-rides the need to suppress CpG; an observation relevant to future strategies towards a rationally attenuated SARS-CoV-2 vaccine.


Author(s):  
Fuyi Li ◽  
Jinxiang Chen ◽  
Zongyuan Ge ◽  
Ya Wen ◽  
Yanwei Yue ◽  
...  

Abstract Promoters are short consensus sequences of DNA, which are responsible for transcription activation or the repression of all genes. There are many types of promoters in bacteria with important roles in initiating gene transcription. Therefore, solving promoter-identification problems has important implications for improving the understanding of their functions. To this end, computational methods targeting promoter classification have been established; however, their performance remains unsatisfactory. In this study, we present a novel stacked-ensemble approach (termed SELECTOR) for identifying both promoters and their respective classification. SELECTOR combined the composition of k-spaced nucleic acid pairs, parallel correlation pseudo-dinucleotide composition, position-specific trinucleotide propensity based on single-strand, and DNA strand features and using five popular tree-based ensemble learning algorithms to build a stacked model. Both 5-fold cross-validation tests using benchmark datasets and independent tests using the newly collected independent test dataset showed that SELECTOR outperformed state-of-the-art methods in both general and specific types of promoter prediction in Escherichia coli. Furthermore, this novel framework provides essential interpretations that aid understanding of model success by leveraging the powerful Shapley Additive exPlanation algorithm, thereby highlighting the most important features relevant for predicting both general and specific types of promoters and overcoming the limitations of existing ‘Black-box’ approaches that are unable to reveal causal relationships from large amounts of initially encoded features.


Sign in / Sign up

Export Citation Format

Share Document