Integration of viral transcriptome sequencing with structure and sequence motifs predicts novel regulatory elements in SARS-CoV-2

Mapping Intimacies ◽

10.1101/2020.06.24.169144 ◽

2020 ◽

Author(s):

Brian J. Cox

Keyword(s):

Regulatory Elements ◽

Viral Gene ◽

Template Switching ◽

Sequence Motif ◽

Sequence Motifs ◽

Human Pathogens ◽

Sequencing Data ◽

Stem Loop ◽

Conserved Sequence ◽

Splice Junctions

SummaryIn the last twenty years, three separate coronaviruses have left their typical animal hosts and became human pathogens. An area of research interest is coronavirus transcription regulation that uses an RNA-RNA mediated template-switching mechanism. It is not known how different transcriptional stoichiometries of each viral gene are generated. Analysis of SARS-CoV-2 RNA sequencing data from whole RNA transcriptomes identified TRS dependent and independent transcripts. Integration of transcripts and 5’-UTR sequence motifs identified that the pentaloop and the stem-loop 3 were also located upstream of spliced genes. TRS independent transcripts were detected as likely non-polyadenylated. Additionally, a novel conserved sequence motif was discovered at either end of the TRS independent splice junctions. While similar both SARS viruses generated similar TRS independent transcripts they were more abundant in SARS-CoV-2. TRS independent gene regulation requires investigation to determine its relationship to viral pathogenicity.

Download Full-text

StoatyDive: Evaluation and Classification of Peak Profiles for Sequencing Data

10.1101/799114 ◽

2019 ◽

Cited By ~ 1

Author(s):

Florian Heyl ◽

Rolf Backofen

Keyword(s):

Quality Control ◽

Binding Sites ◽

High Throughput Sequencing ◽

Sequence Motif ◽

Sequence Motifs ◽

Sequencing Data ◽

Stem Loop ◽

Link Type ◽

Quality Control Tool ◽

Downstream Analysis

The prediction of binding sites (peak calling) is a common task in the data analysis of methods such as crosslinking or chromatin immunoprecipitation in combination with high-throughput sequencing (CLIP-Seq, ChIP-Seq). The predicted binding sites are often further analyzed to predict sequence motifs or structure patterns as an example. However, the obtained peak set can vary in their profile shapes because of the used peakcaller method, different binding domains of the protein, protocol biases, or other factors. Thus, a tool is missing that evaluates and classifies the predicted peaks based on their shapes. We hereby present StoatyDive, a tool that can be used to filter for specific peak profile shapes of sequencing data such as CLIP and ChIP. StoatyDive therefore fine tunes downstream analysis steps such as structure or sequence motif predictions and acts as a quality control.With StoatyDive we were able to classify distinct peak profile shapes from CLIP-seq data of the histone stem-loop-binding protein (SLBP). We show the potential of StoatyDive, as a quality control tool and as a filter to pick different shapes based on biological or methodical questions.StoatyDive is open source and freely available under GLP-3 at https://github.com/BackofenLab/StoatyDive and at bioconda https://anaconda.org/bioconda/stoatydive.

Download Full-text

DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome

Bioinformatics ◽

10.1093/bioinformatics/btab083 ◽

2021 ◽

Author(s):

Yanrong Ji ◽

Zhihan Zhou ◽

Han Liu ◽

Ramana V Davuluri

Keyword(s):

Dna Sequences ◽

Regulatory Elements ◽

Ease Of Use ◽

Fine Tuning ◽

Supplementary Information ◽

Sequence Motifs ◽

Semantic Relationship ◽

Accurate Identification ◽

Conserved Sequence ◽

Genome Wide

Abstract Motivation Deciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex due to the existence of polysemy and distant semantic relationship, which previous informatics methods often fail to capture especially in data-scarce scenarios. Results To address this challenge, we developed a novel pre-trained bidirectional encoder representation, named DNABERT, to capture global and transferrable understanding of genomic DNA sequences based on up and downstream nucleotide contexts. We compared DNABERT to the most widely used programs for genome-wide regulatory elements prediction and demonstrate its ease of use, accuracy and efficiency. We show that the single pre-trained transformers model can simultaneously achieve state-of-the-art performance on prediction of promoters, splice sites and transcription factor binding sites, after easy fine-tuning using small task-specific labeled data. Further, DNABERT enables direct visualization of nucleotide-level importance and semantic relationship within input sequences for better interpretability and accurate identification of conserved sequence motifs and functional genetic variant candidates. Finally, we demonstrate that pre-trained DNABERT with human genome can even be readily applied to other organisms with exceptional performance. We anticipate that the pre-trained DNABERT model can be fined tuned to many other sequence analyses tasks. Availability and implementation The source code, pretrained and finetuned model for DNABERT are available at GitHub (https://github.com/jerryji1993/DNABERT). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Tomato spotted wilt virus S-segment mRNAs have overlapping 3′-ends containing a predicted stem-loop structure and conserved sequence motif

Virus Research ◽

10.1016/j.virusres.2005.01.012 ◽

2005 ◽

Vol 110 (1-2) ◽

pp. 125-131 ◽

Cited By ~ 27

Author(s):

Ingeborg van Knippenberg ◽

Rob Goldbach ◽

Richard Kormelink

Keyword(s):

Tomato Spotted Wilt Virus ◽

Sequence Motif ◽

Loop Structure ◽

Stem Loop ◽

Conserved Sequence ◽

Tomato Spotted Wilt ◽

Stem Loop Structure ◽

Wilt Virus

Download Full-text

An Amyloidogenic Sequence at the N-Terminus of the Androgen Receptor Impacts Polyglutamine Aggregation

10.20944/preprints201705.0126.v1 ◽

2017 ◽

Author(s):

Emmanuel Oppong ◽

Gunter Stier ◽

Miriam Gaal ◽

Rebecca Seeger ◽

Melanie Stoeck ◽

...

Keyword(s):

Androgen Receptor ◽

Amyloid Fibrils ◽

Intrinsic Property ◽

Activation Function ◽

Cysteine Residue ◽

Sequence Motif ◽

Sequence Motifs ◽

Conserved Sequence ◽

Amino Terminal ◽

Intrinsically Disordered

The human androgen receptor (AR) is a ligand inducible transcription factor harboring an amino terminal domain (AR-NTD) hosting the ligand independent activation function. AR-NTD is intrinsically disordered and display aggregation properties conferred by the presence of a poly-glutamine (polyQ) sequence of 22 residues. The length of the polyQ sequence, as well as the presence of adjacent sequence motifs modulate this aggregation property. AR-NTD contains also a conserved sequence motif KELCKAVSVSM that displays an intrinsic property to form amyloid fibrils under mild oxidative conditions of its conserved cysteine residue. As peptide sequences with intrinsic ability to oligomerize are reported to have an impact on the aggregation of polyQ tract, we determined the effect of the KELCKAVSVSM on the polyQ stretch in the context of the AR NTD, using Atomic Force Microscopy (AFM). Here, we present evidence for a crosstalk between the amyloidogenic properties of the KELCKAVSVSM motif and the polyQ stretch at the AR NTD.

Download Full-text

Olfactory expression of trace amine-associated receptors requires cooperative cis-acting enhancers

Nature Communications ◽

10.1038/s41467-021-23824-3 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Ami Shah ◽

Madison Ratkowski ◽

Alessandro Rosa ◽

Paul Feinstein ◽

Thomas Bozza

Keyword(s):

Gene Expression ◽

Large Family ◽

Sequence Motifs ◽

Specific Expression ◽

Cis Acting ◽

Conserved Sequence ◽

Trace Amine ◽

Sequence Elements ◽

Cell Type Specific Expression ◽

Cell Type Specific

AbstractOlfactory sensory neurons express a large family of odorant receptors (ORs) and a small family of trace amine-associated receptors (TAARs). While both families are subject to so-called singular expression (expression of one allele of one gene), the mechanisms underlying TAAR gene choice remain obscure. Here, we report the identification of two conserved sequence elements in the mouse TAAR cluster (T-elements) that are required for TAAR gene expression. We observed that cell-type-specific expression of a TAAR-derived transgene required either T-element. Moreover, deleting either element reduced or abolished expression of a subset of TAAR genes, while deleting both elements abolished olfactory expression of all TAARs in cis with the mutation. The T-elements exhibit several features of known OR enhancers but also contain highly conserved, unique sequence motifs. Our data demonstrate that TAAR gene expression requires two cooperative cis-acting enhancers and suggest that ORs and TAARs share similar mechanisms of singular expression.

Download Full-text

Antimutator mutations in the alpha subunit of Escherichia coli DNA polymerase III: identification of the responsible mutations and alignment with other DNA polymerases.

Genetics ◽

10.1093/genetics/134.4.1039 ◽

1993 ◽

Vol 134 (4) ◽

pp. 1039-1044 ◽

Cited By ~ 2

Author(s):

I J Fijalkowska ◽

R M Schaaper

Keyword(s):

Escherichia Coli ◽

Amino Acid ◽

Dna Polymerase ◽

Dna Polymerases ◽

Alpha Subunit ◽

Sequence Motifs ◽

Dna Polymerase Iii ◽

Conserved Sequence ◽

Polymerase Iii ◽

Dna Replication Errors

Abstract The dnaE gene of Escherichia coli encodes the DNA polymerase (alpha subunit) of the main replicative enzyme, DNA polymerase III holoenzyme. We have previously identified this gene as the site of a series of seven antimutator mutations that specifically decrease the level of DNA replication errors. Here we report the nucleotide sequence changes in each of the different antimutator dnaE alleles. For each a single, but different, amino acid substitution was found among the 1,160 amino acids of the protein. The observed substitutions are generally nonconservative. All affected residues are located in the central one-third of the protein. Some insight into the function of the regions of polymerase III containing the affected residues was obtained by amino acid alignment with other DNA polymerases. We followed the principles developed in 1990 by M. Delarue et al. who have identified in DNA polymerases from a large number of prokaryotic and eukaryotic sources three highly conserved sequence motifs, which are suggested to contain components of the polymerase active site. We succeeded in finding these three conserved motifs in polymerase III as well. However, none of the amino acid substitutions responsible for the antimutator phenotype occurred at these sites. This and other observations suggest that the effect of these mutations may be exerted indirectly through effects on polymerase conformation and/or DNA/polymerase interactions.

Download Full-text

The first crystal structure of the peptidase domain of the U32 peptidase family

Acta Crystallographica Section D Biological Crystallography ◽

10.1107/s1399004715019549 ◽

2015 ◽

Vol 71 (12) ◽

pp. 2505-2512 ◽

Cited By ~ 4

Author(s):

Magdalena Schacherl ◽

Angelika A. M. Montada ◽

Elena Brunstein ◽

Ulrich Baumann

Keyword(s):

Crystal Structure ◽

Catalytic Mechanism ◽

Quaternary Structure ◽

Catalytic Domain ◽

Three Dimensional ◽

Zinc Ion ◽

Crystal Structure Analysis ◽

Dimensional Structure ◽

Sequence Motifs ◽

Conserved Sequence

The U32 family is a collection of over 2500 annotated peptidases in the MEROPS database with unknown catalytic mechanism. They mainly occur in bacteria and archaea, but a few representatives have also been identified in eukarya. Many of the U32 members have been linked to pathogenicity, such as proteins fromHelicobacterandSalmonella. The first crystal structure analysis of a U32 catalytic domain fromMethanopyrus kandleri(genemk0906) reveals a modified (βα)8TIM-barrel fold with some unique features. The connecting segment between strands β7 and β8 is extended and helix α7 is located on top of the C-terminal end of the barrel body. The protein exhibits a dimeric quaternary structure in which a zinc ion is symmetrically bound by histidine and cysteine side chains from both monomers. These residues reside in conserved sequence motifs. No typical proteolytic motifs are discernible in the three-dimensional structure, and biochemical assays failed to demonstrate proteolytic activity. A tunnel in which an acetate ion is bound is located in the C-terminal part of the β-barrel. Two hydrophobic grooves lead to a tunnel at the C-terminal end of the barrel in which an acetate ion is bound. One of the grooves binds to aStrep-Tag II of another dimer in the crystal lattice. Thus, these grooves may be binding sites for hydrophobic peptides or other ligands.

Download Full-text

Genomic structure of murine methylmalonyl-CoA mutase: evidence for genetic and epigenetic mechanisms determining enzyme activity

Biochemical Journal ◽

10.1042/bj2960663 ◽

1993 ◽

Vol 296 (3) ◽

pp. 663-670 ◽

Cited By ~ 13

Author(s):

M F Wilkemeyer ◽

E R Andrews ◽

F D Ledley

Keyword(s):

Enzyme Activity ◽

Steady State ◽

Transcription Initiation ◽

Cultured Cells ◽

Genomic Structure ◽

Genetic Regulation ◽

Regulatory Elements ◽

Transcription Unit ◽

Mrna Levels ◽

Sequence Motifs

Methylmalonyl-CoA mutase (MCM) is a nuclear-encoded mitochondrial matrix enzyme. We have reported characterization of murine MCM and cloning of a murine MCM cDNA and now describe the murine Mut locus, its promoter and evidence for tissue-specific variation in MCM mRNA, enzyme and holo-enzyme levels. The Mut locus spans 30 kb and contains 13 exons constituting a unique transcription unit. A B1 repeat element was found in the 3′ untranslated region (exon 13). The transcription initiation site was identified and upstream sequences were shown to direct expression of a reporter gene in cultured cells. The promoter contains sequence motifs characteristic of: (1) TATA-less housekeeping promoters; (2) enhancer elements purportedly involved in co-ordinating expression of nuclear-encoded mitochondrial proteins; and (3) regulatory elements including CCAAT boxes, cyclic AMP-response elements and potential AP-2-binding sites. Northern blots demonstrate a greater than 10-fold variation in steady-state mRNA levels, which correlate with tissue levels of enzyme activity. However, the ratio of holoenzyme to total enzyme varies among different tissues, and there is no correlation between steady-state mRNA levels and holoenzyme activity. These results suggest that, although there may be regulation of MCM activity at the level of mRNA, the significance of genetic regulation is unclear owning to the presence of epigenetic regulation of holoenzyme formation.

Download Full-text

Molecular and Cytological Analyses of Large Tracks of Centromeric DNA Reveal the Structure and Evolutionary Dynamics of Maize Centromeres

Genetics ◽

10.1093/genetics/163.2.759 ◽

2003 ◽

Vol 163 (2) ◽

pp. 759-770 ◽

Cited By ~ 5

Author(s):

Kiyotaka Nagaki ◽

Junqi Song ◽

Robert M Stupar ◽

Alexander S Parokonny ◽

Qiaoping Yuan ◽

...

Keyword(s):

Evolutionary Dynamics ◽

Artificial Chromosome ◽

Grass Species ◽

Sequence Motifs ◽

Long Terminal Repeats ◽

Satellite Repeat ◽

Sequence Comparisons ◽

Centromeric Dna ◽

Conserved Sequence ◽

Satellite Sequences

Abstract We sequenced two maize bacterial artificial chromosome (BAC) clones anchored by the centromere-specific satellite repeat CentC. The two BACs, consisting of ∼200 kb of cytologically defined centromeric DNA, are composed exclusively of satellite sequences and retrotransposons that can be classified as centromere specific or noncentromere specific on the basis of their distribution in the maize genome. Sequence analysis suggests that the original maize sequences were composed of CentC arrays that were expanded by retrotransposon invasions. Seven centromere-specific retrotransposons of maize (CRM) were found in BAC 16H10. The CRM elements inserted randomly into either CentC monomers or other retrotransposons. Sequence comparisons of the long terminal repeats (LTRs) of individual CRM elements indicated that these elements transposed within the last 1.22 million years. We observed that all of the previously reported centromere-specific retrotransposons in rice and barley, which belong to the same family as the CRM elements, also recently transposed with the oldest element having transposed ∼3.8 million years ago. Highly conserved sequence motifs were found in the LTRs of the centromere-specific retrotransposons in the grass species, suggesting that the LTRs may be important for the centromere specificity of this retrotransposon family.

Download Full-text

Conserved sequence motifs among bacterial, eukaryotic, and archaeal phosphatases that define a new phosphohydrolase superfamily

Protein Science ◽

10.1002/pro.5560070722 ◽

1998 ◽

Vol 7 (7) ◽

pp. 1647-1652 ◽

Cited By ~ 96

Author(s):

Maria Cristina Thaller ◽

Serena Schippa ◽

Gian Maria Rossolini

Keyword(s):

Sequence Motifs ◽

Conserved Sequence ◽

Conserved Sequence Motifs

Download Full-text