Pattern recognition in the case of strong background noise

Mapping Intimacies ◽

10.51415/10321/2162 ◽

2001 ◽

Author(s):

◽

Xingmei Wang

Keyword(s):

Dna Sequences ◽

Reference Point ◽

Tata Box ◽

Finite Alphabet ◽

General Nature ◽

Precise Description ◽

Transcription Start ◽

Promoter Regions ◽

Target Class ◽

Tata Motif

This study presents a development of a method for recognition of a class of patterns in signals contaminated by strong noise. The class of signals considered is described by a finite alphabet. The target class of patterns is assumed to have specific statistical properties that can be conveniently captured by the position weight matrices (PWM) description. Itis also assumed thatthe 'signals: contain numerous patterns si~ilar to the patterns of the target class, but which belong to different classes. These other patterns represent the noise in the signals. The method for-improved recogrrition of the target class of patterns is based on clustering of the target motifs with regard to distance form the reference point (event) in the signal. This positional clustering enables more precise description of the target class of patterns by means of the PWMs. However, it requires the use of as many PWMs as there are clusters of the target class. The method developed is of general nature, applicable to the situations described. It is however, applied to the recognition of the specific short motifs in DNA sequences. The short motif considered is the TATA-box,one of the most important docking sites for proteins in Eukaryotic polymerase II promoter regions. The reference point in the singals obtained form DNA sequences the transcription .start site (TSS). Thus the positional dustering of the TATA-box motif resulted in 20 different PWMs, instead of only one that describes the whole TATA motif class. This however, resulted in more discriminative PWMs and the recognition accuracy has increased by about a factor of two when compared to the recognition of the TATA moti f based on the original PWM.

Download Full-text

DNA mismatch repair deficient tumors exhibit length variability of repetitive DNA sequences in diverse promoter regions

European Journal of Cancer ◽

10.1016/s0959-8049(97)84419-x ◽

1997 ◽

Vol 33 ◽

pp. S11

Author(s):

C. Sutter ◽

J. Gebert ◽

P. Bischoff ◽

D. Kube ◽

C. Herfarth ◽

...

Keyword(s):

Mismatch Repair ◽

Repetitive Dna ◽

Dna Sequences ◽

Dna Mismatch Repair ◽

Repetitive Dna Sequences ◽

Promoter Regions ◽

Dna Mismatch

Download Full-text

A cis Element between the TATA Box and the Transcription Start Site of the Major Immediate-Early Promoter of Human Cytomegalovirus Determines Efficiency of Viral Replication

Journal of Virology ◽

10.1128/jvi.01593-07 ◽

2007 ◽

Vol 82 (2) ◽

pp. 849-858 ◽

Cited By ~ 23

Author(s):

Hiroki Isomura ◽

Mark F. Stinski ◽

Ayumi Kudoh ◽

Sanae Nakayama ◽

Takayuki Murata ◽

...

Keyword(s):

Viral Replication ◽

Transcription Start Site ◽

Gene Transcription ◽

Human Cytomegalovirus ◽

Recombinant Virus ◽

Tata Box ◽

Immediate Early ◽

Wild Type Virus ◽

Start Site ◽

Transcription Start

ABSTRACT The promoter of the major immediate-early (MIE) genes of human cytomegalovirus (HCMV), also referred to as the CMV promoter, possesses a cis-acting element positioned downstream of the TATA box between positions −14 and −1 relative to the transcription start site (+1). We determined the role of the cis-acting element in viral replication by comparing recombinant viruses with the cis-acting element replaced with other sequences. Recombinant virus with the simian CMV counterpart replicated efficiently in human foreskin fibroblasts, as well as wild-type virus. In contrast, replacement with the murine CMV counterpart caused inefficient MIE gene transcription, RNA splicing, MIE and early viral gene expression, and viral DNA replication. To determine which nucleotides in the cis-acting element are required for efficient MIE gene transcription and splicing, we constructed mutations within the cis-acting element in the context of a recombinant virus. While mutations in the cis-acting element have only a minor effect on in vitro transcription, the effects on viral replication are major. The nucleotides at −10 and −9 in the cis-acting element relative to the transcription start site (+1) affect efficient MIE gene transcription and splicing at early times after infection. The cis-acting element also acts as a cis-repression sequence when the viral IE86 protein accumulates in the infected cell. We demonstrate that the cis-acting element has an essential role in viral replication.

Download Full-text

TATA box mutations in the Schizosaccharomyces pombe nmt1 promoter affect transcription efficiency but not the transcription start point or thiamine repressibility

Gene ◽

10.1016/0378-1119(93)90552-e ◽

1993 ◽

Vol 123 (1) ◽

pp. 131-136 ◽

Cited By ~ 409

Author(s):

Gabriele Basi ◽

Elisabeth Schmid ◽

Kinsey Maundrell

Keyword(s):

Schizosaccharomyces Pombe ◽

Tata Box ◽

Transcription Start Point ◽

Transcription Start ◽

Nmt1 Promoter ◽

Transcription Efficiency

Download Full-text

Characterization of the KstR-dependent promoter of the gene for the first step of the cholesterol degradative pathway in Mycobacterium smegmatis

Microbiology ◽

10.1099/mic.0.049213-0 ◽

2011 ◽

Vol 157 (9) ◽

pp. 2670-2680 ◽

Cited By ~ 19

Author(s):

Iria Uhía ◽

Beatriz Galán ◽

Francisco Javier Medrano ◽

José Luis García

Keyword(s):

Mycobacterium Smegmatis ◽

Transcription Start ◽

Promoter Regions ◽

Degradative Pathway ◽

Terminal Domain ◽

Heterologous System ◽

Highly Hydrophobic ◽

Extension Analysis

The KstR-dependent promoter of the MSMEG_5228 gene of Mycobacterium smegmatis, which encodes the 3-β-hydroxysteroid dehydrogenase (3-β HSDMS) responsible for the first step in the cholesterol degradative pathway, has been characterized. Primer extension analysis of the P5228 promoter showed that the transcription starts at the ATG codon, thus generating a leaderless mRNA lacking a 5′ untranslated region (5′UTR). Footprint analyses demonstrated experimentally that KstR specifically binds to an operator region of 31 nt containing the quasi-palindromic sequence AACTGGAACGTGTTTCAGTT, located between the −5 and −35 positions with respect to the transcription start site. This region overlaps with the −10 and −35 boxes of the P5228 promoter, suggesting that KstR represses MSMEG_5228 transcription by preventing the binding of RNA polymerase. Using a P5228 –β-galactosidase fusion we have demonstrated that KstR is able to work as a repressor in a heterologous system like Escherichia coli. A 3D model of the KstR protein revealed folding typical of TetR-type regulators, with two domains, i.e. a DNA-binding N-terminal domain and a regulator-binding C-terminal domain composed of six helices with a long tunnel-shaped hydrophobic pocket that might interact with a putative highly hydrophobic inducer. The finding that similar P5228 promoter regions have been found in all mycobacterial strains examined, with the sole exception of Mycobacterium tuberculosis, provides new clues about the role of cholesterol in the pathogenicity of this micro-organism.

Download Full-text

Binding and transcriptional activation of non-flagellar genes by the Escherichia coli flagellar master regulator FlhD2C2

Microbiology ◽

10.1099/mic.0.27879-0 ◽

2005 ◽

Vol 151 (6) ◽

pp. 1779-1788 ◽

Cited By ~ 46

Author(s):

Graham P. Stafford ◽

Tomoo Ogi ◽

Colin Hughes

Keyword(s):

Escherichia Coli ◽

Dna Sequences ◽

Transcriptional Activation ◽

Binding Sites ◽

Class Ii ◽

Promoter Regions ◽

E Coli ◽

Flagellar Genes

The gene hierarchy directing biogenesis of peritrichous flagella on the surface of Escherichia coli and other enterobacteria is controlled by the heterotetrameric master transcriptional regulator FlhD2C2. To assess the extent to which FlhD2C2 directly activates promoters of a wider regulon, a computational screen of the E. coli genome was used to search for gene-proximal DNA sequences similar to the 42–44 bp inverted repeat FlhD2C2 binding consensus. This identified the binding sequences upstream of all eight flagella class II operons, and also putative novel FlhD2C2 binding sites in the promoter regions of 39 non-flagellar genes. Nine representative non-flagellar promoter regions were all bound in vitro by active reconstituted FlhD2C2 over the K D range 38–356 nM, and of the nine corresponding chromosomal promoter–lacZ fusions, those of the four genes b1904, b2446, wzz fepE and gltI showed up to 50-fold dependence on FlhD2C2 in vivo. In comparison, four representative flagella class II promoters bound FlhD2C2 in the K D range 12–43 nM and were upregulated in vivo 30- to 990-fold. The FlhD2C2-binding sites of the four regulated non-flagellar genes overlap by 1 or 2 bp the predicted −35 motif of the FlhD2C2-activated σ 70 promoters, as is the case with FlhD2C2-dependent class II flagellar promoters. The data indicate a wider FlhD2C2 regulon, in which non-flagellar genes are bound and activated directly, albeit less strongly, by the same mechanism as that regulating the flagella gene hierarchy.

Download Full-text

Sequence-based prediction of transcription upregulation by auxin in plants

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720015400090 ◽

2015 ◽

Vol 13 (01) ◽

pp. 1540009 ◽

Cited By ~ 9

Author(s):

Petr M. Ponomarenko ◽

Mikhail P. Ponomarenko

Keyword(s):

Dna Sequences ◽

Binding Sites ◽

Growth And Development ◽

Gene Sequence ◽

Tata Binding Protein ◽

Relative Increase ◽

Tata Box ◽

Auxin Response ◽

High Importance ◽

Response Elements

Auxin is one of the main regulators of growth and development in plants. Prediction of auxin response based on gene sequence is of high importance. We found the TGTCNC consensus of 111 known natural and artificially mutated auxin response elements (AuxREs) with measured auxin-caused relative increase in genes' transcription levels, so-called either "a response to auxin" or "an auxin response." This consensus was identical to the most cited AuxRE motif. Also, we found several DNA sequence features that correlate with auxin-caused increase in genes' transcription levels, namely: number of matches with TGTCNC, homology score based on nucleotide frequencies at the consensus positions, abundances of five trinucleotides and five B-helical DNA features around these known AuxREs. We combined these correlations using a four-step empirical model of auxin response based on a gene's sequence with four steps, namely: (1) search for AuxREs with no auxin; (2) stop at the found AuxRE; (3) repression of the basal transcription of the gene having this AuxRE; and (4) manifold increase of this gene's transcription in response to auxin. Independently measured increases in transcription levels in response to auxin for 70 Arabidopsis genes were found to significantly correlate with predictions of this equation (r = 0.44, p < 0.001) as well as with TATA-binding protein (TBP)'s affinity to promoters of these genes and with nucleosome packing of these promoters (both, p < 0.025). Finally, we improved our equation for prediction of a gene's transcription increase in response to auxin by taking into account TBP-binding and nucleosome packing (r = 0.53, p < 10-6). Fisher's F-test validated the significant impact of both TBP/promoter-affinity and promoter nucleosome on auxin response in addition to those of AuxRE, F = 4.07, p < 0.025. It means that both TATA-box and nucleosome should be taken into account to recognize transcription factor binding sites upon DNA sequences: in the case of the TATA-less nucleosome-rich promoters, recognition scores must be higher than in the case of the TATA-containing nucleosome-free promoters at the same transcription activity.

Download Full-text

The GC box as a silencer

Bioscience Reports ◽

10.1007/bf01122129 ◽

1987 ◽

Vol 7 (12) ◽

pp. 955-963 ◽

Cited By ~ 11

Author(s):

Jacek M. Jankowski ◽

Gordon H. Dixon

Keyword(s):

Rainbow Trout ◽

Binding Site ◽

Tata Box ◽

Sperm Cells ◽

Promoter Regions ◽

Protein Factor ◽

Eukaryotic Genes ◽

Initiation Of Transcription ◽

Multiple Copies ◽

Gc Box

A DNA control sequenceTGGGGCGGAATGGC, or the “GC” box, has been described in the promoter regions upstream of a number of eukaryotic genes transcribed by polymerase II (for review, see Dynan, W. S. and Tjian, R., Nature316:774, 1985). The “GC” box can occur in single or multiple copies and is the binding site for a protein factor, Spl, which activates initiation of transcription. We have observed in the rainbow trout protamine gene 3′ to the TATA box, three “GC” boxes spaced at 80 bp intervals. The first is 5′ to the cap site and possesses the ability to “silence” transcription from the protamine promoter in constructs linking this promoter to the bacterial chloramphenicol acetyl transferase (CAT) coding sequence following transfection to COS-1 cells. A model is proposed to account for the silencing of the protamine gene in all tissues except developing sperm cells.

Download Full-text

The promoter regions of the T-cell receptor V9 γ (TRGV9) and V2 δ (TRDV2) genes display short direct repeats but no TATA box

FEBS Letters ◽

10.1016/0014-5793(89)81745-4 ◽

1989 ◽

Vol 256 (1-2) ◽

pp. 185-191 ◽

Cited By ~ 10

Author(s):

Piona Dariavach ◽

Marie-Paule Lefranc

Keyword(s):

T Cell ◽

T Cell Receptor ◽

Cell Receptor ◽

Tata Box ◽

Direct Repeats ◽

Promoter Regions

Download Full-text

Genetic Features of MCR-1-Producing Colistin-Resistant Escherichia coli Isolates in South Africa

Antimicrobial Agents and Chemotherapy ◽

10.1128/aac.00444-16 ◽

2016 ◽

Vol 60 (7) ◽

pp. 4394-4397 ◽

Cited By ~ 89

Author(s):

Laurent Poirel ◽

Nicolas Kieffer ◽

Adrian Brink ◽

Jennifer Coetze ◽

Aurélie Jayol ◽

...

Keyword(s):

Escherichia Coli ◽

South Africa ◽

Transcription Start Site ◽

Dna Sequences ◽

Start Site ◽

Transcription Start ◽

Content Type ◽

Promoter Sequences ◽

Genetic Features ◽

Community Patients

ABSTRACTA series of colistin-resistantEscherichia coliclinical isolates was recovered from hospitalized and community patients in South Africa. Seven clonally unrelated isolates harbored themcr-1gene located on different plasmid backbones. Two distinct plasmids were fully sequenced, and identical 2,600-bp-long DNA sequences defining amcr-1cassette were identified. Promoter sequences responsible for the expression ofmcr-1, deduced from the precise identification of the +1 transcription start site formcr-1, were characterized.

Download Full-text

An Efficient Coding Technique for Stochastic Processes

Entropy ◽

10.3390/e24010065 ◽

2021 ◽

Vol 24 (1) ◽

pp. 65

Author(s):

Jesús E. Garca ◽

Verónica A. González-López ◽

Gustavo H. Tasca ◽

Karina Y. Yaginuma

Keyword(s):

Dna Sequences ◽

Coding Theory ◽

Transition Probabilities ◽

Real Data ◽

Finite Alphabet ◽

Real Problem ◽

Huffman Code ◽

Efficient Coding ◽

Codeword Length ◽

Hand Modeling

In the framework of coding theory, under the assumption of a Markov process (Xt) on a finite alphabet A, the compressed representation of the data will be composed of a description of the model used to code the data and the encoded data. Given the model, the Huffman’s algorithm is optimal for the number of bits needed to encode the data. On the other hand, modeling (Xt) through a Partition Markov Model (PMM) promotes a reduction in the number of transition probabilities needed to define the model. This paper shows how the use of Huffman code with a PMM reduces the number of bits needed in this process. We prove the estimation of a PMM allows for estimating the entropy of (Xt), providing an estimator of the minimum expected codeword length per symbol. We show the efficiency of the new methodology on a simulation study and, through a real problem of compression of DNA sequences of SARS-CoV-2, obtaining in the real data at least a reduction of 10.4%.

Download Full-text