scholarly journals Pattern recognition in the case of strong background noise

2001 ◽  
Author(s):  
◽  
Xingmei Wang

This study presents a development of a method for recognition of a class of patterns in signals contaminated by strong noise. The class of signals considered is described by a finite alphabet. The target class of patterns is assumed to have specific statistical properties that can be conveniently captured by the position weight matrices (PWM) description. Itis also assumed thatthe 'signals: contain numerous patterns si~ilar to the patterns of the target class, but which belong to different classes. These other patterns represent the noise in the signals. The method for-improved recogrrition of the target class of patterns is based on clustering of the target motifs with regard to distance form the reference point (event) in the signal. This positional clustering enables more precise description of the target class of patterns by means of the PWMs. However, it requires the use of as many PWMs as there are clusters of the target class. The method developed is of general nature, applicable to the situations described. It is however, applied to the recognition of the specific short motifs in DNA sequences. The short motif considered is the TATA-box,one of the most important docking sites for proteins in Eukaryotic polymerase II promoter regions. The reference point in the singals obtained form DNA sequences the transcription .start site (TSS). Thus the positional dustering of the TATA-box motif resulted in 20 different PWMs, instead of only one that describes the whole TATA motif class. This however, resulted in more discriminative PWMs and the recognition accuracy has increased by about a factor of two when compared to the recognition of the TATA moti f based on the original PWM.

2007 ◽  
Vol 82 (2) ◽  
pp. 849-858 ◽  
Author(s):  
Hiroki Isomura ◽  
Mark F. Stinski ◽  
Ayumi Kudoh ◽  
Sanae Nakayama ◽  
Takayuki Murata ◽  
...  

ABSTRACT The promoter of the major immediate-early (MIE) genes of human cytomegalovirus (HCMV), also referred to as the CMV promoter, possesses a cis-acting element positioned downstream of the TATA box between positions −14 and −1 relative to the transcription start site (+1). We determined the role of the cis-acting element in viral replication by comparing recombinant viruses with the cis-acting element replaced with other sequences. Recombinant virus with the simian CMV counterpart replicated efficiently in human foreskin fibroblasts, as well as wild-type virus. In contrast, replacement with the murine CMV counterpart caused inefficient MIE gene transcription, RNA splicing, MIE and early viral gene expression, and viral DNA replication. To determine which nucleotides in the cis-acting element are required for efficient MIE gene transcription and splicing, we constructed mutations within the cis-acting element in the context of a recombinant virus. While mutations in the cis-acting element have only a minor effect on in vitro transcription, the effects on viral replication are major. The nucleotides at −10 and −9 in the cis-acting element relative to the transcription start site (+1) affect efficient MIE gene transcription and splicing at early times after infection. The cis-acting element also acts as a cis-repression sequence when the viral IE86 protein accumulates in the infected cell. We demonstrate that the cis-acting element has an essential role in viral replication.


Microbiology ◽  
2011 ◽  
Vol 157 (9) ◽  
pp. 2670-2680 ◽  
Author(s):  
Iria Uhía ◽  
Beatriz Galán ◽  
Francisco Javier Medrano ◽  
José Luis García

The KstR-dependent promoter of the MSMEG_5228 gene of Mycobacterium smegmatis, which encodes the 3-β-hydroxysteroid dehydrogenase (3-β HSDMS) responsible for the first step in the cholesterol degradative pathway, has been characterized. Primer extension analysis of the P5228 promoter showed that the transcription starts at the ATG codon, thus generating a leaderless mRNA lacking a 5′ untranslated region (5′UTR). Footprint analyses demonstrated experimentally that KstR specifically binds to an operator region of 31 nt containing the quasi-palindromic sequence AACTGGAACGTGTTTCAGTT, located between the −5 and −35 positions with respect to the transcription start site. This region overlaps with the −10 and −35 boxes of the P5228 promoter, suggesting that KstR represses MSMEG_5228 transcription by preventing the binding of RNA polymerase. Using a P5228 –β-galactosidase fusion we have demonstrated that KstR is able to work as a repressor in a heterologous system like Escherichia coli. A 3D model of the KstR protein revealed folding typical of TetR-type regulators, with two domains, i.e. a DNA-binding N-terminal domain and a regulator-binding C-terminal domain composed of six helices with a long tunnel-shaped hydrophobic pocket that might interact with a putative highly hydrophobic inducer. The finding that similar P5228 promoter regions have been found in all mycobacterial strains examined, with the sole exception of Mycobacterium tuberculosis, provides new clues about the role of cholesterol in the pathogenicity of this micro-organism.


Microbiology ◽  
2005 ◽  
Vol 151 (6) ◽  
pp. 1779-1788 ◽  
Author(s):  
Graham P. Stafford ◽  
Tomoo Ogi ◽  
Colin Hughes

The gene hierarchy directing biogenesis of peritrichous flagella on the surface of Escherichia coli and other enterobacteria is controlled by the heterotetrameric master transcriptional regulator FlhD2C2. To assess the extent to which FlhD2C2 directly activates promoters of a wider regulon, a computational screen of the E. coli genome was used to search for gene-proximal DNA sequences similar to the 42–44 bp inverted repeat FlhD2C2 binding consensus. This identified the binding sequences upstream of all eight flagella class II operons, and also putative novel FlhD2C2 binding sites in the promoter regions of 39 non-flagellar genes. Nine representative non-flagellar promoter regions were all bound in vitro by active reconstituted FlhD2C2 over the K D range 38–356 nM, and of the nine corresponding chromosomal promoter–lacZ fusions, those of the four genes b1904, b2446, wzz fepE and gltI showed up to 50-fold dependence on FlhD2C2 in vivo. In comparison, four representative flagella class II promoters bound FlhD2C2 in the K D range 12–43 nM and were upregulated in vivo 30- to 990-fold. The FlhD2C2-binding sites of the four regulated non-flagellar genes overlap by 1 or 2 bp the predicted −35 motif of the FlhD2C2-activated σ 70 promoters, as is the case with FlhD2C2-dependent class II flagellar promoters. The data indicate a wider FlhD2C2 regulon, in which non-flagellar genes are bound and activated directly, albeit less strongly, by the same mechanism as that regulating the flagella gene hierarchy.


2015 ◽  
Vol 13 (01) ◽  
pp. 1540009 ◽  
Author(s):  
Petr M. Ponomarenko ◽  
Mikhail P. Ponomarenko

Auxin is one of the main regulators of growth and development in plants. Prediction of auxin response based on gene sequence is of high importance. We found the TGTCNC consensus of 111 known natural and artificially mutated auxin response elements (AuxREs) with measured auxin-caused relative increase in genes' transcription levels, so-called either "a response to auxin" or "an auxin response." This consensus was identical to the most cited AuxRE motif. Also, we found several DNA sequence features that correlate with auxin-caused increase in genes' transcription levels, namely: number of matches with TGTCNC, homology score based on nucleotide frequencies at the consensus positions, abundances of five trinucleotides and five B-helical DNA features around these known AuxREs. We combined these correlations using a four-step empirical model of auxin response based on a gene's sequence with four steps, namely: (1) search for AuxREs with no auxin; (2) stop at the found AuxRE; (3) repression of the basal transcription of the gene having this AuxRE; and (4) manifold increase of this gene's transcription in response to auxin. Independently measured increases in transcription levels in response to auxin for 70 Arabidopsis genes were found to significantly correlate with predictions of this equation (r = 0.44, p < 0.001) as well as with TATA-binding protein (TBP)'s affinity to promoters of these genes and with nucleosome packing of these promoters (both, p < 0.025). Finally, we improved our equation for prediction of a gene's transcription increase in response to auxin by taking into account TBP-binding and nucleosome packing (r = 0.53, p < 10-6). Fisher's F-test validated the significant impact of both TBP/promoter-affinity and promoter nucleosome on auxin response in addition to those of AuxRE, F = 4.07, p < 0.025. It means that both TATA-box and nucleosome should be taken into account to recognize transcription factor binding sites upon DNA sequences: in the case of the TATA-less nucleosome-rich promoters, recognition scores must be higher than in the case of the TATA-containing nucleosome-free promoters at the same transcription activity.


1987 ◽  
Vol 7 (12) ◽  
pp. 955-963 ◽  
Author(s):  
Jacek M. Jankowski ◽  
Gordon H. Dixon

A DNA control sequenceTGGGGCGGAATGGC, or the “GC” box, has been described in the promoter regions upstream of a number of eukaryotic genes transcribed by polymerase II (for review, see Dynan, W. S. and Tjian, R., Nature316:774, 1985). The “GC” box can occur in single or multiple copies and is the binding site for a protein factor, Spl, which activates initiation of transcription. We have observed in the rainbow trout protamine gene 3′ to the TATA box, three “GC” boxes spaced at 80 bp intervals. The first is 5′ to the cap site and possesses the ability to “silence” transcription from the protamine promoter in constructs linking this promoter to the bacterial chloramphenicol acetyl transferase (CAT) coding sequence following transfection to COS-1 cells. A model is proposed to account for the silencing of the protamine gene in all tissues except developing sperm cells.


2016 ◽  
Vol 60 (7) ◽  
pp. 4394-4397 ◽  
Author(s):  
Laurent Poirel ◽  
Nicolas Kieffer ◽  
Adrian Brink ◽  
Jennifer Coetze ◽  
Aurélie Jayol ◽  
...  

ABSTRACTA series of colistin-resistantEscherichia coliclinical isolates was recovered from hospitalized and community patients in South Africa. Seven clonally unrelated isolates harbored themcr-1gene located on different plasmid backbones. Two distinct plasmids were fully sequenced, and identical 2,600-bp-long DNA sequences defining amcr-1cassette were identified. Promoter sequences responsible for the expression ofmcr-1, deduced from the precise identification of the +1 transcription start site formcr-1, were characterized.


Entropy ◽  
2021 ◽  
Vol 24 (1) ◽  
pp. 65
Author(s):  
Jesús E. Garca ◽  
Verónica A. González-López ◽  
Gustavo H. Tasca ◽  
Karina Y. Yaginuma

In the framework of coding theory, under the assumption of a Markov process (Xt) on a finite alphabet A, the compressed representation of the data will be composed of a description of the model used to code the data and the encoded data. Given the model, the Huffman’s algorithm is optimal for the number of bits needed to encode the data. On the other hand, modeling (Xt) through a Partition Markov Model (PMM) promotes a reduction in the number of transition probabilities needed to define the model. This paper shows how the use of Huffman code with a PMM reduces the number of bits needed in this process. We prove the estimation of a PMM allows for estimating the entropy of (Xt), providing an estimator of the minimum expected codeword length per symbol. We show the efficiency of the new methodology on a simulation study and, through a real problem of compression of DNA sequences of SARS-CoV-2, obtaining in the real data at least a reduction of 10.4%.


Sign in / Sign up

Export Citation Format

Share Document