scholarly journals cRegions—a tool for detecting conserved cis-elements in multiple sequence alignment of diverged coding sequences

PeerJ ◽  
2019 ◽  
Vol 6 ◽  
pp. e6176 ◽  
Author(s):  
Mikk Puustusmaa ◽  
Aare Abroi

Identifying cis-acting elements and understanding regulatory mechanisms of a gene is crucial to fully understand the molecular biology of an organism. In general, it is difficult to identify previously uncharacterised cis-acting elements with an unknown consensus sequence. The task is especially problematic with viruses containing regions of limited or no similarity to other previously characterised sequences. Fortunately, the fast increase in the number of sequenced genomes allows us to detect some of these elusive cis-elements. In this work, we introduce a web-based tool called cRegions. It was developed to identify regions within a protein-coding sequence where the conservation in the amino acid sequence is caused by the conservation in the nucleotide sequence. The cRegion can be the first step in discovering novel cis-acting sequences from diverged protein-coding genes. The results can be used as a basis for future experimental analysis. We applied cRegions on the non-structural and structural polyproteins of alphaviruses as an example and successfully detected all known cis-acting elements. In this publication and in previous work, we have shown that cRegions is able to detect a wide variety of functional elements in DNA and RNA viruses. These functional elements include splice sites, stem-loops, overlapping reading frames, internal promoters, ribosome frameshifting signals and other embedded elements with yet unknown function. The cRegions web tool is available athttp://bioinfo.ut.ee/cRegions/.


Genetics ◽  
1999 ◽  
Vol 152 (3) ◽  
pp. 943-952
Author(s):  
James F Theis ◽  
Chen Yang ◽  
Christopher B Schaefer ◽  
Carol S Newlon

Abstract ARS elements of Saccharomyces cerevisiae are the cis-acting sequences required for the initiation of chromosomal DNA replication. Comparisons of the DNA sequences of unrelated ARS elements from different regions of the genome have revealed no significant DNA sequence conservation. We have compared the sequences of seven pairs of homologous ARS elements from two Saccharomyces species, S. cerevisiae and S. carlsbergensis. In all but one case, the ARS308-ARS308carl pair, significant blocks of homology were detected. In the cases of ARS305, ARS307, and ARS309, previously identified functional elements were found to be conserved in their S. carlsbergensis homologs. Mutation of the conserved sequences in the S. carlsbergensis ARS elements revealed that the homologous sequences are required for function. These observations suggested that the sequences important for ARS function would be conserved in other ARS elements. Sequence comparisons aided in the identification of the essential matches to the ARS consensus sequence (ACS) of ARS304, ARS306, and ARS310carl, though not of ARS310.



2019 ◽  
Vol 15 (01) ◽  
pp. 1-8
Author(s):  
Ashish C Patel ◽  
C G Joshi

Current data storage technologies cannot keep pace longer with exponentially growing amounts of data through the extensive use of social networking photos and media, etc. The "digital world” with 4.4 zettabytes in 2013 has predicted it to reach 44 zettabytes by 2020. From the past 30 years, scientists and researchers have been trying to develop a robust way of storing data on a medium which is dense and ever-lasting and found DNA as the most promising storage medium. Unlike existing storage devices, DNA requires no maintenance, except the need to store at a cool and dark place. DNA has a small size with high density; just 1 gram of dry DNA can store about 455 exabytes of data. DNA stores the informations using four bases, viz., A, T, G, and C, while CDs, hard disks and other devices stores the information using 0’s and 1’s on the spiral tracks. In the DNA based storage, after binarization of digital file into the binary codes, encoding and decoding are important steps in DNA based storage system. Once the digital file is encoded, the next step is to synthesize arbitrary single-strand DNA sequences and that can be stored in the deep freeze until use.When there is a need for information to be recovered, it can be done using DNA sequencing. New generation sequencing (NGS) capable of producing sequences with very high throughput at a much lower cost about less than 0.1 USD for one MB of data than the first sequencing technologies. Post-sequencing processing includes alignment of all reads using multiple sequence alignment (MSA) algorithms to obtain different consensus sequences. The consensus sequence is decoded as the reversal of the encoding process. Most prior DNA data storage efforts sequenced and decoded the entire amount of stored digital information with no random access, but nowadays it has become possible to extract selective files (e.g., retrieving only required image from a collection) from a DNA pool using PCR-based random access. Various scientists successfully stored up to 110 zettabytes data in one gram of DNA. In the future, with an efficient encoding, error corrections, cheaper DNA synthesis,and sequencing, DNA based storage will become a practical solution for storage of exponentially growing digital data.



BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Geneviève Bart ◽  
Daniel Fischer ◽  
Anatoliy Samoylenko ◽  
Artem Zhyvolozhnyi ◽  
Pavlo Stehantsev ◽  
...  

Abstract Background The human sweat is a mixture of secretions from three types of glands: eccrine, apocrine, and sebaceous. Eccrine glands open directly on the skin surface and produce high amounts of water-based fluid in response to heat, emotion, and physical activity, whereas the other glands produce oily fluids and waxy sebum. While most body fluids have been shown to contain nucleic acids, both as ribonucleoprotein complexes and associated with extracellular vesicles (EVs), these have not been investigated in sweat. In this study we aimed to explore and characterize the nucleic acids associated with sweat particles. Results We used next generation sequencing (NGS) to characterize DNA and RNA in pooled and individual samples of EV-enriched sweat collected from volunteers performing rigorous exercise. In all sequenced samples, we identified DNA originating from all human chromosomes, but only the mitochondrial chromosome was highly represented with 100% coverage. Most of the DNA mapped to unannotated regions of the human genome with some regions highly represented in all samples. Approximately 5 % of the reads were found to map to other genomes: including bacteria (83%), archaea (3%), and virus (13%), identified bacteria species were consistent with those commonly colonizing the human upper body and arm skin. Small RNA-seq from EV-enriched pooled sweat RNA resulted in 74% of the trimmed reads mapped to the human genome, with 29% corresponding to unannotated regions. Over 70% of the RNA reads mapping to an annotated region were tRNA, while misc. RNA (18,5%), protein coding RNA (5%) and miRNA (1,85%) were much less represented. RNA-seq from individually processed EV-enriched sweat collection generally resulted in fewer percentage of reads mapping to the human genome (7–45%), with 50–60% of those reads mapping to unannotated region of the genome and 30–55% being tRNAs, and lower percentage of reads being rRNA, LincRNA, misc. RNA, and protein coding RNA. Conclusions Our data demonstrates that sweat, as all other body fluids, contains a wealth of nucleic acids, including DNA and RNA of human and microbial origin, opening a possibility to investigate sweat as a source for biomarkers for specific health parameters.



1994 ◽  
Vol 14 (11) ◽  
pp. 7652-7659
Author(s):  
J F Theis ◽  
C S Newlon

ARS307 is highly active as a replication origin in its native location on chromosome III of Saccharomyces cerevisiae. Its ability to confer autonomous replication activity on plasmids requires the presence of an 11-bp autonomously replicating sequence (ARS) consensus sequence (ACS), which is also required for chromosomal origin function, as well as approximately 100 bp of sequence flanking the ACS called domain B. To further define the sequences required for ARS function, a linker substitution mutagenesis of domain B was carried out. The mutations defined two sequences, B1 and B2, that contribute to ARS activity. Therefore, like ARS1, domain B of ARS307 is composed of functional subdomains. Constructs carrying mutations in the B1 element were used to replace the chromosomal copy of ARS307. These mutations caused a reduction in chromosomal origin activity, demonstrating that the B1 element is required for efficient chromosomal origin function.



1993 ◽  
Vol 13 (1) ◽  
pp. 668-676
Author(s):  
V Lemarchandel ◽  
J Ghysdael ◽  
V Mignotte ◽  
C Rahuel ◽  
P H Roméo

The human glycoprotein IIB (GPIIB) gene is expressed only in megakaryocytes, and its promoter displays cell type specificity. We show that this specificity involved two cis-acting sequences. The first one, located at -55, contains a GATA binding site. Point mutations that abolish protein binding on this site decrease the activity of the GPIIB promoter but do not affect its tissue specificity. The second one, located at -40, contains an Ets consensus sequence, and we show that Ets-1 or Ets-2 protein can interact with this -40 GPIIB sequence. Point mutations that impair Ets binding decrease the activity of the GPIIB promoter to the same extent as do mutations that abolish GATA binding. A GPIIB 40-bp DNA fragment containing the GATA and Ets binding sites can confer activity to a heterologous promoter in megakaryocytic cells. This activity is independent of the GPIIB DNA fragment orientation, and mutations on each binding site result in decreased activity. Using cotransfection assays, we show that c-Ets-1 and human GATA1 can transactive the GPIIB promoter in HeLa cells and can act additively. Northern (RNA) blot analysis indicates that the ets-1 mRNA level is increased during megakaryocyte-induced differentiation of erythrocytic/megakaryocytic cell lines. Gel retardation assays show that the same GATA-Ets association is found in the human GPIIB enhancer and the rat platelet factor 4 promoter, the other two characterized regulatory regions of megakaryocyte-specific genes. These results indicate that GATA and Ets cis-acting sequences are an important determinant of megakaryocytic specific gene expression.



1988 ◽  
Vol 8 (10) ◽  
pp. 4009-4017 ◽  
Author(s):  
L R Coney ◽  
G S Roeder

Integration of a transposable element adjacent to a gene frequently results in an alteration in expression of the nearby gene. The purpose of our experiments was to identify cis-acting sequences within a yeast transposon (Ty) that are important for expression of the adjacent gene. The role of these sequences in Ty transcription was also analyzed in order to examine the relationship between Ty and adjacent gene expression. Three naturally occurring Ty elements located at the HIS4 locus were examined. These Ty elements differed by multiple sequence changes and had different effects on HIS4 expression. To determine which sequences were important to Ty and HIS4 expression, Ty::lacZ and Ty::HIS4::lacZ fusion genes were constructed and analyzed. Results of these experiments indicated that a sequence element is present in the Ty epsilon region that is necessary for HIS4 expression but which has only a modest effect on Ty transcription. Additionally, a mutation in the Ty promoter region decreased Ty transcription and increased HIS4 expression. The opposite effects of this mutation on Ty and adjacent gene expression were probably caused by promoter competition.



2020 ◽  
Vol 16 ◽  
pp. 117693432090373 ◽  
Author(s):  
Katherine E Noah ◽  
Jiasheng Hao ◽  
Luyan Li ◽  
Xiaoyan Sun ◽  
Brian Foley ◽  
...  

Deep phylogeny involving arthropod lineages is difficult to recover because the erosion of phylogenetic signals over time leads to unreliable multiple sequence alignment (MSA) and subsequent phylogenetic reconstruction. One way to alleviate the problem is to assemble a large number of gene sequences to compensate for the weakness in each individual gene. Such an approach has led to many robustly supported but contradictory phylogenies. A close examination shows that the supermatrix approach often suffers from two shortcomings. The first is that MSA is rarely checked for reliability and, as will be illustrated, can be poor. The second is that, to alleviate the problem of homoplasy at the third codon position of protein-coding genes due to convergent evolution of nucleotide frequencies, phylogeneticists may remove or degenerate the third codon position but may do it improperly and introduce new biases. We performed extensive reanalysis of one of such “big data” sets to highlight these two problems, and demonstrated the power and benefits of correcting or alleviating these problems. Our results support a new group with Xiphosura and Arachnopulmonata (Tetrapulmonata + Scorpiones) as sister taxa. This favors a new hypothesis in which the ancestor of Xiphosura and the extinct Eurypterida (sea scorpions, of which many later forms lived in brackish or freshwater) returned to the sea after the initial chelicerate invasion of land. Our phylogeny is supported even with the original data but processed with a new “principled” codon degeneration. We also show that removing the 1673 codon sites with both AGN and UCN codons (encoding serine) in our alignment can partially reconcile discrepancies between nucleotide-based and AA-based tree, partly because two sequences, one with AGN and the other with UCN, would be identical at the amino acid level but quite different at the nucleotide level.



1996 ◽  
Vol 16 (7) ◽  
pp. 3833-3843 ◽  
Author(s):  
A N Hennigan ◽  
A Jacobson

The determinants of mRNA stability include specific cis-acting destabilizing sequences located within mRNA coding and noncoding regions. We have developed an approach for mapping coding-region instability sequences in unstable yeast mRNAs that exploits the link between mRNA translation and turnover and the dependence of nonsense-mediated mRNA decay on the activity of the UPF1 gene product. This approach, which involves the systematic insertion of in-frame translational termination codons into the coding sequence of a gene of interest in a upf1delta strain, differs significantly from conventional methods for mapping cis-acting elements in that it causes minimal perturbations to overall mRNA structure. Using the previously characterized MATalpha1 mRNA as a model, we have accurately localized its 65-nucleotide instability element (IE) within the protein coding region. Termination of translation 5' to this element stabilized the MATalpha1 mRNA two- to threefold relative to wild-type transcripts. Translation through the element was sufficient to restore an unstable decay phenotype, while internal termination resulted in different extents of mRNA stabilization dependent on the precise location of ribosome stalling. Detailed mutagenesis of the element's rare-codon/AU-rich sequence boundary revealed that the destabilizing activity of the MATalpha1 IE is observed when the terminal codon of the element's rare-codon interval is translated. This region of stability transition corresponds precisely to a MATalpha1 IE sequence previously shown to be complementary to 18S rRNA. Deletion of three nucleotides 3' to this sequence shifted the stability boundary one codon 5' to its wild-type location. Conversely, constructs containing an additional three nucleotides at this same location shifted the transition downstream by an equivalent sequence distance. Our results suggest a model in which the triggering of MATalpha1 mRNA destabilization results from establishment of an interaction between translating ribosomes and a downstream sequence element. Furthermore, our data provide direct molecular evidence for a relationship between mRNA turnover and mRNA translation.



2010 ◽  
Vol 5 (1) ◽  
pp. 24 ◽  
Author(s):  
Darío Guerrero ◽  
Rocío Bautista ◽  
David P Villalobos ◽  
Francisco R Cantón ◽  
M Gonzalo Claros


Sign in / Sign up

Export Citation Format

Share Document