scholarly journals Inter-STOP symbol distances for the identification of coding regions

2013 ◽  
Vol 10 (3) ◽  
pp. 31-39 ◽  
Author(s):  
Carlos A. C. Bastos ◽  
Vera Afreixo ◽  
Sara P. Garcia ◽  
Armando J. Pinho

Summary In this study we explore the potential of inter-STOP symbol distances for finding coding regions in DNA sequences. We use the distance between STOP symbols in the DNA sequence and a chi-square statistic to evaluate the nonhomogeneity of the three possible reading frames and the occurrence of one long distance in one of the frames. The results of this exploratory study suggest that inter-STOP symbol distances have strong ability to discriminate coding regions in prokaryotes and simple eukaryotes.

2005 ◽  
Vol 2005 (2) ◽  
pp. 139-146 ◽  
Author(s):  
Jianbo Gao ◽  
Yan Qi ◽  
Yinhe Cao ◽  
Wen-wen Tung

Most codon indices used today are based on highly biased nonrandom usage of codons in coding regions. The background of a coding or noncoding DNA sequence, however, is fairly random, and can be characterized as a random fractal. When a gene-finding algorithm incorporates multiple sources of information about coding regions, it becomes more successful. It is thus highly desirable to develop new and efficient codon indices by simultaneously characterizing the fractal and periodic features of a DNA sequence. In this paper, we describe a novel way of achieving this goal. The efficiency of the new codon index is evaluated by studying all of the 16 yeast chromosomes. In particular, we show that the method automatically and correctly identifies which of the three reading frames is the one that contains a gene.


1983 ◽  
Vol 3 (3) ◽  
pp. 448-456 ◽  
Author(s):  
M A Schuler ◽  
P McOsker ◽  
E B Keller

DNA sequences have been determined for two actin genes which are closely linked in the genome of the sea urchin Strongylocentrotus purpuratus. The two genes have the same 5'-3' orientation; they were apparently formed originally by tandem gene duplication. The amino acids encoded by the two genes closely resemble those of cytoplasmic actins of mammals and slime molds and differ somewhat from those of mammalian muscle actin. Actin gene 1 had been tentatively identified earlier as the gene for an embryonic cytoplasmic actin by the homology of the 3' noncoding region with that of the cDNA of an embryonic actin mRNA from S. purpuratus. The DNA sequence of gene 1 shows presumptive signals for the initiation and termination of transcription which would govern the formation of a mature mRNA of 1.9 kilobases. Both actin genes 1 and 2 have introns in their coding regions at codons 121/122 and 204. These positions for actin introns have been reported so far only in the rat, not in lower organisms. The divergence of the sequences of these coding-region introns in the two actin genes is 66%, suggesting that the genes diverged about 90 million years ago. By contrast to the introns, the coding regions have been highly conserved; the amino acids of the two genes differ by only 1.3%, and the silent sites of the codons differ by only 12%.


2012 ◽  
Vol 2012 ◽  
pp. 1-6 ◽  
Author(s):  
Xingqin Qi ◽  
Edgar Fuller ◽  
Qin Wu ◽  
Cun-Quan Zhang

Sequence comparison is a primary technique for the analysis of DNA sequences. In order to make quantitative comparisons, one devises mathematical descriptors that capture the essence of the base composition and distribution of the sequence. Alignment methods and graphical techniques (where each sequence is represented by a curve in high-dimension Euclidean space) have been used popularly for a long time. In this contribution we will introduce a new nongraphical and nonalignment approach based on the frequencies of the dinucleotideXYin DNA sequences. The most important feature of this method is that it not only identifies adjacentXYpairs but also nonadjacentXYones whereXandYare separated by some number of nucleotides. This methodology preserves information in DNA sequence that is ignored by other methods. We test our method on the coding regions of exon-1 ofβ–globin for 11 species, and the utility of this new method is demonstrated.


1983 ◽  
Vol 3 (3) ◽  
pp. 448-456
Author(s):  
M A Schuler ◽  
P McOsker ◽  
E B Keller

DNA sequences have been determined for two actin genes which are closely linked in the genome of the sea urchin Strongylocentrotus purpuratus. The two genes have the same 5'-3' orientation; they were apparently formed originally by tandem gene duplication. The amino acids encoded by the two genes closely resemble those of cytoplasmic actins of mammals and slime molds and differ somewhat from those of mammalian muscle actin. Actin gene 1 had been tentatively identified earlier as the gene for an embryonic cytoplasmic actin by the homology of the 3' noncoding region with that of the cDNA of an embryonic actin mRNA from S. purpuratus. The DNA sequence of gene 1 shows presumptive signals for the initiation and termination of transcription which would govern the formation of a mature mRNA of 1.9 kilobases. Both actin genes 1 and 2 have introns in their coding regions at codons 121/122 and 204. These positions for actin introns have been reported so far only in the rat, not in lower organisms. The divergence of the sequences of these coding-region introns in the two actin genes is 66%, suggesting that the genes diverged about 90 million years ago. By contrast to the introns, the coding regions have been highly conserved; the amino acids of the two genes differ by only 1.3%, and the silent sites of the codons differ by only 12%.


Author(s):  
Barbara Trask ◽  
Susan Allen ◽  
Anne Bergmann ◽  
Mari Christensen ◽  
Anne Fertitta ◽  
...  

Using fluorescence in situ hybridization (FISH), the positions of DNA sequences can be discretely marked with a fluorescent spot. The efficiency of marking DNA sequences of the size cloned in cosmids is 90-95%, and the fluorescent spots produced after FISH are ≈0.3 μm in diameter. Sites of two sequences can be distinguished using two-color FISH. Different reporter molecules, such as biotin or digoxigenin, are incorporated into DNA sequence probes by nick translation. These reporter molecules are labeled after hybridization with different fluorochromes, e.g., FITC and Texas Red. The development of dual band pass filters (Chromatechnology) allows these fluorochromes to be photographed simultaneously without registration shift.


2019 ◽  
Vol 104 (1) ◽  
pp. 33-48 ◽  
Author(s):  
Alejandro Zuluaga ◽  
Martin Llano ◽  
Ken Cameron

The subfamily Monsteroideae (Araceae) is the third richest clade in the family, with ca. 369 described species and ca. 700 estimated. It comprises mostly hemiepiphytic or epiphytic plants restricted to the tropics, with three intercontinental disjunctions. Using a dataset representing all 12 genera in Monsteroideae (126 taxa), and five plastid and two nuclear markers, we studied the systematics and historical biogeography of the group. We found high support for the monophyly of the three major clades (Spathiphylleae sister to Heteropsis Kunth and Rhaphidophora Hassk. clades), and for six of the genera within Monsteroideae. However, we found low rates of variation in the DNA sequences used and a lack of molecular markers suitable for species-level phylogenies in the group. We also performed ancestral state reconstruction of some morphological characters traditionally used for genera delimitation. Only seed shape and size, number of seeds, number of locules, and presence of endosperm showed utility in the classification of genera in Monsteroideae. We estimated ancestral ranges using a dispersal-extinction-cladogenesis model as implemented in the R package BioGeoBEARS and found evidence for a Gondwanan origin of the clade. One tropical disjunction (Monstera Adans. sister to Amydrium Schott–Epipremnum Schott) was found to be the product of a previous Boreotropical distribution. Two other disjunctions are more recent and likely due to long-distance dispersal: Spathiphyllum Schott (with Holochlamys Engl. nested within) represents a dispersal from South America to the Pacific Islands in Southeast Asia, and Rhaphidophora represents a dispersal from Asia to Africa. Future studies based on stronger phylogenetic reconstructions and complete morphological datasets are needed to explore the details of speciation and migration within and among areas in Asia.


2013 ◽  
Vol 41 (2) ◽  
pp. 548-553 ◽  
Author(s):  
Andrew A. Travers ◽  
Georgi Muskhelishvili

How much information is encoded in the DNA sequence of an organism? We argue that the informational, mechanical and topological properties of DNA are interdependent and act together to specify the primary characteristics of genetic organization and chromatin structures. Superhelicity generated in vivo, in part by the action of DNA translocases, can be transmitted to topologically sensitive regions encoded by less stable DNA sequences.


2021 ◽  
Vol 15 (1) ◽  
Author(s):  
Anastasios A. Tsonis ◽  
Geli Wang ◽  
Lvyi Zhang ◽  
Wenxu Lu ◽  
Aristotle Kayafas ◽  
...  

Abstract Background Mathematical approaches have been for decades used to probe the structure of DNA sequences. This has led to the development of Bioinformatics. In this exploratory work, a novel mathematical method is applied to probe the DNA structure of two related viral families: those of coronaviruses and those of influenza viruses. The coronaviruses are SARS-CoV-2, SARS-CoV-1, and MERS. The influenza viruses include H1N1-1918, H1N1-2009, H2N2-1957, and H3N2-1968. Methods The mathematical method used is the slow feature analysis (SFA), a rather new but promising method to delineate complex structure in DNA sequences. Results The analysis indicates that the DNA sequences exhibit an elaborate and convoluted structure akin to complex networks. We define a measure of complexity and show that each DNA sequence exhibits a certain degree of complexity within itself, while at the same time there exists complex inter-relationships between the sequences within a family and between the two families. From these relationships, we find evidence, especially for the coronavirus family, that increasing complexity in a sequence is associated with higher transmission rate but with lower mortality. Conclusions The complexity measure defined here may hold a promise and could become a useful tool in the prediction of transmission and mortality rates in future new viral strains.


1999 ◽  
Vol 341 (1) ◽  
pp. 89-93 ◽  
Author(s):  
Gianluca TELL ◽  
Lucia PELLIZZARI ◽  
Gennaro ESPOSITO ◽  
Carlo PUCILLO ◽  
Paolo Emidio MACCHIA ◽  
...  

Pax proteins are transcriptional regulators that play important roles during embryogenesis. These proteins recognize specific DNA sequences via a conserved element: the paired domain (Prd domain). The low level of organized secondary structure, in the free state, is a general feature of Prd domains; however, these proteins undergo a dramatic gain in α-helical content upon interaction with DNA (‘induced fit’). Pax8 is expressed in the developing thyroid, kidney and several areas of the central nervous system. In humans, mutations of the Pax8 gene, which are mapped to the coding region of the Prd domain, give rise to congenital hypothyroidism. Here, we have investigated the molecular defects caused by a mutation in which leucine at position 62 is substituted for an arginine. Leu62 is conserved among Prd domains, and contributes towards the packing together of helices 1 and 3. The binding affinity of the Leu62Arg mutant for a specific DNA sequence (the C sequence of thyroglobulin promoter) is decreased 60-fold with respect to the wild-type Pax8 Prd domain. However, the affinities with which the wild-type and the mutant proteins bind to a non-specific DNA sequence are very similar. CD spectra demonstrate that, in the absence of DNA, both wild-type Pax8 and the Leu62Arg mutant possess a low α-helical content; however, in the Leu62Arg mutant, the gain in α-helical content upon interaction with DNA is greatly reduced with respect to the wild-type protein. Thus the molecular defect of the Leu62Arg mutant causes a reduced capability for induced fit upon DNA interaction.


1985 ◽  
Vol 5 (4) ◽  
pp. 619-627
Author(s):  
M Montoya-Zavala ◽  
J L Hamlin

We have isolated overlapping recombinant cosmids that represent 150 kilobases of contiguous DNA sequence from the amplified dihydrofolate reductase domain of a methotrexate-resistant Chinese hamster ovary cell line (CHOC 400). This sequence includes the 25-kilobase dihydrofolate reductase gene and an origin of DNA synthesis. Eight cosmids that span this domain have been utilized as radioactive hybridization probes to analyze the similarities among the dihydrofolate reductase amplicons in four independently derived methotrexate-resistant Chinese hamster cell lines. We have observed no significant differences among the four cell lines within the 150-kilobase DNA sequence that we have examined, except for polymorphisms that result from the amplification of one or the other of two possible alleles of the dihydrofolate reductase domain. We also show that the restriction patterns of the amplicons in these four resistant cell lines are virtually identical to that of the corresponding, unamplified sequence in drug-susceptible parental cells. Furthermore, measurements of the relative copy numbers of fragments from widely separated regions of the amplicon suggest that all fragments in this 150-kilobase region may be amplified in unison. Our data show that in methotrexate-resistant Chinese hamster cells, the amplified unit is large relative to the dihydrofolate reductase gene itself. Furthermore, within the 150-kilobase amplified consensus sequence that we have examined, significant rearrangements do not seem to occur during the amplification process.


Sign in / Sign up

Export Citation Format

Share Document