The disconnect between DNA and species names: lessons from reptile species in the NCBI taxonomy database

Zootaxa ◽  
2019 ◽  
Vol 4706 (3) ◽  
pp. 401-407 ◽  
Author(s):  
AKHIL GARG ◽  
DETLEF LEIPE ◽  
PETER UETZ

We compared the species names in the Reptile Database, a dedicated taxonomy database, with those in the NCBI taxonomy database, which provides the taxonomic backbone for the GenBank sequence database. About 67% of the known ~11,000 reptile species are represented with at least one DNA sequence and a binary species name in GenBank. However, a common problem arises through the submission of preliminary species names (such as “Pelomedusa sp. A CK-2014”) to GenBank and thus the NCBI taxonomy. These names cannot be assigned to any accepted species names and thus create a disconnect between DNA sequences and species. While these names of unknown taxonomic meaning sometimes get updated, often they remain in GenBank which now contains sequences from ~1,300 such “putative” reptile species tagged by informal names (~15% of its reptile names). We estimate that NCBI/GenBank probably contain tens of thousands of such “disconnected” entries. We encourage sequence submitters to update informal species names after they have been published, otherwise the disconnect will cause increasing confusion and possibly misleading taxonomic conclusions.

Author(s):  
Barbara Trask ◽  
Susan Allen ◽  
Anne Bergmann ◽  
Mari Christensen ◽  
Anne Fertitta ◽  
...  

Using fluorescence in situ hybridization (FISH), the positions of DNA sequences can be discretely marked with a fluorescent spot. The efficiency of marking DNA sequences of the size cloned in cosmids is 90-95%, and the fluorescent spots produced after FISH are ≈0.3 μm in diameter. Sites of two sequences can be distinguished using two-color FISH. Different reporter molecules, such as biotin or digoxigenin, are incorporated into DNA sequence probes by nick translation. These reporter molecules are labeled after hybridization with different fluorochromes, e.g., FITC and Texas Red. The development of dual band pass filters (Chromatechnology) allows these fluorochromes to be photographed simultaneously without registration shift.


2013 ◽  
Vol 41 (2) ◽  
pp. 548-553 ◽  
Author(s):  
Andrew A. Travers ◽  
Georgi Muskhelishvili

How much information is encoded in the DNA sequence of an organism? We argue that the informational, mechanical and topological properties of DNA are interdependent and act together to specify the primary characteristics of genetic organization and chromatin structures. Superhelicity generated in vivo, in part by the action of DNA translocases, can be transmitted to topologically sensitive regions encoded by less stable DNA sequences.


2021 ◽  
Vol 15 (1) ◽  
Author(s):  
Anastasios A. Tsonis ◽  
Geli Wang ◽  
Lvyi Zhang ◽  
Wenxu Lu ◽  
Aristotle Kayafas ◽  
...  

Abstract Background Mathematical approaches have been for decades used to probe the structure of DNA sequences. This has led to the development of Bioinformatics. In this exploratory work, a novel mathematical method is applied to probe the DNA structure of two related viral families: those of coronaviruses and those of influenza viruses. The coronaviruses are SARS-CoV-2, SARS-CoV-1, and MERS. The influenza viruses include H1N1-1918, H1N1-2009, H2N2-1957, and H3N2-1968. Methods The mathematical method used is the slow feature analysis (SFA), a rather new but promising method to delineate complex structure in DNA sequences. Results The analysis indicates that the DNA sequences exhibit an elaborate and convoluted structure akin to complex networks. We define a measure of complexity and show that each DNA sequence exhibits a certain degree of complexity within itself, while at the same time there exists complex inter-relationships between the sequences within a family and between the two families. From these relationships, we find evidence, especially for the coronavirus family, that increasing complexity in a sequence is associated with higher transmission rate but with lower mortality. Conclusions The complexity measure defined here may hold a promise and could become a useful tool in the prediction of transmission and mortality rates in future new viral strains.


Database ◽  
2020 ◽  
Vol 2020 ◽  
Author(s):  
Conrad L Schoch ◽  
Stacy Ciufo ◽  
Mikhail Domrachev ◽  
Carol L Hotton ◽  
Sivakumar Kannan ◽  
...  

Abstract The National Center for Biotechnology Information (NCBI) Taxonomy includes organism names and classifications for every sequence in the nucleotide and protein sequence databases of the International Nucleotide Sequence Database Collaboration. Since the last review of this resource in 2012, it has undergone several improvements. Most notable is the shift from a single SQL database to a series of linked databases tied to a framework of data called NameBank. This means that relations among data elements can be adjusted in more detail, resulting in expanded annotation of synonyms, the ability to flag names with specific nomenclatural properties, enhanced tracking of publications tied to names and improved annotation of scientific authorities and types. Additionally, practices utilized by NCBI Taxonomy curators specific to major taxonomic groups are described, terms peculiar to NCBI Taxonomy are explained, external resources are acknowledged and updates to tools and other resources are documented. Database URL: https://www.ncbi.nlm.nih.gov/taxonomy


1999 ◽  
Vol 341 (1) ◽  
pp. 89-93 ◽  
Author(s):  
Gianluca TELL ◽  
Lucia PELLIZZARI ◽  
Gennaro ESPOSITO ◽  
Carlo PUCILLO ◽  
Paolo Emidio MACCHIA ◽  
...  

Pax proteins are transcriptional regulators that play important roles during embryogenesis. These proteins recognize specific DNA sequences via a conserved element: the paired domain (Prd domain). The low level of organized secondary structure, in the free state, is a general feature of Prd domains; however, these proteins undergo a dramatic gain in α-helical content upon interaction with DNA (‘induced fit’). Pax8 is expressed in the developing thyroid, kidney and several areas of the central nervous system. In humans, mutations of the Pax8 gene, which are mapped to the coding region of the Prd domain, give rise to congenital hypothyroidism. Here, we have investigated the molecular defects caused by a mutation in which leucine at position 62 is substituted for an arginine. Leu62 is conserved among Prd domains, and contributes towards the packing together of helices 1 and 3. The binding affinity of the Leu62Arg mutant for a specific DNA sequence (the C sequence of thyroglobulin promoter) is decreased 60-fold with respect to the wild-type Pax8 Prd domain. However, the affinities with which the wild-type and the mutant proteins bind to a non-specific DNA sequence are very similar. CD spectra demonstrate that, in the absence of DNA, both wild-type Pax8 and the Leu62Arg mutant possess a low α-helical content; however, in the Leu62Arg mutant, the gain in α-helical content upon interaction with DNA is greatly reduced with respect to the wild-type protein. Thus the molecular defect of the Leu62Arg mutant causes a reduced capability for induced fit upon DNA interaction.


1985 ◽  
Vol 5 (4) ◽  
pp. 619-627
Author(s):  
M Montoya-Zavala ◽  
J L Hamlin

We have isolated overlapping recombinant cosmids that represent 150 kilobases of contiguous DNA sequence from the amplified dihydrofolate reductase domain of a methotrexate-resistant Chinese hamster ovary cell line (CHOC 400). This sequence includes the 25-kilobase dihydrofolate reductase gene and an origin of DNA synthesis. Eight cosmids that span this domain have been utilized as radioactive hybridization probes to analyze the similarities among the dihydrofolate reductase amplicons in four independently derived methotrexate-resistant Chinese hamster cell lines. We have observed no significant differences among the four cell lines within the 150-kilobase DNA sequence that we have examined, except for polymorphisms that result from the amplification of one or the other of two possible alleles of the dihydrofolate reductase domain. We also show that the restriction patterns of the amplicons in these four resistant cell lines are virtually identical to that of the corresponding, unamplified sequence in drug-susceptible parental cells. Furthermore, measurements of the relative copy numbers of fragments from widely separated regions of the amplicon suggest that all fragments in this 150-kilobase region may be amplified in unison. Our data show that in methotrexate-resistant Chinese hamster cells, the amplified unit is large relative to the dihydrofolate reductase gene itself. Furthermore, within the 150-kilobase amplified consensus sequence that we have examined, significant rearrangements do not seem to occur during the amplification process.


2014 ◽  
Vol 8 (1) ◽  
pp. 166-170 ◽  
Author(s):  
Jia Wang ◽  
Shuai Liu ◽  
Weina Fu

The formation and precise positioning of nucleosome in chromatin occupies a very important role in studying life process. Today, there are many researchers who discovered that the positioning where the location of a DNA sequence fragment wraps around a histone octamer in genome is not random but regular. However, the positioning is closely relevant to the concrete sequence of core DNA. So in this paper, we analyzed the relation between the affinity and sequence structure of core DNA, and extracted the set of key positions. In these positions, the nucleotide sequences probably occupy mainly action in the binding. First, we simplified and formatted the experimental data with the affinity. Then, to find the key positions in the wrapping, we used neural network to analyze the positive and negative effects of nucleosome generation for each position in core DNA sequences. However, we reached a class of weights with every position to describe this effect. Finally, based on the positions with high weights, we analyzed the reason why the chosen positions are key positions, and used these positions to construct a model for nucleosome positioning prediction. Experimental results show the effectiveness of our method.


Genetics ◽  
1993 ◽  
Vol 134 (4) ◽  
pp. 1195-1204
Author(s):  
S Tarès ◽  
J M Cornuet ◽  
P Abad

Abstract An AluI family of highly reiterated nontranscribed sequences has been found in the genome of the honeybee Apis mellifera. This repeated sequence is shown to be present at approximately 23,000 copies per haploid genome constituting about 2% of the total genomic DNA. The nucleotide sequence of 10 monomers was determined. The consensus sequences is 176 nucleotides long and has an A + T content of 58%. There are clusters of both direct and inverted repeats. Internal subrepeating units ranging from 11 to 17 nucleotides are observed, suggesting that it could have evolved from a shorter sequence. DNA sequence data reveal that this repeat class is unusually homogeneous compared to the other class of invertebrate highly reiterated DNA sequences. The average pairwise sequence divergence between the repeats is 2.5%. In spite of this unusual homogeneity, divergence has been found in the repeated sequence hybridization ladder between four different honeybee subspecies. Therefore, the AluI highly reiterated sequences provide a new probe for fingerprinting in A. m. mellifera.


2019 ◽  
Author(s):  
O. Ordu ◽  
A. Lusser ◽  
N. H. Dekker

ABSTRACTEukaryotic genomes are hierarchically organized into protein-DNA assemblies for compaction into the nucleus. Nucleosomes, with the (H3-H4)2 tetrasome as a likely intermediate, are highly dynamic in nature by way of several different mechanisms. We have recently shown that tetrasomes spontaneously change the direction of their DNA wrapping between left- and right-handed conformations, which may prevent torque build-up in chromatin during active transcription or replication. DNA sequence has been shown to strongly affect nucleosome positioning throughout chromatin. It is not known, however, whether DNA sequence also impacts the dynamic properties of tetrasomes. To address this question, we examined tetrasomes assembled on a high-affinity DNA sequence using freely orbiting magnetic tweezers. In this context, we also studied the effects of mono- and divalent salts on the flipping dynamics. We found that neither DNA sequence nor altered buffer conditions affect overall tetrasome structure. In contrast, tetrasomes bound to high-affinity DNA sequences showed significantly altered flipping kinetics, predominantly via a reduction in the lifetime of the canonical state of left-handed wrapping. Increased mono- and divalent salt concentrations counteracted this behaviour. Thus, our study indicates that high-affinity DNA sequences impact not only the positioning of the nucleosome, but that they also endow the subnucleosomal tetrasome with enhanced conformational plasticity. This may provide a means to prevent histone loss upon exposure to torsional stress, thereby contributing to the integrity of chromatin at high-affinity sites.STATEMENT OF SIGNIFICANCECanonical (H3-H4)2 tetrasomes possess high conformational flexibility, as evidenced by their spontaneous flipping between states of left- and right-handed DNA wrapping. Here, we show that these conformational dynamics of tetrasomes cannot be described by a fixed set of rates over all conditions. Instead, an accurate description of their behavior must take into account details of their loading, in particular the underlying DNA sequence. In vivo, differences in tetrasome flexibility could be regulated by modifications of the histone core or the tetrasomal DNA, and as such constitute an intriguing, potentially adjustable mechanism for chromatin to accommodate the torsional stress generated by processes such as transcription and replication.


2005 ◽  
Vol 51 (12) ◽  
pp. 1045-1055 ◽  
Author(s):  
Zhen-Xiang Lu ◽  
André Laroche ◽  
Hung Chang Huang

Degenerate PCR primers corresponding to conserved domains of fungal chitinases were designed, and PCR was performed on genomic DNA of the entomogenous fungus Verticillium lecanii (Zimmermann) Viegas. Two distinct PCR fragments, chf1 and chf2, were isolated and used to identify two DNA contigs. Analyses of these two contigs revealed that we had obtained the full-length DNA sequence including the promoter, 5′ untranslated region, open reading frame (ORF), and 3′ untranslated regions for two distinct chitinase-like genes. These two genomic DNA sequences exhibited 51% identity at the amino acid (aa) level and were designed as acidic (chi1) and basic (chi2) chitinase-like genes. The isolated cDNA for chi1 gene is 1110 bp with a predicted protein of 370 aa and molecular mass of 40.93 kDa, and its ORF was uninterrupted in its corresponding genomic DNA sequence. The cDNA for the chi2 gene is 1269 bp, a predicted ORF of 423 aa and molecular mass of 45.95 kDa. In contrast, the ORF was interrupted by three introns in its corresponding genomic DNA. The basic chitinase gene (chi2) was successfully expressed in the Pichia pastoris system; optimum enzymatic activity was observed at 22 °C and at pH 7.5. CHI1 and CHI2 were clustered into two different phylogenetic groups according to their sequence alignments with 28 other fungal chitinases. A chitin-binding domain, comprising two sub-domains that exhibit similarities at the aa level to chitin binding domains in bacteria, was identified in 30 fungal chitinase sequences examined.Key words: fungus, chitin, cloning, sequencing, transformation, Pichia sp. expression.


Sign in / Sign up

Export Citation Format

Share Document