2.1 Variation in DNA sequence length

Keyword(s):  
2018 ◽  
Vol 2018 ◽  
pp. 1-13 ◽  
Author(s):  
Suk-Hwan Lee

A large number of studies have examined DNA storage to achieve information hiding in DNA sequences with DNA computing technology. However, most data hiding methods are irreversible in that the original DNA sequence cannot be recovered from the watermarked DNA sequence. This study presents reversible data hiding methods based on multilevel histogram shifting to prevent biological mutations, preserve sequence length, increase watermark capacity, and facilitate blind detection/recovery. The main features of our method are as follows. First, we encode a sequence of nucleotide bases with four-character symbols into integer values using the numeric order. Second, we embed multiple bits in each integer value by multilevel histogram shifting of noncircular type (NHS) and circular type (CHS). Third, we prevent the generation of false start/stop codons by verifying whether a start/stop codon is included in an integer value or between adjacent integer values. The results of our experiments confirmed that the NHS- and CHS-based methods have higher watermark capacities than conventional methods in terms of supplementary data used for decoding. Moreover, unlike conventional methods, our methods do not generate false start/stop codons.


2020 ◽  
Author(s):  
Yifei Yan ◽  
Ansley Gnanapragasam ◽  
Swneke Bailey

ABSTRACTMotivationChromatin immuno-precipitation sequencing (ChIP-Seq) of histone post-translational modifications coupled with de novo motif elucidation and enrichment analyses can identify transcription factors responsible for orchestrating transitions between cell-and disease-states. However, the identified regulatory elements can span several kilobases (kb) in length, which complicates motif-based analyses. Restricting the length of the target DNA sequence(s) can reduce false positives. Therefore, we present HisTrader, a computational tool to identify the regions accessible to transcription factors, nucleosome free regions (NFRs), within histone modification peaks to reduce the DNA sequence length required for motif analyses.ResultsHisTrader accurately identifies NFRs from H3K27Ac ChIP-seq profiles of the lung cancer cell line A549, which are validated by the presence of DNaseI hypersensitivity. In addition, HisTrader reveals that multiple NFRs are common within individual regulatory elements; an easily overlooked feature that should be considered to improve sensitivity of motif analyses using histone modification ChIP-seq data.Availability and implementationThe HisTrader script is open-source and available on GitHub (https://github.com/SvenBaileyLab/Histrader) under a GNU general public license (GPLv3). HisTrader is written in PERL and can be run on any platform with PERL installed.


2018 ◽  
Author(s):  
Eugenia Zarza ◽  
Robert B. O’Hara ◽  
Annette Klussmann-Kolb ◽  
Markus Pfenninger

AbstractOne of the major problems in evolutionary biology is to elucidate the relationships between historical events and the tempo and mode of lineage divergence. The development of relaxed molecular clock models and the increasing availability of DNA sequences resulted in more accurate estimations of taxa divergence times. However, finding the link between competing historical events and divergence is still challenging. Here we investigate assigning constrained-age priors to nodes of interest in a time-calibrated phylogeny as a means of hypothesis comparison. These priors are equivalent to historic scenarios for lineage origin. The hypothesis that best explains the data can be selected by comparing the likelihood values of the competing hypotheses, modelled with different priors. A simulation approach was taken to evaluate the performance of the prior-based method and to compare it with an unconstrained approach. We explored the effect of DNA sequence length and the temporal placement and span of competing hypotheses (i.e. historic scenarios) on selection of the correct hypothesis and the strength of the inference. Competing hypotheses were compared applying a posterior simulation analogue of the Akaike Information Criterion and Bayes factors (obtained after calculation of the marginal likelihood with three estimators: Harmonic Mean, Stepping Stone and Path Sampling). We illustrate the potential application of the prior-based method on an empirical data set to compare competing geological hypotheses explaining the biogeographic patterns in Pleurodeles newts. The correct hypothesis was selected on average 89% times. The best performance was observed with DNA sequence length of 3500-10000 bp. The prior-based method is most reliable when the hypotheses compared are not temporally too close. The strongest inferences were obtained when using the Stepping Stone and Path Sampling estimators. The prior-based approach proved effective in discriminating between competing hypotheses when used on empirical data. The unconstrained analyses performed well but it probably requires additional computational effort. Researchers applying this approach should rely only on inferences with moderate to strong support. The prior-based approach could be applied on biogeographical and phylogeographical studies where robust methods for historical inferences are still lacking.


2004 ◽  
Vol 02 (01) ◽  
pp. 47-60 ◽  
Author(s):  
S. LIANG ◽  
M. P. SAMANTA ◽  
B. A. BIEGEL

The cWINNOWER algorithm detects fuzzy motifs in DNA sequences rich in proteinbinding signals. A signal is defined as any short nucleotide pattern having up to d mutations differing from a motif of length l. The algorithm finds such motifs if a clique consisting of a suffciently large number of mutated copies of the motif (i.e., the signals) is present in the DNA sequence. The cWINNOWER algorithm substantially improves the sensitivity of the winnower method of Pevzner and Sze by imposing a consensus constraint, enabling it to detect much weaker signals. We studied the minimum detectable clique size qc as a function of sequence length N for random sequences. We found that qc increases linearly with N for a fast version of the algorithm based on counting threemember sub-cliques. Imposing consensus constraints reduces qc by a factor of three in this case, which makes the algorithm dramatically more sensitive. Our most sensitive algorithm, which counts four-member sub-cliques, needs a minimum of only 13 signals to detect motifs in a sequence of length N=12,000 for (l,d)=(15,4).


2018 ◽  
Vol 20 (5) ◽  
pp. 3109-3117 ◽  
Author(s):  
Li Wan ◽  
Takashi Nagata ◽  
Masato Katahira

The roles of the amino acid residues responsible for the deaminase activity of APOBEC3F were identified by mutation analysis.


Genome ◽  
1997 ◽  
Vol 40 (3) ◽  
pp. 342-356 ◽  
Author(s):  
Kathleen A. Hill ◽  
Shiva M. Singh

Prokaryote genomes and nuclear genomes of eukaryotes have a global DNA sequence organization that is species type specific, determined primarily by nearest-neighbor nucleotide associations, and independent of gene function and sequence length. The determinants of such a global structure have remained largely uncharacterized. The monophyletic and endosymbiotic origin of mitochondria permit examination of the influence of evolutionary time and host species type. Different global structures were seen among (i) protozoan and plant, (ii) fungal, (iii) algal (iv) nematode, (v) echinoderm, (vi) insect, and (vii) vertebrate species following examination of 28 complete mitochondrial genomes using chaos representation and measures of short-sequence representation. The mitochondrial genomes have biases in single-nucleotide and dinucleotide representation, specifically, an overrepresentation of A and T nucleotides and CC/GG and AG/CT dinucleotides and a deficiency of CG dinucleotides, in all but one genome. Dinucleotide representation is similar among (i) mitochondrial genomes of more closely related species; (ii) mitochondrial genomes and the Mycoplasma capricolum genome, a proposed progenitor of mitochondrial genomes; and (iii) mitochondrial genomes of diverse species, more so than between the mitochondrial and the nuclear genome of the same or a closely related species. It is hypothesized that sufficient evolutionary time has permitted host-specific constraints to affect nuclear and mitochondrial genomes and that different species type specific constraints influence nuclear and mitochondrial genome global structure.Key words: chaos representation, mitochondrial genomes, primary sequence organization, oligonucleotide frequencies.


2008 ◽  
Vol 66 (4) ◽  
pp. 405-415 ◽  
Author(s):  
Jérôme Duminil ◽  
Delphine Grivet ◽  
Sébastien Ollier ◽  
Sylvain Jeandroz ◽  
Rémy J. Petit

Genetics ◽  
1988 ◽  
Vol 119 (4) ◽  
pp. 875-888
Author(s):  
C F Aquadro ◽  
K M Lado ◽  
W A Noon

Abstract A 40-kb region around the rosy and snake loci was analyzed for restriction map variation among 60 lines of Drosophila melanogaster and 30 lines of Drosophila simulans collected together at a single locality in Raleigh, North Carolina. DNA sequence variation in D. simulans was estimated to be 6.3 times greater than in D. melanogaster (heterozygosities per nucleotide of 1.9% vs. 0.3%). This result stands in marked contrast to results of studies of phenotypic variation including proteins (allozymes), morphology and chromosome arrangements which are generally less variable and less geographically differentiated in D. simulans. Intraspecific polymorphism is not distributed uniformly over the 40-kb region. The level of heterozygosity per nucleotide varies more than 12-fold across the region in D. simulans, being highest over the hsc2 gene. Similar, though less extreme, variation in heterozygosity is also observed in D. melanogaster. Average interspecific divergence (corrected for intraspecific polymorphism) averaged 3.8%. The pattern of interspecific divergence over the 40-kb region shows some disparities with the spatial distribution of intraspecific variation, but is generally consistent with selective neutrality predictions: the most polymorphic regions within species are generally the most divergent between species. Sequence-length polymorphism is observed for D. melanogaster to be at levels comparable to other gene regions in this species. In contrast, no sequence length variation was observed among D. simulans chromosomes (limit of resolution approximately 100 bp). These data indicate that transposable elements play at best a minor role in the generation of naturally occurring genetic variation in D. simulans compared to D. melanogaster. We hypothesize that differences in species effective population size are the major determinant of the contrasting levels and patterns of DNA sequence and insertion/deletion variation that we report here and the patterns of allozyme and morphological variation and differentiation reported by other workers for these two species.


Author(s):  
Dandi Saleky ◽  
Sendy L Merly

A large number of gastropod species have similarities in morphology (cryptic) makes misidentification probably happen/occurred. Accurate species identification is needed in studying bioecology of species. This research aims to identify the species of Cassidulla sp. Which was collected from Peyum Beach Merauke with DNA barcoding techniques using COI gene markers. The primers used in this study are forward primers (LCO1490) and reverse primers (HCO2198). The result of identification with DNA barcoding showed that the species analyzed was Cassidula angulifera with a 99.53% similarity level with a DNA sequence length of 650 bp. Phylogenetic reconstruction showing the entire sequence of Cassidula sp. which were analyzed separately based on the type and genetic distance with high bootstrap value. Phylogenetic reconstruction of Cassidula sp. form a monophyletic group, which means that the species come from the same ancestors. DNA barcoding is very good and accurate in identifying species.


Author(s):  
Barbara Trask ◽  
Susan Allen ◽  
Anne Bergmann ◽  
Mari Christensen ◽  
Anne Fertitta ◽  
...  

Using fluorescence in situ hybridization (FISH), the positions of DNA sequences can be discretely marked with a fluorescent spot. The efficiency of marking DNA sequences of the size cloned in cosmids is 90-95%, and the fluorescent spots produced after FISH are ≈0.3 μm in diameter. Sites of two sequences can be distinguished using two-color FISH. Different reporter molecules, such as biotin or digoxigenin, are incorporated into DNA sequence probes by nick translation. These reporter molecules are labeled after hybridization with different fluorochromes, e.g., FITC and Texas Red. The development of dual band pass filters (Chromatechnology) allows these fluorochromes to be photographed simultaneously without registration shift.


Sign in / Sign up

Export Citation Format

Share Document