scholarly journals Computational Inference Software for Tetrad Assembly from Randomly Arrayed Yeast Colonies

2019 ◽  
Author(s):  
Nikita A. Sakhanenko ◽  
Gareth A. Cromie ◽  
Aimée M. Dudley ◽  
David J. Galas

AbstractHere, we describe an information-theory-based method and associated software for computationally identifying sister spores derived from the same meiotic tetrad. The method exploits specific DNA sequence features of tetrads that result from meiotic centromere and allele segregation patterns. Because the method uses only the genomic sequence, it alleviates the need for tetrad-specific barcodes or other genetic modifications to the strains. Using this method, strains derived from randomly arrayed spores can be efficiently grouped back into tetrads.

F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 165 ◽  
Author(s):  
Anatoliy Zubritskiy ◽  
Yulia A. Medvedeva

The presence of H3K27me3 has been demonstrated to correlate with the CpG content. In this work, we tested whether H3K27ac has similar sequence preferences. We performed a translocation of DNA sequences with various properties into a beta-globin locus to control for the local chromatin environment. Our results suggest that in contrast to H3K27me3, H3K27ac gain is unlikely affected by the CpG content of the underlying DNA sequence, while extremely high GC-content might contribute to the gain of the H3K27ac.


BMC Genomics ◽  
2009 ◽  
Vol 10 (1) ◽  
pp. 557 ◽  
Author(s):  
Soeren Hofmayer ◽  
Ijad Madisch ◽  
Sebastian Darr ◽  
Fabienne Rehren ◽  
Albert Heim

GigaScience ◽  
2020 ◽  
Vol 9 (11) ◽  
Author(s):  
Milton Silva ◽  
Diogo Pratas ◽  
Armando J Pinho

Abstract Background The increasing production of genomic data has led to an intensified need for models that can cope efficiently with the lossless compression of DNA sequences. Important applications include long-term storage and compression-based data analysis. In the literature, only a few recent articles propose the use of neural networks for DNA sequence compression. However, they fall short when compared with specific DNA compression tools, such as GeCo2. This limitation is due to the absence of models specifically designed for DNA sequences. In this work, we combine the power of neural networks with specific DNA models. For this purpose, we created GeCo3, a new genomic sequence compressor that uses neural networks for mixing multiple context and substitution-tolerant context models. Findings We benchmark GeCo3 as a reference-free DNA compressor in 5 datasets, including a balanced and comprehensive dataset of DNA sequences, the Y-chromosome and human mitogenome, 2 compilations of archaeal and virus genomes, 4 whole genomes, and 2 collections of FASTQ data of a human virome and ancient DNA. GeCo3 achieves a solid improvement in compression over the previous version (GeCo2) of $2.4\%$, $7.1\%$, $6.1\%$, $5.8\%$, and $6.0\%$, respectively. To test its performance as a reference-based DNA compressor, we benchmark GeCo3 in 4 datasets constituted by the pairwise compression of the chromosomes of the genomes of several primates. GeCo3 improves the compression in $12.4\%$, $11.7\%$, $10.8\%$, and $10.1\%$ over the state of the art. The cost of this compression improvement is some additional computational time (1.7–3 times slower than GeCo2). The RAM use is constant, and the tool scales efficiently, independently of the sequence size. Overall, these values outperform the state of the art. Conclusions GeCo3 is a genomic sequence compressor with a neural network mixing approach that provides additional gains over top specific genomic compressors. The proposed mixing method is portable, requiring only the probabilities of the models as inputs, providing easy adaptation to other data compressors or compression-based data analysis tools. GeCo3 is released under GPLv3 and is available for free download at https://github.com/cobilab/geco3.


1996 ◽  
Vol 29 (7) ◽  
pp. 1187-1194 ◽  
Author(s):  
Ramón Román-Roldán ◽  
Pedro Bernaola-Galván ◽  
JoséL Oliver

2015 ◽  
Author(s):  
Geoffrey H Siwo ◽  
Andrew Rider ◽  
Asako Tan ◽  
Richard S Pinapati ◽  
Scott Emrich ◽  
...  

The quantitative prediction of transcriptional activity of genes using promoter sequence is fundamental to the engineering of biological systems for industrial purposes and understanding the natural variation in gene expression. To catalyze the development of new algorithms for this purpose, the Dialogue on Reverse Engineering Assessment and Methods (DREAM) organized a community challenge seeking predictive models of promoter activity given normalized promoter activity data for 90 ribosomal protein promoters driving expression of a fluorescent reporter gene. By developing an unbiased modeling approach that performs an iterative search for predictive DNA sequence features using the frequencies of various k-mers, inferred DNA mechanical properties and spatial positions of promoter sequences, we achieved the best performer status in this challenge. The specific predictive features used in the model included the frequency of the nucleotide G, the length of polymeric tracts of T and TA, the frequencies of 6 distinct trinucleotides and 12 tetranucleotides, and the predicted protein deformability of the DNA sequence. Our method accurately predicted the activity of 20 natural variants of ribosomal protein promoters (Spearman correlation r = 0.73) as compared to 33 laboratory-mutated variants of the promoters (r = 0.57) in a test set that was hidden from participants. Notably, our model differed substantially from the rest in 2 main ways: i) it did not explicitly utilize transcription factor binding information implying that subtle DNA sequence features are highly associated with gene expression, and ii) it was entirely based on features extracted exclusively from the 100 bp region upstream from the translational start site demonstrating that this region encodes much of the overall promoter activity. The findings from this study have important implications for the engineering of predictable gene expression systems and the evolution of gene expression in naturally occurring biological systems.


2019 ◽  
Author(s):  
Mihály Kondrák ◽  
Andrea Kopp ◽  
Csilla Uri ◽  
Anita Sós-Hegedűs ◽  
Edina Csákvári ◽  
...  

AbstractVirus resistance genes carried by wild plant species are valuable resources for plant breeding. The Rysto gene, conferring a broad spectrum of durable resistance, originated from Solanum stoloniferum and was introgressed into several commercial potato cultivars, including ‘White Lady’, by classical breeding. Rysto was mapped to chromosome XII in potato, and markers used for marker-assisted selection in breeding programmes were identified. Nevertheless, there was no information on the identity of the Rysto gene. To begin to reveal the identification of Rysto, fine-scale genetic mapping was performed which, in combination with chromosome walking, narrowed down the locus of the gene to approximately 1 Mb. DNA sequence analysis of the locus identified six full-length NBS-LRR-type (short NLR-type) putative resistance genes. Two of them, designated TMV2 and TMV3, were similar to a TMV resistance gene isolated from tobacco and to Y-1, which co-segregates with Ryadg, the extreme virus resistance gene originated from Solanum andigena and localised to chromosome XI. Furthermore, TMV2 of ‘White Lady’ was found to be 95% identical at the genomic sequence level with the recently isolated Rysto gene of the potato cultivar ‘Alicja’. In addition to the markers identified earlier, this work generated five tightly linked new markers which can serve potato breeding efforts for extreme virus resistance.


2019 ◽  
Author(s):  
Thomas D. Schneider ◽  
Vishnu Jejjala

AbstractRestriction enzymes recognize and bind to specific sequences on invading bacteriophage DNA. Like a key in a lock, these proteins require many contacts to specify the correct DNA sequence. Using information theory we develop an equation that defines the number of independent contacts, which is the dimensionality of the binding. We show that EcoRI, which binds to the sequence GAATTC, functions in 24 dimensions. Information theory represents messages as spheres in high dimensional spaces. Better sphere packing leads to better communications systems. The densest known packing of hyperspheres occurs on the Leech lattice in 24 dimensions. We suggest that the single protein EcoRI molecule employs a Leech lattice in its operation. Optimizing density of sphere packing explains why 6 base restriction enzymes are so common.


2020 ◽  
Author(s):  
Aakash Basu ◽  
Dmitriy G. Bobrovnikov ◽  
Basilio Cieza ◽  
Zan Qureshi ◽  
Taekjip Ha

AbstractSequence features have long been known to influence the local mechanical properties and shapes of DNA. However, a mechanical code (i.e. a comprehensive mapping between DNA sequence and mechanical properties), if it exists, has been difficult to experimentally determine because direct means of measuring the mechanical properties of DNA are typically limited in throughput. Here we use Loop-seq – a recently developed technique to measure the intrinsic cyclizabilities (a proxy for bendability) of DNA fragments in genomic-scale throughput – to characterize the mechanical code. We tabulate how DNA sequence features (distribution patterns of all possible dinucleotides and dinucleotide pairs) influence intrinsic cyclizability, and build a linear model to predict intrinsic cyclizability from sequence. Using our model, we predict that DNA mechanical landscape shapes nucleosome organization around the promoters of various organisms and at the binding site of the transcription factor CTCF, and that hyperperiodic DNA in C. elegans leads to globally curved DNA segments. By performing loop-seq on random libraries in the presence or absence of CpG methylation, we show that CpG methylation leads to global stiffening of DNA in a wide sequence context, and predict based on our model that CpG methylation widely changes the mechanical landscape around mouse promoters. It suggests how epigenetic modifications of DNA might alter gene expression and mediate cellular adaptation by affecting critical processes around promoters that require mechanical deformations of DNA, such as nucleosome organization and transcription initiation. Finally, we show that the genetic code and the mechanical code are linked: sequence-dependent mechanical properties of coding DNA constrains the amino acid sequence despite the degeneracy in the genetic code. Our measurements explain why the pattern of nucleosome organization along genes influences the distribution of amino acids in the translated polypeptide.


Sign in / Sign up

Export Citation Format

Share Document