scholarly journals periodicDNA: an R/Bioconductor package to investigate k-mer periodicity in DNA

F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 141
Author(s):  
Jacques Serizay ◽  
Julie Ahringer

Periodic occurrences of oligonucleotide sequences can impact the physical properties of DNA. For example, DNA bendability is modulated by 10-bp periodic occurrences of WW (W = A/T) dinucleotides. We present periodicDNA, an R package to identify k-mer periodicity and generate continuous tracks of k-mer periodicity over genomic loci of interest, such as regulatory elements. periodicDNA will facilitate investigation and improve understanding of how periodic DNA sequence features impact function.

Author(s):  
Laura-Jayne Gardiner ◽  
Rachel Rusholme-Pilcher ◽  
Josh Colmer ◽  
Hannah Rees ◽  
Juan Manuel Crescente ◽  
...  

AbstractThe circadian clock is an important adaptation to life on earth. Here, we use machine learning to predict complex temporal circadian gene expression patterns in Arabidopsis. Most significantly, we classify circadian genes using DNA sequence features generated from public genomic resources, with no experimental work or prior knowledge needed. We use model explanation to rank DNA sequence features, observing transcript-specific combinations of potential circadian regulatory elements that discriminate temporal phase of expression. Model interpretation/explanation provides the backbone of our methodological advances, giving insight into biological processes and experimental design. Next, we use model interpretation to optimize sampling strategies when we predict circadian transcripts using reduced numbers of transcriptomic timepoints, saving both time and money. Finally, we predict the circadian time from a single transcriptomic timepoint, deriving novel marker transcripts that are most impactful for accurate prediction, this could facilitate the identification of altered clock function from existing datasets.


1989 ◽  
Vol 9 (11) ◽  
pp. 5219-5222
Author(s):  
A Celada ◽  
R Maki

The X box is a loosely conserved DNA sequence that is located upstream of all major histocompatibility class II genes and is one of the cis-acting regulatory elements. Despite the similarity between all X-box sequences, each promoter-proximal X box in the mouse appears to bind a separate nuclear factor.


2019 ◽  
Vol 36 (8) ◽  
pp. 2587-2588 ◽  
Author(s):  
Christopher M Ward ◽  
Thu-Hien To ◽  
Stephen M Pederson

Abstract Motivation High throughput next generation sequencing (NGS) has become exceedingly cheap, facilitating studies to be undertaken containing large sample numbers. Quality control (QC) is an essential stage during analytic pipelines and the outputs of popular bioinformatics tools such as FastQC and Picard can provide information on individual samples. Although these tools provide considerable power when carrying out QC, large sample numbers can make inspection of all samples and identification of systemic bias a challenge. Results We present ngsReports, an R package designed for the management and visualization of NGS reports from within an R environment. The available methods allow direct import into R of FastQC reports along with outputs from other tools. Visualization can be carried out across many samples using default, highly customizable plots with options to perform hierarchical clustering to quickly identify outlier libraries. Moreover, these can be displayed in an interactive shiny app or HTML report for ease of analysis. Availability and implementation The ngsReports package is available on Bioconductor and the GUI shiny app is available at https://github.com/UofABioinformaticsHub/shinyNgsreports. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Deepank R Korandla ◽  
Jacob M Wozniak ◽  
Anaamika Campeau ◽  
David J Gonzalez ◽  
Erik S Wright

Abstract Motivation A core task of genomics is to identify the boundaries of protein coding genes, which may cover over 90% of a prokaryote's genome. Several programs are available for gene finding, yet it is currently unclear how well these programs perform and whether any offers superior accuracy. This is in part because there is no universal benchmark for gene finding and, therefore, most developers select their own benchmarking strategy. Results Here, we introduce AssessORF, a new approach for benchmarking prokaryotic gene predictions based on evidence from proteomics data and the evolutionary conservation of start and stop codons. We applied AssessORF to compare gene predictions offered by GenBank, GeneMarkS-2, Glimmer and Prodigal on genomes spanning the prokaryotic tree of life. Gene predictions were 88–95% in agreement with the available evidence, with Glimmer performing the worst but no clear winner. All programs were biased towards selecting start codons that were upstream of the actual start. Given these findings, there remains considerable room for improvement, especially in the detection of correct start sites. Availability and implementation AssessORF is available as an R package via the Bioconductor package repository. Supplementary information Supplementary data are available at Bioinformatics online.


2008 ◽  
Vol 18 (12) ◽  
pp. 1955-1968 ◽  
Author(s):  
S. G. Kuntz ◽  
E. M. Schwarz ◽  
J. A. DeModena ◽  
T. De Buysscher ◽  
D. Trout ◽  
...  

2017 ◽  
Author(s):  
Seyed Ali Madani Tonekaboni ◽  
Parisa Mazrooei ◽  
Victor Kofia ◽  
Benjamin Haibe-Kains ◽  
Mathieu Lupien

ABSTRACTCellular identity relies on cell type-specific gene expression profiles controlled by cis-regulatory elements (CREs), such as promoters, enhancers and anchors of chromatin interactions. CREs are unevenly distributed across the genome, giving rise to distinct subsets such as individual CREs and Clusters Of cis-Regulatory Elements (COREs), also known as super-enhancers. Identifying COREs is a challenge due to technical and biological features that entail variability in the distribution of distances between CREs within a given dataset. To address this issue, we developed a new unsupervised machine learning approach termed Clustering of genomic REgions Analysis Method (CREAM) that outperforms the Ranking Of Super Enhancer (ROSE) approach. Specifically CREAM identified COREs are enriched in CREs strongly bound by master transcription factors according to ChIP-seq signal intensity, are proximal to highly expressed genes, are preferentially found near genes essential for cell growth and are more predictive of cell identity. Moreover, we show that CREAM enables subtyping primary prostate tumor samples according to their CORE distribution across the genome. We further show that COREs are enriched compared to individual CREs at TAD boundaries and these are preferentially bound by CTCF and factors of the cohesin complex (e.g.: RAD21 and SMC3). Finally, using CREAM against transcription factor ChIP-seq reveals CTCF and cohesin-specific COREs preferentially at TAD boundaries compared to intra-TADs. CREAM is available as an open source R package (https://CRAN.R-project.org/package=CREAM) to identify COREs from cis-regulatory annotation datasets from any biological samples.


Sign in / Sign up

Export Citation Format

Share Document