Memes: an R interface to the MEME Suite

Mapping Intimacies ◽

10.1101/2021.04.23.441089 ◽

2021 ◽

Author(s):

Spencer L. Nystrom ◽

Daniel J. McKay

Keyword(s):

Data Structures ◽

Source Code ◽

Data Access ◽

R Package ◽

Comprehensive Analysis ◽

Multidimensional Data ◽

Biological Sequences ◽

Bioconductor Package ◽

Motif Analysis ◽

Bioconductor Project

AbstractIdentification of biopolymer motifs represents a key step in the analysis of biological sequences. The MEME Suite is a widely used toolkit for comprehensive analysis of biopolymer motifs; however, these tools are poorly integrated within popular analysis frameworks like the R/Bioconductor project, creating barriers to their use. Here we present memes, an R package which provides a seamless R interface to the MEME Suite. memes provides a novel “data aware” interface to these tools, enabling rapid and complex discriminative motif analysis workflows. In addition to interfacing with popular MEME Suite tools, memes leverages existing R/Bioconductor data structures to store the complex, multidimensional data returned by MEME Suite tools for rapid data access and manipulation. Finally, memes provides data visualization capabilities to facilitate communication of results. memes is available as a Bioconductor package at https://bioconductor.org/packages/memes, and the source code can be found at github.com/snystrom/memes.

Download Full-text

Memes: A motif analysis environment in R using tools from the MEME Suite

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008991 ◽

2021 ◽

Vol 17 (9) ◽

pp. e1008991

Author(s):

Spencer L. Nystrom ◽

Daniel J. McKay

Keyword(s):

Data Access ◽

R Package ◽

Comprehensive Analysis ◽

Multidimensional Data ◽

Biological Sequences ◽

Bioconductor Package ◽

Motif Analysis ◽

Bioconductor Project ◽

Analysis Environment ◽

Selection Of

Identification of biopolymer motifs represents a key step in the analysis of biological sequences. The MEME Suite is a widely used toolkit for comprehensive analysis of biopolymer motifs; however, these tools are poorly integrated within popular analysis frameworks like the R/Bioconductor project, creating barriers to their use. Here we present memes, an R package that provides a seamless R interface to a selection of popular MEME Suite tools. memes provides a novel “data aware” interface to these tools, enabling rapid and complex discriminative motif analysis workflows. In addition to interfacing with popular MEME Suite tools, memes leverages existing R/Bioconductor data structures to store the multidimensional data returned by MEME Suite tools for rapid data access and manipulation. Finally, memes provides data visualization capabilities to facilitate communication of results. memes is available as a Bioconductor package at https://bioconductor.org/packages/memes, and the source code can be found at github.com/snystrom/memes.

Download Full-text

NewWave: a scalable R/Bioconductor package for the dimensionality reduction and batch effect removal of single-cell RNA-seq data

10.1101/2021.08.02.453487 ◽

2021 ◽

Author(s):

Federico Agostinis ◽

Chiara Romualdi ◽

Gabriele Sales ◽

Davide Risso

Keyword(s):

Dimensionality Reduction ◽

Single Cell ◽

R Package ◽

Batch Effect ◽

Supplementary Information ◽

Bioconductor Package ◽

Rna Seq ◽

Sequencing Data ◽

Bioconductor Project ◽

Single Cell Rna Sequencing

Summary: We present NewWave, a scalable R/Bioconductor package for the dimensionality reduction and batch effect removal of single-cell RNA sequencing data. To achieve scalability, NewWave uses mini-batch optimization and can work with out-of-memory data, enabling users to analyze datasets with millions of cells. Availability and implementation: NewWave is implemented as an open-source R package available through the Bioconductor project at https://bioconductor.org/packages/NewWave/ Supplementary information: Supplementary data are available at Bioinformatics online.

Download Full-text

WORCS: A workflow for open reproducible code in science

Data Science ◽

10.3233/ds-210031 ◽

2021 ◽

pp. 1-21

Author(s):

Caspar J. Van Lissa ◽

Andreas M. Brandmaier ◽

Loek Brinkman ◽

Anna-Lena Lamprecht ◽

Aaron Peikert ◽

...

Keyword(s):

Best Practices ◽

Source Code ◽

R Package ◽

Open Science ◽

Research Projects ◽

Tabular Data ◽

Step Procedure ◽

Starting Point ◽

Conducting Research ◽

And Training

Adopting open science principles can be challenging, requiring conceptual education and training in the use of new tools. This paper introduces the Workflow for Open Reproducible Code in Science (WORCS): A step-by-step procedure that researchers can follow to make a research project open and reproducible. This workflow intends to lower the threshold for adoption of open science principles. It is based on established best practices, and can be used either in parallel to, or in absence of, top-down requirements by journals, institutions, and funding bodies. To facilitate widespread adoption, the WORCS principles have been implemented in the R package worcs, which offers an RStudio project template and utility functions for specific workflow steps. This paper introduces the conceptual workflow, discusses how it meets different standards for open science, and addresses the functionality provided by the R implementation, worcs. This paper is primarily targeted towards scholars conducting research projects in R, conducting research that involves academic prose, analysis code, and tabular data. However, the workflow is flexible enough to accommodate other scenarios, and offers a starting point for customized solutions. The source code for the R package and manuscript, and a list of examplesof WORCS projects, are available at https://github.com/cjvanlissa/worcs.

Download Full-text

tidyMicro: a pipeline for microbiome data analysis and visualization using the tidyverse in R

BMC Bioinformatics ◽

10.1186/s12859-021-03967-2 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Charlie M. Carpenter ◽

Daniel N. Frank ◽

Kayla Williamson ◽

Jaron Arbet ◽

Brandie D. Wagner ◽

...

Keyword(s):

Microbial Communities ◽

Open Source ◽

Data Structures ◽

Negative Binomial ◽

Rocky Mountain ◽

R Package ◽

Microbiome Analysis ◽

External Data ◽

Data Tables ◽

Microbiome Data

Abstract Background The drive to understand how microbial communities interact with their environments has inspired innovations across many fields. The data generated from sequence-based analyses of microbial communities typically are of high dimensionality and can involve multiple data tables consisting of taxonomic or functional gene/pathway counts. Merging multiple high dimensional tables with study-related metadata can be challenging. Existing microbiome pipelines available in R have created their own data structures to manage this problem. However, these data structures may be unfamiliar to analysts new to microbiome data or R and do not allow for deviations from internal workflows. Existing analysis tools also focus primarily on community-level analyses and exploratory visualizations, as opposed to analyses of individual taxa. Results We developed the R package “tidyMicro” to serve as a more complete microbiome analysis pipeline. This open source software provides all of the essential tools available in other popular packages (e.g., management of sequence count tables, standard exploratory visualizations, and diversity inference tools) supplemented with multiple options for regression modelling (e.g., negative binomial, beta binomial, and/or rank based testing) and novel visualizations to improve interpretability (e.g., Rocky Mountain plots, longitudinal ordination plots). This comprehensive pipeline for microbiome analysis also maintains data structures familiar to R users to improve analysts’ control over workflow. A complete vignette is provided to aid new users in analysis workflow. Conclusions tidyMicro provides a reliable alternative to popular microbiome analysis packages in R. We provide standard tools as well as novel extensions on standard analyses to improve interpretability results while maintaining object malleability to encourage open source collaboration. The simple examples and full workflow from the package are reproducible and applicable to external data sets.

Download Full-text

Multidimensional Data Structures

Algorithms and Theory of Computation Handbook - Chapman & Hall/CRC Applied Algorithms and Data Structures series ◽

10.1201/9781420049503-c19 ◽

1998 ◽

Author(s):

Hanan Samet

Keyword(s):

Data Structures ◽

Multidimensional Data ◽

Multidimensional Data Structures

Download Full-text

BioInstaller: a comprehensive R package to integrate bioinformatics resources

10.7287/peerj.preprints.27221v1 ◽

2018 ◽

Author(s):

Jianfeng Li ◽

Bowen Cui ◽

Yuting Dai ◽

Ling Bai ◽

Jinyan Huang

Keyword(s):

Source Code ◽

R Package ◽

Community Based ◽

Representational State Transfer ◽

State Transfer ◽

Application Programming ◽

Representational State ◽

Programming Interfaces ◽

Shiny Application ◽

R Functions

The number of bioinformatics resources, such as tools/scripts and databases are growing exponentially. This poses a great challenge for users to access, manage, and integrate the corresponding bioinformatics resources. To facilitate the request, we proposed a comprehensive R package, BioInstaller, which includes the R functions, Shiny application, and the HTTP representational state transfer (REST) application programming interfaces (APIs). We also established a community-based configuration pool to collect, access and share bioinformatics resources. The source code of BioInstaller is freely available at our lab website http://bioinfo.rjh.com.cn/labs/jhuang/tools/bioinstaller or popular package host GitHub at: https://github.com/JhuangLab/BioInstaller. Also, a docker image can be downloaded from DockerHub (https://hub.docker.com/r/bioinstaller).

Download Full-text

TFutils: Data structures for transcription factor bioinformatics

F1000Research ◽

10.12688/f1000research.17976.1 ◽

2019 ◽

Vol 8 ◽

pp. 152

Author(s):

Benjamin J. Stubbs ◽

Shweta Gopaulakrishnan ◽

Kimberly Glass ◽

Nathalie Pochet ◽

Celine Everaert ◽

...

Keyword(s):

Transcription Factor ◽

Transcription Factors ◽

Data Structures ◽

Binding Sites ◽

Genome Wide Association Study ◽

Genome Wide Association ◽

Bioconductor Package ◽

Genome Wide ◽

Study Results ◽

Integrative Analyses

DNA transcription is intrinsically complex. Bioinformatic work with transcription factors (TFs) is complicated by a multiplicity of data resources and annotations. The Bioconductor package TFutils includes data structures and functions to enhance the precision and utility of integrative analyses that have components involving TFs. TFutils provides catalogs of human TFs from three reference sources (CISBP, HOCOMOCO, and GO), a catalog of TF targets derived from MSigDb, and multiple approaches to enumerating TF binding sites. Aspects of integration of TF binding patterns and genome-wide association study results are explored in examples.

Download Full-text

Uso de Modelos de Dados Multidimensionais para a ampliação da Transparência Ativa │ Use of multidimensional data models to increase active

Liinc em Revista ◽

10.18617/liinc.v9i2.599 ◽

2013 ◽

Vol 9 (2) ◽

Author(s):

Fernando De Assis Rodrigues ◽

Ricardo Ceśar Gonçalves Sant'Ana

Keyword(s):

Information And Communication Technologies ◽

Business Intelligence ◽

Data Access ◽

Communication Technologies ◽

Multidimensional Data ◽

Dimensional Model ◽

Information And Communication ◽

Future Demands ◽

Citizen Monitoring ◽

Government Data

Resumo Ambientes para acesso a dados governamentais, via Tecnologias de Informação e Comunicação, podem ampliar possibilidades de acompanhamento pelo cidadão, retroalimentando futuras demandas. O objetivo deste estudo é identificar nos dados disponíveis via transparência ativa, a existência de elementos que permitam a elaboração de propostas de modelos dimensionais, propiciando a antecipação de demandas de acesso a dados. Como referencial teórico-metodológico, o texto utiliza os conceitos Business Intelligence eCitizen Intelligence. Como resultado, foi elaborada a proposta de um modelo dimensional a partir da consulta de despesas diárias, disponível no Portal de Transparência do Governo Federal.Palavras-chave Transparência Pública, Tecnologias de Informação e Comunicação, Coleta de Dados, Citizen Intelligence, Data Warehouse.Abstract Environments for access to government data, viaInformation and Communications Technologies, may expand possibilities for citizen monitoring, providing feedback for future demands. The aim of this study is to identify, in the available data via active transparency, the existence of elements that allow the construction of new proposals of dimensional models, enabling an anticipation of demands on data access. The theoretical-methodological framework, the text uses the concepts Citizen Intelligence and Business Intelligence. As a result, a dimensional model was proposed, building on a dimensional model from a daily expenses query, available in the Transparency home-page of the Brazillian Federal Government.Keywords Public Transparency, Information and Communication Technologies, Collecting Data, Citizen Intelligence, Data Warehouse.

Download Full-text

fullsibQTL: an R package for QTL mapping in biparental populations of outcrossing species

10.1101/2020.12.04.412262 ◽

2020 ◽

Author(s):

Rodrigo Gazaffi ◽

Rodrigo R. Amadeu ◽

Marcelo Mollinari ◽

João R. B. F. Rosa ◽

Cristiane H. Taniguti ◽

...

Keyword(s):

Qtl Mapping ◽

Open Source ◽

Qtl Analysis ◽

Source Code ◽

R Package ◽

Genetic Maps ◽

Linkage Phase ◽

Position Effects ◽

Genetic Features ◽

Outcrossing Species

ABSTRACTAccurate QTL mapping in outcrossing species requires software programs which consider genetic features of these populations, such as markers with different segregation patterns and different level of information. Although the available mapping procedures to date allow inferring QTL position and effects, they are mostly not based on multilocus genetic maps. Having a QTL analysis based in such maps is crucial since they allow informative markers to propagate their information to less informative intervals of the map. We developed fullsibQTL, a novel and freely available R package to perform composite interval QTL mapping considering outcrossing populations and markers with different segregation patterns. It allows to estimate QTL position, effects, segregation patterns, and linkage phase with flanking markers. Additionally, several statistical and graphical tools are implemented, for straightforward analysis and interpretations. fullsibQTL is an R open source package with C and R source code (GPLv3). It is multiplatform and can be installed from https://github.com/augusto-garcia/fullsibQTL.

Download Full-text

karyoploteR: an R/Bioconductor package to plot customizable linear genomes displaying arbitrary data

10.1101/122838 ◽

2017 ◽

Cited By ~ 1

Author(s):

Bernat Gel ◽

Eduard Serra

Keyword(s):

Experimental Data ◽

Source Code ◽

Genomic Data ◽

Data Exploration ◽

Main Function ◽

Bioconductor Package ◽

End User ◽

Creation Process ◽

Whole Genomes ◽

Linear Genomes

AbstractMotivationData visualization is a crucial tool for data exploration, analysis and interpretation. For the visualization of genomic data there lacks a tool to create customizable non-circular plots of whole genomes from any species.ResultsWe have developed karyoploteR, an R/Bioconductor package to create linear chromosomal representations of any genome with genomic annotations and experimental data plotted along them. Plot creation process is inspired in R base graphics, with a main function creating karyoplots with no data and multiple additional functions, including custom functions written by the end-user, adding data and other graphical elements. This approach allows the creation of highly customizable plots from arbitrary data with complete freedom on data positioning and representation.AvailabilitykaryoploteR is released under Artistic-2.0 License. Source code and documentation are freely available through Bioconductor (http://www.bioconductor.org/packages/karyoploteR)[email protected]

Download Full-text