scholarly journals Memes: an R interface to the MEME Suite

2021 ◽  
Author(s):  
Spencer L. Nystrom ◽  
Daniel J. McKay

AbstractIdentification of biopolymer motifs represents a key step in the analysis of biological sequences. The MEME Suite is a widely used toolkit for comprehensive analysis of biopolymer motifs; however, these tools are poorly integrated within popular analysis frameworks like the R/Bioconductor project, creating barriers to their use. Here we present memes, an R package which provides a seamless R interface to the MEME Suite. memes provides a novel “data aware” interface to these tools, enabling rapid and complex discriminative motif analysis workflows. In addition to interfacing with popular MEME Suite tools, memes leverages existing R/Bioconductor data structures to store the complex, multidimensional data returned by MEME Suite tools for rapid data access and manipulation. Finally, memes provides data visualization capabilities to facilitate communication of results. memes is available as a Bioconductor package at https://bioconductor.org/packages/memes, and the source code can be found at github.com/snystrom/memes.

2021 ◽  
Vol 17 (9) ◽  
pp. e1008991
Author(s):  
Spencer L. Nystrom ◽  
Daniel J. McKay

Identification of biopolymer motifs represents a key step in the analysis of biological sequences. The MEME Suite is a widely used toolkit for comprehensive analysis of biopolymer motifs; however, these tools are poorly integrated within popular analysis frameworks like the R/Bioconductor project, creating barriers to their use. Here we present memes, an R package that provides a seamless R interface to a selection of popular MEME Suite tools. memes provides a novel “data aware” interface to these tools, enabling rapid and complex discriminative motif analysis workflows. In addition to interfacing with popular MEME Suite tools, memes leverages existing R/Bioconductor data structures to store the multidimensional data returned by MEME Suite tools for rapid data access and manipulation. Finally, memes provides data visualization capabilities to facilitate communication of results. memes is available as a Bioconductor package at https://bioconductor.org/packages/memes, and the source code can be found at github.com/snystrom/memes.


2021 ◽  
Author(s):  
Federico Agostinis ◽  
Chiara Romualdi ◽  
Gabriele Sales ◽  
Davide Risso

Summary: We present NewWave, a scalable R/Bioconductor package for the dimensionality reduction and batch effect removal of single-cell RNA sequencing data. To achieve scalability, NewWave uses mini-batch optimization and can work with out-of-memory data, enabling users to analyze datasets with millions of cells. Availability and implementation: NewWave is implemented as an open-source R package available through the Bioconductor project at https://bioconductor.org/packages/NewWave/ Supplementary information: Supplementary data are available at Bioinformatics online.


Data Science ◽  
2021 ◽  
pp. 1-21
Author(s):  
Caspar J. Van Lissa ◽  
Andreas M. Brandmaier ◽  
Loek Brinkman ◽  
Anna-Lena Lamprecht ◽  
Aaron Peikert ◽  
...  

Adopting open science principles can be challenging, requiring conceptual education and training in the use of new tools. This paper introduces the Workflow for Open Reproducible Code in Science (WORCS): A step-by-step procedure that researchers can follow to make a research project open and reproducible. This workflow intends to lower the threshold for adoption of open science principles. It is based on established best practices, and can be used either in parallel to, or in absence of, top-down requirements by journals, institutions, and funding bodies. To facilitate widespread adoption, the WORCS principles have been implemented in the R package worcs, which offers an RStudio project template and utility functions for specific workflow steps. This paper introduces the conceptual workflow, discusses how it meets different standards for open science, and addresses the functionality provided by the R implementation, worcs. This paper is primarily targeted towards scholars conducting research projects in R, conducting research that involves academic prose, analysis code, and tabular data. However, the workflow is flexible enough to accommodate other scenarios, and offers a starting point for customized solutions. The source code for the R package and manuscript, and a list of examplesof WORCS projects, are available at https://github.com/cjvanlissa/worcs.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Charlie M. Carpenter ◽  
Daniel N. Frank ◽  
Kayla Williamson ◽  
Jaron Arbet ◽  
Brandie D. Wagner ◽  
...  

Abstract Background The drive to understand how microbial communities interact with their environments has inspired innovations across many fields. The data generated from sequence-based analyses of microbial communities typically are of high dimensionality and can involve multiple data tables consisting of taxonomic or functional gene/pathway counts. Merging multiple high dimensional tables with study-related metadata can be challenging. Existing microbiome pipelines available in R have created their own data structures to manage this problem. However, these data structures may be unfamiliar to analysts new to microbiome data or R and do not allow for deviations from internal workflows. Existing analysis tools also focus primarily on community-level analyses and exploratory visualizations, as opposed to analyses of individual taxa. Results We developed the R package “tidyMicro” to serve as a more complete microbiome analysis pipeline. This open source software provides all of the essential tools available in other popular packages (e.g., management of sequence count tables, standard exploratory visualizations, and diversity inference tools) supplemented with multiple options for regression modelling (e.g., negative binomial, beta binomial, and/or rank based testing) and novel visualizations to improve interpretability (e.g., Rocky Mountain plots, longitudinal ordination plots). This comprehensive pipeline for microbiome analysis also maintains data structures familiar to R users to improve analysts’ control over workflow. A complete vignette is provided to aid new users in analysis workflow. Conclusions tidyMicro provides a reliable alternative to popular microbiome analysis packages in R. We provide standard tools as well as novel extensions on standard analyses to improve interpretability results while maintaining object malleability to encourage open source collaboration. The simple examples and full workflow from the package are reproducible and applicable to external data sets.


2018 ◽  
Author(s):  
Jianfeng Li ◽  
Bowen Cui ◽  
Yuting Dai ◽  
Ling Bai ◽  
Jinyan Huang

The number of bioinformatics resources, such as tools/scripts and databases are growing exponentially. This poses a great challenge for users to access, manage, and integrate the corresponding bioinformatics resources. To facilitate the request, we proposed a comprehensive R package, BioInstaller, which includes the R functions, Shiny application, and the HTTP representational state transfer (REST) application programming interfaces (APIs). We also established a community-based configuration pool to collect, access and share bioinformatics resources. The source code of BioInstaller is freely available at our lab website http://bioinfo.rjh.com.cn/labs/jhuang/tools/bioinstaller or popular package host GitHub at: https://github.com/JhuangLab/BioInstaller. Also, a docker image can be downloaded from DockerHub (https://hub.docker.com/r/bioinstaller).


F1000Research ◽  
2019 ◽  
Vol 8 ◽  
pp. 152
Author(s):  
Benjamin J. Stubbs ◽  
Shweta Gopaulakrishnan ◽  
Kimberly Glass ◽  
Nathalie Pochet ◽  
Celine Everaert ◽  
...  

DNA transcription is intrinsically complex. Bioinformatic work with transcription factors (TFs) is complicated by a multiplicity of data resources and annotations. The Bioconductor package TFutils includes data structures and functions to enhance the precision and utility of integrative analyses that have components involving TFs. TFutils provides catalogs of human TFs from three reference sources (CISBP, HOCOMOCO, and GO), a catalog of TF targets derived from MSigDb, and multiple approaches to enumerating TF binding sites. Aspects of integration of TF binding patterns and genome-wide association study results are explored in examples.


2013 ◽  
Vol 9 (2) ◽  
Author(s):  
Fernando De Assis Rodrigues ◽  
Ricardo Ceśar Gonçalves Sant'Ana

Resumo Ambientes para acesso a dados governamentais, via Tecnologias de Informação e Comunicação, podem ampliar possibilidades de acompanhamento pelo cidadão, retroalimentando futuras demandas. O objetivo deste estudo é identificar nos dados disponíveis via transparência ativa, a existência de elementos que permitam a elaboração de propostas de modelos dimensionais, propiciando a antecipação de demandas de acesso a dados. Como referencial teórico-metodológico, o texto utiliza os conceitos Business Intelligence eCitizen Intelligence. Como resultado, foi elaborada a proposta de um modelo dimensional a partir da consulta de despesas diárias, disponível no Portal de Transparência do Governo Federal.Palavras-chave Transparência Pública, Tecnologias de Informação e Comunicação, Coleta de Dados, Citizen Intelligence, Data Warehouse.Abstract Environments for access to government data, viaInformation and Communications Technologies, may expand possibilities for citizen monitoring, providing feedback for future demands. The aim of this study is to identify, in the available data via active transparency, the existence of elements that allow the construction of new proposals of dimensional models, enabling an anticipation of demands on data access. The theoretical-methodological framework, the text uses the concepts Citizen Intelligence and Business Intelligence. As a result, a dimensional model was proposed, building on a dimensional model from a daily expenses query, available in the Transparency home-page of the Brazillian Federal Government.Keywords Public Transparency, Information and Communication Technologies, Collecting Data, Citizen Intelligence, Data Warehouse.


2020 ◽  
Author(s):  
Rodrigo Gazaffi ◽  
Rodrigo R. Amadeu ◽  
Marcelo Mollinari ◽  
João R. B. F. Rosa ◽  
Cristiane H. Taniguti ◽  
...  

ABSTRACTAccurate QTL mapping in outcrossing species requires software programs which consider genetic features of these populations, such as markers with different segregation patterns and different level of information. Although the available mapping procedures to date allow inferring QTL position and effects, they are mostly not based on multilocus genetic maps. Having a QTL analysis based in such maps is crucial since they allow informative markers to propagate their information to less informative intervals of the map. We developed fullsibQTL, a novel and freely available R package to perform composite interval QTL mapping considering outcrossing populations and markers with different segregation patterns. It allows to estimate QTL position, effects, segregation patterns, and linkage phase with flanking markers. Additionally, several statistical and graphical tools are implemented, for straightforward analysis and interpretations. fullsibQTL is an R open source package with C and R source code (GPLv3). It is multiplatform and can be installed from https://github.com/augusto-garcia/fullsibQTL.


2017 ◽  
Author(s):  
Bernat Gel ◽  
Eduard Serra

AbstractMotivationData visualization is a crucial tool for data exploration, analysis and interpretation. For the visualization of genomic data there lacks a tool to create customizable non-circular plots of whole genomes from any species.ResultsWe have developed karyoploteR, an R/Bioconductor package to create linear chromosomal representations of any genome with genomic annotations and experimental data plotted along them. Plot creation process is inspired in R base graphics, with a main function creating karyoplots with no data and multiple additional functions, including custom functions written by the end-user, adding data and other graphical elements. This approach allows the creation of highly customizable plots from arbitrary data with complete freedom on data positioning and representation.AvailabilitykaryoploteR is released under Artistic-2.0 License. Source code and documentation are freely available through Bioconductor (http://www.bioconductor.org/packages/karyoploteR)[email protected]


Sign in / Sign up

Export Citation Format

Share Document