scholarly journals Declutter your R workflow with tidy tools

Author(s):  
Zev Ross ◽  
Hadley Wickham ◽  
David Robinson

The R language has withstood the test of time. Forty years after it was initially developed (in the form of the S language) R is being used by millions of programmers on workflows the inventors of the language could never have imagined. Although base R packages perform well in most settings, workflows can be made more efficient by developing packages with more consistent arguments, inputs and outputs and emphasizing constantly improving code over historical code consistency. The universe of R packages known as the tidyverse, including dplyr, tidyr and others, aim to improve workflows and make data analysis as smooth as possible by applying a set of core programming principles in package development.

Author(s):  
Zev Ross ◽  
Hadley Wickham ◽  
David Robinson

The R language has withstood the test of time. Forty years after it was initially developed (in the form of the S language) R is being used by millions of programmers on workflows the inventors of the language could never have imagined. Although base R packages perform well in most settings, workflows can be made more efficient by developing packages with more consistent arguments, inputs and outputs and emphasizing constantly improving code over historical code consistency. The universe of R packages known as the tidyverse, including dplyr, tidyr and others, aim to improve workflows and make data analysis as smooth as possible by applying a set of core programming principles in package development.


F1000Research ◽  
2013 ◽  
Vol 2 ◽  
pp. 192 ◽  
Author(s):  
Emanuel Gonçalves ◽  
Julio Saez-Rodriguez

There is an increasing number of software packages to analyse biological experimental data in the R environment. In particular, Bioconductor, a repository of curated R packages, is one of the most comprehensive resources for bioinformatics and biostatistics. The use of these packages is increasing, but it requires a basic understanding of the R language, as well as the syntax of the specific package used. The availability of user graphical interfaces for these packages would decrease the learning curve and broaden their application.   Here, we present a Cytoscape plug-in termed Cyrface that allows Cytoscape plug-ins to connect to any function and package developed in R. Cyrface can be used to run R packages from within the Cytoscape environment making use of a graphical user interface. Moreover, it links the R packages with the capabilities of Cytoscape and its plug-ins, in particular network visualization and analysis. Cyrface’s utility has been demonstrated for two Bioconductor packages (CellNOptR and DrugVsDisease), and here we further illustrate its usage by implementing a workflow of data analysis and visualization. Download links, installation instructions and user guides can be accessed from the Cyrface homepage (http://www.ebi.ac.uk/saezrodriguez/cyrface/).


2019 ◽  
Vol 4 (1) ◽  
Author(s):  
Chang Chen ◽  
Shixue Sun ◽  
Zhixin Cao ◽  
Yan Shi ◽  
Baoqing Sun ◽  
...  

Abstract Sample entropy is a powerful tool for analyzing the complexity and irregularity of physiology signals which may be associated with human health. Nevertheless, the sophistication of its calculation hinders its universal application. As of today, the R language provides multiple open-source packages for calculating sample entropy. All of which, however, are designed for different scenarios. Therefore, when searching for a proper package, the investigators would be confused on the parameter setting and selection of algorithms. To ease their selection, we have explored the functions of five existing R packages for calculating sample entropy and have compared their computing capability in several dimensions. We used four published datasets on respiratory and heart rate to study their input parameters, types of entropy, and program running time. In summary, NonlinearTseries and CGManalyzer can provide the analysis of sample entropy with different embedding dimensions and similarity thresholds. CGManalyzer is a good choice for calculating multiscale sample entropy of physiological signal because it not only shows sample entropy of all scales simultaneously but also provides various visualization plots. MSMVSampEn is the only package that can calculate multivariate multiscale entropies. In terms of computing time, NonlinearTseries, CGManalyzer, and MSMVSampEn run significantly faster than the other two packages. Moreover, we identify the issues in MVMSampEn package. This article provides guidelines for researchers to find a suitable R package for their analysis and applications using sample entropy.


Author(s):  
Roger S. Bivand

Abstract Twenty years have passed since Bivand and Gebhardt (J Geogr Syst 2(3):307–317, 2000. 10.1007/PL00011460) indicated that there was a good match between the then nascent open-source R programming language and environment and the needs of researchers analysing spatial data. Recalling the development of classes for spatial data presented in book form in Bivand et al. (Applied spatial data analysis with R. Springer, New York, 2008, Applied spatial data analysis with R, 2nd edn. Springer, New York, 2013), it is important to present the progress now occurring in representation of spatial data, and possible consequences for spatial data handling and the statistical analysis of spatial data. Beyond this, it is imperative to discuss the relationships between R-spatial software and the larger open-source geospatial software community on whose work R packages crucially depend.


2017 ◽  
Vol 3 ◽  
pp. e129 ◽  
Author(s):  
Bruno Contrino ◽  
Eric Miele ◽  
Ronald Tomlinson ◽  
M. Paola Castaldi ◽  
Piero Ricchiuto

Background Mass Spectrometry (MS) based chemoproteomics has recently become a main tool to identify and quantify cellular target protein interactions with ligands/drugs in drug discovery. The complexity associated with these new types of data requires scientists with a limited computational background to perform systematic data quality controls as well as to visualize the results derived from the analysis to enable rapid decision making. To date, there are no readily accessible platforms specifically designed for chemoproteomics data analysis. Results We developed a Shiny-based web application named DOSCHEDA (Down Stream Chemoproteomics Data Analysis) to assess the quality of chemoproteomics experiments, to filter peptide intensities based on linear correlations between replicates, and to perform statistical analysis based on the experimental design. In order to increase its accessibility, DOSCHEDA is designed to be used with minimal user input and it does not require programming knowledge. Typical inputs can be protein fold changes or peptide intensities obtained from Proteome Discover, MaxQuant or other similar software. DOSCHEDA aggregates results from bioinformatics analyses performed on the input dataset into a dynamic interface, it encompasses interactive graphics and enables customized output reports. Conclusions DOSCHEDA is implemented entirely in R language. It can be launched by any system with R installed, including Windows, Mac OS and Linux distributions. DOSCHEDA is hosted on a shiny-server at https://doscheda.shinyapps.io/doscheda and is also available as a Bioconductor package (http://www.bioconductor.org/).


2017 ◽  
Author(s):  
Nabeel Siddiqui
Keyword(s):  

This tutorial explores how scholars can organize 'tidy' data, understand R packages to manipulate data, and conduct basic data analysis.


2019 ◽  
Vol 491 (4) ◽  
pp. 4869-4883 ◽  
Author(s):  
Konstantinos Tanidis ◽  
Stefano Camera ◽  
David Parkinson

ABSTRACT Following on our purpose of developing a unified pipeline for large-scale structure data analysis with angular power spectra, we now include the weak lensing effect of magnification bias on galaxy clustering in a publicly available, modular parameter estimation code. We thus forecast constraints on the parameters of the concordance cosmological model, dark energy, and modified gravity theories from galaxy clustering tomographic angular power spectra. We find that a correct modelling of magnification is crucial not to bias the parameter estimation, especially in the case of deep galaxy surveys. Our case study adopts specifications of the Evolutionary Map of the Universe, which is a full-sky, deep radio-continuum survey, expected to probe the Universe up to redshift z ∼ 6. We assume the Limber approximation, and include magnification bias on top of density fluctuations and redshift-space distortions. By restricting our analysis to the regime where the Limber approximation holds true, we significantly minimize the computational time needed, compared to that of the exact calculation. We also show that there is a trend for more biased parameter estimates from neglecting magnification when the redshift bins are very wide. We conclude that this result implies a strong dependence on the lensing contribution, which is an integrated effect and becomes dominant when wide redshift bins are considered. Finally, we note that instead of being considered a contaminant, magnification bias encodes important cosmological information, and its inclusion leads to an alleviation of its degeneracy between the galaxy bias and the amplitude normalization of the matter fluctuations.


Sign in / Sign up

Export Citation Format

Share Document