ILSA Data Analysis with R Packages

2021 ◽  
pp. 271-282
Author(s):  
Laura Ringienė ◽  
Julius Žilinskas ◽  
Audronė Jakaitienė
Keyword(s):  
Author(s):  
Roger S. Bivand

Abstract Twenty years have passed since Bivand and Gebhardt (J Geogr Syst 2(3):307–317, 2000. 10.1007/PL00011460) indicated that there was a good match between the then nascent open-source R programming language and environment and the needs of researchers analysing spatial data. Recalling the development of classes for spatial data presented in book form in Bivand et al. (Applied spatial data analysis with R. Springer, New York, 2008, Applied spatial data analysis with R, 2nd edn. Springer, New York, 2013), it is important to present the progress now occurring in representation of spatial data, and possible consequences for spatial data handling and the statistical analysis of spatial data. Beyond this, it is imperative to discuss the relationships between R-spatial software and the larger open-source geospatial software community on whose work R packages crucially depend.


2017 ◽  
Author(s):  
Nabeel Siddiqui
Keyword(s):  

This tutorial explores how scholars can organize 'tidy' data, understand R packages to manipulate data, and conduct basic data analysis.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Xinyan Zhang ◽  
Nengjun Yi

Abstract Background Microbiome/metagenomic data have specific characteristics, including varying total sequence reads, over-dispersion, and zero-inflation, which require tailored analytic tools. Many microbiome/metagenomic studies follow a longitudinal design to collect samples, which further complicates the analysis methods needed. A flexible and efficient R package is needed for analyzing processed multilevel or longitudinal microbiome/metagenomic data. Results NBZIMM is a freely available R package that provides functions for setting up and fitting negative binomial mixed models, zero-inflated negative binomial mixed models, and zero-inflated Gaussian mixed models. It also provides functions to summarize the results from fitted models, both numerically and graphically. The main functions are built on top of the commonly used R packages nlme and MASS, allowing us to incorporate the well-developed analytic procedures into the framework for analyzing over-dispersed and zero-inflated count or proportion data with multilevel structures (e.g., longitudinal studies). The statistical methods and their implementations in NBZIMM particularly address the data characteristics and the complex designs in microbiome/metagenomic studies. The package is freely available from the public GitHub repository https://github.com/nyiuab/NBZIMM. Conclusion The NBZIMM package provides useful tools for complex microbiome/metagenomics data analysis.


2017 ◽  
Author(s):  
Florian Privé ◽  
Hugues Aschard ◽  
Michael G.B. Blum

AbstractMotivation:Genome-wide datasets produced for association studies have dramatically increased in size over the past few years, with modern datasets commonly including millions of variants measured in dozens of thousands of individuals. This increase in data size is a major challenge severely slowing down genomic analyses. Specialized software for every part of the analysis pipeline have been developed to handle large genomic data. However, combining all these software into a single data analysis pipeline might be technically difficult.Results:Here we present two R packages, bigstatsr and bigsnpr, allowing for management and analysis of large scale genomic data to be performed within a single comprehensive framework. To address large data size, the packages use memory-mapping for accessing data matrices stored on disk instead of in RAM. To perform data pre-processing and data analysis, the packages integrate most of the tools that are commonly used, either through transparent system calls to existing software, or through updated or improved implementation of existing methods. In particular, the packages implement a fast derivation of Principal Component Analysis, functions to remove SNPs in Linkage Disequilibrium, and algorithms to learn Polygenic Risk Scores on millions of SNPs. We illustrate applications of the two R packages by analysing a case-control genomic dataset for the celiac disease, performing an association study and computing Polygenic Risk Scores. Finally, we demonstrate the scalability of the R packages by analyzing a simulated genome-wide dataset including 500,000 individuals and 1 million markers on a single desktop computer.Availability:https://privefl.github.io/bigstatsr/ & https://privefl.github.io/bigsnpr/Contact:[email protected] & [email protected] information:Supplementary data are available at Bioinformatics online.


Author(s):  
Zev Ross ◽  
Hadley Wickham ◽  
David Robinson

The R language has withstood the test of time. Forty years after it was initially developed (in the form of the S language) R is being used by millions of programmers on workflows the inventors of the language could never have imagined. Although base R packages perform well in most settings, workflows can be made more efficient by developing packages with more consistent arguments, inputs and outputs and emphasizing constantly improving code over historical code consistency. The universe of R packages known as the tidyverse, including dplyr, tidyr and others, aim to improve workflows and make data analysis as smooth as possible by applying a set of core programming principles in package development.


Author(s):  
Peter A. Henderson

Ecological Methods, by the late T. R. E. Southwood and revised over the years by P. A. Henderson, has developed into a classic reference work for the field biologist. It provides a handbook of ecological methods and analytical techniques pertinent to the study of animals, with an emphasis on non-microscopic animals in both terrestrial and aquatic environments. It remains unique in the breadth of the methods presented and in the depth of the literature cited, stretching right back to the earliest days of ecological research. The universal availability of R as an open-source package has radically changed the way ecologists analyze their data. In response, Southwood’s classic text has been thoroughly revised to be more relevant and useful to a new generation of ecologists, making the vast resource of R packages more readily available to the wider ecological community. By focusing on the use of R for data analysis, supported by worked examples, the book is now more accessible than previous editions to students requiring support and ideas for their projects.


F1000Research ◽  
2013 ◽  
Vol 2 ◽  
pp. 192 ◽  
Author(s):  
Emanuel Gonçalves ◽  
Julio Saez-Rodriguez

There is an increasing number of software packages to analyse biological experimental data in the R environment. In particular, Bioconductor, a repository of curated R packages, is one of the most comprehensive resources for bioinformatics and biostatistics. The use of these packages is increasing, but it requires a basic understanding of the R language, as well as the syntax of the specific package used. The availability of user graphical interfaces for these packages would decrease the learning curve and broaden their application.   Here, we present a Cytoscape plug-in termed Cyrface that allows Cytoscape plug-ins to connect to any function and package developed in R. Cyrface can be used to run R packages from within the Cytoscape environment making use of a graphical user interface. Moreover, it links the R packages with the capabilities of Cytoscape and its plug-ins, in particular network visualization and analysis. Cyrface’s utility has been demonstrated for two Bioconductor packages (CellNOptR and DrugVsDisease), and here we further illustrate its usage by implementing a workflow of data analysis and visualization. Download links, installation instructions and user guides can be accessed from the Cyrface homepage (http://www.ebi.ac.uk/saezrodriguez/cyrface/).


PLoS ONE ◽  
2021 ◽  
Vol 16 (5) ◽  
pp. e0244122
Author(s):  
Dario Righelli ◽  
Claudia Angelini

During last years “irreproducibility” became a general problem in omics data analysis due to the use of sophisticated and poorly described computational procedures. For avoiding misleading results, it is necessary to inspect and reproduce the entire data analysis as a unified product. Reproducible Research (RR) provides general guidelines for public access to the analytic data and related analysis code combined with natural language documentation, allowing third-parties to reproduce the findings. We developed easyreporting, a novel R/Bioconductor package, to facilitate the implementation of an RR layer inside reports/tools. We describe the main functionalities and illustrate the organization of an analysis report using a typical case study concerning the analysis of RNA-seq data. Then, we show how to use easyreporting in other projects to trace R functions automatically. This latter feature helps developers to implement procedures that automatically keep track of the analysis steps. Easyreporting can be useful in supporting the reproducibility of any data analysis project and shows great advantages for the implementation of R packages and GUIs. It turns out to be very helpful in bioinformatics, where the complexity of the analyses makes it extremely difficult to trace all the steps and parameters used in the study.


Sign in / Sign up

Export Citation Format

Share Document