Packaging data analytical work reproducibly using R (and friends)

10.7287/peerj.preprints.3192v1 ◽

2017 ◽

Cited By ~ 3

Author(s):

Ben Marwick ◽

Carl Boettiger ◽

Lincoln Mullen

Keyword(s):

Large Scale ◽

Research Process ◽

R Programming Language ◽

Large Scale Data ◽

Software Packages ◽

Analytical Work ◽

Computer Based ◽

R Packages ◽

R Programming ◽

Central Tool

Computers are a central tool in the research process, enabling complex and large scale data analysis. As computer-based research has increased in complexity, so have the challenges of ensuring that this research is reproducible. To address this challenge, we review the concept of the research compendium as a solution for providing a standard and easily recognisable way for organising the digital materials of a research project to enable other researchers to inspect, reproduce, and extend the research. We investigate how the structure and tooling of software packages of the R programming language are being used to produce research compendia in a variety of disciplines. We also describe how software engineering tools and services are being used by researchers to streamline working with research compendia. Using real-world examples, we show how researchers can improve the reproducibility of their work using research compendia based on R packages and related tools.

Download Full-text

Packaging data analytical work reproducibly using R (and friends)

10.7287/peerj.preprints.3192v2 ◽

2018 ◽

Cited By ~ 4

Author(s):

Ben Marwick ◽

Carl Boettiger ◽

Lincoln Mullen

Keyword(s):

Large Scale ◽

Research Process ◽

R Programming Language ◽

Large Scale Data ◽

Software Packages ◽

Analytical Work ◽

Computer Based ◽

R Packages ◽

R Programming ◽

Central Tool

Computers are a central tool in the research process, enabling complex and large scale data analysis. As computer-based research has increased in complexity, so have the challenges of ensuring that this research is reproducible. To address this challenge, we review the concept of the research compendium as a solution for providing a standard and easily recognisable way for organising the digital materials of a research project to enable other researchers to inspect, reproduce, and extend the research. We investigate how the structure and tooling of software packages of the R programming language are being used to produce research compendia in a variety of disciplines. We also describe how software engineering tools and services are being used by researchers to streamline working with research compendia. Using real-world examples, we show how researchers can improve the reproducibility of their work using research compendia based on R packages and related tools.

Download Full-text

ChR: Dynamic Functional Constraints Checking in R

Journal of Applied Computer Science Methods ◽

10.1515/jacsm-2017-0004 ◽

2017 ◽

Vol 9 (1) ◽

pp. 65-78

Author(s):

Konrad Grzanek

Keyword(s):

Large Scale ◽

Data Science ◽

R Programming Language ◽

Large Scale Data ◽

Dynamic Type ◽

Gradual Typing ◽

R Programming ◽

Learning Projects ◽

Time Type ◽

Scale Data

Abstract Dynamic typing of R programming language may issue some quality problems in large scale data-science and machine-learning projects for which the language is used. Following our efforts on providing gradual typing library for Clojure we come with a package chR - a library that offers functionality of run-time type-related checks in R. The solution is not only a dynamic type checker, it also helps to systematize thinking about types in the language, at the same time offering high expressivenes and full adherence to functional programming style.

Download Full-text

Progress in the R ecosystem for representing and handling spatial data

Journal of Geographical Systems ◽

10.1007/s10109-020-00336-0 ◽

2020 ◽

Cited By ~ 1

Author(s):

Roger S. Bivand

Keyword(s):

New York ◽

Data Analysis ◽

Open Source ◽

Spatial Data ◽

Spatial Data Analysis ◽

Data Handling ◽

Good Match ◽

R Programming Language ◽

R Packages ◽

R Programming

Abstract Twenty years have passed since Bivand and Gebhardt (J Geogr Syst 2(3):307–317, 2000. 10.1007/PL00011460) indicated that there was a good match between the then nascent open-source R programming language and environment and the needs of researchers analysing spatial data. Recalling the development of classes for spatial data presented in book form in Bivand et al. (Applied spatial data analysis with R. Springer, New York, 2008, Applied spatial data analysis with R, 2nd edn. Springer, New York, 2013), it is important to present the progress now occurring in representation of spatial data, and possible consequences for spatial data handling and the statistical analysis of spatial data. Beyond this, it is imperative to discuss the relationships between R-spatial software and the larger open-source geospatial software community on whose work R packages crucially depend.

Download Full-text

Evolutionary Genetics

10.1093/oso/9780198830917.001.0001 ◽

2019 ◽

Cited By ~ 1

Author(s):

Glenn-Peter Sætre ◽

Mark Ravinet

Keyword(s):

Sequence Data ◽

Evolutionary Genetics ◽

Genetic Data ◽

Practical Experience ◽

Whole Genome Sequence ◽

R Programming Language ◽

Making Sense ◽

Computer Based ◽

R Programming ◽

Text Study

Evolutionary genetics is the study of how genetic variation leads to evolutionary change. With the recent explosion in the availability of whole genome sequence data, vast quantities of genetic data are being generated at an ever-increasing pace with the result that programming has become an essential tool for researchers. Most importantly, a thorough understanding of evolutionary principles is essential for making sense of this genetic data. This up-to-date textbook covers all the major components of modern evolutionary genetics, carefully explaining fundamental processes such as mutation, natural selection, genetic drift, and speciation, together with their consequences. In addition to the text, study questions are provided to motivate the reader to think and reflect on the concepts in each chapter. Practical experience is essential when it comes to developing an understanding of how to use genetic data to analyze and address interesting questions in the life sciences and how to interpret results in meaningful ways. Throughout the book, a series of online, computer-based tutorials serves as an introduction to programming and analysis of evolutionary genetic data centered on the R programming language, which stands out as an ideal all-purpose platform to handle and analyze such data. The book and its online materials take full advantage of the authors’ own experience in working in a post-genomic revolution world, and introduce readers to the plethora of molecular and analytical methods that have only recently become available.

Download Full-text

Research and Implementation of the Secure Database-Update Mechanism

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.513-517.1752 ◽

2014 ◽

Vol 513-517 ◽

pp. 1752-1755 ◽

Cited By ~ 1

Author(s):

Chun Liu ◽

Kun Tan

Keyword(s):

Real Time ◽

Large Scale ◽

Data Transfer ◽

Time Data ◽

Safety Critical ◽

Large Scale Data ◽

Data Transfer Protocol ◽

Safety And Reliability ◽

Computer Based ◽

Short Time

For a safety critical computer, large-scale data like database which has to be transferred in an instant time cannot be voted directly. This paper proposes a database update algorithm for safety critical computer based on status vote,which is to vote the database status instead of database itself. This algorithm can solve the problem of voting too much data in a short time, and compare versions of database of different modules in real time. A Markov model is built to calculate the safety and reliability of this algorithm. The results show that this algorithm meets the update requirement of safety critical computer. 1. Communication protocol for database update 1.1 TFTP protocol TFTP is a simple protocol for transporting document. It usually uses the UDP protocol to realize but the TFTP does not require the specific agreement of implementation and can implement with TCP in special occasions. [This agreement is designed for small file transferring, so it doesn't have function many FTP usually does; it can only acquire or write the file from the server and not able tot list directory, not authenticate. It transfers 8 bits of data with three models: netascii, the eight-bit ASCII form; octet, the eight-bit source data type; mail, no longer supported, it returns the data back directly to the user rather than saved as a file. 1.2 SRTP Ethernet security real-time data transfer protocol

Download Full-text

Using R packages 'tmap', 'raster' and 'ggmap' for cartographic visualization: An example of dem-based terrain modelling of Italy, Apennine Peninsula

Zbornik radova - Geografski fakultet Univerziteta u Beogradu ◽

10.5937/zrgfub2068099l ◽

2020 ◽

pp. 99-116

Author(s):

Polina Lemenkova

Keyword(s):

Machine Learning ◽

Open Source ◽

Programming Language ◽

Slope Aspect ◽

Regional Mapping ◽

Geographic Education ◽

R Programming Language ◽

Machine Learning Methods ◽

R Packages ◽

R Programming

The main purpose of this article is to present the use of R programming language in cartographic visualization demonstrating using machine learning methods in geographic education. Current trends in education technologies are largely influenced by the possibilities of distance-learning, e-learning and selflearning. In view of this, the main tendencies in modern geographic education include active use of open source GIS and publicly available free geospatial datasets that can be used by students for cartographic exercises, data visualization and mapping, both at intermediate and advanced levels. This paper contributes to the development of these methods and is fully based on the datasets and tools available for every student: the R programming language and the free open source datasets. The case study demonstrated in this paper show the examples of both physical geographic mapping (geomorphology) and socio-economic geography (regional mapping) which can be used in the classes and in self-learning. The objective of this research includes geomorphological modelling of the terrain relief in Italy and regional mapping. The data include dem SRTM90 and datasets on regional borders of Italy embedded in R packages 'maps' and 'mapdata'. Modelling references to the characteristics of slope, aspect, hillshade and elevation, their visualization using R packages: 'raster' and 'tmap'. Regional mapping of Italy was made using main package 'ggmap' with the 'ggplot2' as a wrapper. The results present five thematic maps (slope, aspect, hillshade, elevation and regions of Italy) created in R language. Traditionally used in statistical analysis, R is less known as a perfect tool in geographic education. This paper contributes to the development of methods in geographic education by presenting new technologies of the machine learning methods of mapping.

Download Full-text

see: An R Package for Visualizing Statistical Models

10.31234/osf.io/m4uax ◽

2021 ◽

Author(s):

Daniel Lüdecke ◽

Indrajeet Patil ◽

Mattan S. Ben-Shachar ◽

Brenton M. Wiernik ◽

Philip Waggoner ◽

...

Keyword(s):

Programming Language ◽

Statistical Models ◽

R Package ◽

Model Parameters ◽

R Programming Language ◽

R Packages ◽

R Programming ◽

And Performance ◽

Performance Diagnostics ◽

Scientific Reporting

The see package is embedded in the easystats ecosystem, a collection of R packages that operate in synergy to provide a consistent and intuitive syntax when working with statistical models in the R programming language (R Core Team, 2021). Most easystats packages return comprehensive numeric summaries of model parameters and performance. The see package complements these numeric summaries with a host of functions and tools to produce a range of publication-ready visualizations for model parameters, predictions, and performance diagnostics. As a core pillar of easystats, the see package helps users to utilize visualization for more informative, communicable, and well-rounded scientific reporting.

Download Full-text

webMCP-counter: a web interface for transcriptomics-based quantification of immune and stromal cells in heterogeneous human or murine samples

10.1101/2020.12.03.400754 ◽

2020 ◽

Author(s):

Maxime Meylan ◽

Etienne Becht ◽

Catherine Sautès-Fridman ◽

Aurélien de Reyniès ◽

Wolf H. Fridman ◽

...

Keyword(s):

Stromal Cells ◽

R Package ◽

Web Interface ◽

Transcriptomic Data ◽

R Programming Language ◽

Link Type ◽

Precise Estimation ◽

R Packages ◽

R Programming ◽

User Friendly

AbstractSummaryWe previously reported MCP-counter and mMCP-counter, methods that allow precise estimation of the immune and stromal composition of human and murine samples from bulk transcriptomic data, but they were only distributed as R packages. Here, we report webMCP-counter, a user-friendly web interface to allow all users to use these methods, regardless of their proficiency in the R programming language.Availability and ImplementationFreely available from http://134.157.229.105:3838/webMCP/. Website developed with the R package shiny. Source code available from GitHub: https://github.com/FPetitprez/webMCP-counter.

Download Full-text

Data Attribution from Download to Publication

Biodiversity Information Science and Standards ◽

10.3897/biss.2.26060 ◽

2018 ◽

Vol 2 ◽

pp. e26060

Author(s):

Pamela Soltis

Keyword(s):

Natural History ◽

Large Scale ◽

Sequence Data ◽

Research Process ◽

Global Biodiversity Information Facility ◽

Data Set ◽

Dna Sequence Data ◽

Large Scale Data ◽

Data Tracking ◽

Biodiversity Information

Digitized natural history data are enabling a broad range of innovative studies of biodiversity. Large-scale data aggregators such as Global Biodiversity Information facility (GBIF) and Integrated Digitized Biocollections (iDigBio) provide easy, global access to millions of specimen records contributed by thousands of collections. A developing community of eager users of specimen data – whether locality, image, trait, etc. – is perhaps unaware of the effort and resources required to curate specimens, digitize information, capture images, mobilize records, serve the data, and maintain the infrastructure (human and cyber) to support all of these activities. Tracking of specimen information throughout the research process is needed to provide appropriate attribution to the institutions and staff that have supplied and served the records. Such tracking may also allow for annotation and comment on particular records or collections by the global community. Detailed data tracking is also required for open, reproducible science. Despite growing recognition of the value and need for thorough data tracking, both technical and sociological challenges continue to impede progress. In this talk, I will present a brief vision of how application of a DOI to each iteration of a data set in a typical research project could provide attribution to the provider, opportunity for comment and annotation of records, and the foundation for reproducible science based on natural history specimen records. Sociological change – such as journal requirements for data deposition of all iterations of a data set – can be accomplished using community meetings and workshops, along with editorial efforts, as were applied to DNA sequence data two decades ago.

Download Full-text