Developing and deploying an integrated workshop curriculum teaching computational skills for reproducible research

Inspired by well-established material and pedagogy provided by The Carpentries, we developed a two-day workshop curriculum that teaches introductory R programming for managing, analyzing, plotting and reporting data using packages from the tidyverse, the Unix shell, version control with git, and GitHub. While the official Software Carpentry curriculum is comprehensive, we found that it contains too much content for a two-day workshop. We also felt that the independent nature of the lessons left learners confused about how to integrate the newly acquired programming skills in their own work. Thus, we developed a new curriculum (https://umcarpentries.org/intro-curriculum-r/) that aims to teach novices how to implement reproducible research principles in their own data analysis. The curriculum integrates live coding lessons with individual-level and group-based practice exercises, and also serves as a succinct resource that learners can reference both during and after the workshop. Moreover, it lowers the entry barrier for new instructors as they do not have to develop their own teaching materials or sift through extensive content. We developed this curriculum during a two-day sprint, successfully used it to host a two-day virtual workshop with almost 40 participants, and updated the material based on instructor and learner feedback. We hope that our new curriculum will prove useful to future instructors interested in teaching workshops with similar learning objectives.

Download Full-text

hagis, an R Package Resource for Pathotype Analysis of Phytophthora sojae Populations Causing Stem and Root Rot of Soybean

Molecular Plant-Microbe Interactions ◽

10.1094/mpmi-07-19-0180-a ◽

2019 ◽

Vol 32 (12) ◽

pp. 1574-1576 ◽

Cited By ~ 2

Author(s):

Austin G. McCoy ◽

Zachary Noel ◽

Adam H. Sparks ◽

Martin Chilvers

Keyword(s):

Plant Pathology ◽

Data Analysis ◽

Root Rot ◽

R Package ◽

Diversity Indices ◽

Phytophthora Sojae ◽

Reproducible Research ◽

Microsoft Excel ◽

R Programming Language ◽

R Programming

Phytophthora sojae is a significant pathogen of soybean worldwide. Pathotype surveys for Phytophthora sojae are conducted to monitor resistance gene efficacy and determine if new resistance genes are needed. Valuable measurements for pathotype analysis include the distribution of susceptible reactions, pathotype complexity, pathotype frequency, and diversity indices for pathotype distributions. Previously the Habgood-Gilmour Spreadsheet (HaGiS), written in Microsoft Excel, was used for data analysis. However, the growing popularity of the R programming language in plant pathology and desire for reproducible research made HaGiS a prime candidate for conversion into an R package. Here we report on the development and use of an R package, hagis, that can be used to produce all outputs from the HaGiS Excel sheet for P. sojae or other gene-for-gene pathosystem studies.

Download Full-text

The New Statistics with R

10.1093/oso/9780198798170.001.0001 ◽

2021 ◽

Cited By ~ 1

Author(s):

Andy Hector

Keyword(s):

Linear Model ◽

Evolutionary Biology ◽

Environmental Science ◽

Research Training ◽

Model Analysis ◽

Scientific Data ◽

Information Criteria ◽

Reproducible Research ◽

Data Sets ◽

R Programming

Statistics is a fundamental component of the scientific toolbox, but learning the basics of this area of mathematics is one of the most challenging parts of a research training. This book gives an up-to-date introduction to the classical techniques and modern extensions of linear-model analysis—one of the most useful approaches in the analysis of scientific data in the life and environmental sciences. The book emphasizes an estimation-based approach that takes account of recent criticisms of overuse of probability values and introduces the alternative approach using information criteria. The book is based on the use of the open-source R programming language for statistics and graphics, which is rapidly becoming the lingua franca in many areas of science. This second edition adds new chapters, including one discussing some of the complexities of linear-model analysis and another introducing reproducible research documents using the R Markdown package. Statistics is introduced through worked analyses performed in R using interesting data sets from ecology, evolutionary biology, and environmental science. The data sets and R scripts are available as supporting material.

Download Full-text

Extending TCGA queries to automatically identify analogous genomic data from dbGaP

F1000Research ◽

10.12688/f1000research.9837.1 ◽

2017 ◽

Vol 6 ◽

pp. 319

Author(s):

Erin K. Wagner ◽

Satyajeet Raje ◽

Liz Amos ◽

Jessica Kurata ◽

Abhijit S. Badve ◽

...

Keyword(s):

Genomic Data ◽

The Cancer Genome Atlas ◽

Genomic Research ◽

Reproducible Research ◽

Software Pipeline ◽

Individual Level ◽

Related Data ◽

Cancer Genome Atlas ◽

Existing Data ◽

Genome Atlas

Data sharing is critical to advance genomic research by reducing the demand to collect new data by reusing and combining existing data and by promoting reproducible research. The Cancer Genome Atlas (TCGA) is a popular resource for individual-level genotype-phenotype cancer related data. The Database of Genotypes and Phenotypes (dbGaP) contains many datasets similar to those in TCGA. We have created a software pipeline that will allow researchers to discover relevant genomic data from dbGaP, based on matching TCGA metadata. The resulting research provides an easy to use tool to connect these two data sources.

Download Full-text

Pengajaran Semantik pada Mahasiswa IKIP PGRI Pontianak

Jurnal Penelitian dan Pengembangan Sains dan Humaniora ◽

10.23887/jppsh.v4i1.24389 ◽

2020 ◽

Vol 4 (1) ◽

pp. 37

Author(s):

Mai Yuliastri Simarmata

Keyword(s):

Data Analysis ◽

Direct Observation ◽

Learning Objectives ◽

Learning Objective ◽

Learning Activities ◽

Student Research ◽

Direct Communication ◽

Communication Techniques ◽

Analysis Technique ◽

Essay Test

This study aims to determine the process of semantic teaching in IKIP PGRI Pontianak students. The technique used in this research is direct observation and direct communication techniques. While the tools used are obesservation guidelines and interview guidelines. The processing and analyzing of the data is carried out through reflection activities. Data collected through essay test related to synonymy and antonym. The data analysis technique was carried out in stages (1) checking the result of observation carefully (2) determing the suitability of the formulation of learning objectives made by lecturers in teaching listening subject related to meaning relations (3) determing the suitability of the material with the learning objective 94) analyzing the suitability of learning activities conducted by the lecturer (5) Analysis of interview data by the lecture (6) The results of IKIP PGRI Pontianak student research in general can distinguish the relation of meaning related to synonymy and antonym.

Download Full-text

Progress in the R ecosystem for representing and handling spatial data

Journal of Geographical Systems ◽

10.1007/s10109-020-00336-0 ◽

2020 ◽

Cited By ~ 1

Author(s):

Roger S. Bivand

Keyword(s):

New York ◽

Data Analysis ◽

Open Source ◽

Spatial Data ◽

Spatial Data Analysis ◽

Data Handling ◽

Good Match ◽

R Programming Language ◽

R Packages ◽

R Programming

Abstract Twenty years have passed since Bivand and Gebhardt (J Geogr Syst 2(3):307–317, 2000. 10.1007/PL00011460) indicated that there was a good match between the then nascent open-source R programming language and environment and the needs of researchers analysing spatial data. Recalling the development of classes for spatial data presented in book form in Bivand et al. (Applied spatial data analysis with R. Springer, New York, 2008, Applied spatial data analysis with R, 2nd edn. Springer, New York, 2013), it is important to present the progress now occurring in representation of spatial data, and possible consequences for spatial data handling and the statistical analysis of spatial data. Beyond this, it is imperative to discuss the relationships between R-spatial software and the larger open-source geospatial software community on whose work R packages crucially depend.

Download Full-text

Integrated open-source software for multiscale electrophysiology

Scientific Data ◽

10.1038/s41597-019-0242-z ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 2

Author(s):

Konstantinos Nasiotis ◽

Martin Cousineau ◽

François Tadel ◽

Adrien Peyrache ◽

Richard M. Leahy ◽

...

Keyword(s):

Data Analysis ◽

Open Source ◽

Open Source Software ◽

Analysis Data ◽

Open Science ◽

Dense Array ◽

Unrestricted Access ◽

Research Transparency ◽

Programming Skills ◽

User Friendly

Abstract The methods for electrophysiology in neuroscience have evolved tremendously over the recent years with a growing emphasis on dense-array signal recordings. Such increased complexity and augmented wealth in the volume of data recorded, have not been accompanied by efforts to streamline and facilitate access to processing methods, which too are susceptible to grow in sophistication. Moreover, unsuccessful attempts to reproduce peer-reviewed publications indicate a problem of transparency in science. This growing problem could be tackled by unrestricted access to methods that promote research transparency and data sharing, ensuring the reproducibility of published results. Here, we provide a free, extensive, open-source software that provides data-analysis, data-management and multi-modality integration solutions for invasive neurophysiology. Users can perform their entire analysis through a user-friendly environment without the need of programming skills, in a tractable (logged) way. This work contributes to open-science, analysis standardization, transparency and reproducibility in invasive neurophysiology.

Download Full-text

ColiCoords: A Python package for the analysis of bacterial fluorescence microscopy data

10.1101/608109 ◽

2019 ◽

Author(s):

Jochem H. Smit ◽

Yichen Li ◽

Eliza M. Warszawik ◽

Andreas Herrmann ◽

Thorben Cordes

Keyword(s):

Data Analysis ◽

Fluorescence Microscopy ◽

Single Molecule ◽

Statistical Significance ◽

Reproducible Research ◽

Cellular Processes ◽

Analysis Package ◽

Microscopy Data ◽

The Cost ◽

Single Molecule Localization Microscopy

AbstractSingle-molecule fluorescence microscopy studies of bacteria provide unique insights into the mechanisms of cellular processes and protein machineries in ways that are unrivalled by any other technique. With the cost of microscopes dropping and the availability of fully automated microscopes, the volume of microscopy data produced has increased tremendously. These developments have moved the bottleneck of throughput from image acquisition and sample preparation to data analysis. Furthermore, requirements for analysis procedures have become more stringent given the requirement of various journals to make data and analysis procedures available. To address this we have developed a new data analysis package for analysis of fluorescence microscopy data of rod-like cells. Our software ColiCoords structures microscopy data at the single-cell level and implements a coordinate system describing each cell. This allows for the transformation of Cartesian coordinates of both cellular images (e.g. from transmission light or fluorescence microscopy) and single-molecule localization microscopy (SMLM) data to cellular coordinates. Using this transformation, many cells can be combined to increase the statistical significance of fluorescence microscopy datasets of any kind. Coli-Coords is open source, implemented in the programming language Python, and is extensively documented. This allows for modifications for specific needs or to inspect and publish data analysis procedures. By providing a format that allows for easy sharing of code and associated data, we intend to promote open and reproducible research.The source code and documentation can be found via the project’s GitHub page.

Download Full-text

PENERAPAN MODEL PEMBELAJARAN THINK TALK WRITE UNTUK MENINGKATKAN PEMAHAMAN KONSEP MAHASISWA PADA MATAKULIAH BIOLOGI SEL

Bioilmi Jurnal Pendidikan ◽

10.19109/bioilmi.v3i2.1400 ◽

2017 ◽

Vol 3 (2) ◽

pp. 94-99

Author(s):

Nuril Hidayati

Keyword(s):

Data Analysis ◽

Action Research ◽

Learning Experiences ◽

Learning Objectives ◽

Reflection Data ◽

Data Analysis Technique ◽

Analysis Technique ◽

Planning Implementation ◽

Learning Competencies ◽

Learning By Using

Achievement of learning objectives and learning competencies can be possessed by students through a process of learning experiences that are meaningful for students. Low learning results one of them caused by misunderstanding of the concept of matter. Understanding the concept of students can be improved in learning by using the model of learning think, talk, write. This study is a classroom action research consisting of two cycles, each cycle consisting of planning, implementation, observation, and reflection. Data analysis technique is done by calculating from the value obtained from the completeness of individual and classical. The result of the analysis shows that there is an increase of learning result 23,32% from cycle 1 to cycle 2.

Download Full-text

Big Data Analysis with R Programming and RHadoop

International Journal of Trend in Scientific Research and Development ◽

10.31142/ijtsrd15705 ◽

2018 ◽

Vol Volume-2 (Issue-4) ◽

pp. 2623-2627

Author(s):

U. Prathibha ◽

M. Thillainayaki ◽

A. Jenneth ◽

Keyword(s):

Big Data ◽

Data Analysis ◽

Big Data Analysis ◽

R Programming

Download Full-text

Soil Data Analysis and Crop Yield Prediction in Data Mining using R – Programming

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8683.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 1857-1860

Keyword(s):

Data Mining ◽

Data Analysis ◽

Decision Tree ◽

Crop Yield ◽

Climatic Condition ◽

Research Work ◽

Yield Prediction ◽

Decision Tree Algorithm ◽

Data Set ◽

R Programming

Data mining is better choices in emerging research filed- soil data analysis. crop yield prediction is an important issue for selecting the crop. earlier prediction of crop is done by the experience of farmer on a particular type of field and crop. predicting the crop is done by the farmer’s experience based on the factors like soil types, climatic condition, seasons, and weather, rainfall and irrigation facilities. data mining techniques is the better choice for predicting the crop. the analysis of soil plays an important role in agricultural filed. soil fertility prediction is one of the very important factors in agriculture this research work implements to predict yield of crop, decision tree algorithm is used to find yield. the aim of this research to pinpoint the accuracy and to finding the yield of the crop using decision tree and c 4.5 algorithm is used to predict the yield of crop using rprogramming and also to find range of magnesium found in the collected soil data set. this prediction will be very useful for the farmer to predict the crop yield for cultivation

Download Full-text