scholarly journals Developing and deploying an integrated workshop curriculum teaching computational skills for reproducible research

2021 ◽  
Author(s):  
Zena Lapp ◽  
Kelly L Sovacool ◽  
Nicholas A Lesniak ◽  
Dana King ◽  
Catherine Barnier ◽  
...  

Inspired by well-established material and pedagogy provided by The Carpentries, we developed a two-day workshop curriculum that teaches introductory R programming for managing, analyzing, plotting and reporting data using packages from the tidyverse, the Unix shell, version control with git, and GitHub. While the official Software Carpentry curriculum is comprehensive, we found that it contains too much content for a two-day workshop. We also felt that the independent nature of the lessons left learners confused about how to integrate the newly acquired programming skills in their own work. Thus, we developed a new curriculum (https://umcarpentries.org/intro-curriculum-r/) that aims to teach novices how to implement reproducible research principles in their own data analysis. The curriculum integrates live coding lessons with individual-level and group-based practice exercises, and also serves as a succinct resource that learners can reference both during and after the workshop. Moreover, it lowers the entry barrier for new instructors as they do not have to develop their own teaching materials or sift through extensive content. We developed this curriculum during a two-day sprint, successfully used it to host a two-day virtual workshop with almost 40 participants, and updated the material based on instructor and learner feedback. We hope that our new curriculum will prove useful to future instructors interested in teaching workshops with similar learning objectives.

2019 ◽  
Vol 32 (12) ◽  
pp. 1574-1576 ◽  
Author(s):  
Austin G. McCoy ◽  
Zachary Noel ◽  
Adam H. Sparks ◽  
Martin Chilvers

Phytophthora sojae is a significant pathogen of soybean worldwide. Pathotype surveys for Phytophthora sojae are conducted to monitor resistance gene efficacy and determine if new resistance genes are needed. Valuable measurements for pathotype analysis include the distribution of susceptible reactions, pathotype complexity, pathotype frequency, and diversity indices for pathotype distributions. Previously the Habgood-Gilmour Spreadsheet (HaGiS), written in Microsoft Excel, was used for data analysis. However, the growing popularity of the R programming language in plant pathology and desire for reproducible research made HaGiS a prime candidate for conversion into an R package. Here we report on the development and use of an R package, hagis, that can be used to produce all outputs from the HaGiS Excel sheet for P. sojae or other gene-for-gene pathosystem studies.


Author(s):  
Andy Hector

Statistics is a fundamental component of the scientific toolbox, but learning the basics of this area of mathematics is one of the most challenging parts of a research training. This book gives an up-to-date introduction to the classical techniques and modern extensions of linear-model analysis—one of the most useful approaches in the analysis of scientific data in the life and environmental sciences. The book emphasizes an estimation-based approach that takes account of recent criticisms of overuse of probability values and introduces the alternative approach using information criteria. The book is based on the use of the open-source R programming language for statistics and graphics, which is rapidly becoming the lingua franca in many areas of science. This second edition adds new chapters, including one discussing some of the complexities of linear-model analysis and another introducing reproducible research documents using the R Markdown package. Statistics is introduced through worked analyses performed in R using interesting data sets from ecology, evolutionary biology, and environmental science. The data sets and R scripts are available as supporting material.


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 319
Author(s):  
Erin K. Wagner ◽  
Satyajeet Raje ◽  
Liz Amos ◽  
Jessica Kurata ◽  
Abhijit S. Badve ◽  
...  

Data sharing is critical to advance genomic research by reducing the demand to collect new data by reusing and combining existing data and by promoting reproducible research. The Cancer Genome Atlas (TCGA) is a popular resource for individual-level genotype-phenotype cancer related data. The Database of Genotypes and Phenotypes (dbGaP) contains many datasets similar to those in TCGA. We have created a software pipeline that will allow researchers to discover relevant genomic data from dbGaP, based on matching TCGA metadata. The resulting research provides an easy to use tool to connect these two data sources.


2020 ◽  
Vol 4 (1) ◽  
pp. 37
Author(s):  
Mai Yuliastri Simarmata

This study aims to determine the process of semantic teaching in IKIP PGRI Pontianak students. The technique used in this research is direct observation and direct communication techniques. While the tools used are obesservation guidelines and interview guidelines. The processing and analyzing of the data is carried out through reflection activities. Data collected through essay test related to synonymy and antonym. The data analysis technique was carried out in stages (1) checking the result of observation carefully (2) determing the suitability of the formulation of learning objectives made by lecturers in teaching listening subject related to meaning relations (3) determing the  suitability of the material with the learning objective 94) analyzing the suitability of learning activities conducted by the lecturer (5) Analysis of interview data by the lecture (6) The results of IKIP PGRI Pontianak student research in general can distinguish the relation of meaning related to synonymy and antonym.


Author(s):  
Roger S. Bivand

Abstract Twenty years have passed since Bivand and Gebhardt (J Geogr Syst 2(3):307–317, 2000. 10.1007/PL00011460) indicated that there was a good match between the then nascent open-source R programming language and environment and the needs of researchers analysing spatial data. Recalling the development of classes for spatial data presented in book form in Bivand et al. (Applied spatial data analysis with R. Springer, New York, 2008, Applied spatial data analysis with R, 2nd edn. Springer, New York, 2013), it is important to present the progress now occurring in representation of spatial data, and possible consequences for spatial data handling and the statistical analysis of spatial data. Beyond this, it is imperative to discuss the relationships between R-spatial software and the larger open-source geospatial software community on whose work R packages crucially depend.


2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Konstantinos Nasiotis ◽  
Martin Cousineau ◽  
François Tadel ◽  
Adrien Peyrache ◽  
Richard M. Leahy ◽  
...  

Abstract The methods for electrophysiology in neuroscience have evolved tremendously over the recent years with a growing emphasis on dense-array signal recordings. Such increased complexity and augmented wealth in the volume of data recorded, have not been accompanied by efforts to streamline and facilitate access to processing methods, which too are susceptible to grow in sophistication. Moreover, unsuccessful attempts to reproduce peer-reviewed publications indicate a problem of transparency in science. This growing problem could be tackled by unrestricted access to methods that promote research transparency and data sharing, ensuring the reproducibility of published results. Here, we provide a free, extensive, open-source software that provides data-analysis, data-management and multi-modality integration solutions for invasive neurophysiology. Users can perform their entire analysis through a user-friendly environment without the need of programming skills, in a tractable (logged) way. This work contributes to open-science, analysis standardization, transparency and reproducibility in invasive neurophysiology.


2019 ◽  
Author(s):  
Jochem H. Smit ◽  
Yichen Li ◽  
Eliza M. Warszawik ◽  
Andreas Herrmann ◽  
Thorben Cordes

AbstractSingle-molecule fluorescence microscopy studies of bacteria provide unique insights into the mechanisms of cellular processes and protein machineries in ways that are unrivalled by any other technique. With the cost of microscopes dropping and the availability of fully automated microscopes, the volume of microscopy data produced has increased tremendously. These developments have moved the bottleneck of throughput from image acquisition and sample preparation to data analysis. Furthermore, requirements for analysis procedures have become more stringent given the requirement of various journals to make data and analysis procedures available. To address this we have developed a new data analysis package for analysis of fluorescence microscopy data of rod-like cells. Our software ColiCoords structures microscopy data at the single-cell level and implements a coordinate system describing each cell. This allows for the transformation of Cartesian coordinates of both cellular images (e.g. from transmission light or fluorescence microscopy) and single-molecule localization microscopy (SMLM) data to cellular coordinates. Using this transformation, many cells can be combined to increase the statistical significance of fluorescence microscopy datasets of any kind. Coli-Coords is open source, implemented in the programming language Python, and is extensively documented. This allows for modifications for specific needs or to inspect and publish data analysis procedures. By providing a format that allows for easy sharing of code and associated data, we intend to promote open and reproducible research.The source code and documentation can be found via the project’s GitHub page.


2017 ◽  
Vol 3 (2) ◽  
pp. 94-99
Author(s):  
Nuril Hidayati

Achievement of learning objectives and learning competencies can be possessed by students through a process of learning experiences that are meaningful for students. Low learning results one of them caused by misunderstanding of the concept of matter. Understanding the concept of students can be improved in learning by using the model of learning think, talk, write. This study is a classroom action research consisting of two cycles, each cycle consisting of planning, implementation, observation, and reflection. Data analysis technique is done by calculating from the value obtained from the completeness of individual and classical. The result of the analysis shows that there is an increase of learning result 23,32% from cycle 1 to cycle 2.


2018 ◽  
Vol Volume-2 (Issue-4) ◽  
pp. 2623-2627
Author(s):  
U. Prathibha ◽  
M. Thillainayaki ◽  
A. Jenneth ◽  

Data mining is better choices in emerging research filed- soil data analysis. crop yield prediction is an important issue for selecting the crop. earlier prediction of crop is done by the experience of farmer on a particular type of field and crop. predicting the crop is done by the farmer’s experience based on the factors like soil types, climatic condition, seasons, and weather, rainfall and irrigation facilities. data mining techniques is the better choice for predicting the crop. the analysis of soil plays an important role in agricultural filed. soil fertility prediction is one of the very important factors in agriculture this research work implements to predict yield of crop, decision tree algorithm is used to find yield. the aim of this research to pinpoint the accuracy and to finding the yield of the crop using decision tree and c 4.5 algorithm is used to predict the yield of crop using rprogramming and also to find range of magnesium found in the collected soil data set. this prediction will be very useful for the farmer to predict the crop yield for cultivation


Sign in / Sign up

Export Citation Format

Share Document