A First Course in Statistical Programming with R

Mapping Intimacies ◽

10.1017/9781108993456 ◽

2021 ◽

Author(s):

W. John Braun ◽

Duncan J. Murdoch

Keyword(s):

Data Science ◽

Worked Examples ◽

Design Principles ◽

Complex Projects ◽

Statistical Programming ◽

Programming Skills ◽

Downloadable Code ◽

Metropolis Hastings Algorithms

This third edition of Braun and Murdoch's bestselling textbook now includes discussion of the use and design principles of the tidyverse packages in R, including expanded coverage of ggplot2, and R Markdown. The expanded simulation chapter introduces the Box–Muller and Metropolis–Hastings algorithms. New examples and exercises have been added throughout. This is the only introduction you'll need to start programming in R, the computing standard for analyzing data. This book comes with real R code that teaches the standards of the language. Unlike other introductory books on the R system, this book emphasizes portable programming skills that apply to most computing languages and techniques used to develop more complex projects. Solutions, datasets, and any errata are available from www.statprogr.science. Worked examples - from real applications - hundreds of exercises, and downloadable code, datasets, and solutions make a complete package for anyone working in or learning practical data science.

Download Full-text

The R package “eseis” – a software toolbox for environmental seismology

Earth Surface Dynamics ◽

10.5194/esurf-6-669-2018 ◽

2018 ◽

Vol 6 (3) ◽

pp. 669-686 ◽

Cited By ~ 8

Author(s):

Michael Dietze

Keyword(s):

Data Science ◽

Earth Science ◽

Science Research ◽

Worked Examples ◽

R Package ◽

Research Field ◽

Data Sets ◽

Research Fields ◽

Scientific Disciplines ◽

Analysis Environment

Abstract. Environmental seismology is the study of the seismic signals emitted by Earth surface processes. This emerging research field is at the intersection of seismology, geomorphology, hydrology, meteorology, and further Earth science disciplines. It amalgamates a wide variety of methods from across these disciplines and ultimately fuses them in a common analysis environment. This overarching scope of environmental seismology requires a coherent yet integrative software which is accepted by many of the involved scientific disciplines. The statistic software R has gained paramount importance in the majority of data science research fields. R has well-justified advances over other mostly commercial software, which makes it the ideal language to base a comprehensive analysis toolbox on. The article introduces the avenues and needs of environmental seismology, and how these are met by the R package eseis. The conceptual structure, example data sets, and available functions are demonstrated. Worked examples illustrate possible applications of the package and in-depth descriptions of the flexible use of the functions. The package has a registered DOI, is available under the GPL licence on the Comprehensive R Archive Network (CRAN), and is maintained on GitHub.

Download Full-text

What more than a hundred project groups reveal about teaching visualization

Journal of Visualization ◽

10.1007/s12650-020-00659-6 ◽

2020 ◽

Vol 23 (5) ◽

pp. 895-911 ◽

Cited By ~ 1

Author(s):

Michael Burch ◽

Elisabeth Melby

Keyword(s):

Programming Languages ◽

Information Visualization ◽

Teaching Strategies ◽

Data Science ◽

Major Goal ◽

Wide Range ◽

Project Groups ◽

Small Project ◽

Programming Skills ◽

The Given

Abstract The growing number of students can be a challenge for teaching visualization lectures, supervision, evaluation, and grading. Moreover, designing visualization courses by matching the different experiences and skills of the students is a major goal in order to find a common solvable task for all of them. Particularly, the given task is important to follow a common project goal, to collaborate in small project groups, but also to further experience, learn, or extend programming skills. In this article, we survey our experiences from teaching 116 student project groups of 6 bachelor courses on information visualization with varying topics. Moreover, two teaching strategies were tried: 2 courses were held without lectures and assignments but with weekly scrum sessions (further denoted by TS1) and 4 courses were guided by weekly lectures and assignments (further denoted by TS2). A total number of 687 students took part in all of these 6 courses. Managing the ever growing number of students in computer and data science is a big challenge in these days, i.e., the students typically apply a design-based active learning scenario while being supported by weekly lectures, assignments, or scrum sessions. As a major outcome, we identified a regular supervision either by lectures and assignments or by regular scrum sessions as important due to the fact that the students were relatively unexperienced bachelor students with a wide range of programming skills, but nearly no visualization background. In this article, we explain different subsequent stages to successfully handle the upcoming problems and describe how much supervision was involved in the development of the visualization project. The project task description is given in a way that it has a minimal number of requirements but can be extended in many directions while most of the decisions are up to the students like programming languages, visualization approaches, or interaction techniques. Finally, we discuss the benefits and drawbacks of both teaching strategies. Graphic abstract

Download Full-text

The R package eseis – a comprehensive software toolbox for environmental seismology

10.5194/esurf-2017-75 ◽

2018 ◽

Author(s):

Michael Dietze

Keyword(s):

Data Science ◽

Earth Science ◽

Science Research ◽

Worked Examples ◽

R Package ◽

Research Field ◽

Data Sets ◽

Research Fields ◽

Scientific Disciplines ◽

Analysis Environment

Abstract. Environmental seismology is the study of the seismic signals emitted by Earth surface processes. This emerging research field is at the seams of seismology, geomorphology, hydrology, meteorology, and further Earth science disciplines. It amalgamates a wide variety of methods from across these disciplines and, ultimately, fuses them in a common analysis environment. This overarching scope of environmental seismology asks for a coherent, yet integrative software, which is accepted by many of the involved scientific disciplines. The statistic software R has gained paramount importance in the majority of data science research fields. R has well justified advances over other, mostly commercial software, which makes it the ideal language to base a comprehensive analysis toolbox on. The article introduces the avenues and needs of environmental seismology, and how these are met by the R package eseis. The conceptual structure, example data sets and available functions are demonstrated. Worked examples illustrate possible applications of the package and in depth descriptions of the flexible use of the functions. The package is available under the GPL license on the Comprehensive R Archive Network (CRAN) and maintained on Github.

Download Full-text

The Neurodata Without Borders ecosystem for neurophysiological data science

10.1101/2021.03.13.435173 ◽

2021 ◽

Author(s):

Oliver Ruebel ◽

Andrew J. Tritt ◽

Ryan Ly ◽

Benjamin K. Dichter ◽

Satrajit S Ghosh ◽

...

Keyword(s):

Data Management ◽

Data Science ◽

Design Principles ◽

Standard Language ◽

Design And Implementation ◽

Neurophysiological Data ◽

The Brain ◽

Unified Description

The neurophysiology of cells and tissues are monitored electrophysiologically and optically in diverse experiments and species, ranging from flies to humans. Understanding the brain requires integration of data across this diversity, and thus these data must be findable, accessible, interoperable, and reusable (FAIR). This requires a standard language for data and metadata that can coevolve with neuroscience. We describe design and implementation principles for a language for neurophysiology data. Our software (Neurodata Without Borders, NWB) defines and modularizes the interdependent, yet separable, components of a data language. We demonstrate NWB's impact through unified description of neurophysiology data across diverse modalities and species. NWB exists in an ecosystem which includes data management, analysis, visualization, and archive tools. Thus, the NWB data language enables reproduction, interchange, and reuse of diverse neurophysiology data. More broadly, the design principles of NWB are generally applicable to enhance discovery across biology through data FAIRness.

Download Full-text

The BHEF National Higher Education and Workforce Initiative

Industry and Higher Education ◽

10.5367/ihe.2014.0224 ◽

2014 ◽

Vol 28 (5) ◽

pp. 371-378 ◽

Cited By ~ 3

Author(s):

Brian K. Fitzgerald ◽

Steve Barkanic ◽

Isabel Cardenas-Navia ◽

Karen Elzey ◽

Debbie Hughes ◽

...

Keyword(s):

Higher Education ◽

Risk Management ◽

Data Science ◽

Design Principles ◽

Mobile Technologies ◽

Strategic Goals ◽

Energy Risk Management

Partnerships between higher education and business have long been an important part of the academic landscape, but often they are based on shorter-term transactional objectives rather than on longer-term strategic goals. BHEF's National Higher Education and Workforce Initiative brings together business and academia at the institutional, regional and national levels to create sustainable new opportunities for undergraduates to learn about emerging fields such as data science and analytics, cybersecurity, energy, risk management, and social and mobile technologies through direct engagement with the companies working in these areas. These partnerships are built on a base of evidence, strategic business engagement and design principles that aim to align needs with existing and enhanced capacity.

Download Full-text

The experience of teaching introductory programming skills to bioscientists in Brazil

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009534 ◽

2021 ◽

Vol 17 (11) ◽

pp. e1009534

Author(s):

Luíza Zuvanov ◽

Ana Letycia Basso Garcia ◽

Fernando Henrique Correr ◽

Rodolfo Bizarria ◽

Ailton Pereira da Costa Filho ◽

...

Keyword(s):

Programming Language ◽

Undergraduate Students ◽

Data Science ◽

Scientific Discipline ◽

Biological Data ◽

Workshop Model ◽

Computer Scientists ◽

Programming Skills ◽

Online Format

Computational biology has gained traction as an independent scientific discipline over the last years in South America. However, there is still a growing need for bioscientists, from different backgrounds, with different levels, to acquire programming skills, which could reduce the time from data to insights and bridge communication between life scientists and computer scientists. Python is a programming language extensively used in bioinformatics and data science, which is particularly suitable for beginners. Here, we describe the conception, organization, and implementation of the Brazilian Python Workshop for Biological Data. This workshop has been organized by graduate and undergraduate students and supported, mostly in administrative matters, by experienced faculty members since 2017. The workshop was conceived for teaching bioscientists, mainly students in Brazil, on how to program in a biological context. The goal of this article was to share our experience with the 2020 edition of the workshop in its virtual format due to the Coronavirus Disease 2019 (COVID-19) pandemic and to compare and contrast this year’s experience with the previous in-person editions. We described a hands-on and live coding workshop model for teaching introductory Python programming. We also highlighted the adaptations made from in-person to online format in 2020, the participants’ assessment of learning progression, and general workshop management. Lastly, we provided a summary and reflections from our personal experiences from the workshops of the last 4 years. Our takeaways included the benefits of the learning from learners’ feedback (LLF) that allowed us to improve the workshop in real time, in the short, and likely in the long term. We concluded that the Brazilian Python Workshop for Biological Data is a highly effective workshop model for teaching a programming language that allows bioscientists to go beyond an initial exploration of programming skills for data analysis in the medium to long term.

Download Full-text

Ambalytics: A Scalable and Distributed System Architecture Concept for Bibliometric Network Analyses

Future Internet ◽

10.3390/fi13080203 ◽

2021 ◽

Vol 13 (8) ◽

pp. 203

Author(s):

Klaus Kammerer ◽

Manuel Göster ◽

Manfred Reichert ◽

Rüdiger Pryss

Keyword(s):

Bibliometric Analysis ◽

Domain Knowledge ◽

Data Science ◽

Deep Understanding ◽

Research Area ◽

Network Analyses ◽

Statistical Knowledge ◽

Academic Publications ◽

Programming Skills ◽

Analysis Platform

A deep understanding about a field of research is valuable for academic researchers. In addition to technical knowledge, this includes knowledge about subareas, open research questions, and social communities (networks) of individuals and organizations within a given field. With bibliometric analyses, researchers can acquire quantitatively valuable knowledge about a research area by using bibliographic information on academic publications provided by bibliographic data providers. Bibliometric analyses include the calculation of bibliometric networks to describe affiliations or similarities of bibliometric entities (e.g., authors) and group them into clusters representing subareas or communities. Calculating and visualizing bibliometric networks is a nontrivial and time-consuming data science task that requires highly skilled individuals. In addition to domain knowledge, researchers must often provide statistical knowledge and programming skills or use software tools having limited functionality and usability. In this paper, we present the ambalytics bibliometric platform, which reduces the complexity of bibliometric network analysis and the visualization of results. It accompanies users through the process of bibliometric analysis and eliminates the need for individuals to have programming skills and statistical knowledge, while preserving advanced functionality, such as algorithm parameterization, for experts. As a proof-of-concept, and as an example of bibliometric analyses outcomes, the calculation of research fronts networks based on a hybrid similarity approach is shown. Being designed to scale, ambalytics makes use of distributed systems concepts and technologies. It is based on the microservice architecture concept and uses the Kubernetes framework for orchestration. This paper presents the initial building block of a comprehensive bibliometric analysis platform called ambalytics, which aims at a high usability for users as well as scalability.

Download Full-text

A peek in the micro-sized world: a review of design principles, engineering tools, and applications of engineered microbial community

Biochemical Society Transactions ◽

10.1042/bst20190172 ◽

2020 ◽

Vol 48 (2) ◽

pp. 399-409

Author(s):

Baizhen Gao ◽

Rushant Sabnis ◽

Tommaso Costantini ◽

Robert Jinkerson ◽

Qing Sun

Keyword(s):

Microbial Community ◽

Microbial Communities ◽

Environmental Remediation ◽

Real Life ◽

Design Principles ◽

Microbial Interactions ◽

Potential Applications ◽

New Gene ◽

Global Biogeochemical Cycles ◽

Synthetic Microbial Communities

Microbial communities drive diverse processes that impact nearly everything on this planet, from global biogeochemical cycles to human health. Harnessing the power of these microorganisms could provide solutions to many of the challenges that face society. However, naturally occurring microbial communities are not optimized for anthropogenic use. An emerging area of research is focusing on engineering synthetic microbial communities to carry out predefined functions. Microbial community engineers are applying design principles like top-down and bottom-up approaches to create synthetic microbial communities having a myriad of real-life applications in health care, disease prevention, and environmental remediation. Multiple genetic engineering tools and delivery approaches can be used to ‘knock-in' new gene functions into microbial communities. A systematic study of the microbial interactions, community assembling principles, and engineering tools are necessary for us to understand the microbial community and to better utilize them. Continued analysis and effort are required to further the current and potential applications of synthetic microbial communities.

Download Full-text