CiDAEN: An Online Data Science Course

Abstract Objective To discuss and illustrate the utility of two open collaborative data science platforms, and how they would benefit data science and informatics education. Methods and Materials The features of two online data science platforms are outlined. Both are useful for new data projects and both are integrated with common programming languages used for data analysis. One platform focuses more on data exploration and the other focuses on containerizing, visualization, and sharing code repositories. Results Both data science platforms are open, free, and allow for collaboration. Both are capable of visual, descriptive, and predictive analytics Discussion Data science education benefits by having affordable open and collaborative platforms to conduct a variety of data analyses. Conclusion Open collaborative data science platforms are particularly useful for teaching data science skills to clinical and nonclinical informatics students. Commercial data science platforms exist but are cost-prohibitive and generally limited to specific programming languages.

Download Full-text

Big Data is Too Small: Research Implications of Class Inequality for Online Data Collection

10.31235/osf.io/zm6xy ◽

2018 ◽

Author(s):

Jen Schradie

Keyword(s):

Big Data ◽

Data Science ◽

Digital Data ◽

The Internet ◽

Sociological Research ◽

Marginalized Populations ◽

Online Data ◽

Persistent Problem ◽

Current State ◽

Using Data

With a growing interest in data science and online analytics, researchers are increasingly using data derived from the Internet. Whether for qualitative or quantitative analysis, online data, including “Big Data,” can often exclude marginalized populations, especially those from the poor and working class, as the digital divide remains a persistent problem. This methodological commentary on the current state of digital data and methods disentangles the hype from the reality of digitally produced data for sociological research. In the process, it offers strategies to address the weaknesses of data that is derived from the Internet in order to represent marginalized populations.

Download Full-text

Advancing Open and Reproducible Water Data Science by Integrating Data Analytics with an Online Data Repository

10.1002/essoar.10509223.1 ◽

2021 ◽

Author(s):

Jeffery Horsburgh ◽

Scott Black ◽

Anthony Castronova

Keyword(s):

Data Analytics ◽

Data Science ◽

Data Repository ◽

Online Data

Download Full-text

Exploring Data with CODAP

Mathematics Teacher ◽

10.5951/mathteacher.112.6.0473 ◽

2019 ◽

Vol 112 (6) ◽

pp. 473-476 ◽

Cited By ~ 2

Author(s):

Gemma F. Mojica ◽

Christina N. Azmy ◽

Hollylynne S. Lee

Keyword(s):

Science Education ◽

Data Science ◽

Statistics Education ◽

The Internet ◽

Online Data ◽

Web Browser ◽

Web Based ◽

Teachers And Students ◽

And Mathematics ◽

Analysis Platform

Concord Consortium's Common Online Data Analysis Platform (CODAP), a free Web-based data tool designed for students in grades 6-12 and higher, is continuously being updated and developed for diverse projects in data science, science education, and mathematics/statistics education (https://codap.concord.org/). Teachers and students can access CODAP without downloading software or registering for accounts. Although some Web-based technology tools provide certain features for free and require users to pay a fee to use additional features, CODAP has no hidden costs. Devices need only be connected to the Internet using an updated Web browser (Chrome is preferred). CODAP is not optimized (yet) for use on such touchscreen devices as tablets or iPads®.

Download Full-text

The democratization of data science education

10.7287/peerj.preprints.3195v1 ◽

2017 ◽

Cited By ~ 1

Author(s):

Sean Kross ◽

Roger D Peng ◽

Brian S Caffo ◽

Ira Gooding ◽

Jeffrey T Leek

Keyword(s):

Machine Learning ◽

Science Education ◽

Data Analysis ◽

Data Science ◽

Online Data ◽

The Past ◽

The Us ◽

Science Curricula ◽

The Impact ◽

And Training

Over the last three decades data has become ubiquitous and cheap. This transition has accelerated over the last five years and training in statistics, machine learning, and data analysis have struggled to keep up. In April 2014 we launched a program of nine courses, the Johns Hopkins Data Science Specialization, which has now had more than 4 million enrollments over the past three years. Here the program is described and compared to both standard and more recently developed data science curricula. We show that novel pedagogical and administrative decisions introduced in our program are now standard in online data science programs. The impact of the Data Science Specialization on data science education in the US is also discussed. Finally we conclude with some thoughts about the future of data science education in a data democratized world.

Download Full-text

The democratization of data science education

10.7287/peerj.preprints.3195 ◽

2017 ◽

Cited By ~ 1

Author(s):

Sean Kross ◽

Roger D Peng ◽

Brian S Caffo ◽

Ira Gooding ◽

Jeffrey T Leek

Keyword(s):

Machine Learning ◽

Science Education ◽

Data Analysis ◽

Data Science ◽

Online Data ◽

The Past ◽

The Us ◽

Science Curricula ◽

The Impact ◽

And Training

Over the last three decades data has become ubiquitous and cheap. This transition has accelerated over the last five years and training in statistics, machine learning, and data analysis have struggled to keep up. In April 2014 we launched a program of nine courses, the Johns Hopkins Data Science Specialization, which has now had more than 4 million enrollments over the past three years. Here the program is described and compared to both standard and more recently developed data science curricula. We show that novel pedagogical and administrative decisions introduced in our program are now standard in online data science programs. The impact of the Data Science Specialization on data science education in the US is also discussed. Finally we conclude with some thoughts about the future of data science education in a data democratized world.

Download Full-text

Facilitating API lookup for novices learning data wrangling using thumbnail graphics

Foundations of Data Science ◽

10.3934/fods.2021032 ◽

2021 ◽

Vol 0 (0) ◽

pp. 0

Author(s):

Lovisa Sundin ◽

Nourhan Sakr ◽

Juho Leinonen ◽

Quintin Cutts

Keyword(s):

Data Science ◽

Application Programming Interface ◽

Online Data ◽

Negative Results ◽

Application Programming ◽

And Performance ◽

Learner Activity ◽

Programming Interface ◽

Learning Data

<p style='text-indent:20px;'>With the rising demand for data science skills, the ability to wrangle data programmatically becomes a crucial barrier. In this paper, we discuss the centrality of API (application programming interface) lookup to data wrangling, and how an ontology-structured command menu could facilitate it. We design thumbnail graphics as visual alternatives to explaining data wrangling operations and use a survey to validate their quality. We furthermore predict that thumbnail graphics make the menu more navigable, improving lookup efficiency and performance. Our predictions are tested using Slice N Dice, an online data wrangling tutorial platform that collects learner activity. It includes both non-programmatic and programmatic data wrangling exercises. Participants from a multi-institutional sample (<i>n</i> = 200) were randomly assigned the tutorial either with or without thumbnail graphics. Our results show that thumbnail graphics reduce the need for clarifications, thereby assisting API lookup for novices learning data wrangling. We further present some negative results regarding performance gain and follow up with a discussion on why the differences are subtle and how they can be improved. Last but not least, we complement our statistical results with a qualitative study where we receive positive feedback from our participants on the design and helpfulness of the thumbnail graphics.</p>

Download Full-text

Model-Based Clustering and Classification for Data Science

10.1017/9781108644181 ◽

2019 ◽

Cited By ~ 17

Author(s):

Charles Bouveyron ◽

Gilles Celeux ◽

T. Brendan Murphy ◽

Adrian E. Raftery

Keyword(s):

Data Science ◽

Model Based Clustering ◽

Model Based ◽

Clustering And Classification

Download Full-text

Reliability of MTurk Data From Masters and Workers

Journal of Individual Differences ◽

10.1027/1614-0001/a000300 ◽

2020 ◽

Vol 41 (1) ◽

pp. 30-36

Author(s):

Steven V. Rouse

Keyword(s):

Cognitive Abilities ◽

Online Survey ◽

Quality Data ◽

Standard Samples ◽

Online Data ◽

Approval Ratings ◽

Cognitive Abilities Test ◽

Elite Status ◽

Master Status ◽

Amazon's Mechanical Turk

Abstract. Previous research has supported the use of Amazon’s Mechanical Turk (MTurk) for online data collection in individual differences research. Although MTurk Masters have reached an elite status because of strong approval ratings on previous tasks (and therefore gain higher payment for their work) no research has empirically examined whether researchers actually obtain higher quality data when they require that their MTurk Workers have Master status. In two different online survey studies (one using a personality test and one using a cognitive abilities test), the psychometric reliability of MTurk data was compared between a sample that required a Master qualification type and a sample that placed no status-level qualification requirement. In both studies, the Master samples failed to outperform the standard samples.

Download Full-text

Analyzing TEDS using the new graphing features of the online data analysis system (DAS)

PsycEXTRA Dataset ◽

10.1037/e434252005-001 ◽

2005 ◽

Author(s):

Keyword(s):

Data Analysis ◽

Online Data ◽

Analysis System ◽

Data Analysis System

Download Full-text