scholarly journals The democratization of data science education

Author(s):  
Sean Kross ◽  
Roger D Peng ◽  
Brian S Caffo ◽  
Ira Gooding ◽  
Jeffrey T Leek

Over the last three decades data has become ubiquitous and cheap. This transition has accelerated over the last five years and training in statistics, machine learning, and data analysis have struggled to keep up. In April 2014 we launched a program of nine courses, the Johns Hopkins Data Science Specialization, which has now had more than 4 million enrollments over the past three years. Here the program is described and compared to both standard and more recently developed data science curricula. We show that novel pedagogical and administrative decisions introduced in our program are now standard in online data science programs. The impact of the Data Science Specialization on data science education in the US is also discussed. Finally we conclude with some thoughts about the future of data science education in a data democratized world.

Author(s):  
Sean Kross ◽  
Roger D Peng ◽  
Brian S Caffo ◽  
Ira Gooding ◽  
Jeffrey T Leek

Over the last three decades data has become ubiquitous and cheap. This transition has accelerated over the last five years and training in statistics, machine learning, and data analysis have struggled to keep up. In April 2014 we launched a program of nine courses, the Johns Hopkins Data Science Specialization, which has now had more than 4 million enrollments over the past three years. Here the program is described and compared to both standard and more recently developed data science curricula. We show that novel pedagogical and administrative decisions introduced in our program are now standard in online data science programs. The impact of the Data Science Specialization on data science education in the US is also discussed. Finally we conclude with some thoughts about the future of data science education in a data democratized world.


Psychology ◽  
2020 ◽  
Author(s):  
Jeffrey Stanton

The term “data science” refers to an emerging field of research and practice that focuses on obtaining, processing, visualizing, analyzing, preserving, and re-using large collections of information. A related term, “big data,” has been used to refer to one of the important challenges faced by data scientists in many applied environments: the need to analyze large data sources, in certain cases using high-speed, real-time data analysis techniques. Data science encompasses much more than big data, however, as a result of many advancements in cognate fields such as computer science and statistics. Data science has also benefited from the widespread availability of inexpensive computing hardware—a development that has enabled “cloud-based” services for the storage and analysis of large data sets. The techniques and tools of data science have broad applicability in the sciences. Within the field of psychology, data science offers new opportunities for data collection and data analysis that have begun to streamline and augment efforts to investigate the brain and behavior. The tools of data science also enable new areas of research, such as computational neuroscience. As an example of the impact of data science, psychologists frequently use predictive analysis as an investigative tool to probe the relationships between a set of independent variables and one or more dependent variables. While predictive analysis has traditionally been accomplished with techniques such as multiple regression, recent developments in the area of machine learning have put new predictive tools in the hands of psychologists. These machine learning tools relax distributional assumptions and facilitate exploration of non-linear relationships among variables. These tools also enable the analysis of large data sets by opening options for parallel processing. In this article, a range of relevant areas from data science is reviewed for applicability to key research problems in psychology including large-scale data collection, exploratory data analysis, confirmatory data analysis, and visualization. This bibliography covers data mining, machine learning, deep learning, natural language processing, Bayesian data analysis, visualization, crowdsourcing, web scraping, open source software, application programming interfaces, and research resources such as journals and textbooks.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Li Tong ◽  
◽  
Po-Yen Wu ◽  
John H. Phan ◽  
Hamid R. Hassazadeh ◽  
...  

Abstract To use next-generation sequencing technology such as RNA-seq for medical and health applications, choosing proper analysis methods for biomarker identification remains a critical challenge for most users. The US Food and Drug Administration (FDA) has led the Sequencing Quality Control (SEQC) project to conduct a comprehensive investigation of 278 representative RNA-seq data analysis pipelines consisting of 13 sequence mapping, three quantification, and seven normalization methods. In this article, we focused on the impact of the joint effects of RNA-seq pipelines on gene expression estimation as well as the downstream prediction of disease outcomes. First, we developed and applied three metrics (i.e., accuracy, precision, and reliability) to quantitatively evaluate each pipeline’s performance on gene expression estimation. We then investigated the correlation between the proposed metrics and the downstream prediction performance using two real-world cancer datasets (i.e., SEQC neuroblastoma dataset and the NIH/NCI TCGA lung adenocarcinoma dataset). We found that RNA-seq pipeline components jointly and significantly impacted the accuracy of gene expression estimation, and its impact was extended to the downstream prediction of these cancer outcomes. Specifically, RNA-seq pipelines that produced more accurate, precise, and reliable gene expression estimation tended to perform better in the prediction of disease outcome. In the end, we provided scenarios as guidelines for users to use these three metrics to select sensible RNA-seq pipelines for the improved accuracy, precision, and reliability of gene expression estimation, which lead to the improved downstream gene expression-based prediction of disease outcome.


Author(s):  
N. Dolzhenko ◽  
E. Mailyanova ◽  
I. Assilbekova ◽  
Z. Konakbay

Cloudiness and range of visibility are the most significant flight conditions for aircraft. The impact of clouds and visibility on the safety of aircraft flights, especially small aircraft, cannot be overestimated. According to the Interstate Air Committee, Kazakhstan ranks second in the number of aviation disasters. The average age of a third of Kazakhstan's small aircraft is more than 30 years. Over the past few years, 14 air accidents have occurred in the Republic of Kazakhstan, 11 of them with small aircraft. In this work, we investigate long-term data on cloudiness and visibility at the most weather-favorable airfield in Balkhash, for the possibility of safe and economical flights of small aircraft and planning training flights.


Author(s):  
Brian Granger ◽  
Fernando Pérez

Project Jupyter is an open-source project for interactive computing widely used in data science, machine learning, and scientific computing. We argue that even though Jupyter helps users perform complex, technical work, Jupyter itself solves problems that are fundamentally human in nature. Namely, Jupyter helps humans to think and tell stories with code and data. We illustrate this by describing three dimensions of Jupyter: interactive computing, computational narratives, and  the idea that Jupyter is more than software. We illustrate the impact of these dimensions on a community of practice in Earth and climate science.


PEDIATRICS ◽  
1996 ◽  
Vol 97 (5) ◽  
pp. 733-735
Author(s):  
Modena Wilson ◽  
Donald M. Berwick ◽  
Carolyn DiGuiseppi

Preventive services compose a large portion of primary care pediatrics, and pediatricians by their nature and training seem extraordinarily disposed toward clinical prevention. Therefore, when the first edition of the Guide to Clinical Preventive Services appeared in 1989 from the US Preventive Services Task Force (USPSTF), the negative reaction of the organized pediatric community was disappointing. The second edition of that guide has just been released, and we three pediatricians, who have worked hard during the past 5 years as members and staff of the second task force, hope for a far more positive reaction from our colleagues this time around.


2020 ◽  
Vol 9 (2) ◽  
pp. 25-36
Author(s):  
Necmi Gürsakal ◽  
Ecem Ozkan ◽  
Fırat Melih Yılmaz ◽  
Deniz Oktay

The interest in data science is increasing in recent years. Data science, including mathematics, statistics, big data, machine learning, and deep learning, can be considered as the intersection of statistics, mathematics and computer science. Although the debate continues about the core area of data science, the subject is a huge hit. Universities have a high demand for data science. They are trying to live up to this demand by opening postgraduate and doctoral programs. Since the subject is a new field, there are significant differences between the programs given by universities in data science. Besides, since the subject is close to statistics, most of the time, data science programs are opened in the statistics departments, and this also causes differences between the programs. In this article, we will summarize the data science education developments in the world and in Turkey specifically and how data science education should be at the graduate level.


JAMIA Open ◽  
2018 ◽  
Vol 1 (2) ◽  
pp. 159-165
Author(s):  
Robert Hoyt ◽  
Victoria Wangia-Anderson

Abstract Objective To discuss and illustrate the utility of two open collaborative data science platforms, and how they would benefit data science and informatics education. Methods and Materials The features of two online data science platforms are outlined. Both are useful for new data projects and both are integrated with common programming languages used for data analysis. One platform focuses more on data exploration and the other focuses on containerizing, visualization, and sharing code repositories. Results Both data science platforms are open, free, and allow for collaboration. Both are capable of visual, descriptive, and predictive analytics Discussion Data science education benefits by having affordable open and collaborative platforms to conduct a variety of data analyses. Conclusion Open collaborative data science platforms are particularly useful for teaching data science skills to clinical and nonclinical informatics students. Commercial data science platforms exist but are cost-prohibitive and generally limited to specific programming languages.


2005 ◽  
Vol 08 (04) ◽  
pp. 637-657 ◽  
Author(s):  
Shuh-Chyi Doong ◽  
Sheng-Yung Yang ◽  
Thomas C. Chiang

This paper examines autocorrelation and cross-autocorrelation patterns for selected Asian stock returns. Special attention is given to examination of Asian stock returns and the impact on them of the past information. By employing a class of asymmetric specification of conditional mean and conditional variance models, we find the autocorrelation coefficient to be negative for the Japanese market and positive for the rest of the Asian markets studied. Our findings suggest that the Asian markets respond sensitively to the US market, especially on the down side. The asymmetric effects are found to be present in both mean and variance equations. The evidence is consistent with behavior in which investors in Asian markets tend to react more significantly to negative stock news originating from US sources than they do to positive news.


Author(s):  
Fidel Antonio Cárdenas Salgado

Despite the enhanced importance given by science educators and researchers toevaluation in sciences in the past, the search for better and more reliable assessingprocesses is nowadays a research priority.Although the areas of science to be assessed have been generally accepted asincluding knowledge of concepts and facts, process skills, science thinking, problemsolvingskills, abilities reeded to manipulate laboratoty equipment and the dispositionof students to apply scientific knowledge, the impact of assessment on students,teachers and parents has not been overlooked. However, advances in cognitivepsychology, science education and research on assessment on science are callingfor new dimensions, such as complex reasoning, to be researched in the field.Complex reasoning is a competence included as a desirable outcome of Scienceeducation in many Science curricula and is characterized by the following attributes:problem-solving, decision-making and critical and creative thinking. This paperpresents a theoretical framework fon assessment in science involving thedevelopment of competencies through science education and emphasizes complexreasoning. A working model to categorize assessment tasks in complex reasoning isdescribed and some of the main questions to be researched in the une are stated.The need for interdisciplinary work as well as close interaction with other unes ofinvestigation has also been put forward.It is the main objective of this line of nesearch to explore and propose moresystematic assessment processes in science education to trace the evolution ofstudents understandings and achievements mainly in secondary schools.


Sign in / Sign up

Export Citation Format

Share Document