scholarly journals Linking Educational Resources on Data Science

Author(s):  
José Luis Ambite ◽  
Jonathan Gordon ◽  
Lily Fierro ◽  
Gully Burns ◽  
Joel Mathew

The availability of massive datasets in genetics, neuroimaging, mobile health, and other subfields of biology and medicine promises new insights but also poses significant challenges. To realize the potential of big data in biomedicine, the National Institutes of Health launched the Big Data to Knowledge (BD2K) initiative, funding several centers of excellence in biomedical data analysis and a Training Coordinating Center (TCC) tasked with facilitating online and inperson training of biomedical researchers in data science. A major initiative of the BD2K TCC is to automatically identify, describe, and organize data science training resources available on the Web and provide personalized training paths for users. In this paper, we describe the construction of ERuDIte, the Educational Resource Discovery Index for Data Science, and its release as linked data. ERuDIte contains over 11,000 training resources including courses, video tutorials, conference talks, and other materials. The metadata for these resources is described uniformly using Schema.org. We use machine learning techniques to tag each resource with concepts from the Data Science Education Ontology, which we developed to further describe resource content. Finally, we map references to people and organizations in learning resources to entities in DBpedia, DBLP, and ORCID, embedding our collection in the web of linked data. We hope that ERuDIte will provide a framework to foster open linked educational resources on the Web.

Author(s):  
Antonio Garrote ◽  
María N. Moreno García

In this chapter we describe a news trends detection system built with the aim of detecting daily trends in a big collection of news articles extracted from the web and expose the computed trends data as open linked data that can be consumed by other components of the IT infrastructure. Due to the sheer amount of data being processed, the system relies on big data technologies to process raw news data and compute the trends that will be later exposed as open linked data. Thanks to the open linked data interface, data can be easily consumed by other components of the application, like a JavaScript front-end, or re-used by different IT systems. The case is a good example of how open linked data can be used to provide a convenient interface to big data systems.


Author(s):  
Dimitrios A. Koutsomitropoulos ◽  
Georgia D. Solomou ◽  
Aikaterini K. Kalou

2021 ◽  
pp. 1-11
Author(s):  
Helen MacGillivray

There has been increasing interest in recent years in training in official statistics with reference to the 2030 Agenda, big data, diversification of data types and sources, and data science. Backgrounds for work in official statistics are becoming more varied than ever. The official statistics community has also become progressively more aware of the importance of statistical literacy in education and trust in official statistics. Hence foundation and introductory are of as much interest to official statistics as more specialised training. At the same time, greater access to data and vast technological capabilities has seen much emphasis and discussion of the statistical and data sciences and education therein, including development of educational resources in contexts such as civic data and statistics. Data science provides opportunities to renew the decades-long push for authentic learning that reflects the practice of ‘greater statistics’ and ‘greater data science’, and to examine progress to date in implementing and sustaining the extensive work and advocacy of many. This article discusses what is needed at the foundation and introductory levels to realize this advocacy, with commentary relevant to official statistics.


Author(s):  
Sebastian Hellmann ◽  
Jens Lehmann ◽  
Sören Auer

The vision of the Semantic Web aims to make use of semantic representations on the largest possible scale - the Web. Large knowledge bases such as DBpedia, OpenCyc, and GovTrack are emerging and freely available as Linked Data and SPARQL endpoints. Exploring and analysing such knowledge bases is a significant hurdle for Semantic Web research and practice. As one possible direction for tackling this problem, the authors present an approach for obtaining complex class expressions from objects in knowledge bases by using Machine Learning techniques. The chapter describes in detail how to leverage existing techniques to achieve scalability on large knowledge bases available as SPARQL endpoints or Linked Data. The algorithms are made available in the open source DL-Learner project and this chapter presents several real-life scenarios in which they can be used by Semantic Web applications.


F1000Research ◽  
2019 ◽  
Vol 8 ◽  
pp. 251
Author(s):  
John Van Horn ◽  
Sumiko Abe ◽  
José Luis Ambite ◽  
Teresa K. Attwood ◽  
Niall Beard ◽  
...  

The increasing richness and diversity of biomedical data types creates major organizational and analytical impediments to rapid translational impact in the context of training and education. As biomedical data-sets increase in size, variety and complexity, they challenge conventional methods for sharing, managing and analyzing those data. In May 2017, we convened a two-day meeting between the BD2K Training Coordinating Center (TCC), ELIXIR Training/TeSS, GOBLET, H3ABioNet, EMBL-ABR, bioCADDIE and the CSIRO, in Huntington Beach, California, to compare and contrast our respective activities, and how these might be leveraged for wider impact on an international scale. Discussions focused on the role of i) training for biomedical data science; ii) the need to promote core competencies, and the ii) development of career paths. These led to specific conversations about i) the values of standardizing and sharing data science training resources; ii) challenges in encouraging adoption of training material standards; iii) strategies and best practices for the personalization and customization of learning experiences; iv) processes of identifying stakeholders and determining how they should be accommodated; and v) discussions of joint partnerships to lead the world on data science training in ways that benefit all stakeholders. Generally, international cooperation was viewed as essential for accommodating the widest possible participation in the modern bioscience enterprise, providing skills in a truly “FAIR” manner, addressing the importance of data science understanding worldwide. Several recommendations for the exchange of educational frameworks are made, along with potential sources for support, and plans for further cooperative efforts are presented.


2020 ◽  
Vol 13 (1) ◽  
Author(s):  
Jason H. Moore

Abstract Data science has emerged as an important discipline in the era of big data and biological and biomedical data mining. As such, we have seen a rapid increase in the number of data science departments, research centers, and schools. We review here ten important leadership roles for a successful academic data science chair, director, or dean. These roles include the visionary, executive, cheerleader, manager, enforcer, subordinate, educator, entrepreneur, mentor, and communicator. Examples specific to leadership in data science are given for each role.


2015 ◽  
pp. 1633-1637
Author(s):  
Antonio Garrote ◽  
María N. Moreno García

In this chapter we describe a news trends detection system built with the aim of detecting daily trends in a big collection of news articles extracted from the web and expose the computed trends data as open linked data that can be consumed by other components of the IT infrastructure. Due to the sheer amount of data being processed, the system relies on big data technologies to process raw news data and compute the trends that will be later exposed as open linked data. Thanks to the open linked data interface, data can be easily consumed by other components of the application, like a JavaScript front-end, or re-used by different IT systems. The case is a good example of how open linked data can be used to provide a convenient interface to big data systems.


Author(s):  
Jacques R. J. Bughin ◽  
Michele Cincera ◽  
Dorota Reykowska ◽  
Rafał Ohme

Data science has been proven to be an important asset to support better decision making in a variety of settings, whether it is for a scientist to better predict climate change for a company to better predict sales or for a government to anticipate voting preferences. In this research, the authors leverage random forest (RF) as one of the most effective machine learning techniques using big data to predict vaccine intent in five European countries. The findings support the idea that outside of vaccine features, building adequate perception of the risk of contamination, and securing institutional and peer trust are key nudges to convert skeptics to get vaccinated against COVID-19. What machine learning techniques further add beyond traditional regression techniques is some extra granularity in factors affecting vaccine preferences (twice more factors than logistic regression). Other factors that emerge as predictors of vaccine intent are compliance appetite with non-pharmaceutical protective measures as well as perception of the crisis duration.


Author(s):  
Hani Ragab ◽  
Mohamed Salama

Event management is a dynamic field that has always benefited from latest advances in technology. In this chapter we will review some of the newest and most promising fields in information technology and discuss how they could be used to support event managers. Data is at the heart of information technology, in particular, data science aims to extract knowledge from data using machine learning techniques. The amount of data might make it not possible to process it on personal computers, leading to the field of big data. We will explore the fields of data science, big data as well as machine learning. Stored and transiting data might hold high value that attracts cyber criminals, information security focuses on how to protect data from accidental release and tampering of data. Basic concepts of information security, particularly cryptography, had a major contribution in the creation of the new paradigm of blockchains.


2018 ◽  
Author(s):  
Ravi Madduri ◽  
Kyle Chard ◽  
Mike D’ Arcy ◽  
Segun C. Jung ◽  
Alexis Rodriguez ◽  
...  

AbstractBig biomedical data create exciting opportunities for discovery, but make it difficult to capture analyses and outputs in forms that are findable, accessible, interoperable, and reusable (FAIR). In response, we describe tools that make it easy to capture, and assign identifiers to, data and code throughout the data lifecycle. We illustrate the use of these tools via a case study involving a multi-step analysis that creates an atlas of putative transcription factor binding sites from terabytes of ENCODE DNase I hypersensitive sites sequencing data. We show how the tools automate routine but complex tasks, capture analysis algorithms in understandable and reusable forms, and harness fast networks and powerful cloud computers to process data rapidly, all without sacrificing usability or reproducibility—thus ensuring that big data are not hard-to-(re)use data. We compare and contrast our approach with other approaches to big data analysis and reproducibility.


Sign in / Sign up

Export Citation Format

Share Document