Final Session: The Fourth Paradigm : Data-Intensive Scientific Discovery: More than 10 years later.

Author(s):  
Irina Sens
2014 ◽  
Vol 22 (2) ◽  
pp. 173-185 ◽  
Author(s):  
Eli Dart ◽  
Lauren Rotman ◽  
Brian Tierney ◽  
Mary Hester ◽  
Jason Zurawski

The ever-increasing scale of scientific data has become a significant challenge for researchers that rely on networks to interact with remote computing systems and transfer results to collaborators worldwide. Despite the availability of high-capacity connections, scientists struggle with inadequate cyberinfrastructure that cripples data transfer performance, and impedes scientific progress. The ScienceDMZparadigm comprises a proven set of network design patterns that collectively address these problems for scientists. We explain the Science DMZ model, including network architecture, system configuration, cybersecurity, and performance tools, that creates an optimized network environment for science. We describe use cases from universities, supercomputing centers and research laboratories, highlighting the effectiveness of the Science DMZ model in diverse operational settings. In all, the Science DMZ model is a solid platform that supports any science workflow, and flexibly accommodates emerging network technologies. As a result, the Science DMZ vastly improves collaboration, accelerating scientific discovery.


2011 ◽  
Vol 15 (4) ◽  
pp. 199-201 ◽  
Author(s):  
Roger Barga ◽  
Bill Howe ◽  
David Beck ◽  
Stuart Bowers ◽  
William Dobyns ◽  
...  

2021 ◽  
Author(s):  
Chaolemen Borjigin ◽  
Chen Zhang

Abstract Data Science is one of today’s most rapidly growing academic fields and has significant implications for all conventional scientific studies. However, most of the relevant studies so far have been limited to one or several facets of Data Science from a specific application domain perspective and fail to discuss its theoretical framework. Data Science is a novel science in that its research goals, perspectives, and body of knowledge is distinct from other sciences. The core theories of Data Science are the DIKW pyramid, data-intensive scientific discovery, data science lifecycle, data wrangling or munging, big data analytics, data management and governance, data products development, and big data visualization. Six main trends characterize the recent theoretical studies on Data Science: growing significance of DataOps, the rise of citizen data scientists, enabling augmented data science, diversity of domain-specific data science, and implementing data stories as data products. The further development of Data Science should prioritize four ways to turning challenges into opportunities: accelerating theoretical studies of data science, the trade-off between explainability and performance, achieving data ethics, privacy and trust, and aligning academic curricula to industrial needs.


2019 ◽  
Vol 6 (1) ◽  
pp. 47-55 ◽  
Author(s):  
Alexandra Paxton ◽  
Alexa Tullett

Today, researchers can collect, analyze, and share more data than ever before. Not only does increasing technological capacity open the door to new data-intensive perspectives in cognitive science and psychology (i.e., research that takes advantage of complex or large-scale data to understand human cognition and behavior), but increasing connectedness has sparked exponential increases in the ease and practice of scientific transparency. The growing open science movement encourages researchers to share data, materials, methods, and publications with other scientists and the wider public. Open science benefits data-intensive psychological science, the public, and public policy, and we present recommendations to improve the adoption of open science practices by changing the academic incentive structure and by improving the education pipeline. Despite ongoing questions about implementing open science guidelines, policy makers have an unprecedented opportunity to shape the next frontier of scientific discovery.


2018 ◽  
Author(s):  
Alexandra Paxton ◽  
Alexa Mary Tullett

Today, researchers can collect, analyze, and share more data than ever before. Not only does increasing technological capacity open the door to new data-intensive perspectives in cognitive science and psychology (that is, research that takes advantage of complex or large-scale data to understand human cognition and behavior), but increasing connectedness has sparked exponential increases in the ease and practice of scientific transparency. The growing open science movement encourages researchers to share data, materials, methods, and publications with other scientists and the wider public. Open science benefits data-intensive psychological science, the public, and public policy, and we present recommendations to improve the adoption of open science practices by changing the academic incentive structure and by improving the education pipeline. Despite ongoing questions about implementing open-science guidelines, policymakers have an unprecedented opportunity to shape the next frontier of scientific discovery.


Author(s):  
Anna Lewis ◽  
George Matsumoto

Scientific discovery, problem-solving, and hypothesis testing requires observation, data analysis and synthesis of new knowledge. In today's world, this process is highly dependent on computer-based data exploration of high volume, high velocity, and high variety data streams (3HV) However, though the power of 3HV surpasses the amount of information gathered from more familiar lab experiences, data-intensive science has not yet achieved the same impact or prominence in public education. This chapter provides an examination of the Education And Research: Testing Hypotheses (EARTH) Science Educator Professional Development (PD) which was developed to bridge this gap, bringing data-intensive science into the classroom while supporting inquiry learning practices.


Author(s):  
Tevfik Kosar ◽  
Mehmet Balman ◽  
Esma Yildirim ◽  
Sivakumar Kulasekaran ◽  
Brandon Ross

In this paper, we present the Stork data scheduler as a solution for mitigating the data bottleneck in e-Science and data-intensive scientific discovery. Stork focuses on planning, scheduling, monitoring and management of data placement tasks and application-level end-to-end optimization of networked inputs/outputs for petascale distributed e-Science applications. Unlike existing approaches, Stork treats data resources and the tasks related to data access and movement as first-class entities just like computational resources and compute tasks, and not simply the side-effect of computation. Stork provides unique features such as aggregation of data transfer jobs considering their source and destination addresses, and an application-level throughput estimation and optimization service. We describe how these two features are implemented in Stork and their effects on end-to-end data transfer performance.


2021 ◽  
Author(s):  
Roshan Patel ◽  
Carlos Borca ◽  
Michael Webb

The emergence of data-intensive scientific discovery and machine learning has dramatically changed the way in which scientists and engineers approach materials design. Nevertheless, for designing macromolecules or polymers, one limitation is the lack of appropriate methods or standards for converting systems into chemically informed, machine-readable representations. This featurization process is critical to building predictive models that can guide polymer discovery. Although standard molecular featurization techniques have been deployed on homopolymers, such approaches capture neither the multiscale nature nor topological complexity of copolymers, and they have limited application to systems that cannot be characterized by a single repeat unit. Herein, we present, evaluate, and analyze a series of featurization strategies suitable for copolymer systems. These strategies are systematically examined in diverse prediction tasks sourced from four distinct datasets that enable understanding of how featurization can impact copolymer property prediction. Based on this comparative analysis, we suggest directly encoding polymer size in polymer representations when possible, adopting topological descriptors or convolutional neural networks when the precise polymer sequence is known, and using simplified constitutional unit representations depending on the noise-level of underlying data. These results provide guidance and future directions regarding polymer featurization for copolymer design by machine learning.


Sign in / Sign up

Export Citation Format

Share Document