A Virtual Dataspaces Model for large-scale materials scientific data access

Extra-label drug use in food animal medicine is authorized by the US Animal Medicinal Drug Use Clarification Act (AMDUCA), and estimated withdrawal intervals are based on published scientific pharmacokinetic data. Occasionally there is a paucity of scientific data on which to base a withdrawal interval or a large number of animals being treated, driving the need to test for drug residues. Rapid assay commercial farm-side tests are essential for monitoring drug residues in animal products to protect human health. Active ingredients, sensitivity, matrices, and species that have been evaluated for commercial rapid assay tests are typically reported on manufacturers' websites or in PDF documents that are available to consumers but may require a special access request. Additionally, this information is not always correlated with FDA-approved tolerances. Furthermore, parameter changes for these tests can be very challenging to regularly identify, especially those listed on websites or in documents that are not publicly available. Therefore, artificial intelligence plays a critical role in efficiently extracting the data and ensure current information. Extracting tables from PDF and HTML documents has been investigated both by academia and commercial tool builders. Research in text mining of such documents has become a widespread yet challenging arena in implementing natural language programming. However, techniques of extracting tables are still in their infancy and being investigated and improved by researchers. In this study, we developed and evaluated a data-mining method for automatically extracting rapid assay data from electronic documents. Our automatic electronic data extraction method includes a software package module, a developed pattern recognition tool, and a data mining engine. Assay details were provided by several commercial entities that produce these rapid drug residue assay tests. During this study, we developed a real-time conversion system and method for reflowing contents in these files for accessibility practice and research data mining. Embedded information was extracted using an AI technology for text extraction and text mining to convert to structured formats. These data were then made available to veterinarians and producers via an online interface, allowing interactive searching and also presenting the commercial test assay parameters in reference to FDA-approved tolerances.

Download Full-text

A Python-oriented environment for climate experiments at scale in the frame of the European Open Science Cloud

10.5194/egusphere-egu2020-17031 ◽

2020 ◽

Author(s):

Donatello Elia ◽

Fabrizio Antonio ◽

Cosimo Palazzo ◽

Paola Nassisi ◽

Sofiane Bendoukha ◽

...

Keyword(s):

Data Analysis ◽

Data Analytics ◽

Large Scale ◽

Data Access ◽

Open Science ◽

Scientific Data ◽

Precipitation Trend ◽

Data Intensive ◽

Research Activities ◽

The Eu

Scientific data analysis experiments and applications require software capable of handling domain-specific and data-intensive workflows. The increasing volume of scientific data is further exacerbating these data management and analytics challenges, pushing the community towards the definition of novel programming environments for dealing efficiently with complex experiments, while abstracting from the underlying computing infrastructure.&#160;ECASLab provides a user-friendly data analytics environment to support scientists in their daily research activities, in particular in the climate change domain, by integrating analysis tools with scientific datasets (e.g., from the ESGF data archive) and computing resources (i.e., Cloud and HPC-based). It combines the features of the ENES Climate Analytics Service (ECAS) and the JupyterHub service, with a wide set of scientific libraries from the Python landscape for data manipulation, analysis and visualization. ECASLab is being set up in the frame of the European Open Science Cloud (EOSC) platform - in the EU H2020 EOSC-Hub project - by CMCC (https://ecaslab.cmcc.it/) and DKRZ (https://ecaslab.dkrz.de/), which host two major instances of the environment.&#160;ECAS, which lies at the heart of ECASLab, enables scientists to perform data analysis experiments on large volumes of multi-dimensional data by providing a workflow-oriented, PID-supported, server-side and distributed computing approach. ECAS consists of multiple components, centered around the Ophidia High Performance Data Analytics framework, which has been integrated with data access and sharing services (e.g., EUDAT B2DROP/B2SHARE, Onedata), along with the EGI federated cloud infrastructure. The integration with JupyterHub provides a convenient interface for scientists to access the ECAS features for the development and execution of experiments, as well as for sharing results (and the experiment/workflow definition itself). ECAS parallel data analytics capabilities can be easily exploited in Jupyter Notebooks (by means of PyOphidia, the Ophidia Python bindings) together with well-known Python modules for processing and for plotting the results on charts and maps (e.g., Dask, Xarray, NumPy, Matplotlib, etc.). ECAS is also one of the compute services made available to climate scientists by the EU H2020 IS-ENES3 project.&#160;Hence, this integrated environment represents a complete software stack for the design and run of interactive experiments as well as complex and data-intensive workflows. One class of such large-scale workflows, efficiently implemented through the environment resources, refers to multi-model data analysis in the context of both CMIP5 and CMIP6 (i.e., precipitation trend analysis orchestrated in parallel over multiple CMIP-based datasets).

Download Full-text

Abstract: Visualizing Large Scale Scientific Data Provenance

2012 SC Companion: High Performance Computing, Networking Storage and Analysis ◽

10.1109/sc.companion.2012.205 ◽

2012 ◽

Author(s):

Peng Chen ◽

Beth Plale

Keyword(s):

Large Scale ◽

Scientific Data ◽

Data Provenance

Download Full-text

EventDB: A Large-Scale Semi-structured Scientific Data Management System

Big Scientific Data Management - Lecture Notes in Computer Science ◽

10.1007/978-3-030-28061-1_12 ◽

2019 ◽

pp. 105-115

Author(s):

Wenjia Zhao ◽

Yong Qi ◽

Di Hou ◽

Peijian Wang ◽

Xin Gao ◽

...

Keyword(s):

Data Management ◽

Management System ◽

Large Scale ◽

Scientific Data ◽

Data Management System ◽

Scientific Data Management

Download Full-text

A Query Processing Framework for Large-Scale Scientific Data Analysis

Lecture Notes in Computer Science - Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXVIII ◽

10.1007/978-3-662-58384-5_5 ◽

2018 ◽

pp. 119-145

Author(s):

Leonidas Fegaras

Keyword(s):

Data Analysis ◽

Query Processing ◽

Large Scale ◽

Scientific Data ◽

Scientific Data Analysis ◽

Processing Framework

Download Full-text

Parallel Tensor Compression for Large-Scale Scientific Data.

10.2172/1226255 ◽

2015 ◽

Cited By ~ 1

Author(s):

Tamara G. Kolda ◽

Grey Ballard ◽

Woody Nathan Austin

Keyword(s):

Large Scale ◽

Scientific Data

Download Full-text

Visualization of large scale time-varying scientific data

Journal of Physics Conference Series ◽

10.1088/1742-6596/46/1/074 ◽

2006 ◽

Vol 46 ◽

pp. 535-544 ◽

Cited By ~ 4

Author(s):

Han-Wei Shen

Keyword(s):

Large Scale ◽

Scientific Data ◽

Time Varying

Download Full-text

RADAR: Runtime Asymmetric Data-Access Driven Scientific Data Replication

Lecture Notes in Computer Science - Supercomputing ◽

10.1007/978-3-319-07518-1_19 ◽

2014 ◽

pp. 296-313 ◽

Cited By ~ 10

Author(s):

John Jenkins ◽

Xiaocheng Zou ◽

Houjun Tang ◽

Dries Kimpe ◽

Robert Ross ◽

...

Keyword(s):

Data Replication ◽

Data Access ◽

Scientific Data ◽

Asymmetric Data

Download Full-text

Facilitating Design of Efficient Components by Bridging Gaps between Data Model and Business Process via Analysis of Service Traits of Data

Enterprise Information Systems ◽

10.4018/978-1-61692-852-0.ch214 ◽

2011 ◽

pp. 544-549

Author(s):

Ning Chen

Keyword(s):

Business Process ◽

Large Scale ◽

Data Modeling ◽

Data Access ◽

Enterprise Information System ◽

Enterprise Information ◽

Solution Quality ◽

Design Data ◽

Component Design

In many large-scale enterprise information system solutions, process design, data modeling and software component design are performed relatively independently by different people using various tools and methodologies. This usually leads to gaps among business process modeling, component design and data modeling. Currently, these functional or non-functional disconnections are fixed manually, which increases the complexity and decrease the efficiency and quality of development. In this chapter, a pattern-based approach is proposed to bridge the gaps with automatically generated data access components. Data access rules and patterns are applied to optimize these data access components. In addition, the authors present the design of a toolkit that automatically applies these patterns to bridge the gaps to ensure reduced development time, and higher solution quality.

Download Full-text

A Virtual Dataspaces Model for large-scale materials scientific data access

Meta-data Management System for High-Performance Large-Scale Scientific Data Access

Large-Scale Data Mining of Rapid Residue Detection Assay Data From HTML and PDF Documents: Improving Data Access and Visualization for Veterinarians

A Python-oriented environment for climate experiments at scale in the frame of the European Open Science Cloud

Abstract: Visualizing Large Scale Scientific Data Provenance

EventDB: A Large-Scale Semi-structured Scientific Data Management System

A Query Processing Framework for Large-Scale Scientific Data Analysis

Parallel Tensor Compression for Large-Scale Scientific Data.

Visualization of large scale time-varying scientific data

RADAR: Runtime Asymmetric Data-Access Driven Scientific Data Replication

Facilitating Design of Efficient Components by Bridging Gaps between Data Model and Business Process via Analysis of Service Traits of Data

Export Citation Format