Use(less) Data: Discovery, COUNTER, and Music Databases

Bio-inspired algorithms are sort of implementation of natural solutions to solve hard problems – so called NP problems. A seismic hazard is the probability that an earthquake will occur in a given geographic area, within a given window of time, and with ground motion intensity exceeding a given threshold. Seismic hazards prediction is one of the fields where data mining plays an important role. This paper presents a new bio-inspired algorithm motivated by the echolocation behavior of bats for seismic hazard states prediction in coal mines based on previously recorded data. It is a distance calculation based approach, Results were very satisfactory in a manner that encourage us to continue working on this approach. The implementation of the algorithm touches three fields of studies, data discovery or so called data mining, bio inspired techniques, and seismic hazards predictions.

Get full-text (via PubEx)

Seeping Semantics: Linking Datasets Using Word Embeddings for Data Discovery

2018 IEEE 34th International Conference on Data Engineering (ICDE) ◽

10.1109/icde.2018.00093 ◽

2018 ◽

Cited By ~ 9

Author(s):

Raul Castro Fernandez ◽

Essam Mansour ◽

Abdulhakim A. Qahtan ◽

Ahmed Elmagarmid ◽

Ihab Ilyas ◽

...

Keyword(s):

Word Embeddings ◽

Data Discovery ◽

Semantics Linking

Get full-text (via PubEx)

FAIRness in Biomedical Data Discovery

Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies ◽

10.5220/0007576401590166 ◽

2019 ◽

Cited By ~ 1

Author(s):

Alina Trifan ◽

José Oliveira

Keyword(s):

Biomedical Data ◽

Data Discovery

Get full-text (via PubEx)

DNS-embedded service endpoint registry for distributed e-Infrastructures

Cluster Computing ◽

10.1007/s10586-021-03455-5 ◽

2021 ◽

Author(s):

Andrii Salnikov ◽

Balázs Kónya

Keyword(s):

Data Model ◽

Service Discovery ◽

Record Management ◽

Big Science ◽

Information Discovery ◽

Data Discovery ◽

Distributed Information ◽

Computing Grid ◽

Client Side ◽

Science Service

AbstractDistributed e-Infrastructure is a key component of modern BIG Science. Service discovery in e-Science environments, such as Worldwide LHC Computing Grid (WLCG), is a crucial functionality that relies on service registry. In this paper we re-formulate the requirements for the service endpoint registry based on our more than 10 years experience with many systems designed or used within the WLCG e-Infrastructure. To satisfy those requirements the paper proposes a novel idea to use the existing well-established Domain Name System (DNS) infrastructure together with a suitable data model as a service endpoint registry. The presented ARC Hierarchical Endpoints Registry (ARCHERY) system consists of a minimalistic data model representing services and their endpoints within e-Infrastructures, a rendering of the data model embedded into DNS-records, a lightweight software layer for DNS-record management and client-side data discovery. Our approach for the ARCHERY registry required minimal software development and inherits all the benefits of one of the most reliable distributed information discovery source of the internet, the DNS infrastructure. In particular, deployment, management and operation of ARCHERY is fully relying on DNS. Results of ARCHERY deployment use-cases are provided together with performance analysis.

Get full-text (via PubEx)

Research-ready data for multi-cohort analyses: The Dementias Platform UK (DPUK) C-Surv data model

10.21203/rs.3.rs-937113/v3 ◽

2021 ◽

Author(s):

Sarah Bauermeister ◽

Joshua R Bauermeister ◽

R Bridgman ◽

C Felici ◽

M Newbury ◽

...

Keyword(s):

Data Model ◽

Data Access ◽

Data Discovery ◽

Standard Data ◽

Technology Standard ◽

Access Request ◽

Nested Structure ◽

Using Data ◽

Model C ◽

Cohort Analyses

Abstract Research-ready data (that curated to a defined standard) increases scientific opportunity and rigour by integrating the data environment. The development of research platforms has highlighted the value of research-ready data, particularly for multi-cohort analyses. Following user consultation, a standard data model (C-Surv), optimised for data discovery, was developed using data from 12 Dementias Platform UK (DPUK) population and clinical cohort studies. The model uses a four-tier nested structure based on 18 data themes selected according to user behaviour or technology. Standard variable naming conventions are applied to uniquely identify variables within the context of longitudinal studies. The data model was used to develop a harmonised dataset for 11 cohorts. This dataset populated the Cohort Explorer data discovery tool for assessing the feasibility of an analysis prior to making a data access request. It was concluded that developing and applying a standard data model (C-Surv) for research cohort data is feasible and useful.

Get full-text (via PubEx)

Practical Application of a Data Stewardship Maturity Matrix for the NOAA OneStop Project

10.31219/osf.io/fp3js ◽

2018 ◽

Author(s):

Ge Peng ◽

Anna Milan ◽

Nancy A. Ritchey ◽

Robert P. Partee ◽

Sonny Zinn ◽

...

Keyword(s):

North Carolina ◽

Best Practices ◽

Data Quality ◽

User Needs ◽

Data Quality Control ◽

Practical Application ◽

Data Discovery ◽

Data Quality Assessment ◽

Data Stewardship ◽

Do So

Assessing the stewardship maturity of individual datasets is an essential part of ensuring and improving the way datasets are documented, preserved, and disseminated to users. It is a critical step towards meeting U.S. federal regulations, organizational requirements, and user needs. However, it is challenging to do so consistently and quantifiably. The Data Stewardship Maturity Matrix (DSMM), developed jointly by NOAA’s National Centers for Environmental Information (NCEI) and the Cooperative Institute for Climate and Satellites–North Carolina (CICS-NC), provides a uniform framework for consistently rating stewardship maturity of individual datasets in nine key components: preservability, accessibility, usability, production sustainability, data quality assurance, data quality control/monitoring, data quality assessment, transparency/traceability, and data integrity. So far, the DSMM has been applied to over 900 individual datasets that are archived and/or managed by NCEI, in support of the NOAA’s OneStop Data Discovery and Access Framework Project. As a part of the OneStop-ready process, tools, implementation guidance, workflows, and best practices are developed to assist the application of the DSMM and described in this paper. The DSMM ratings are also consistently captured in the ISO standard-based dataset-level quality metadata and citable quality descriptive information documents, which serve as interoperable quality information to both machine and human end-users. These DSMM implementation and integration workflows and best practices could be adopted by other data management and stewardship projects or adapted for applications of other maturity assessment models.

Get full-text (via PubEx)