The Common Greenhouse Ontology: An Ontology Describing Components, Properties, and Measurements inside the Greenhouse

e19283 Background: Taiwan has 32 biobanks under Government’ governance. The Ministry of Health and Welfare have established a National Biobank Consortium of Taiwan to unify the specimen quality and the medical record database. The total recruited participants exceed 350,000. The National Health Research Institutes in Taiwan hold the responsibility of establish a common data model for aggregating data elements from electronic health records (EHRs) of institutes through direct feeds. The goals are to assemble a set of common oncology data elements and to facilitate cancer data interoperability for patient care and research across institutes of Biobank Consortium. Methods: We first conduct a thorough review of available EHR data elements for patient characteristics, diagnosis/staging, treatments, laboratory results, vital signs and outcomes. The data dictionary was organized based on HL7 FHIR and also included data elements from Taiwan Cancer Registry (TCR) and National Health Insurance (NHI) Program, which the common definition has already been established and implemented for years. Data elements suggested by ASCO CancerLinQ and minimal Common Oncology Data Elements (mCODE) are also referenced during planning. The final common model was then reviewed by a panel of experts consisting oncologists as well as data science specialists. Results: There are finally 9 data tables with 281 data elements, in which 248 of them are from the routinely uploaded data elements to government agencies (TCR & NHI) and 33 elements are collected with partial common definition among institutes. There are 164 data elements which are to be collected one observation per case, while 117 elements will be accumulated periodically. Conclusions: A comprehensive understanding of genetics, phenotypes, disease variation as well as treatment responses is crucial to fulfill the needs of real-world studies, which potentially would lead to personalized treatment and drug development. At the first stage of this project, we aim to accumulate available EHR structured data elements and to maintain sufficient cancer data quality. Consequently, the database can provide real-world evidence to promote evidence-based & data-driven cancer care.

Download Full-text

FAIR Digital Objects for Science: From Data Pieces to Actionable Knowledge Units

Publications ◽

10.3390/publications8020021 ◽

2020 ◽

Vol 8 (2) ◽

pp. 21 ◽

Cited By ~ 2

Author(s):

Koenraad De Smedt ◽

Dimitris Koureas ◽

Peter Wittenburg

Keyword(s):

Data Science ◽

Open Science ◽

Research Data ◽

Use Cases ◽

Digital Object ◽

Data Interoperability ◽

Actionable Knowledge ◽

The Past ◽

Research Communities ◽

Digital Objects

Data science is facing the following major challenges: (1) developing scalable cross-disciplinary capabilities, (2) dealing with the increasing data volumes and their inherent complexity, (3) building tools that help to build trust, (4) creating mechanisms to efficiently operate in the domain of scientific assertions, (5) turning data into actionable knowledge units and (6) promoting data interoperability. As a way to overcome these challenges, we further develop the proposals by early Internet pioneers for Digital Objects as encapsulations of data and metadata made accessible by persistent identifiers. In the past decade, this concept was revisited by various groups within the Research Data Alliance and put in the context of the FAIR Guiding Principles for findable, accessible, interoperable and reusable data. The basic components of a FAIR Digital Object (FDO) as a self-contained, typed, machine-actionable data package are explained. A survey of use cases has indicated the growing interest of research communities in FDO solutions. We conclude that the FDO concept has the potential to act as the interoperable federative core of a hyperinfrastructure initiative such as the European Open Science Cloud (EOSC).

Download Full-text

Erratum: CERIF: The Common European Research Information Format Model [Data Science Journal, Volume 9, 24 July 2010 CRIS24-CRIS31]

Data Science Journal ◽

10.2481/dsj.9_e1 ◽

2010 ◽

Vol 9 ◽

pp. ECRIS1-ECRIS4

Author(s):

Brigitte Jörg

Keyword(s):

Data Science ◽

European Research ◽

Science Journal ◽

Model Data ◽

Research Information ◽

Information Format ◽

The Common

Download Full-text

Abstraction of Computer Language Patterns

Computational Linguistics ◽

10.4018/978-1-4666-6042-7.ch069 ◽

2014 ◽

pp. 1401-1421

Author(s):

Jaroslav Porubän ◽

Ján Kollár ◽

Miroslav Sabo

Keyword(s):

Computer Language ◽

Common Language ◽

Domain Expert ◽

Domain Experts ◽

Computer Languages ◽

Domain Specific ◽

Language Patterns ◽

Automated Inference ◽

The Common ◽

Technical People

In general, designing a domain-specific language (DSL) is a complicated process, requiring the cooperation of experts from both application domain and computer language development areas. One of the problems that may occur is a communication gap between a domain expert and a language engineer. Since domain experts are usually non-technical people, it might be difficult for them to express requirements on a DSL notation in a technical manner. Another compelling problem is that even though the majority of DSLs share the same notation style for representing the common language constructs, a language engineer has to formulate the specification for these constructs repeatedly for each new DSL being designed. The authors propose an innovative concept of computer language patterns to capture the well-known recurring notation style often seen in many computer languages. To address the communication problem, they aim for the way of proposing a DSL notation by providing program examples as they would have been written in a desired DSL. As a combination of these two ideas, the chapter presents a method for example-driven DSL notation specification (EDNS), which utilizes computer language patterns for semi-automated inference of a DSL notation specification from the provided program examples.

Download Full-text

The Ethics of the Ethics of AI

The Oxford Handbook of Ethics of AI ◽

10.1093/oxfordhb/9780190067397.013.2 ◽

2020 ◽

pp. 25-51

Author(s):

Thomas M. Powers ◽

Jean-Gabriel Ganascia

Keyword(s):

Artificial Intelligence ◽

Big Data ◽

Data Science ◽

Scientific Explanation ◽

Applied Ethics ◽

Machine Ethics ◽

Ethical Problems ◽

Computational Data ◽

The Common ◽

Novel Approaches

This chapter discusses several challenges for doing the ethics of artificial intelligence (AI). The challenges fall into five major categories: conceptual ambiguities within philosophy and AI scholarship; the estimation of AI risks; implementing machine ethics; epistemic issues of scientific explanation and prediction in what can be called computational data science (CDS), which includes “big data” science; and oppositional versus systemic ethics approaches. The chapter then argues that these ethical problems are not likely to yield to the “common approaches” of applied ethics. Primarily due to the transformational nature of artificial intelligence within science, engineering, and human culture, novel approaches will be needed to address the ethics of AI in the future. Moreover, serious barriers to the formalization of ethics will be needed to overcome to implement ethics in AI.

Download Full-text

Abstraction of Computer Language Patterns

Formal and Practical Aspects of Domain-Specific Languages ◽

10.4018/978-1-4666-2092-6.ch013 ◽

2012 ◽

pp. 365-385

Author(s):

Jaroslav Porubän ◽

Ján Kollár ◽

Miroslav Sabo

Keyword(s):

Computer Language ◽

Common Language ◽

Domain Expert ◽

Domain Experts ◽

Computer Languages ◽

Domain Specific ◽

Language Patterns ◽

Automated Inference ◽

The Common ◽

Technical People

In general, designing a domain-specific language (DSL) is a complicated process, requiring the cooperation of experts from both application domain and computer language development areas. One of the problems that may occur is a communication gap between a domain expert and a language engineer. Since domain experts are usually non-technical people, it might be difficult for them to express requirements on a DSL notation in a technical manner. Another compelling problem is that even though the majority of DSLs share the same notation style for representing the common language constructs, a language engineer has to formulate the specification for these constructs repeatedly for each new DSL being designed. The authors propose an innovative concept of computer language patterns to capture the well-known recurring notation style often seen in many computer languages. To address the communication problem, they aim for the way of proposing a DSL notation by providing program examples as they would have been written in a desired DSL. As a combination of these two ideas, the chapter presents a method for example-driven DSL notation specification (EDNS), which utilizes computer language patterns for semi-automated inference of a DSL notation specification from the provided program examples.

Download Full-text

Four Generations in Data Engineering for Data Science

Datenbank-Spektrum ◽

10.1007/s13222-021-00399-3 ◽

2021 ◽

Author(s):

Meike Klettke ◽

Uta Störl

Keyword(s):

Data Science ◽

Data Curation ◽

Data Driven ◽

Environmental Sciences ◽

Scientific Methods ◽

Domain Experts ◽

The Past ◽

Research Fields ◽

Data Engineering ◽

The Moment

AbstractData-driven methods and data science are important scientific methods in many research fields. All data science approaches require professional data engineering components. At the moment, computer science experts are needed for solving these data engineering tasks. Simultaneously, scientists from many fields (like natural sciences, medicine, environmental sciences, and engineering) want to analyse their data autonomously. The arising task for data engineering is the development of tools that can support an automated data curation and are utilisable for domain experts. In this article, we will introduce four generations of data engineering approaches classifying the data engineering technologies of the past and presence. We will show which data engineering tools are needed for the scientific landscape of the next decade.

Download Full-text

Uncovering host-microbiome interactions in global systems with collaborative programming: a novel approach integrating social and data sciences

F1000Research ◽

10.12688/f1000research.26459.1 ◽

2020 ◽

Vol 9 ◽

pp. 1478

Author(s):

Jenna Oberstaller ◽

Swamy Rakesh Adapa ◽

Guy W. Dayhoff II ◽

Justin Gibbons ◽

Thomas E. Keller ◽

...

Keyword(s):

Data Science ◽

South Florida ◽

Technological Advancement ◽

Real World Data ◽

Domain Experts ◽

Novel Approach ◽

Microbiome Research ◽

Data Output ◽

Microbiome Data ◽

Global Systems

Microbiome data are undergoing exponential growth powered by rapid technological advancement. As the scope and depth of microbiome research increases, cross-disciplinary research is urgently needed for interpreting and harnessing the unprecedented data output. However, conventional research settings pose challenges to much-needed interdisciplinary research efforts due to barriers in scientific terminologies, methodology and research-culture. To breach these barriers, our University of South Florida OneHealth Codeathon was designed to be an interactive, hands-on event that solves real-world data problems. The format brought together students, postdocs, faculty, researchers, and clinicians in a uniquely cross-disciplinary, team-focused setting. Teams were formed to encourage equitable distribution of diverse domain-experts and proficient programmers, with beginners to experts on each team. To unify the intellectual framework, we set the focus on the topics of microbiome interactions at different scales from clinical to environmental sciences, leveraging local expertise in the fields of genetics, genomics, clinical data, and social and geospatial sciences. As a result, teams developed working methods and pipelines to face major challenges in current microbiome research, including data integration, experimental power calculations, geospatial mapping, and machine-learning classifiers. This broad, transdisciplinary and efficient workflow will be an example for future workshops to deliver useful data-science products.

Download Full-text

Analysis Process MILAR: Mining Indirect Least Association Rule Algorithm

Journal of Computer Science and Information Technology ◽

10.35134/jcsitech.v7i3.6 ◽

2021 ◽

pp. 39-45

Author(s):

Zailani Abdullah ◽

Aggy Gusman ◽

Tutut Herawan ◽

Mustafa Mat Deris

Keyword(s):

Association Rules ◽

Association Rule ◽

Experimental Results ◽

Common Rule ◽

Domain Experts ◽

Meaningful Information ◽

Analysis Process ◽

The Common ◽

Indirect Association ◽

New Perspective

One of the interesting and meaningful information that is hiding in transactional database is indirect association rule. It corresponds to the property of high dependencies between two items that are rarely occurred together but indirectly emerged via another items. Since indirect association rule is nontrivial information, it can implicitly give a new perspective of relationship which cannot be directly observed from the common rule. Therefore, we proposed an algorithm for Mining Indirect Least Association Rule (MILAR) from the real and benchmarked datasets. MILAR is embedded with our scalable least measure namely Critical Relative Support (CRS). The experimental results show that MILAR can generate the desired rules in term of least and indirect least association rules. In addition, the obtained results can also be used by the domain experts to do further analysis and finally reveal more interesting findings

Download Full-text

Cryptanalysis of an encrypted database in SIGMOD '14

Proceedings of the VLDB Endowment ◽

10.14778/3467861.3467865 ◽

2021 ◽

Vol 14 (10) ◽

pp. 1743-1755

Author(s):

Xinle Cao ◽

Jian Liu ◽

Hao Lu ◽

Kui Ren

Keyword(s):

Real World ◽

Service Provider ◽

Innovative Technology ◽

Data Confidentiality ◽

Data Interoperability ◽

Encrypted Data ◽

Complex Queries ◽

Data Owner ◽

The Common ◽

Encrypted Database

Encrypted database is an innovative technology proposed to solve the data confidentiality issue in cloud-based DB systems. It allows a data owner to encrypt its database before uploading it to the service provider; and it allows the service provider to execute SQL queries over the encrypted data. Most of existing encrypted databases (e.g., CryptDB in SOSP '11) do not support data interoperability: unable to process complex queries that require piping the output of one operation to another. To the best of our knowledge, SDB (SIGMOD '14) is the only encrypted database that achieves data interoperability. Unfortunately, we found SDB is not secure! In this paper, we revisit the security of SDB and propose a ciphertext-only attack named co-prime attack. It successfully attacks the common operations supported by SDB, including addition, comparison, sum, equi-join and group-by. We evaluate our attack in three real-world benchmarks. For columns that support addition and comparison , we recover 84.9% -- 99.9% plaintexts. For columns that support sum, equi-join and group-by , we recover 100% plaintexts. Besides, we provide potential countermeasures that can prevent the attacks against sum, equi-join, group-by and addition. It is still an open problem to prevent the attack against comparison.

Download Full-text