Creating domain specific metadata for scientific data and knowledge bases

1991 ◽  
Vol 3 (4) ◽  
pp. 421-434 ◽  
Author(s):  
J. Diederich ◽  
J. Milton
Author(s):  
Aparna S. Varde ◽  
Shuhui Ma ◽  
Mohammed Maniruzzaman ◽  
David C. Brown ◽  
Elke A. Rundensteiner ◽  
...  

AbstractScientific data is often analyzed in the context of domain-specific problems, for example, failure diagnostics, predictive analysis, and computational estimation. These problems can be solved using approaches such as mathematical models or heuristic methods. In this paper we compare a heuristic approach based on mining stored data with a mathematical approach based on applying state-of-the-art formulae to solve an estimation problem. The goal is to estimate results of scientific experiments given their input conditions. We present a comparative study based on sample space, time complexity, and data storage with respect to a real application in materials science. Performance evaluation with real materials science data is also presented, taking into account accuracy and efficiency. We find that both approaches have their pros and cons in computational estimation. Similar arguments can be applied to other scientific problems such as failure diagnostics and predictive analysis. In the estimation problem in this paper, heuristic methods outperform mathematical models.


2019 ◽  
Vol 21 (6) ◽  
pp. 1937-1953 ◽  
Author(s):  
Jussi Paananen ◽  
Vittorio Fortino

Abstract The drug discovery process starts with identification of a disease-modifying target. This critical step traditionally begins with manual investigation of scientific literature and biomedical databases to gather evidence linking molecular target to disease, and to evaluate the efficacy, safety and commercial potential of the target. The high-throughput and affordability of current omics technologies, allowing quantitative measurements of many putative targets (e.g. DNA, RNA, protein, metabolite), has exponentially increased the volume of scientific data available for this arduous task. Therefore, computational platforms identifying and ranking disease-relevant targets from existing biomedical data sources, including omics databases, are needed. To date, more than 30 drug target discovery (DTD) platforms exist. They provide information-rich databases and graphical user interfaces to help scientists identify putative targets and pre-evaluate their therapeutic efficacy and potential side effects. Here we survey and compare a set of popular DTD platforms that utilize multiple data sources and omics-driven knowledge bases (either directly or indirectly) for identifying drug targets. We also provide a description of omics technologies and related data repositories which are important for DTD tasks.


2007 ◽  
Vol 33 (1) ◽  
pp. 41-61 ◽  
Author(s):  
Diego Mollá ◽  
José Luis Vicedo

Automated question answering has been a topic of research and development since the earliest AI applications. Computing power has increased since the first such systems were developed, and the general methodology has changed from the use of hand-encoded knowledge bases about simple domains to the use of text collections as the main knowledge source over more complex domains. Still, many research issues remain. The focus of this article is on the use of restricted domains for automated question answering. The article contains a historical perspective on question answering over restricted domains and an overview of the current methods and applications used in restricted domains. A main characteristic of question answering in restricted domains is the integration of domain-specific information that is either developed for question answering or that has been developed for other purposes. We explore the main methods developed to leverage this domain-specific information.


Author(s):  
WILLIAM H. WOOD ◽  
HUI DONG ◽  
CLIVE L. DYM

Design couples synthesis and analysis in iterative cycles, alternatively generating solutions, and evaluating their validity. The accuracy and depth of evaluation has increased markedly because of the availability of powerful simulation tools and the development of domain-specific knowledge bases. Efforts to extend the state of the art in evaluation have unfortunately been carried out in stovepipe fashion, depending on domain-specific views both of function and of what constitutes “good” design. Although synthesis as practiced by humans is an intentional process that centers on the notion of function, computational synthesis often eschews such intention for sheer permutation. Rather than combining synthesis and analysis to form an integrated design environment, current methods focus on comprehensive search for solutions within highly circumscribed subdomains of design. This paper presents an overview of the progress made in representing design function across abstraction levels proven useful to human designers. Through an example application in the domain of mechatronics, these representations are integrated across domains and throughout the design process.


2015 ◽  
Vol 48 (1) ◽  
pp. 301-305 ◽  
Author(s):  
Mark Könnecke ◽  
Frederick A. Akeroyd ◽  
Herbert J. Bernstein ◽  
Aaron S. Brewster ◽  
Stuart I. Campbell ◽  
...  

NeXus is an effort by an international group of scientists to define a common data exchange and archival format for neutron, X-ray and muon experiments. NeXus is built on top of the scientific data format HDF5 and adds domain-specific rules for organizing data within HDF5 files, in addition to a dictionary of well defined domain-specific field names. The NeXus data format has two purposes. First, it defines a format that can serve as a container for all relevant data associated with a beamline. This is a very important use case. Second, it defines standards in the form of application definitions for the exchange of data between applications. NeXus provides structures for raw experimental data as well as for processed data.


2004 ◽  
Vol 13 (03) ◽  
pp. 721-738 ◽  
Author(s):  
XIAOYING GAO ◽  
MENGJIE ZHANG

This paper describes a learning/adaptive approach to automatically building knowledge bases for information extraction from text based web pages. A frame based representation is introduced to represent domain knowledge as knowledge unit frames. A frame learning algorithm is developed to automatically learn knowledge unit frames from training examples. Some training examples can be obtained by automatically parsing a number of tabular web pages in the same domain, which greatly reduced the amount of time consuming manual work. This approach was investigated on ten web sites of real estate advertisements and car advertisements and nearly all the information was successfully extracted with very few false alarms. These results suggest that both the knowledge unit frame representation and the frame learning algorithm work well, domain specific knowledge bases can be learned from training examples, and the domain specific knowledge base can be used for information extraction from flexible text-based semi-structured Web pages on multiple Web sites. The investigation of the knowledge representation on five other domains suggests that this approach can be easily applied to other domains by simply changing the training examples.


2014 ◽  
Vol 9 (3) ◽  
Author(s):  
Jingjun Ge ◽  
Changjun Hu ◽  
Xin Liu ◽  
Wei Lin ◽  
Haolei Zuo

2020 ◽  
Author(s):  
Chad Trabant ◽  
Rick Benson ◽  
Rob Casey ◽  
Gillian Sharer ◽  
Jerry Carter

<p>The data center of the National Science Foundation’s Seismological Facility for the Advancement of Geoscience (SAGE), operated by IRIS Data Services, has evolved over the past 30 years to address the data accessibility needs of the scientific research community.  In recent years a broad call for adherence to FAIR data principles has prompted repositories to increased activity to support them. As these principles are well aligned with the needs of data users, many of the FAIR principles are already supported and actively promoted by IRIS.  Standardized metadata and data identifiers support findability. Open and standardized web services enable a high degree of accessibility. Interoperability is ensured by offering data in a combination of rich, domain-specific formats in addition to simple, text-based formats. The use of open, rich (domain-specific) format standards enables a high degree of reuse.  Further advancement towards these principles includes: an introduction and dissemination of DOIs for data; and an introduction of Linked Data support, via JSON-LD, allowing scientific data brokers, catalogers and generic search systems to discover data. Naturally, some challenges remain such as: the granularity and mechanisms needed for persistent IDs for data; the reality that metadata is updated with corrections (having implications for FAIR data principles); and the complexity of data licensing in a repository with data contributed from individual PIs, national observatories, and international collaborations.  In summary, IRIS Data Services is well along the path of adherence of FAIR data principles with more work to do. We will present the current status of these efforts and describe the key challenges that remain.</p>


2021 ◽  
Author(s):  
N.O. Dorodnykh ◽  
Y.V. Kotlov ◽  
O.A. Nikolaychuk ◽  
V.M. Popov ◽  
A.Y. Yurin

The complexity of creating artificial intelligence applications remains high. One of the factors that cause such complexity is the high qualification requirements for developers in the field of programming. Development complexity can be reduced by using methods and tools based on a paradigm known as End-user development. One of the problems that requires the application of the methods of this paradigm is the development of intelligent systems for supporting the search and troubleshooting onboard aircraft. Some tasks connected with this problem are identified, including the task of dynamic formation of task cards for troubleshooting in terms of forming a list of operations. This paper presents a solution to this problem based on some principles of End-user development: model-driven development, visual programming, and wizard form-filling. In particular, an extension of the Prototyping expert systems based on transformations technology, which implements the End-user development, is proposed in the context of the problem to be solved for Sukhoi Superjet aircraft. The main contribution of the work is as follows: expanded the main technology method by supporting event trees formalism (as a popular expert method for formalizing scenarios for the development of problem situations and their localization); created a domain-specific tool (namely, Extended event tree editor) for building standard and extended event trees, including for diagnostic tasks; developed a module for supporting transformations of XML-like event tree representation format for the knowledge base prototyping system – Personal knowledge base designer. A description of the proposed extension and the means of its implementation, as well as an illustrative example, are provided.


Sign in / Sign up

Export Citation Format

Share Document