Domain-Specific Multi-Level IR Rewriting for GPU

Most compilers have a single core intermediate representation (IR) (e.g., LLVM) sometimes complemented with vaguely defined IR-like data structures. This IR is commonly low-level and close to machine instructions. As a result, optimizations relying on domain-specific information are either not possible or require complex analysis to recover the missing information. In contrast, multi-level rewriting instantiates a hierarchy of dialects (IRs), lowers programs level-by-level, and performs code transformations at the most suitable level. We demonstrate the effectiveness of this approach for the weather and climate domain. In particular, we develop a prototype compiler and design stencil- and GPU-specific dialects based on a set of newly introduced design principles. We find that two domain-specific optimizations (500 lines of code) realized on top of LLVM’s extensible MLIR compiler infrastructure suffice to outperform state-of-the-art solutions. In essence, multi-level rewriting promises to herald the age of specialized compilers composed from domain- and target-specific dialects implemented on top of a shared infrastructure.

Download Full-text

PharmKG: a dedicated knowledge graph benchmark for bomedical data mining

Briefings in Bioinformatics ◽

10.1093/bib/bbaa344 ◽

2020 ◽

Author(s):

Shuangjia Zheng ◽

Jiahua Rao ◽

Ying Song ◽

Jixian Zhang ◽

Xianglu Xiao ◽

...

Keyword(s):

Chemical Structure ◽

State Of The Art ◽

Critical Role ◽

Global Network ◽

Evaluation Metrics ◽

Complex Nature ◽

Specific Information ◽

Biomedical Knowledge ◽

Domain Specific ◽

Modeling Methods

Abstract Biomedical knowledge graphs (KGs), which can help with the understanding of complex biological systems and pathologies, have begun to play a critical role in medical practice and research. However, challenges remain in their embedding and use due to their complex nature and the specific demands of their construction. Existing studies often suffer from problems such as sparse and noisy datasets, insufficient modeling methods and non-uniform evaluation metrics. In this work, we established a comprehensive KG system for the biomedical field in an attempt to bridge the gap. Here, we introduced PharmKG, a multi-relational, attributed biomedical KG, composed of more than 500 000 individual interconnections between genes, drugs and diseases, with 29 relation types over a vocabulary of ~8000 disambiguated entities. Each entity in PharmKG is attached with heterogeneous, domain-specific information obtained from multi-omics data, i.e. gene expression, chemical structure and disease word embedding, while preserving the semantic and biomedical features. For baselines, we offered nine state-of-the-art KG embedding (KGE) approaches and a new biological, intuitive, graph neural network-based KGE method that uses a combination of both global network structure and heterogeneous domain features. Based on the proposed benchmark, we conducted extensive experiments to assess these KGE models using multiple evaluation metrics. Finally, we discussed our observations across various downstream biological tasks and provide insights and guidelines for how to use a KG in biomedicine. We hope that the unprecedented quality and diversity of PharmKG will lead to advances in biomedical KG construction, embedding and application.

Download Full-text

Contextualized Filtering for Shared Cyber Threat Information

Sensors ◽

10.3390/s21144890 ◽

2021 ◽

Vol 21 (14) ◽

pp. 4890

Author(s):

Athanasios Dimitriadis ◽

Christos Prassas ◽

Jose Luis Flores ◽

Boonserm Kulvatunyou ◽

Nenad Ivezic ◽

...

Keyword(s):

Information Sharing ◽

Business Processes ◽

State Of The Art ◽

Contextual Information ◽

Coarse Grained ◽

Business Information ◽

Cyber Threat ◽

Domain Expertise ◽

Multi Level ◽

Filtering Approach

Cyber threat information sharing is an imperative process towards achieving collaborative security, but it poses several challenges. One crucial challenge is the plethora of shared threat information. Therefore, there is a need to advance filtering of such information. While the state-of-the-art in filtering relies primarily on keyword- and domain-based searching, these approaches require sizable human involvement and rarely available domain expertise. Recent research revealed the need for harvesting of business information to fill the gap in filtering, albeit it resulted in providing coarse-grained filtering based on the utilization of such information. This paper presents a novel contextualized filtering approach that exploits standardized and multi-level contextual information of business processes. The contextual information describes the conditions under which a given threat information is actionable from an organization perspective. Therefore, it can automate filtering by measuring the equivalence between the context of the shared threat information and the context of the consuming organization. The paper directly contributes to filtering challenge and indirectly to automated customized threat information sharing. Moreover, the paper proposes the architecture of a cyber threat information sharing ecosystem that operates according to the proposed filtering approach and defines the characteristics that are advantageous to filtering approaches. Implementation of the proposed approach can support compliance with the Special Publication 800-150 of the National Institute of Standards and Technology.

Download Full-text

Knowledge Enhanced LSTM for Coreference Resolution on Biomedical Texts

Bioinformatics ◽

10.1093/bioinformatics/btab153 ◽

2021 ◽

Author(s):

Yufei Li ◽

Xiaoyong Ma ◽

Xiangyu Zhou ◽

Pengzhen Cheng ◽

Kai He ◽

...

Keyword(s):

Information Integration ◽

Short Term Memory ◽

Superior Performance ◽

Supplementary Information ◽

Specific Information ◽

Coreference Resolution ◽

Fine Grained ◽

Domain Specific ◽

Memory Network ◽

Biomedical Texts

Abstract Motivation Bio-entity Coreference Resolution focuses on identifying the coreferential links in biomedical texts, which is crucial to complete bio-events’ attributes and interconnect events into bio-networks. Previously, as one of the most powerful tools, deep neural network-based general domain systems are applied to the biomedical domain with domain-specific information integration. However, such methods may raise much noise due to its insufficiency of combining context and complex domain-specific information. Results In this paper, we explore how to leverage the external knowledge base in a fine-grained way to better resolve coreference by introducing a knowledge-enhanced Long Short Term Memory network (LSTM), which is more flexible to encode the knowledge information inside the LSTM. Moreover, we further propose a knowledge attention module to extract informative knowledge effectively based on contexts. The experimental results on the BioNLP and CRAFT datasets achieve state-of-the-art performance, with a gain of 7.5 F1 on BioNLP and 10.6 F1 on CRAFT. Additional experiments also demonstrate superior performance on the cross-sentence coreferences. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

PROTEIN STRUCTURE ALIGNMENT AND FAST SIMILARITY SEARCH USING LOCAL SHAPE SIGNATURES

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720004000533 ◽

2004 ◽

Vol 02 (01) ◽

pp. 215-239 ◽

Cited By ~ 4

Author(s):

TOLGA CAN ◽

YUAN-FANG WANG

Keyword(s):

Protein Structure ◽

Protein Structures ◽

Structure Alignment ◽

Protein Structure Alignment ◽

Specific Information ◽

Alignment Algorithm ◽

Screening Process ◽

Domain Specific ◽

Local Sequence ◽

Shape Signatures

We present a new method for conducting protein structure similarity searches, which improves on the efficiency of some existing techniques. Our method is grounded in the theory of differential geometry on 3D space curve matching. We generate shape signatures for proteins that are invariant, localized, robust, compact, and biologically meaningful. The invariancy of the shape signatures allows us to improve similarity searching efficiency by adopting a hierarchical coarse-to-fine strategy. We index the shape signatures using an efficient hashing-based technique. With the help of this technique we screen out unlikely candidates and perform detailed pairwise alignments only for a small number of candidates that survive the screening process. Contrary to other hashing based techniques, our technique employs domain specific information (not just geometric information) in constructing the hash key, and hence, is more tuned to the domain of biology. Furthermore, the invariancy, localization, and compactness of the shape signatures allow us to utilize a well-known local sequence alignment algorithm for aligning two protein structures. One measure of the efficacy of the proposed technique is that we were able to perform structure alignment queries 36 times faster (on the average) than a well-known method while keeping the quality of the query results at an approximately similar level.

Download Full-text

The Three-Cornered Hat Method for Estimating Error Variances of Three or More Atmospheric Data Sets – Part II: Evaluating Radio Occultation and Radiosonde Observations, Global Model Forecasts, and Reanalyses

Journal of Atmospheric and Oceanic Technology ◽

10.1175/jtech-d-20-0209.1 ◽

2021 ◽

Author(s):

Therese Rieckh ◽

Jeremiah P. Sjoberg ◽

Richard A. Anthes

Keyword(s):

Radio Occultation ◽

State Of The Art ◽

Specific Humidity ◽

Data Sets ◽

Error Growth ◽

Atmospheric Conditions ◽

Error Statistics ◽

Weather And Climate ◽

Atmospheric Data ◽

The Impact

AbstractWe apply the three-cornered hat (3CH) method to estimate refractivity, bending angle, and specific humidity error variances for a number of data sets widely used in research and/or operations: radiosondes, radio occultation (COSMIC, COSMIC-2), NCEP global forecasts, and nine reanalyses. We use a large number and combinations of data sets to obtain insights into the impact of the error correlations among different data sets that affect 3CH estimates. Error correlations may be caused by actual correlations of errors, representativeness differences, or imperfect co-location of the data sets. We show that the 3CH method discriminates among the data sets and how error statistics of observations compare to state-of-the-art reanalyses and forecasts, as well as reanalyses that do not assimilate satellite data. We explore results for October and November 2006 and 2019 over different latitudinal regions and show error growth of the NCEP forecasts with time. Because of the importance of tropospheric water vapor to weather and climate, we compare error estimates of refractivity for dry and moist atmospheric conditions.

Download Full-text

Using a virtual machine environment for developing, testing, and training for the UM-UKCA composition-climate model, using Unified Model version 10.9 and above

Geoscientific Model Development ◽

10.5194/gmd-11-3647-2018 ◽

2018 ◽

Vol 11 (9) ◽

pp. 3647-3657 ◽

Cited By ~ 1

Author(s):

Nathan Luke Abraham ◽

Alexander T. Archibald ◽

Paul Cresswell ◽

Sam Cusworth ◽

Mohit Dalvi ◽

...

Keyword(s):

Virtual Machine ◽

Climate Model ◽

State Of The Art ◽

Model Development ◽

Unified Model ◽

Model Version ◽

Weather And Climate ◽

Interactive Composition ◽

Comprehensive Test ◽

And Training

Abstract. The Met Office Unified Model (UM) is a state-of-the-art weather and climate model that is used operationally worldwide. UKCA is the chemistry and aerosol sub model of the UM that enables interactive composition and physical atmosphere interactions, but which adds an additional 120 000 lines of code to the model. Ensuring that the UM code and UM-UKCA (the UM running with interactive chemistry and aerosols) is well tested is thus essential. While a comprehensive test harness is in place at the Met Office and partner sites to aid in development, this is not available to many UM users. Recently, the Met Office have made available a virtual machine environment that can be used to run the UM on a desktop or laptop PC. Here we describe the development of a UM-UKCA configuration that is able to run within this virtual machine while only needing 6 GB of memory, before discussing the applications of this system for model development, testing, and training.

Download Full-text

GPS Based Service Provider Server for Various Classified Domain Specific Information to the End Users at Distributed Locations with Consistency

2016 International Conference on Computing Communication Control and automation (ICCUBEA) ◽

10.1109/iccubea.2016.7859999 ◽

2016 ◽

Author(s):

Suryakant B Patil ◽

Shubham Ashok Zawar ◽

Sharma Prachi

Keyword(s):

Service Provider ◽

End Users ◽

Specific Information ◽

Domain Specific

Download Full-text

DISSECT: DISentangle SharablE ConTent for Multimodal Integration and Crosswise-mapping

10.1101/2020.09.04.283234 ◽

2020 ◽

Author(s):

Geoffrey Schau ◽

Erik Burlingame ◽

Young Hwan Chang

Keyword(s):

Deep Learning ◽

Complete Information ◽

Specific Information ◽

Multimodal Integration ◽

Specific Data ◽

Domain Specific ◽

Cross Domain ◽

Input Feature ◽

Novel Approach ◽

Latent Representations

AbstractDeep learning systems have emerged as powerful mechanisms for learning domain translation models. However, in many cases, complete information in one domain is assumed to be necessary for sufficient cross-domain prediction. In this work, we motivate a formal justification for domain-specific information separation in a simple linear case and illustrate that a self-supervised approach enables domain translation between data domains while filtering out domain-specific data features. We introduce a novel approach to identify domainspecific information from sets of unpaired measurements in complementary data domains by considering a deep learning cross-domain autoencoder architecture designed to learn shared latent representations of data while enabling domain translation. We introduce an orthogonal gate block designed to enforce orthogonality of input feature sets by explicitly removing non-sharable information specific to each domain and illustrate separability of domain-specific information on a toy dataset.

Download Full-text

Fixel-based Analysis of Diffusion MRI: Methods, Applications, Challenges and Opportunities

10.31219/osf.io/zu8fv ◽

2020 ◽

Author(s):

Thijs Dhollander ◽

Adam Clemente ◽

Mervyn Singh ◽

Frederique Boonstra ◽

Oren Civier ◽

...

Keyword(s):

Diffusion Mri ◽

State Of The Art ◽

Specific Information ◽

Analysis Framework ◽

Wide Range ◽

Challenges And Opportunities ◽

P Diffusion ◽

Modelling Techniques ◽

The Rich

Diffusion MRI has provided the neuroimaging community with a powerful tool to acquire in-vivo data sensitive to microstructural features of white matter, up to 3 orders of magnitude smaller than typical voxel sizes. The key to extracting such valuable information lies in complex modelling techniques, which form the link between the rich diffusion MRI data and various metrics related to the microstructural organisation. Over time, increasingly advanced techniques have been developed, up to the point where some diffusion MRI models can now provide access to properties specific to individual fibre populations in each voxel in the presence of multiple "crossing" fibre pathways. While highly valuable, such fibre-specific information poses unique challenges for typical image processing pipelines and statistical analysis. In this work, we review the "fixel-based analysis" (FBA) framework that implements bespoke solutions to this end, and has recently seen a stark increase in adoption for studies of both typical (healthy) populations as well as a wide range of clinical populations. We describe the main concepts related to fixel-based analyses, as well as the methods and specific steps involved in a state-of-the-art FBA pipeline, with a focus on providing researchers with practical advice on how to interpret results. We also include an overview of the scope of current fixel-based analysis studies (until August 2020), categorised across a broad range of neuroscientific domains, listing key design choices and summarising their main results and conclusions. Finally, we critically discuss several aspects and challenges involved with the fixel-based analysis framework, and outline some directions and future opportunities.

Download Full-text

Measuring Models

Model-Driven Software Development ◽

10.4018/978-1-60566-006-6.ch007 ◽

2009 ◽

pp. 147-169 ◽

Cited By ~ 6

Author(s):

Martin Monperrus ◽

Jean-Marc Jézéquel ◽

Joël Champeau ◽

Brigitte Hoeltzener

Keyword(s):

Quality Assurance ◽

Software Development ◽

Point Of View ◽

Specific Information ◽

Model Driven Engineering ◽

Text Documents ◽

Model Driven ◽

Domain Specific ◽

Wide Range

Model-Driven Engineering (MDE) is an approach to software development that uses models as primary artifacts, from which code, documentation and tests are derived. One way of assessing quality assurance in a given domain is to define domain metrics. We show that some of these metrics are supported by models. As text documents, models can be considered from a syntactic point of view i.e., thought of as graphs. We can readily apply graph-based metrics to them, such as the number of nodes, the number of edges or the fan-in/fan-out distributions. However, these metrics cannot leverage the semantic structuring enforced by each specific metamodel to give domain specific information. Contrary to graph-based metrics, more specific metrics do exist for given domains (such as LOC for programs), but they lack genericity. Our contribution is to propose one metric, called s, that is generic over metamodels and allows the easy specification of an open-ended wide range of model metrics.

Download Full-text