scholarly journals Domain-Specific Multi-Level IR Rewriting for GPU

2021 ◽  
Vol 18 (4) ◽  
pp. 1-23
Author(s):  
Tobias Gysi ◽  
Christoph Müller ◽  
Oleksandr Zinenko ◽  
Stephan Herhut ◽  
Eddie Davis ◽  
...  

Most compilers have a single core intermediate representation (IR) (e.g., LLVM) sometimes complemented with vaguely defined IR-like data structures. This IR is commonly low-level and close to machine instructions. As a result, optimizations relying on domain-specific information are either not possible or require complex analysis to recover the missing information. In contrast, multi-level rewriting instantiates a hierarchy of dialects (IRs), lowers programs level-by-level, and performs code transformations at the most suitable level. We demonstrate the effectiveness of this approach for the weather and climate domain. In particular, we develop a prototype compiler and design stencil- and GPU-specific dialects based on a set of newly introduced design principles. We find that two domain-specific optimizations (500 lines of code) realized on top of LLVM’s extensible MLIR compiler infrastructure suffice to outperform state-of-the-art solutions. In essence, multi-level rewriting promises to herald the age of specialized compilers composed from domain- and target-specific dialects implemented on top of a shared infrastructure.

Author(s):  
Shuangjia Zheng ◽  
Jiahua Rao ◽  
Ying Song ◽  
Jixian Zhang ◽  
Xianglu Xiao ◽  
...  

Abstract Biomedical knowledge graphs (KGs), which can help with the understanding of complex biological systems and pathologies, have begun to play a critical role in medical practice and research. However, challenges remain in their embedding and use due to their complex nature and the specific demands of their construction. Existing studies often suffer from problems such as sparse and noisy datasets, insufficient modeling methods and non-uniform evaluation metrics. In this work, we established a comprehensive KG system for the biomedical field in an attempt to bridge the gap. Here, we introduced PharmKG, a multi-relational, attributed biomedical KG, composed of more than 500 000 individual interconnections between genes, drugs and diseases, with 29 relation types over a vocabulary of ~8000 disambiguated entities. Each entity in PharmKG is attached with heterogeneous, domain-specific information obtained from multi-omics data, i.e. gene expression, chemical structure and disease word embedding, while preserving the semantic and biomedical features. For baselines, we offered nine state-of-the-art KG embedding (KGE) approaches and a new biological, intuitive, graph neural network-based KGE method that uses a combination of both global network structure and heterogeneous domain features. Based on the proposed benchmark, we conducted extensive experiments to assess these KGE models using multiple evaluation metrics. Finally, we discussed our observations across various downstream biological tasks and provide insights and guidelines for how to use a KG in biomedicine. We hope that the unprecedented quality and diversity of PharmKG will lead to advances in biomedical KG construction, embedding and application.


Sensors ◽  
2021 ◽  
Vol 21 (14) ◽  
pp. 4890
Author(s):  
Athanasios Dimitriadis ◽  
Christos Prassas ◽  
Jose Luis Flores ◽  
Boonserm Kulvatunyou ◽  
Nenad Ivezic ◽  
...  

Cyber threat information sharing is an imperative process towards achieving collaborative security, but it poses several challenges. One crucial challenge is the plethora of shared threat information. Therefore, there is a need to advance filtering of such information. While the state-of-the-art in filtering relies primarily on keyword- and domain-based searching, these approaches require sizable human involvement and rarely available domain expertise. Recent research revealed the need for harvesting of business information to fill the gap in filtering, albeit it resulted in providing coarse-grained filtering based on the utilization of such information. This paper presents a novel contextualized filtering approach that exploits standardized and multi-level contextual information of business processes. The contextual information describes the conditions under which a given threat information is actionable from an organization perspective. Therefore, it can automate filtering by measuring the equivalence between the context of the shared threat information and the context of the consuming organization. The paper directly contributes to filtering challenge and indirectly to automated customized threat information sharing. Moreover, the paper proposes the architecture of a cyber threat information sharing ecosystem that operates according to the proposed filtering approach and defines the characteristics that are advantageous to filtering approaches. Implementation of the proposed approach can support compliance with the Special Publication 800-150 of the National Institute of Standards and Technology.


Author(s):  
Yufei Li ◽  
Xiaoyong Ma ◽  
Xiangyu Zhou ◽  
Pengzhen Cheng ◽  
Kai He ◽  
...  

Abstract Motivation Bio-entity Coreference Resolution focuses on identifying the coreferential links in biomedical texts, which is crucial to complete bio-events’ attributes and interconnect events into bio-networks. Previously, as one of the most powerful tools, deep neural network-based general domain systems are applied to the biomedical domain with domain-specific information integration. However, such methods may raise much noise due to its insufficiency of combining context and complex domain-specific information. Results In this paper, we explore how to leverage the external knowledge base in a fine-grained way to better resolve coreference by introducing a knowledge-enhanced Long Short Term Memory network (LSTM), which is more flexible to encode the knowledge information inside the LSTM. Moreover, we further propose a knowledge attention module to extract informative knowledge effectively based on contexts. The experimental results on the BioNLP and CRAFT datasets achieve state-of-the-art performance, with a gain of 7.5 F1 on BioNLP and 10.6 F1 on CRAFT. Additional experiments also demonstrate superior performance on the cross-sentence coreferences. Supplementary information Supplementary data are available at Bioinformatics online.


2004 ◽  
Vol 02 (01) ◽  
pp. 215-239 ◽  
Author(s):  
TOLGA CAN ◽  
YUAN-FANG WANG

We present a new method for conducting protein structure similarity searches, which improves on the efficiency of some existing techniques. Our method is grounded in the theory of differential geometry on 3D space curve matching. We generate shape signatures for proteins that are invariant, localized, robust, compact, and biologically meaningful. The invariancy of the shape signatures allows us to improve similarity searching efficiency by adopting a hierarchical coarse-to-fine strategy. We index the shape signatures using an efficient hashing-based technique. With the help of this technique we screen out unlikely candidates and perform detailed pairwise alignments only for a small number of candidates that survive the screening process. Contrary to other hashing based techniques, our technique employs domain specific information (not just geometric information) in constructing the hash key, and hence, is more tuned to the domain of biology. Furthermore, the invariancy, localization, and compactness of the shape signatures allow us to utilize a well-known local sequence alignment algorithm for aligning two protein structures. One measure of the efficacy of the proposed technique is that we were able to perform structure alignment queries 36 times faster (on the average) than a well-known method while keeping the quality of the query results at an approximately similar level.


Author(s):  
Therese Rieckh ◽  
Jeremiah P. Sjoberg ◽  
Richard A. Anthes

AbstractWe apply the three-cornered hat (3CH) method to estimate refractivity, bending angle, and specific humidity error variances for a number of data sets widely used in research and/or operations: radiosondes, radio occultation (COSMIC, COSMIC-2), NCEP global forecasts, and nine reanalyses. We use a large number and combinations of data sets to obtain insights into the impact of the error correlations among different data sets that affect 3CH estimates. Error correlations may be caused by actual correlations of errors, representativeness differences, or imperfect co-location of the data sets. We show that the 3CH method discriminates among the data sets and how error statistics of observations compare to state-of-the-art reanalyses and forecasts, as well as reanalyses that do not assimilate satellite data. We explore results for October and November 2006 and 2019 over different latitudinal regions and show error growth of the NCEP forecasts with time. Because of the importance of tropospheric water vapor to weather and climate, we compare error estimates of refractivity for dry and moist atmospheric conditions.


2018 ◽  
Vol 11 (9) ◽  
pp. 3647-3657 ◽  
Author(s):  
Nathan Luke Abraham ◽  
Alexander T. Archibald ◽  
Paul Cresswell ◽  
Sam Cusworth ◽  
Mohit Dalvi ◽  
...  

Abstract. The Met Office Unified Model (UM) is a state-of-the-art weather and climate model that is used operationally worldwide. UKCA is the chemistry and aerosol sub model of the UM that enables interactive composition and physical atmosphere interactions, but which adds an additional 120 000 lines of code to the model. Ensuring that the UM code and UM-UKCA (the UM running with interactive chemistry and aerosols) is well tested is thus essential. While a comprehensive test harness is in place at the Met Office and partner sites to aid in development, this is not available to many UM users. Recently, the Met Office have made available a virtual machine environment that can be used to run the UM on a desktop or laptop PC. Here we describe the development of a UM-UKCA configuration that is able to run within this virtual machine while only needing 6 GB of memory, before discussing the applications of this system for model development, testing, and training.


2020 ◽  
Author(s):  
Geoffrey Schau ◽  
Erik Burlingame ◽  
Young Hwan Chang

AbstractDeep learning systems have emerged as powerful mechanisms for learning domain translation models. However, in many cases, complete information in one domain is assumed to be necessary for sufficient cross-domain prediction. In this work, we motivate a formal justification for domain-specific information separation in a simple linear case and illustrate that a self-supervised approach enables domain translation between data domains while filtering out domain-specific data features. We introduce a novel approach to identify domainspecific information from sets of unpaired measurements in complementary data domains by considering a deep learning cross-domain autoencoder architecture designed to learn shared latent representations of data while enabling domain translation. We introduce an orthogonal gate block designed to enforce orthogonality of input feature sets by explicitly removing non-sharable information specific to each domain and illustrate separability of domain-specific information on a toy dataset.


2020 ◽  
Author(s):  
Thijs Dhollander ◽  
Adam Clemente ◽  
Mervyn Singh ◽  
Frederique Boonstra ◽  
Oren Civier ◽  
...  

Diffusion MRI has provided the neuroimaging community with a powerful tool to acquire in-vivo data sensitive to microstructural features of white matter, up to 3 orders of magnitude smaller than typical voxel sizes. The key to extracting such valuable information lies in complex modelling techniques, which form the link between the rich diffusion MRI data and various metrics related to the microstructural organisation. Over time, increasingly advanced techniques have been developed, up to the point where some diffusion MRI models can now provide access to properties specific to individual fibre populations in each voxel in the presence of multiple "crossing" fibre pathways. While highly valuable, such fibre-specific information poses unique challenges for typical image processing pipelines and statistical analysis. In this work, we review the "fixel-based analysis" (FBA) framework that implements bespoke solutions to this end, and has recently seen a stark increase in adoption for studies of both typical (healthy) populations as well as a wide range of clinical populations. We describe the main concepts related to fixel-based analyses, as well as the methods and specific steps involved in a state-of-the-art FBA pipeline, with a focus on providing researchers with practical advice on how to interpret results. We also include an overview of the scope of current fixel-based analysis studies (until August 2020), categorised across a broad range of neuroscientific domains, listing key design choices and summarising their main results and conclusions. Finally, we critically discuss several aspects and challenges involved with the fixel-based analysis framework, and outline some directions and future opportunities.


Author(s):  
Martin Monperrus ◽  
Jean-Marc Jézéquel ◽  
Joël Champeau ◽  
Brigitte Hoeltzener

Model-Driven Engineering (MDE) is an approach to software development that uses models as primary artifacts, from which code, documentation and tests are derived. One way of assessing quality assurance in a given domain is to define domain metrics. We show that some of these metrics are supported by models. As text documents, models can be considered from a syntactic point of view i.e., thought of as graphs. We can readily apply graph-based metrics to them, such as the number of nodes, the number of edges or the fan-in/fan-out distributions. However, these metrics cannot leverage the semantic structuring enforced by each specific metamodel to give domain specific information. Contrary to graph-based metrics, more specific metrics do exist for given domains (such as LOC for programs), but they lack genericity. Our contribution is to propose one metric, called s, that is generic over metamodels and allows the easy specification of an open-ended wide range of model metrics.


Sign in / Sign up

Export Citation Format

Share Document