Planware-domain-specific synthesis of high-performance schedulers

1. Given the biodiversity crisis, we more than ever need to access information on multiple taxa (e.g. distribution, traits, diet) in the scientific literature to understand, map and predict all-inclusive biodiversity. Tools are needed to automatically extract useful information from the ever-growing corpus of ecological texts and feed this information to open data repositories. A prerequisite is the ability to recognise mentions of taxa in text, a special case of named entity recognition (NER). In recent years, deep learning-based NER systems have become ubiqutous, yielding state-of-the-art results in the general and biomedical domains. However, no such tool is available to ecologists wishing to extract information from the biodiversity literature. 2. We propose a new tool called TaxoNERD that provides two deep neural network (DNN) models to recognise taxon mentions in ecological documents. To achieve high performance, DNN-based NER models usually need to be trained on a large corpus of manually annotated text. Creating such a gold standard corpus (GSC) is a laborious and costly process, with the result that GSCs in the ecological domain tend to be too small to learn an accurate DNN model from scratch. To address this issue, we leverage existing DNN models pretrained on large biomedical corpora using transfer learning. The performance of our models is evaluated on four GSCs and compared to the most popular taxonomic NER tools. 3. Our experiments suggest that existing taxonomic NER tools are not suited to the extraction of ecological information from text as they performed poorly on ecologically-oriented corpora, either because they do not take account of the variability of taxon naming practices, or because they do not generalise well to the ecological domain. Conversely, a domain-specific DNN-based tool like TaxoNERD outperformed the other approaches on an ecological information extraction task. 4. Efforts are needed in order to raise ecological information extraction to the same level of performance as its biomedical counterpart. One promising direction is to leverage the huge corpus of unlabelled ecological texts to learn a language representation model that could benefit downstream tasks. These efforts could be highly beneficial to ecologists on the long term.

Download Full-text

Forensic Feature-Comparison Expertise: Statistical Learning Facilitates Visual Comparison Performance

10.31234/osf.io/pzfjb ◽

2020 ◽

Author(s):

Bethany Growns ◽

Kristy Martire

Keyword(s):

Statistical Learning ◽

High Performance ◽

Comparison Task ◽

Domain Specific ◽

Visual Evidence ◽

Visual Comparison ◽

Distributional Information ◽

Distributional Learning ◽

The Impact ◽

The Relationship

Forensic feature-comparison examiners in select disciplines are more accurate than novices when comparing visual evidence samples. This paper examines a key cognitive mechanism that may contribute to this superior visual comparison performance: the ability to learn how often stimuli occur in the environment (distributional statistical learning). We examined the relation-ship between distributional learning and visual comparison performance, and the impact of training about the diagnosticity of distributional information in visual comparison tasks. We compared performance between novices given no training (uninformed novices; n = 32), accu-rate training (informed novices; n = 32) or inaccurate training (misinformed novices; n = 32) in Experiment 1; and between forensic examiners (n = 26), informed novices (n = 29) and unin-formed novices (n = 27) in Experiment 2. Across both experiments, forensic examiners and nov-ices performed significantly above chance in a visual comparison task where distributional learning was required for high performance. However, informed novices outperformed all par-ticipants and only their visual comparison performance was significantly associated with their distributional learning. It is likely that forensic examiners’ expertise is domain-specific and doesn’t generalise to novel visual comparison tasks. Nevertheless, diagnosticity training could be critical to the relationship between distributional learning and visual comparison performance.

Download Full-text

2016 Sixth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC)

10.1109/wolfhpc40351.2016 ◽

2016 ◽

Keyword(s):

High Performance Computing ◽

High Performance ◽

International Workshop ◽

Domain Specific Languages ◽

Domain Specific ◽

Sixth International Workshop ◽

High Level ◽

Performance Computing ◽

Sixth International

Download Full-text

Bio4j: a high-performance cloud-enabled graph-based data platform

10.1101/016758 ◽

2015 ◽

Cited By ~ 17

Author(s):

Pablo Pareja-Tobes ◽

Raquel Tobes ◽

Marina Manrique ◽

Eduardo Pareja ◽

Eduardo Pareja-Tobes

Keyword(s):

High Performance ◽

Cost Effective ◽

Data Access ◽

Biological Data ◽

Biological Knowledge ◽

Graph Models ◽

Domain Specific ◽

Data Querying ◽

Knowledge Intensive ◽

Rich Data

Background. Next Generation Sequencing and other high-throughput technologies have brought a revolution to the bioinformatics landscape, by offering sheer amounts of data about previously unaccessible domains in a cheap and scalable way. However, fast, reproducible, and cost-effective data analysis at such scale remains elusive. A key need for achieving it is being able to access and query the vast amount of publicly available data, specially so in the case of knowledge-intensive, semantically rich data: incredibly valuable information about proteins and their functions, genes, pathways, or all sort of biological knowledge encoded in ontologies remains scattered, semantically and physically fragmented. Methods and Results. Guided by this, we have designed and developed Bio4j. It aims to offer a platform for the integration of semantically rich biological data using typed graph models. We have modeled and integrated most publicly available data linked with proteins into a set of interdependent graphs. Data querying is possible through a data model aware Domain Specific Language implemented in Java, letting the user write typed graph traversals over the integrated data. A ready to use cloud-based data distribution, based on the Titan graph database engine is provided; generic data import code can also be used for in-house deployment. Conclusion. Bio4j represents a unique resource for the current Bioinformatician, providing at once a solution for several key problems: data integration; expressive, high performance data access; and a cost-effective scalable cloud deployment model.

Download Full-text

The VOLNA-OP2 tsunami code (version 1.5)

Geoscientific Model Development ◽

10.5194/gmd-11-4621-2018 ◽

2018 ◽

Vol 11 (11) ◽

pp. 4621-4635 ◽

Cited By ~ 7

Author(s):

Istvan Z. Reguly ◽

Daniel Giles ◽

Devaraj Gopinathan ◽

Laure Quivy ◽

Joakim H. Beck ◽

...

Keyword(s):

Graphics Processing Units ◽

High Performance ◽

Shallow Water Equation ◽

Xeon Phi ◽

Intel Xeon Phi ◽

Central Processing ◽

Domain Specific ◽

Computing Platforms ◽

Graphics Processing ◽

Intel Xeon

Abstract. In this paper, we present the VOLNA-OP2 tsunami model and implementation; a finite-volume non-linear shallow-water equation (NSWE) solver built on the OP2 domain-specific language (DSL) for unstructured mesh computations. VOLNA-OP2 is unique among tsunami solvers in its support for several high-performance computing platforms: central processing units (CPUs), the Intel Xeon Phi, and graphics processing units (GPUs). This is achieved in a way that the scientific code is kept separate from various parallel implementations, enabling easy maintainability. It has already been used in production for several years; here we discuss how it can be integrated into various workflows, such as a statistical emulator. The scalability of the code is demonstrated on three supercomputers, built with classical Xeon CPUs, the Intel Xeon Phi, and NVIDIA P100 GPUs. VOLNA-OP2 shows an ability to deliver productivity as well as performance and portability to its users across a number of platforms.

Download Full-text

USING METAPROGRAMMING TO PARALLELIZE FUNCTIONAL SPECIFICATIONS

Parallel Processing Letters ◽

10.1142/s0129626402000926 ◽

2002 ◽

Vol 12 (02) ◽

pp. 193-210 ◽

Cited By ~ 3

Author(s):

CHRISTOPH A. HERRMANN ◽

CHRISTIAN LENGAUER

Keyword(s):

Parallel Computing ◽

Programming Language ◽

High Performance ◽

Parallel Implementation ◽

General Purpose ◽

Functional Language ◽

Application Domain ◽

Domain Specific ◽

Domain Independent

Metaprogramming is a paradigm for enhancing a general-purpose programming language with features catering for a special-purpose application domain, without a need for a reimplementation of the language. In a staged compilation, the special-purpose features are translated and optimised by a domain-specific preprocessor, which hands over to the general-purpose compiler for translation of the domain-independent part of the program. The domain we work in is high-performance parallel computing. We use metaprogramming to enhance the functional language Haskell with features for the efficient, parallel implementation of certain computational patterns, called skeletons.

Download Full-text