scholarly journals Development of the Dataset Searcher Webapp for finding data on the Belle II computing grid

2020 ◽  
Vol 245 ◽  
pp. 04021
Author(s):  
Kim Smith ◽  
David Dossett ◽  
Martin Sevior

In any large scale scientific experiment involving enormous quantities of data it is crucial that everyone involved has quick and easy access to all the relevant datasets for their research. By the end of the run time of the Belle II experiment there will be a projected 50 ab−1 of integrated luminosity making it no exception. Until now the only method for locating data of interest was by looking up hand written tables that needed to be regularly updated. In this paper, a new webapp built on the DIRAC software framework will be presented which aims to be the new standard for not only locating data but also storing all its associated metadata.

2021 ◽  
Vol 5 (1) ◽  
Author(s):  
Valentin Kuznetsov ◽  
Luca Giommi ◽  
Daniele Bonacorsi

AbstractMachine Learning (ML) will play a significant role in the success of the upcoming High-Luminosity LHC (HL-LHC) program at CERN. An unprecedented amount of data at the exascale will be collected by LHC experiments in the next decade, and this effort will require novel approaches to train and use ML models. In this paper, we discuss a Machine Learning as a Service pipeline for HEP (MLaaS4HEP) which provides three independent layers: a data streaming layer to read High-Energy Physics (HEP) data in their native ROOT data format; a data training layer to train ML models using distributed ROOT files; a data inference layer to serve predictions using pre-trained ML models via HTTP protocol. Such modular design opens up the possibility to train data at large scale by reading ROOT files from remote storage facilities, e.g., World-Wide LHC Computing Grid (WLCG) infrastructure, and feed the data to the user’s favorite ML framework. The inference layer implemented as TensorFlow as a Service (TFaaS) may provide an easy access to pre-trained ML models in existing infrastructure and applications inside or outside of the HEP domain. In particular, we demonstrate the usage of the MLaaS4HEP architecture for a physics use-case, namely, the $$t{\bar{t}}$$ t t ¯ Higgs analysis in CMS originally performed using custom made Ntuples. We provide details on the training of the ML model using distributed ROOT files, discuss the performance of the MLaaS and TFaaS approaches for the selected physics analysis, and compare the results with traditional methods.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
L. Orr ◽  
S. C. Chapman ◽  
J. W. Gjerloev ◽  
W. Guo

AbstractGeomagnetic substorms are a global magnetospheric reconfiguration, during which energy is abruptly transported to the ionosphere. Central to this are the auroral electrojets, large-scale ionospheric currents that are part of a larger three-dimensional system, the substorm current wedge. Many, often conflicting, magnetospheric reconfiguration scenarios have been proposed to describe the substorm current wedge evolution and structure. SuperMAG is a worldwide collaboration providing easy access to ground based magnetometer data. Here we show application of techniques from network science to analyze data from 137 SuperMAG ground-based magnetometers. We calculate a time-varying directed network and perform community detection on the network, identifying locally dense groups of connections. Analysis of 41 substorms exhibit robust structural change from many small, uncorrelated current systems before substorm onset, to a large spatially-extended coherent system, approximately 10 minutes after onset. We interpret this as strong indication that the auroral electrojet system during substorm expansions is inherently a large-scale phenomenon and is not solely due to many meso-scale wedgelets.


Author(s):  
Ismail Chabini

A solution is provided for what appears to be a 30-year-old problem dealing with the discovery of the most efficient algorithms possible to compute all-to-one shortest paths in discrete dynamic networks. This problem lies at the heart of efficient solution approaches to dynamic network models that arise in dynamic transportation systems, such as intelligent transportation systems (ITS) applications. The all-to-one dynamic shortest paths problem and the one-to-all fastest paths problems are studied. Early results are revisited and new properties are established. The complexity of these problems is established, and solution algorithms optimal for run time are developed. A new and simple solution algorithm is proposed for all-to-one, all departure time intervals, shortest paths problems. It is proved, theoretically, that the new solution algorithm has an optimal run time complexity that equals the complexity of the problem. Computer implementations and experimental evaluations of various solution algorithms support the theoretical findings and demonstrate the efficiency of the proposed solution algorithm. The findings should be of major benefit to research and development activities in the field of dynamic management, in particular real-time management, and to control of large-scale ITSs.


Author(s):  
Abdul Rachman Rasyid ◽  
Andi Lukman Irwan ◽  
Laode Muhammad Asfan Mujahid ◽  
Ihsan ◽  
Mimi Arifin ◽  
...  

Wajo Regency is one of the districts that have a role in the development and progress of South Sulawesi Province. Therefore, agricultural production facilities will be developed through processing mechanisms to the creative industries. Irrigation will be directed at the development of large-scale and small-scale rural irrigation through artificial embankments, revitalization of swamps and lakes. Whereas in urban areas a residential environment will be held an adjustment, especially near the of Lake Tempe in the area of ​​Sengkang as the Capital of Wajo Regency. The purpose of this study is to find easy access for the community to drinking water and to provide accurate data related to Geographic Information System (GIS)-based regional location conditions. The approach used in this activity is a field survey related to the existing condition of the location by assisting the community, increasing knowledge by training or counseling aimed at solving existing problems in the village / subdistrict in Tempe Subdistrict, Wajo Regency, as well as training and utilizing digital databases related to the profile and potential of the city. The results of the study obtained were that some districts had several problems, namely, solid waste systems, road networks, inadequate buildings and inadequate clean water especially in Attakae, Maddukelleng, Pattirosompe and Tempe. However, there is potential that can be developed to improve the regional economy, such as the silk industry and wood industry.


2010 ◽  
Vol 13 (03) ◽  
pp. 383-390 ◽  
Author(s):  
R.P.. P. Batycky ◽  
M.. Förster ◽  
M.R.. R. Thiele ◽  
K.. Stüben

Summary We present the parallelization of a commercial streamline simulator to multicore architectures based on the OpenMP programming model and its performance on various field examples. This work is a continuation of recent work by Gerritsen et al. (2009) in which a research streamline simulator was extended to parallel execution. We identified that the streamline-transport step represents approximately 40-80% of the total run time. It is exactly this step that is straightforward to parallelize owing to the independent solution of each streamline that is at the heart of streamline simulation. Because we are working with an existing large serial code, we used specialty software to quickly and easily identify variables that required particular handling for implementing the parallel extension. Minimal rewrite to existing code was required to extend the streamline-transport step to OpenMP. As part of this work, we also parallelized additional run-time code, including the gravity-line solver and some simple routines required for constructing the pressure matrix. Overall, the run-time fraction of code parallelized ranged from 0.50 to 0.83, depending on the transport physics being considered. We tested our parallel simulator on a variety of large models including SPE 10, Forties-a UK oil/water model, Judy Creek-a Canadian waterflood/water-alternating-gas (WAG) model, and a South American black-oil model. We noted overall speedup factors from 1.8 to 3.3x for eight threads. In terms of real time, this implies that large-scale streamline simulation models as tested here can be simulated in less than 4 hours. We found speedup results to be reasonable when compared with Amdahl's ideal scaling law. Beyond eight threads, we observed minimal speedups because of memory bandwidth limits on our test machine.


2017 ◽  
Vol 73 (6) ◽  
pp. 469-477 ◽  
Author(s):  
Tom Burnley ◽  
Colin M. Palmer ◽  
Martyn Winn

As part of its remit to provide computational support to the cryo-EM community, the Collaborative Computational Project for Electron cryo-Microscopy (CCP-EM) has produced a software framework which enables easy access to a range of programs and utilities. The resulting software suite incorporates contributions from different collaborators by encapsulating them in Python task wrappers, which are then made accessibleviaa user-friendly graphical user interface as well as a command-line interface suitable for scripting. The framework includes tools for project and data management. An overview of the design of the framework is given, together with a survey of the functionality at different levels. The currentCCP-EMsuite has particular strength in the building and refinement of atomic models into cryo-EM reconstructions, which is described in detail.


2015 ◽  
Vol 27 (10) ◽  
pp. 2039-2096 ◽  
Author(s):  
Frank-Michael Schleif ◽  
Peter Tino

Efficient learning of a data analysis task strongly depends on the data representation. Most methods rely on (symmetric) similarity or dissimilarity representations by means of metric inner products or distances, providing easy access to powerful mathematical formalisms like kernel or branch-and-bound approaches. Similarities and dissimilarities are, however, often naturally obtained by nonmetric proximity measures that cannot easily be handled by classical learning algorithms. Major efforts have been undertaken to provide approaches that can either directly be used for such data or to make standard methods available for these types of data. We provide a comprehensive survey for the field of learning with nonmetric proximities. First, we introduce the formalism used in nonmetric spaces and motivate specific treatments for nonmetric proximity data. Second, we provide a systematization of the various approaches. For each category of approaches, we provide a comparative discussion of the individual algorithms and address complexity issues and generalization properties. In a summarizing section, we provide a larger experimental study for the majority of the algorithms on standard data sets. We also address the problem of large-scale proximity learning, which is often overlooked in this context and of major importance to make the method relevant in practice. The algorithms we discuss are in general applicable for proximity-based clustering, one-class classification, classification, regression, and embedding approaches. In the experimental part, we focus on classification tasks.


2011 ◽  
Vol 21 (03) ◽  
pp. 279-299 ◽  
Author(s):  
I-HSIN CHUNG ◽  
CHE-RUNG LEE ◽  
JIAZHENG ZHOU ◽  
YEH-CHING CHUNG

As the high performance computing systems scale up, mapping the tasks of a parallel application onto physical processors to allow efficient communication becomes one of the critical performance issues. Existing algorithms were usually designed to map applications with regular communication patterns. Their mapping criterion usually overlooks the size of communicated messages, which is the primary factor of communication time. In addition, most of their time complexities are too high to process large scale problems. In this paper, we present a hierarchical mapping algorithm (HMA), which is capable of mapping applications with irregular communication patterns. It first partitions tasks according to their run-time communication information. The tasks that communicate with each other more frequently are regarded as strongly connected. Based on their connectivity strength, the tasks are partitioned into supernodes based on the algorithms in spectral graph theory. The hierarchical partitioning reduces the mapping algorithm complexity to achieve scalability. Finally, the run-time communication information will be used again in fine tuning to explore better mappings. With the experiments, we show how the mapping algorithm helps to reduce the point-to-point communication time for the PDGEMM, a ScaLAPACK matrix multiplication computation kernel, up to 20% and the AMG2006, a tier 1 application of the Sequoia benchmark, up to 7%.


2020 ◽  
Author(s):  
Debarati Roychowdhury ◽  
Samir Gupta ◽  
Xihan Qin ◽  
Cecilia N. Arighi ◽  
K. Vijay-Shanker

AbstractMotivationmicroRNAs (miRNAs) are essential gene regulators and their dysregulation often leads to diseases. Easy access to miRNA information is crucial for interpreting generated experimental data, connecting facts across publications, and developing new hypotheses built on previous knowledge. Here, we present emiRIT, a text mining-based resource, which presents miRNA information mined from the literature through a user-friendly interface.ResultsWe collected 149,233 miRNA-PubMed ID pairs from Medline between January 1997 to May 2020. emiRIT currently contains miRNA-gene regulation (60,491 relations); miRNA-disease (cancer) (12,300 relations); miRNA-biological process and pathways (23,390 relations); and circulatory miRNAs in extracellular locations (3,782 relations). Biological entities and their relation to miRNAs were extracted from Medline abstracts using publicly available and in-house developed text mining tools, and the entities were normalized to facilitate querying and integration. We built a database and an interface to store and access the integrated data, respectively.ConclusionWe provide an up-to-date and user-friendly resource to facilitate access to comprehensive miRNA information from the literature on a large-scale, enabling users to navigate through different roles of miRNA and examine them in a context specific to their information needs. To assess our resource’s information coverage, in the absence of gold standards, we have conducted two case studies focusing on the target and differential expression information of miRNAs in the context of diseases. Database URL: https://research.bioinformatics.udel.edu/emirit/


Sign in / Sign up

Export Citation Format

Share Document