Grafs: declarative graph analytics

Graph analytics elicits insights from large graphs to inform critical decisions for business, safety and security. Several large-scale graph processing frameworks feature efficient runtime systems; however, they often provide programming models that are low-level and subtly different from each other. Therefore, end users can find implementation and specially optimization of graph analytics error-prone and time-consuming. This paper regards the abstract interface of the graph processing frameworks as the instruction set for graph analytics, and presents Grafs, a high-level declarative specification language for graph analytics and a synthesizer that automatically generates efficient code for five high-performance graph processing frameworks. It features novel semantics-preserving fusion transformations that optimize the specifications and reduce them to three primitives: reduction over paths, mapping over vertices and reduction over vertices. Reductions over paths are commonly calculated based on push or pull models that iteratively apply kernel functions at the vertices. This paper presents conditions, parametric in terms of the kernel functions, for the correctness and termination of the iterative models, and uses these conditions as specifications to automatically synthesize the kernel functions. Experimental results show that the generated code matches or outperforms handwritten code, and that fusion accelerates execution.

Download Full-text

Performance Analysis of Specification Computer and Mobile with Implementation Tawaf Virtual Reality using A* Algorithm and RVO System

EMITTER International Journal of Engineering Technology ◽

10.24003/emitter.v7i1.321 ◽

2019 ◽

Vol 7 (1) ◽

pp. 55-70

Author(s):

Moh. Zikky ◽

M. Jainal Arifin ◽

Kholid Fathoni ◽

Agus Zainal Arifin

Keyword(s):

Virtual Reality ◽

High Performance ◽

Large Scale ◽

3D Models ◽

A Algorithm ◽

Virtual Reality Technology ◽

Performance Technology ◽

Outer Line ◽

High Performance Computer ◽

High Level

High-Performance Computer (HPC) is computer systems that are built to be able to solve computational loads. HPC can provide a high-performance technology and short the computing processes timing. This technology was often used in large-scale industries and several activities that require high-level computing, such as rendering virtual reality technology. In this research, we provide Tawafâ€™s Virtual Reality with 1000 of Pilgrims and realistic surroundings of Masjidil-Haram as the interactive and immersive simulation technology by imitating them with 3D models. Thus, the main purpose of this study is to calculate and to understand the processing time of its Virtual Reality with the implementation of tawaf activities using various platforms; such as computer and Android smartphone. The results showed that the outer-line or outer rotation of Kaaâ€™bah mostly consumes minimum times although he must pass the longer distance than the closer one. Â It happened because the agent with the closer area to Kaabah is facing the crowded peoples. It means an obstacle has the more impact than the distances in this case.

Download Full-text

Cloud Computing Cloud Computing in Remote Sensing : High Performance Remote Sensing Data Processing in a Big data Environment

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2021.6.4236 ◽

2021 ◽

Vol 16 (6) ◽

Author(s):

Yassine Sabri ◽

Aouad Siham

Keyword(s):

Remote Sensing ◽

Cloud Computing ◽

Data Processing ◽

High Performance ◽

Large Scale ◽

Processing System ◽

Remote Sensing Data ◽

Cloud Service ◽

Intermediate Data ◽

High Level

Multi-area and multi-faceted remote sensing (SAR) datasets are widely used due to the increasing demand for accurate and up-to-date information on resources and the environment for regional and global monitoring. In general, the processing of RS data involves a complex multi-step processing sequence that includes several independent processing steps depending on the type of RS application. The processing of RS data for regional disaster and environmental monitoring is recognized as computationally and data demanding.Recently, by combining cloud computing and HPC technology, we propose a method to efficiently solve these problems by searching for a large-scale RS data processing system suitable for various applications. Real-time on-demand service. The ubiquitous, elastic, and high-level transparency of the cloud computing model makes it possible to run massive RS data management and data processing monitoring dynamic environments in any cloud. via the web interface. Hilbert-based data indexing methods are used to optimally query and access RS images, RS data products, and intermediate data. The core of the cloud service provides a parallel file system of large RS data and an interface for accessing RS data from time to time to improve localization of the data. It collects data and optimizes I/O performance. Our experimental analysis demonstrated the effectiveness of our method platform.

Download Full-text

Granular layEr Simulator: Design and Multi-GPU Simulation of the Cerebellar Granular Layer

Frontiers in Computational Neuroscience ◽

10.3389/fncom.2021.630795 ◽

2021 ◽

Vol 15 ◽

Author(s):

Giordana Florimbi ◽

Emanuele Torti ◽

Stefano Masoli ◽

Egidio D'Angelo ◽

Francesco Leporati

Keyword(s):

High Performance ◽

Large Scale ◽

Granular Layer ◽

Graphics Processing Unit ◽

Mossy Fibers ◽

Processing Unit ◽

Large Network ◽

Processing Times ◽

3D Space ◽

High Level

In modern computational modeling, neuroscientists need to reproduce long-lasting activity of large-scale networks, where neurons are described by highly complex mathematical models. These aspects strongly increase the computational load of the simulations, which can be efficiently performed by exploiting parallel systems to reduce the processing times. Graphics Processing Unit (GPU) devices meet this need providing on desktop High Performance Computing. In this work, authors describe a novel Granular layEr Simulator development implemented on a multi-GPU system capable of reconstructing the cerebellar granular layer in a 3D space and reproducing its neuronal activity. The reconstruction is characterized by a high level of novelty and realism considering axonal/dendritic field geometries, oriented in the 3D space, and following convergence/divergence rates provided in literature. Neurons are modeled using Hodgkin and Huxley representations. The network is validated by reproducing typical behaviors which are well-documented in the literature, such as the center-surround organization. The reconstruction of a network, whose volume is 600 × 150 × 1,200 μm3 with 432,000 granules, 972 Golgi cells, 32,399 glomeruli, and 4,051 mossy fibers, takes 235 s on an Intel i9 processor. The 10 s activity reproduction takes only 4.34 and 3.37 h exploiting a single and multi-GPU desktop system (with one or two NVIDIA RTX 2080 GPU, respectively). Moreover, the code takes only 3.52 and 2.44 h if run on one or two NVIDIA V100 GPU, respectively. The relevant speedups reached (up to ~38× in the single-GPU version, and ~55× in the multi-GPU) clearly demonstrate that the GPU technology is highly suitable for realistic large network simulations.

Download Full-text

Identifying Distinctive Features of Productive and Socially Efficient Schools

Standards and Monitoring in Education ◽

10.12737/article_5d2da1df971e12.57383007 ◽

2019 ◽

Vol 7 (4) ◽

pp. 15-23

Author(s):

Марина Матюшкина ◽

Marina Matyushkina ◽

Константин Белоусов ◽

Konstantin Belousov

Keyword(s):

High Performance ◽

Large Scale ◽

Empirical Studies ◽

Basic Research ◽

Successful Schools ◽

Distinctive Features ◽

Effi Ciency ◽

Level Of Use ◽

High Level ◽

State Examination

The article presents the results of a series of empirical studies devoted to the analysis of the relationship between school performance (according to the Unified State Examination criterion), its social efficiency (according to the criterion of the frequency of student circulation to tutors) and various social and pedagogical characteristics of the school. A correlation analysis was carried out on an array of data obtained over 5 years of regular comprehensive surveys in schools of St. Petersburg. The sets of signs that are most characteristic for schools with high performance and for schools with high social effi ciency are identified and described. Distinctive features of successful schools are associated with a high level of use of tutoring services by students, with good material and technical conditions, teachers’ competence in the use of design and research methods, etc. In socially eff ective schools, the achievement of students’ academic results is based on the use of their own school strengths — teachers’ potential, innovative technologies with large-scale attraction of the Internet and electronic resources. The study was carried out with the fi nancial support of the Russian Foundation for Basic Research in the framework of the scientifi c project “Signs of an eff ective school in conditions of the mass distribution of tutoring practices” No. 19-013-00455.

Download Full-text

From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation

Scientific Programming ◽

10.1155/2013/167841 ◽

2013 ◽

Vol 21 (1-2) ◽

pp. 1-16 ◽

Cited By ~ 6

Author(s):

Marek Blazewicz ◽

Ian Hinder ◽

David M. Koppelman ◽

Steven R. Brandt ◽

Milosz Ciznicki ◽

...

Keyword(s):

Code Generation ◽

High Performance ◽

Large Scale ◽

Einstein Equations ◽

Efficient Manner ◽

Code Transformations ◽

Wide Range ◽

Problem Description ◽

Physics Model ◽

High Level

Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, theChemoraframework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning. Chemora achieves parallelism through MPI and multi-threading, combining OpenMP and CUDA. Optimizations include high-level code transformations, efficient loop traversal strategies, dynamically selected data and instruction cache usage strategies, and JIT compilation of GPU code tailored to the problem characteristics. The discretization is based on higher-order finite differences on multi-block domains. Chemora's capabilities are demonstrated by simulations of black hole collisions. This problem provides an acid test of the framework, as the Einstein equations contain hundreds of variables and thousands of terms.

Download Full-text

CuSP

ACM SIGOPS Operating Systems Review ◽

10.1145/3469379.3469385 ◽

2021 ◽

Vol 55 (1) ◽

pp. 47-60

Author(s):

Loc Hoang ◽

Roshan Dathathri ◽

Gurbinder Gill ◽

Keshav Pingali

Keyword(s):

Single Machine ◽

Distributed Memory ◽

The State ◽

Main Memory ◽

Input Graph ◽

Large Graphs ◽

Graph Analytics ◽

Graph Partitions ◽

High Level ◽

Edge Partitioning

Graph analytics systems must analyze graphs with billions of vertices and edges which require several terabytes of storage. Distributed-memory clusters are often used for analyzing such large graphs since the main memory of a single machine is usually restricted to a few hundreds of gigabytes. This requires partitioning the graph among the machines in the cluster. Existing graph analytics systems use a built-in partitioner that incorporates a particular partitioning policy, but the best policy is dependent on the algorithm, input graph, and platform. Therefore, built-in partitioners are not sufficiently flexible. Stand-alone graph partitioners are available, but they too implement only a few policies. CuSP is a fast streaming edge partitioning framework which permits users to specify the desired partitioning policy at a high level of abstraction and quickly generates highquality graph partitions. For example, it can partition wdc12, the largest publicly available web-crawl graph with 4 billion vertices and 129 billion edges, in under 2 minutes for clusters with 128 machines. Our experiments show that it can produce quality partitions 6× faster on average than the state-of-theart stand-alone partitioner in the literature while supporting a wider range of partitioning policies.

Download Full-text

Release the Power of Online-Training for Robust Visual Tracking

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6956 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12645-12652

Author(s):

Yifan Yang ◽

Guorong Li ◽

Yuankai Qi ◽

QIngming Huang

Keyword(s):

Visual Tracking ◽

High Performance ◽

Large Scale ◽

Feature Space ◽

Online Training ◽

Training Data ◽

Semantic Features ◽

Tracking Accuracy ◽

Tightly Coupled ◽

High Level

Convolutional neural networks (CNNs) have been widely adopted in the visual tracking community, significantly improving the state-of-the-art. However, most of them ignore the important cues lying in the distribution of training data and high-level features that are tightly coupled with the target/background classification. In this paper, we propose to improve the tracking accuracy via online training. On the one hand, we squeeze redundant training data by analyzing the dataset distribution in low-level feature space. On the other hand, we design statistic-based losses to increase the inter-class distance while decreasing the intra-class variance of high-level semantic features. We demonstrate the effectiveness on top of two high-performance tracking methods: MDNet and DAT. Experimental results on the challenging large-scale OTB2015 and UAVDT demonstrate the outstanding performance of our tracking method.

Download Full-text

USING PREGEL TO CALCULATE PAGE RANK OF A WEBPAGE A BULK SYNCHRONOUS PARALLEL PROGRAMMING APPROACH

International Journal of Computer and Communication Technology ◽

10.47893/ijcct.2015.1322 ◽

2015 ◽

pp. 289-293

Author(s):

PRASHANT RAGHAV

Keyword(s):

Large Scale ◽

Data Locality ◽

Programming Approach ◽

Graph Processing ◽

Distributed Environment ◽

Large Graphs ◽

Bulk Synchronous Parallel ◽

Speed Up ◽

Straightforward Solution ◽

Computation Speed

Although using graphs to represent networks and relationship is not new; the size of network has been dramatically increasing in the past decade so storing the whole graph in one place is almost impossible. Problems arise when processing very large graphs, when visiting billions of highly connected vertices. In such cases a graph can’t fit on a single machine, and the implementation resorts to a big batch distributed over a cluster of machines. The graph needs to be broken into multiple partitions and stored at various locations. This resulted in the need for a framework that can work in a Distributed Environment. Also, by breaking the graph into different partitions, we can manipulate the graph in parallel to speed up the processing. Google Pregel provides a simple straightforward solution to the large-scale graph processing problems. While it sounds similar to MapReduce, Pregel is optimized for graph operations by reducing I/O, ensuring data locality, but also preserving processing state between phases. The paper will give an insight of the Pregel approach for large scale graph processing. The paper will give an overview of PREGEL’s architecture and then will explore use of Pregel to solve real time applications such as finding PageRank of a Webpage. The paper will also give an insight on Bulk Synchronous Programming and will showcase how it increases computation speed with just few simple lines of code.

Download Full-text

HPGraph: High-Performance Graph Analytics with Productivity on the GPU

Scientific Programming ◽

10.1155/2018/9340697 ◽

2018 ◽

Vol 2018 ◽

pp. 1-11

Author(s):

Haoduo Yang ◽

Huayou Su ◽

Qiang Lan ◽

Mei Wen ◽

Chunyuan Zhang

Keyword(s):

Graph Algorithms ◽

High Performance ◽

Gpu Computing ◽

Programming Model ◽

Sparse Matrix ◽

Graph Analytics ◽

Matrix Operations ◽

High Level ◽

Broad Interest

The growing use of graph in many fields has sparked a broad interest in developing high-level graph analytics programs. Existing GPU implementations have limited performance with compromising on productivity. HPGraph, our high-performance bulk-synchronous graph analytics framework based on the GPU, provides an abstraction focused on mapping vertex programs to generalized sparse matrix operations on GPU as the backend. HPGraph strikes a balance between performance and productivity by coupling high-performance GPU computing primitives and optimization strategies with a high-level programming model for users to implement various graph algorithms with relatively little effort. We evaluate the performance of HPGraph for four graph primitives (BFS, SSSP, PageRank, and TC). Our experiments show that HPGraph matches or even exceeds the performance of high-performance GPU graph libraries such as MapGraph, nvGraph, and Gunrock. HPGraph also runs significantly faster than advanced CPU graph libraries.

Download Full-text

TDCMR: Triplet-Based Deep Cross-Modal Retrieval for Geo-Multimedia Data

Applied Sciences ◽

10.3390/app112210803 ◽

2021 ◽

Vol 11 (22) ◽

pp. 10803

Author(s):

Jiagang Song ◽

Yunwu Lin ◽

Jiayu Song ◽

Weiren Yu ◽

Leyuan Zhang

Keyword(s):

High Performance ◽

Large Scale ◽

Location Based Services ◽

Multimedia Data ◽

Multimedia Retrieval ◽

Geographical Information ◽

Superior Performance ◽

Hybrid Index ◽

High Level ◽

Hash Codes

Mass multimedia data with geographical information (geo-multimedia) are collected and stored on the Internet due to the wide application of location-based services (LBS). How to find the high-level semantic relationship between geo-multimedia data and construct efficient index is crucial for large-scale geo-multimedia retrieval. To combat this challenge, the paper proposes a deep cross-modal hashing framework for geo-multimedia retrieval, termed as Triplet-based Deep Cross-Modal Retrieval (TDCMR), which utilizes deep neural network and an enhanced triplet constraint to capture high-level semantics. Besides, a novel hybrid index, called TH-Quadtree, is developed by combining cross-modal binary hash codes and quadtree to support high-performance search. Extensive experiments are conducted on three common used benchmarks, and the results show the superior performance of the proposed method.

Download Full-text