optimized code
Recently Published Documents


TOTAL DOCUMENTS

93
(FIVE YEARS 20)

H-INDEX

14
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Alexander T. Leighton ◽  
Yun William Yu

Electronic health records (EHR) are often siloed across a network of hospitals, but researchers may wish to perform aggregate count queries on said records in entirety---e.g. How many patients have diabetes? Prior work has established a strong approach to answering these queries in the form of probabilistic sketching algorithms like LogLog and HyperLogLog; however, it has remained somewhat of an open question how these algorithms should be made truly private. While many works in the computational biology community---as well as the computer science community at large---have attempted to solve this problem using differential privacy, these methods involve adding noise and still reveal some amount of non-trivial information. Here, we prototype a new protocol using fully homomorphic encryption that is trivially secured even in the setting of quantum-capable adversaries, as it reveals no information other than that which can be trivially gained from final numerical estimation. Simulating up to 16 parties on a single CPU thread takes no longer than 20 minutes to return an estimate with expected 6% approximation error; furthermore, the protocol is parallelizable across both parties and cores, so, in practice, with optimized code, we might expect sub-minute processing time for each party.


2021 ◽  
Vol 5 (OOPSLA) ◽  
pp. 1-30
Author(s):  
Michael D. Brown ◽  
Matthew Pruett ◽  
Robert Bigelow ◽  
Girish Mururu ◽  
Santosh Pande

Despite extensive testing and correctness certification of their functional semantics, a number of compiler optimizations have been shown to violate security guarantees implemented in source code. While prior work has shed light on how such optimizations may introduce semantic security weaknesses into programs, there remains a significant knowledge gap concerning the impacts of compiler optimizations on non-semantic properties with security implications. In particular, little is currently known about how code generation and optimization decisions made by the compiler affect the availability and utility of reusable code segments called gadgets required for implementing code reuse attack methods such as return-oriented programming. In this paper, we bridge this gap through a study of the impacts of compiler optimization on code reuse gadget sets. We analyze and compare 1,187 variants of 20 different benchmark programs built with two production compilers (GCC and Clang) to determine how their optimization behaviors affect the code reuse gadget sets present in program variants with respect to both quantitative and qualitative metrics. Our study exposes an important and unexpected problem; compiler optimizations introduce new gadgets at a high rate and produce code containing gadget sets that are generally more useful to an attacker than those in unoptimized code. Using differential binary analysis, we identify several undesirable behaviors at the root of this phenomenon. In turn, we propose and evaluate several strategies to mitigate these behaviors. In particular, we show that post-production binary recompilation can effectively mitigate these behaviors with negligible performance impacts, resulting in optimized code with significantly smaller and less useful gadget sets.


2021 ◽  
Vol 15 ◽  
Author(s):  
Lea Steffen ◽  
Robin Koch ◽  
Stefan Ulbrich ◽  
Sven Nitzsche ◽  
Arne Roennau ◽  
...  

Animal brains still outperform even the most performant machines with significantly lower speed. Nonetheless, impressive progress has been made in robotics in the areas of vision, motion- and path planning in the last decades. Brain-inspired Spiking Neural Networks (SNN) and the parallel hardware necessary to exploit their full potential have promising features for robotic application. Besides the most obvious platform for deploying SNN, brain-inspired neuromorphic hardware, Graphical Processing Units (GPU) are well capable of parallel computing as well. Libraries for generating CUDA-optimized code, like GeNN and affordable embedded systems make them an attractive alternative due to their low price and availability. While a few performance tests exist, there has been a lack of benchmarks targeting robotic applications. We compare the performance of a neural Wavefront algorithm as a representative of use cases in robotics on different hardware suitable for running SNN simulations. The SNN used for this benchmark is modeled in the simulator-independent declarative language PyNN, which allows using the same model for different simulator backends. Our emphasis is the comparison between Nest, running on serial CPU, SpiNNaker, as a representative of neuromorphic hardware, and an implementation in GeNN. Beyond that, we also investigate the differences of GeNN deployed to different hardware. A comparison between the different simulators and hardware is performed with regard to total simulation time, average energy consumption per run, and the length of the resulting path. We hope that the insights gained about performance details of parallel hardware solutions contribute to developing more efficient SNN implementations for robotics.


2021 ◽  
pp. 1-11
Author(s):  
Thanh-Trung DO ◽  
Viet-Hung Vu ◽  
Zhaoheng Liu

Abstract A new symbolic differentiation algorithm is proposed in this paper to automatically generate the inverse dynamics of flexible joint robots in symbolic form, and results obtained can be used in real-time applications. The proposed method with 𝒪(n) computational complexity is developed based on the recursive Newton-Euler algorithm, the chain rule of differentiation, and the computer algebra system. The input of the proposed algorithm consists of symbolic matrices describing the kinematic and dynamic parameters of the robot. The output is the inverse dynamics solution written in portable and optimized code (C-code/Matlab-code). An exemplary, numerical simulation for inverse dynamics of the Kuka LWR4 robot with seven flexible joints is conducted using Matlab, in which the computational time per cycle of inverse dynamics is about 0.02 millisecond. The numerical example provides very good matching results versus existing methods, while requiring much less computation time and complexity.


2021 ◽  
Author(s):  
Pengcheng Guo ◽  
Fengfan Yang ◽  
Chunli Zhao ◽  
Waheed Ullah

Abstract This paper proposes a distributed RS coding scheme which is comprised of two different ReedSolomon (RS) codes over fast Rayleigh fading channel. Practically in any distributed coding scheme, an appropriate encoding strategy at the relay plays a vital role in achieving an optimized code at the destination. Therefore, the authors have proposed an efficient approach for proper selection of information at the relay based on subspace approach. Using this approach as the proper benchmark, another more practical selection approach with low complexity is also proposed. Monte Carlo simulations demonstrate that the distributed RS coding scheme under the two approaches can achieve nearly the same bit error rate (BER) performance. Furthermore, to jointly decode the source and relay codes at the destination, two different decoding algorithms named as naive and smart algorithms are proposed. The simulation results reveal that the advantage of smart algorithm as compared to naive one. The proposed distributed RS coding scheme with smart algorithm outperforms its non-cooperative scheme by a gain of 2.4-3.2 dB under identical conditions. Moreover, the proposed distributed RS coding scheme outperforms multiple existing distributed coding schemes, making it an excellent candidate for the future distributed coding wireless communications.


2021 ◽  
Vol 19 (3) ◽  
pp. 413-420
Author(s):  
Oscar Jesus Castro ◽  
Ines Fernando Vega

Author(s):  
Markus Holzer ◽  
Martin Bauer ◽  
Harald Köstler ◽  
Ulrich Rüde

A high-performance implementation of a multiphase lattice Boltzmann method based on the conservative Allen-Cahn model supporting high-density ratios and high Reynolds numbers is presented. Meta-programming techniques are used to generate optimized code for CPUs and GPUs automatically. The coupled model is specified in a high-level symbolic description and optimized through automatic transformations. The memory footprint of the resulting algorithm is reduced through the fusion of compute kernels. A roofline analysis demonstrates the excellent efficiency of the generated code on a single GPU. The resulting single GPU code has been integrated into the multiphysics framework waLBerla to run massively parallel simulations on large domains. Communication hiding and GPUDirect-enabled MPI yield near-perfect scaling behavior. Scaling experiments are conducted on the Piz Daint supercomputer with up to 2048 GPUs, simulating several hundred fully resolved bubbles. Further, validation of the implementation is shown in a physically relevant scenario—a three-dimensional rising air bubble in water.


2021 ◽  
pp. 3-23
Author(s):  
Philip Munksgaard ◽  
Svend Lund Breddam ◽  
Troels Henriksen ◽  
Fabian Cristian Gieseke ◽  
Cosmin Oancea

AbstractFunctional languages allow rewrite-rule systems that aggressively generate a multitude of semantically-equivalent but differently-optimized code versions. In the context of GPGPU execution, this paper addresses the important question of how to compose these code versions into a single program that (near-)optimally discriminates them across different datasets. Rather than aiming at a general autotuning framework reliant on stochastic search, we argue that in some cases, a more effective solution can be obtained by customizing the tuning strategy for the compiler transformation producing the code versions.We present a simple and highly-composable strategy which requires that the (dynamic) program property used to discriminate between code versions conforms with a certain monotonicity assumption. Assuming the monotonicity assumption holds, our strategy guarantees that if an optimal solution exists it will be found. If an optimal solution doesn’t exist, our strategy produces human tractable and deterministic results that provide insights into what went wrong and how it can be fixed.We apply our tuning strategy to the incremental-flattening transformation supported by the publicly-available Futhark compiler and compare with a previous black-box tuning solution that uses the popular OpenTuner library. We demonstrate the feasibility of our solution on a set of standard datasets of real-world applications and public benchmark suites, such as Rodinia and FinPar. We show that our approach shortens the tuning time by a factor of $$6\times $$ 6 × on average, and more importantly, in five out of eleven cases, it produces programs that are (as high as $$10\times $$ 10 × ) faster than the ones produced by the OpenTuner-based technique.


Author(s):  
Jun Zhang ◽  
Tian Lu

The evaluation of molecular electrostatic potential (ESP) is a performance bottleneck for many computational chemical tasks like RESP charge fitting or QM/MM simulations. In this paper, an efficient algorithm for...


Sign in / Sign up

Export Citation Format

Share Document