benchmark suite
Recently Published Documents


TOTAL DOCUMENTS

354
(FIVE YEARS 114)

H-INDEX

39
(FIVE YEARS 6)

2022 ◽  
Vol 6 (POPL) ◽  
pp. 1-29
Author(s):  
Anders Miltner ◽  
Adrian Trejo Nuñez ◽  
Ana Brendel ◽  
Swarat Chaudhuri ◽  
Isil Dillig

We present a novel bottom-up method for the synthesis of functional recursive programs. While bottom-up synthesis techniques can work better than top-down methods in certain settings, there is no prior technique for synthesizing recursive programs from logical specifications in a purely bottom-up fashion. The main challenge is that effective bottom-up methods need to execute sub-expressions of the code being synthesized, but it is impossible to execute a recursive subexpression of a program that has not been fully constructed yet. In this paper, we address this challenge using the concept of angelic semantics. Specifically, our method finds a program that satisfies the specification under angelic semantics (we refer to this as angelic synthesis), analyzes the assumptions made during its angelic execution, uses this analysis to strengthen the specification, and finally reattempts synthesis with the strengthened specification. Our proposed angelic synthesis algorithm is based on version space learning and therefore deals effectively with many incremental synthesis calls made during the overall algorithm. We have implemented this approach in a prototype called Burst and evaluate it on synthesis problems from prior work. Our experiments show that Burst is able to synthesize a solution to 94% of the benchmarks in our benchmark suite, outperforming prior work.


Processes ◽  
2021 ◽  
Vol 9 (12) ◽  
pp. 2276
Author(s):  
Mohammad H. Nadimi-Shahraki ◽  
Ali Fatahi ◽  
Hoda Zamani ◽  
Seyedali Mirjalili ◽  
Laith Abualigah ◽  
...  

Moth–flame optimization (MFO) is a prominent swarm intelligence algorithm that demonstrates sufficient efficiency in tackling various optimization tasks. However, MFO cannot provide competitive results for complex optimization problems. The algorithm sinks into the local optimum due to the rapid dropping of population diversity and poor exploration. Hence, in this article, a migration-based moth–flame optimization (M-MFO) algorithm is proposed to address the mentioned issues. In M-MFO, the main focus is on improving the position of unlucky moths by migrating them stochastically in the early iterations using a random migration (RM) operator, maintaining the solution diversification by storing new qualified solutions separately in a guiding archive, and, finally, exploiting around the positions saved in the guiding archive using a guided migration (GM) operator. The dimensionally aware switch between these two operators guarantees the convergence of the population toward the promising zones. The proposed M-MFO was evaluated on the CEC 2018 benchmark suite on dimension 30 and compared against seven well-known variants of MFO, including LMFO, WCMFO, CMFO, CLSGMFO, LGCMFO, SMFO, and ODSFMFO. Then, the top four latest high-performing variants were considered for the main experiments with different dimensions, 30, 50, and 100. The experimental evaluations proved that the M-MFO provides sufficient exploration ability and population diversity maintenance by employing migration strategy and guiding archive. In addition, the statistical results analyzed by the Friedman test proved that the M-MFO demonstrates competitive performance compared to the contender algorithms used in the experiments.


2021 ◽  
Author(s):  
Henriette Capel ◽  
Robin Weiler ◽  
Maurits J.J. Dijkstra ◽  
Reinier Vleugels ◽  
Peter Bloem ◽  
...  

Self-supervised language modeling is a rapidly developing approach for the analysis of protein sequence data. However, work in this area is heterogeneous and diverse, making comparison of models and methods difficult. Moreover, models are often evaluated only on one or two downstream tasks, making it unclear whether the models capture generally useful properties. We introduce the ProteinGLUE benchmark for the evaluation of protein representations: a set of seven tasks for evaluating learned protein representations. We also offer reference code, and we provide two baseline models with hyperparameters specifically trained for these benchmarks. Pre-training was done on two tasks, masked symbol prediction and next sentence prediction. We show that pre-training yields higher performance on a variety of downstream tasks such as secondary structure and protein interaction interface prediction, compared to no pre-training. However, the larger base model does not outperform the smaller medium. We expect the ProteinGLUE benchmark dataset introduced here, together with the two baseline pre-trained models and their performance evaluations, to be of great value to the field of protein sequence-based property prediction. Availability: code and datasets from https://github.com/ibivu/protein-glue


2021 ◽  
Vol 5 (1) ◽  
Author(s):  
Domenico Giordano ◽  
Manfred Alef ◽  
Luca Atzori ◽  
Jean-Michel Barbet ◽  
Olga Datskova ◽  
...  

AbstractThe HEPiX Benchmarking Working Group has developed a framework to benchmark the performance of a computational server using the software applications of the High Energy Physics (HEP) community. This framework consists of two main components, named HEP-Workloads and HEPscore. HEP-Workloads is a collection of standalone production applications provided by a number of HEP experiments. HEPscore is designed to run HEP-Workloads and provide an overall measurement that is representative of the computing power of a system. HEPscore is able to measure the performance of systems with different processor architectures and accelerators. The framework is completed by the HEP Benchmark Suite that simplifies the process of executing HEPscore and other benchmarks such as HEP-SPEC06, SPEC CPU 2017, and DB12. This paper describes the motivation, the design choices, and the results achieved by the HEPiX Benchmarking Working group. A perspective on future plans is also presented.


2021 ◽  
Author(s):  
Steven Farrell ◽  
Murali Emani ◽  
Jacob Balma ◽  
Lukas Drescher ◽  
Aleksandr Drozd ◽  
...  

2021 ◽  
Vol 20 (5s) ◽  
pp. 1-22
Author(s):  
Biswadip Maity ◽  
Saehanseul Yi ◽  
Dongjoo Seo ◽  
Leming Cheng ◽  
Sung-Soo Lim ◽  
...  

Self-driving systems execute an ensemble of different self-driving workloads on embedded systems in an end-to-end manner, subject to functional and performance requirements. To enable exploration, optimization, and end-to-end evaluation on different embedded platforms, system designers critically need a benchmark suite that enables flexible and seamless configuration of self-driving scenarios, which realistically reflects real-world self-driving workloads’ unique characteristics. Existing CPU and GPU embedded benchmark suites typically (1) consider isolated applications, (2) are not sensor-driven, and (3) are unable to support emerging self-driving applications that simultaneously utilize CPUs and GPUs with stringent timing requirements. On the other hand, full-system self-driving simulators (e.g., AUTOWARE, APOLLO) focus on functional simulation, but lack the ability to evaluate the self-driving software stack on various embedded platforms. To address design needs, we present Chauffeur, the first open-source end-to-end benchmark suite for self-driving vehicles with configurable representative workloads. Chauffeur is easy to configure and run, enabling researchers to evaluate different platform configurations and explore alternative instantiations of the self-driving software pipeline. Chauffeur runs on diverse emerging platforms and exploits heterogeneous onboard resources. Our initial characterization of Chauffeur on different embedded platforms – NVIDIA Jetson TX2 and Drive PX2 – enables comparative evaluation of these GPU platforms in executing an end-to-end self-driving computational pipeline to assess the end-to-end response times on these emerging embedded platforms while also creating opportunities to create application gangs for better response times. Chauffeur enables researchers to benchmark representative self-driving workloads and flexibly compose them for different self-driving scenarios to explore end-to-end tradeoffs between design constraints, power budget, real-time performance requirements, and accuracy of applications.


2021 ◽  
Vol 27 (1) ◽  
Author(s):  
Linghui Luo ◽  
Felix Pauck ◽  
Goran Piskachev ◽  
Manuel Benz ◽  
Ivan Pashchenko ◽  
...  

AbstractDue to the lack of established real-world benchmark suites for static taint analyses of Android applications, evaluations of these analyses are often restricted and hard to compare. Even in evaluations that do use real-world apps, details about the ground truth in those apps are rarely documented, which makes it difficult to compare and reproduce the results. To push Android taint analysis research forward, this paper thus recommends criteria for constructing real-world benchmark suites for this specific domain, and presents TaintBench, the first real-world malware benchmark suite with documented taint flows. TaintBench benchmark apps include taint flows with complex structures, and addresses static challenges that are commonly agreed on by the community. Together with the TaintBench suite, we introduce the TaintBench framework, whose goal is to simplify real-world benchmarking of Android taint analyses. First, a usability test shows that the framework improves experts’ performance and perceived usability when documenting and inspecting taint flows. Second, experiments using TaintBench reveal new insights for the taint analysis tools Amandroid and FlowDroid: (i) They are less effective on real-world malware apps than on synthetic benchmark apps. (ii) Predefined lists of sources and sinks heavily impact the tools’ accuracy. (iii) Surprisingly, up-to-date versions of both tools are less accurate than their predecessors.


2021 ◽  
Author(s):  
Petros Voudouris ◽  
Per Stenström ◽  
Risat Pathan

AbstractHeterogeneous multiprocessors can offer high performance at low energy expenditures. However, to be able to use them in hard real-time systems, timing guarantees need to be provided, and the main challenge is to determine the worst-case schedule length (also known as makespan) of an application. Previous works that estimate the makespan focus mainly on the independent-task application model or the related multiprocessor model that limits the applicability of the makespan. On the other hand, the directed acyclic graph (DAG) application model and the unrelated multiprocessor model are general and can cover most of today’s platforms and applications. In this work, we propose a simple work-conserving scheduling method of the tasks in a DAG and two new approaches to finding the makespan. A set of representative OpenMP task-based parallel applications from the BOTS benchmark suite and synthetic DAGs are used to evaluate the proposed method. Based on the empirical results, the proposed approach calculates the makespan close to the exhaustive method and with low pessimism compared to a lower bound of the actual makespan calculation.


Sign in / Sign up

Export Citation Format

Share Document