ACM Transactions on Modeling and Performance Evaluation of Computing Systems
Latest Publications


TOTAL DOCUMENTS

118
(FIVE YEARS 53)

H-INDEX

9
(FIVE YEARS 2)

Published By Association For Computing Machinery

2376-3639

Author(s):  
Caixiang Fan ◽  
Sara Ghaemi ◽  
Hamzeh Khazaei ◽  
Yuxiang Chen ◽  
Petr Musilek

Distributed ledgers (DLs) provide many advantages over centralized solutions in Internet of Things projects, including but not limited to improved security, transparency, and fault tolerance. To leverage DLs at scale, their well-known limitation (i.e., performance) should be adequately analyzed and addressed. Directed acyclic graph-based DLs have been proposed to tackle the performance and scalability issues by design. The first among them, IOTA, has shown promising signs in addressing the preceding issues. IOTA is an open source DL designed for the Internet of Things. It uses a directed acyclic graph to store transactions on its ledger, to achieve a potentially higher scalability over blockchain-based DLs. However, due to the uncertainty and centralization of the deployed consensus, the current IOTA implementation exposes some performance issues, making it less performant than the initial design. In this article, we first extend an existing simulator to support realistic IOTA simulations and investigate the impact of different design parameters on IOTA’s performance. Then, we propose a layered model to help the users of IOTA determine the optimal waiting time to resend the previously submitted but not yet confirmed transaction. Our findings reveal the impact of the transaction arrival rate, tip selection algorithms, weighted tip selection algorithm randomness, and network delay on the throughput. Using the proposed layered model, we shed some light on the distribution of the confirmed transactions. The distribution is leveraged to calculate the optimal time for resending an unconfirmed transaction to the DL. The performance analysis results can be used by both system designers and users to support their decision making.


Author(s):  
Stefan Geissler ◽  
Stanislav Lange ◽  
Leonardo Linguaglossa ◽  
Dario Rossi ◽  
Thomas Zinner ◽  
...  

Network Functions Virtualization (NFV) is among the latest network revolutions, promising increased flexibility and avoiding network ossification. At the same time, all-software NFV implementations on commodity hardware raise performance issues when comparing to ASIC solutions. To address these issues, numerous software acceleration frameworks for packet processing have been proposed in the last few years. One central mechanism of many of these frameworks is the use of batching techniques , where packets are processed in groups as opposed to individually. This is required to provide high-speed capabilities by minimizing framework overhead, reducing interrupt pressure, and leveraging instruction-level cache hits. Several such system implementations have been proposed and experimentally benchmarked in the past. However, the scientific community has so far only to a limited extent attempted to model the system dynamics of modern NFV routers exploiting batching acceleration. In this article, we propose a simple, generic model for this type of batching-based systems that can be applied to predict all relevant key performance indicators. In particular, we extend our previous work and formulate the calculation of the queue size as well as waiting time distributions in addition to the batch size distribution and the packet loss probability. Furthermore, we introduce the waiting time distribution as a relevant QoS parameter and perform an in-depth parameter study, widening the set of investigated variables as well as the range of values. Finally, we contrast the model prediction with experimental results gathered in a high-speed testbed including an NFV router, showing that the model not only correctly captures system performance under simple conditions, but also in more realistic scenarios in which traffic is processed by a mixture of functions.


Author(s):  
Anna Engelmann ◽  
Admela Jukan

In data center networks, the reliability of Service Function Chain (SFC)—an end-to-end service presented by a chain of virtual network functions (VNFs)—is a complex and specific function of placement, configuration, and application requirements, both in hardware and software. Existing approaches to reliability analysis do not jointly consider multiple features of system components, including, (i) heterogeneity, (ii) disjointness, (iii) sharing, (iv) redundancy, and (v) failure interdependency. To this end, we develop a novel analysis of service reliability of the so-called generic SFC, consisting of n = k + r sub-SFCs, whereby k ≥ 1 and r ≥ 0 are the numbers of arbitrary placed primary and backup (redundant) sub-SFCs, respectively. Our analysis is based on combinatorics and a reduced binomial theorem—resulting in a simple approach, which, however, can be utilized to analyze rather complex SFC configurations. The analysis is practically applicable to various VNF placement strategies in arbitrary data center configurations, and topologies and can be effectively used for evaluation and optimization of reliable SFC placements.


Author(s):  
Rossano Gaeta ◽  
Marco Grangetto

In coding-based distributed storage systems (DSSs), a set of storage nodes (SNs) hold coded fragments of a data unit that collectively allow one to recover the original information. It is well known that data modification (a.k.a. pollution attack) is the Achilles’ heel of such coding systems; indeed, intentional modification of a single coded fragment has the potential to prevent the reconstruction of the original information because of error propagation induced by the decoding algorithm. The challenge we take in this work is to devise an algorithm to identify polluted coded fragments within the set encoding a data unit and to characterize its performance. To this end, we provide the following contributions: (i) We devise MIND (Malicious node IdeNtification in DSS), an algorithm that is general with respect to the encoding mechanism chosen for the DSS, it is able to cope with a heterogeneous allocation of coded fragments to SNs, and it is effective in successfully identifying polluted coded fragments in a low-redundancy scenario; (ii) We formally prove both MIND termination and correctness; (iii) We derive an accurate analytical characterization of MIND performance (hit probability and complexity); (iv) We develop a C++ prototype that implements MIND to validate the performance predictions of the analytical model. Finally, to show applicability of our work, we define performance and robustness metrics for an allocation of coded fragments to SNs and we apply the results of the analytical characterization of MIND performance to select coded fragments allocations yielding robustness to collusion as well as the highest probability to identify actual attackers.


Author(s):  
Diogo Marques ◽  
Aleksandar Ilic ◽  
Leonel Sousa

Continuous enhancements and diversity in modern multi-core hardware, such as wider and deeper core pipelines and memory subsystems, bring to practice a set of hard-to-solve challenges when modeling their upper-bound capabilities and identifying the main application bottlenecks. Insightful roofline models are widely used for this purpose, but the existing approaches overly abstract the micro-architecture complexity, thus providing unrealistic performance bounds that lead to a misleading characterization of real-world applications. To address this problem, the Mansard Roofline Model (MaRM), proposed in this work, uncovers a minimum set of architectural features that must be considered to provide insightful, but yet accurate and realistic, modeling of performance upper bounds for modern processors. By encapsulating the retirement constraints due to the amount of retirement slots, Reorder-Buffer and Physical Register File sizes, the proposed model accurately models the capabilities of a real platform (average rRMSE of 5.4%) and characterizes 12 application kernels from standard benchmark suites. By following a herein proposed MaRM interpretation methodology and guidelines, speed-ups of up to 5× are obtained when optimizing real-world bioinformatic application, as well as a super-linear speedup of 18.5× when parallelized.


Author(s):  
Bo Jiang ◽  
Philippe Nain ◽  
Don Towsley

Consider a setting where Willie generates a Poisson stream of jobs and routes them to a single server that follows the first-in first-out discipline. Suppose there is an adversary Alice, who desires to receive service without being detected. We ask the question: What is the number of jobs that she can receive covertly, i.e., without being detected by Willie? In the case where both Willie and Alice jobs have exponential service times with respective rates μ 1 and μ 2 , we demonstrate a phase-transition when Alice adopts the strategy of inserting a single job probabilistically when the server idles: over n busy periods, she can achieve a covert throughput, measured by the expected number of jobs covertly inserted, of O (√ n ) when μ 1 < 2 μ 2 , O (√ n log n ) when μ 1 = 2μ 2 , and O ( n μ 2 /μ 1 ) when μ 1 > 2μ 2 . When both Willie and Alice jobs have general service times, we establish an upper bound for the number of jobs Alice can execute covertly. This bound is related to the Fisher information. More general insertion policies are also discussed.


Author(s):  
V.S. Ch Lakshmi Narayana ◽  
Sharayu Moharir ◽  
Nikhil Karamchandani

The rapid proliferation of shared edge computing platforms has enabled application service providers to deploy a wide variety of services with stringent latency and high bandwidth requirements. A key advantage of these platforms is that they provide pay-as-you-go flexibility by charging clients in proportion to their resource usage through short-term contracts. This affords the client significant cost-saving opportunities by dynamically deciding when to host its service on the platform, depending on the changing intensity of requests. A natural policy for our setting is the Time-To-Live (TTL) policy. We show that TTL performs poorly both in the adversarial arrival setting, i.e., in terms of the competitive ratio, and for i.i.d. stochastic arrivals with low arrival rates, irrespective of the value of the TTL timer. We propose an online policy called RetroRenting (RR) and characterize its performance in terms of the competitive ratio. Our results show that RR overcomes the limitations of TTL. In addition, we provide performance guarantees for RR for i.i.d. stochastic arrival processes coupled with negatively associated rent cost sequences and prove that it compares well with the optimal online policy. Further, we conduct simulations using both synthetic and real-world traces to compare the performance of RR with the optimal offline and online policies. The simulations show that the performance of RR is near optimal for all settings considered. Our results illustrate the universality of RR.


Author(s):  
Nikki Sonenberg ◽  
Grzegorz Kielanski ◽  
Benny Van Houdt

Randomized work stealing is used in distributed systems to increase performance and improve resource utilization. In this article, we consider randomized work stealing in a large system of homogeneous processors where parent jobs spawn child jobs that can feasibly be executed in parallel with the parent job. We analyse the performance of two work stealing strategies: one where only child jobs can be transferred across servers and the other where parent jobs are transferred. We define a mean-field model to derive the response time distribution in a large-scale system with Poisson arrivals and exponential parent and child job durations. We prove that the model has a unique fixed point that corresponds to the steady state of a structured Markov chain, allowing us to use matrix analytic methods to compute the unique fixed point. The accuracy of the mean-field model is validated using simulation. Using numerical examples, we illustrate the effect of different probe rates, load, and different child job size distributions on performance with respect to the two stealing strategies, individually, and compared to each other.


Author(s):  
Hani Nemati ◽  
Seyed Vahid Azhari ◽  
Mahsa Shakeri ◽  
Michel Dagenais

Cloud computing is a fast-growing technology that provides on-demand access to a pool of shared resources. This type of distributed and complex environment requires advanced resource management solutions that could model virtual machine (VM) behavior. Different workload measurements, such as CPU, memory, disk, and network usage, are usually derived from each VM to model resource utilization and group similar VMs. However, these course workload metrics require internal access to each VM with the available performance analysis toolkit, which is not feasible with many cloud environments privacy policies. In this article, we propose a non-intrusive host-based virtual machine workload characterization using hypervisor tracing. VM blockings duration, along with virtual interrupt injection rates, are derived as features to reveal multiple levels of resource intensiveness. In addition, the VM exit reason is considered, as well as the resource contention rate due to the host and other VMs. Moreover, the processes and threads preemption rates in each VM are extracted using the collected tracing logs. Our proposed approach further improves the selected features by exploiting a page ranking based algorithm to filter non-important processes running on each VM. Once the metric features are defined, a two-stage VM clustering technique is employed to perform both coarse- and fine-grain workload characterization. The inter-cluster and intra-cluster similarity metrics of the silhouette score is used to reveal distinct VM workload groups, as well as the ones with significant overlap. The proposed framework can provide a detailed vision of the underlying behavior of the running VMs. This can assist infrastructure administrators in efficient resource management, as well as root cause analysis.


Author(s):  
Guilherme Domingues ◽  
Gabriel Mendonça ◽  
Edmundo De Souza E Silva ◽  
Rosa M. M. Leão ◽  
Daniel S. Menasché ◽  
...  

Caching is a fundamental element of networking systems since the early days of the Internet. By filtering requests toward custodians, caches reduce the bandwidth required by the latter and the delay experienced by clients. The requests that are not served by a cache, in turn, comprise its miss stream. We refer to the dependence of the cache state and miss stream on its history as hysteresis. Although hysteresis is at the core of caching systems, a dimension that has not been systematically studied in previous works relates to its impact on caching systems between misses, evictions, and insertions. In this article, we propose novel mechanisms and models to leverage hysteresis on cache evictions and insertions. The proposed solutions extend TTL-like mechanisms and rely on two knobs to tune the time between insertions and evictions given a target hit rate. We show the general benefits of hysteresis and the particular improvement of the two thresholds strategy in reducing download times, making the system more predictable and accounting for different costs associated with object retrieval.


Sign in / Sign up

Export Citation Format

Share Document