scholarly journals DC Large-Scale Simulation of Nonlinear Circuits on Parallel Processors

2012 ◽  
Vol 58 (3) ◽  
pp. 285-295
Author(s):  
Diego Ernesto Cortés Udave ◽  
Jan Ogrodzki ◽  
Miguel Angel Gutiérrez De Anda

Abstract Newton-Raphson DC analysis of large-scale nonlinear circuits may be an extremely time consuming process even if sparse matrix techniques and bypassing of nonlinear models calculation are used. A slight decrease in the time required for this task may be enabled on multi-core, multithread computers if the calculation of the mathematical models for the nonlinear elements as well as the stamp management of the sparse matrix entries is managed through concurrent processes. In this paper it is shown how the numerical complexity of this problem (and thus its solution time) can be further reduced via the circuit decomposition and parallel solution of blocks taking as a departure point the Bordered-Block Diagonal (BBD) matrix structure. This BBD-parallel approach may give a considerable profit though it is strongly dependent on the system topology. This paper presents a theoretical foundation of the algorithm, its implementation, and numerical complexity analysis in virtue of practical measurements of matrix operations.

2021 ◽  
Vol 2113 (1) ◽  
pp. 012083
Author(s):  
Xiaonan Liu ◽  
Lina Jing ◽  
Lin Han ◽  
Jie Gao

Abstract Solving large-scale linear equations is of great significance in many engineering fields, such as weather forecasting and bioengineering. The classical computer solves the linear equations, no matter adopting the elimination method or Kramer’s rule, the time required for solving is in a polynomial relationship with the scale of the equation system. With the advent of the era of big data, the integration of transistors is getting higher and higher. When the size of transistors is close to the order of electron diameter, quantum tunneling will occur, and Moore’s Law will not be followed. Therefore, the traditional computing model will not be able to meet the demand. In this paper, through an in-depth study of the classic HHL algorithm, a small-scale quantum circuit model is proposed to solve a 2×2 linear equations, and the circuit diagram and programming are used to simulate and verify on the Origin Quantum Platform. The fidelity under different parameter values reaches more than 90%. For the case where the matrix to be solved is a sparse matrix, the quantum algorithm has an exponential speed improvement over the best known classical algorithm.


Author(s):  
Mark Endrei ◽  
Chao Jin ◽  
Minh Ngoc Dinh ◽  
David Abramson ◽  
Heidi Poxon ◽  
...  

Rising power costs and constraints are driving a growing focus on the energy efficiency of high performance computing systems. The unique characteristics of a particular system and workload and their effect on performance and energy efficiency are typically difficult for application users to assess and to control. Settings for optimum performance and energy efficiency can also diverge, so we need to identify trade-off options that guide a suitable balance between energy use and performance. We present statistical and machine learning models that only require a small number of runs to make accurate Pareto-optimal trade-off predictions using parameters that users can control. We study model training and validation using several parallel kernels and more complex workloads, including Algebraic Multigrid (AMG), Large-scale Atomic Molecular Massively Parallel Simulator, and Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. We demonstrate that we can train the models using as few as 12 runs, with prediction error of less than 10%. Our AMG results identify trade-off options that provide up to 45% improvement in energy efficiency for around 10% performance loss. We reduce the sample measurement time required for AMG by 90%, from 13 h to 74 min.


1987 ◽  
Vol 18 (6) ◽  
pp. 89-99 ◽  
Author(s):  
Hideki Asai ◽  
Mitsuo Asai ◽  
Mamoru Tanaka

Author(s):  
Bo Li ◽  
Ruihong Qiao ◽  
Zhizhi Wang ◽  
Weihong Zhou ◽  
Xin Li ◽  
...  

Telomere repeat factor 1 (TRF1) is a subunit of shelterin (also known as the telosome) and plays a critical role in inhibiting telomere elongation by telomerase. Tankyrase 1 (TNKS1) is a poly(ADP-ribose) polymerase that regulates the activity of TRF1 through poly(ADP-ribosyl)ation (PARylation). PARylation of TRF1 by TNKS1 leads to the release of TRF1 from telomeres and allows telomerase to access telomeres. The interaction between TRF1 and TNKS1 is thus important for telomere stability and the mitotic cell cycle. Here, the crystal structure of a complex between the N-terminal acidic domain of TRF1 (residues 1–55) and a fragment of TNKS1 covering the second and third ankyrin-repeat clusters (ARC2-3) is presented at 2.2 Å resolution. The TNKS1–TRF1 complex crystals were optimized using an `oriented rescreening' strategy, in which the initial crystallization condition was used as a guide for a second round of large-scale sparse-matrix screening. This crystallographic and biochemical analysis provides a better understanding of the TRF1–TNKS1 interaction and the three-dimensional structure of the ankyrin-repeat domain of TNKS.


2022 ◽  
Vol 15 (2) ◽  
pp. 1-33
Author(s):  
Mikhail Asiatici ◽  
Paolo Ienne

Applications such as large-scale sparse linear algebra and graph analytics are challenging to accelerate on FPGAs due to the short irregular memory accesses, resulting in low cache hit rates. Nonblocking caches reduce the bandwidth required by misses by requesting each cache line only once, even when there are multiple misses corresponding to it. However, such reuse mechanism is traditionally implemented using an associative lookup. This limits the number of misses that are considered for reuse to a few tens, at most. In this article, we present an efficient pipeline that can process and store thousands of outstanding misses in cuckoo hash tables in on-chip SRAM with minimal stalls. This brings the same bandwidth advantage as a larger cache for a fraction of the area budget, because outstanding misses do not need a data array, which can significantly speed up irregular memory-bound latency-insensitive applications. In addition, we extend nonblocking caches to generate variable-length bursts to memory, which increases the bandwidth delivered by DRAMs and their controllers. The resulting miss-optimized memory system provides up to 25% speedup with 24× area reduction on 15 large sparse matrix-vector multiplication benchmarks evaluated on an embedded and a datacenter FPGA system.


PLoS ONE ◽  
2020 ◽  
Vol 15 (12) ◽  
pp. e0243475
Author(s):  
David Mödinger ◽  
Jan-Hendrik Lorenz ◽  
Rens W. van der Heijden ◽  
Franz J. Hauck

The cryptocurrency system Bitcoin uses a peer-to-peer network to distribute new transactions to all participants. For risk estimation and usability aspects of Bitcoin applications, it is necessary to know the time required to disseminate a transaction within the network. Unfortunately, this time is not immediately obvious and hard to acquire. Measuring the dissemination latency requires many connections into the Bitcoin network, wasting network resources. Some third parties operate that way and publish large scale measurements. Relying on these measurements introduces a dependency and requires additional trust. This work describes how to unobtrusively acquire reliable estimates of the dissemination latencies for transactions without involving a third party. The dissemination latency is modelled with a lognormal distribution, and we estimate their parameters using a Bayesian model that can be updated dynamically. Our approach provides reliable estimates even when using only eight connections, the minimum connection number used by the default Bitcoin client. We provide an implementation of our approach as well as datasets for modelling and evaluation. Our approach, while slightly underestimating the latency distribution, is largely congruent with observed dissemination latencies.


Sign in / Sign up

Export Citation Format

Share Document