DC Large-Scale Simulation of Nonlinear Circuits on Parallel Processors

Abstract Newton-Raphson DC analysis of large-scale nonlinear circuits may be an extremely time consuming process even if sparse matrix techniques and bypassing of nonlinear models calculation are used. A slight decrease in the time required for this task may be enabled on multi-core, multithread computers if the calculation of the mathematical models for the nonlinear elements as well as the stamp management of the sparse matrix entries is managed through concurrent processes. In this paper it is shown how the numerical complexity of this problem (and thus its solution time) can be further reduced via the circuit decomposition and parallel solution of blocks taking as a departure point the Bordered-Block Diagonal (BBD) matrix structure. This BBD-parallel approach may give a considerable profit though it is strongly dependent on the system topology. This paper presents a theoretical foundation of the algorithm, its implementation, and numerical complexity analysis in virtue of practical measurements of matrix operations.

Download Full-text

HHL Analysis and Simulation Verification Based on Origin Quantum Platform

Journal of Physics Conference Series ◽

10.1088/1742-6596/2113/1/012083 ◽

2021 ◽

Vol 2113 (1) ◽

pp. 012083

Author(s):

Xiaonan Liu ◽

Lina Jing ◽

Lin Han ◽

Jie Gao

Keyword(s):

Large Scale ◽

Weather Forecasting ◽

Sparse Matrix ◽

Linear Equations ◽

Quantum Algorithm ◽

Equation System ◽

Quantum Circuit ◽

Small Scale ◽

The Matrix ◽

Time Required

Abstract Solving large-scale linear equations is of great significance in many engineering fields, such as weather forecasting and bioengineering. The classical computer solves the linear equations, no matter adopting the elimination method or Kramer’s rule, the time required for solving is in a polynomial relationship with the scale of the equation system. With the advent of the era of big data, the integration of transistors is getting higher and higher. When the size of transistors is close to the order of electron diameter, quantum tunneling will occur, and Moore’s Law will not be followed. Therefore, the traditional computing model will not be able to meet the demand. In this paper, through an in-depth study of the classic HHL algorithm, a small-scale quantum circuit model is proposed to solve a 2×2 linear equations, and the circuit diagram and programming are used to simulate and verify on the Origin Quantum Platform. The fidelity under different parameter values reaches more than 90%. For the case where the matrix to be solved is a sparse matrix, the quantum algorithm has an exponential speed improvement over the best known classical algorithm.

Download Full-text

Statistical and machine learning models for optimizing energy in parallel applications

The International Journal of High Performance Computing Applications ◽

10.1177/1094342019842915 ◽

2019 ◽

Vol 33 (6) ◽

pp. 1079-1097 ◽

Cited By ~ 2

Author(s):

Mark Endrei ◽

Chao Jin ◽

Minh Ngoc Dinh ◽

David Abramson ◽

Heidi Poxon ◽

...

Keyword(s):

Machine Learning ◽

Energy Efficiency ◽

High Performance ◽

Large Scale ◽

Energy Use ◽

Parallel Applications ◽

Learning Models ◽

Trade Off ◽

Time Required ◽

Machine Learning Models

Rising power costs and constraints are driving a growing focus on the energy efficiency of high performance computing systems. The unique characteristics of a particular system and workload and their effect on performance and energy efficiency are typically difficult for application users to assess and to control. Settings for optimum performance and energy efficiency can also diverge, so we need to identify trade-off options that guide a suitable balance between energy use and performance. We present statistical and machine learning models that only require a small number of runs to make accurate Pareto-optimal trade-off predictions using parameters that users can control. We study model training and validation using several parallel kernels and more complex workloads, including Algebraic Multigrid (AMG), Large-scale Atomic Molecular Massively Parallel Simulator, and Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. We demonstrate that we can train the models using as few as 12 runs, with prediction error of less than 10%. Our AMG results identify trade-off options that provide up to 45% improvement in energy efficiency for around 10% performance loss. We reduce the sample measurement time required for AMG by 90%, from 13 h to 74 min.

Download Full-text

Local parameter identifiability of large-scale nonlinear models based on the output sensitivity covariance matrix

IFAC-PapersOnLine ◽

10.1016/j.ifacol.2021.08.277 ◽

2021 ◽

Vol 54 (3) ◽

pp. 415-420

Author(s):

Carlos S. Méndez-Blanco ◽

Leyla Özkan

Keyword(s):

Covariance Matrix ◽

Large Scale ◽

Nonlinear Models ◽

Local Parameter ◽

Parameter Identifiability

Download Full-text

Optimising Sparse Matrix Vector multiplication for large scale FEM problems on FPGA

2016 26th International Conference on Field Programmable Logic and Applications (FPL) ◽

10.1109/fpl.2016.7577352 ◽

2016 ◽

Cited By ~ 8

Author(s):

Paul Grigoras ◽

Pavel Burovskiy ◽

Wayne Luk ◽

Spencer Sherwin

Keyword(s):

Large Scale ◽

Sparse Matrix ◽

Matrix Vector Multiplication ◽

Matrix Vector

Download Full-text

Special parallel processor for lu decomposition of a large-scale sparse matrix

Systems and Computers in Japan ◽

10.1002/scj.4690180608 ◽

1987 ◽

Vol 18 (6) ◽

pp. 89-99 ◽

Cited By ~ 1

Author(s):

Hideki Asai ◽

Mitsuo Asai ◽

Mamoru Tanaka

Keyword(s):

Large Scale ◽

Sparse Matrix ◽

Lu Decomposition ◽

Parallel Processor

Download Full-text

Full-Wave Analysis of Large-Scale Interconnects Using the Multilevel UV Method With the Sparse Matrix Iterative Approach (SMIA)

IEEE Transactions on Advanced Packaging ◽

10.1109/tadvp.2008.2005021 ◽

2008 ◽

Vol 31 (4) ◽

pp. 818-829 ◽

Cited By ~ 8

Author(s):

Chong-Jin Ong ◽

Leung Tsang

Keyword(s):

Large Scale ◽

Sparse Matrix ◽

Iterative Approach ◽

Wave Analysis ◽

Full Wave ◽

Full Wave Analysis

Download Full-text

Crystal structure of a tankyrase 1–telomere repeat factor 1 complex

Acta Crystallographica Section F Structural Biology Communications ◽

10.1107/s2053230x16004131 ◽

2016 ◽

Vol 72 (4) ◽

pp. 320-327 ◽

Cited By ~ 9

Author(s):

Bo Li ◽

Ruihong Qiao ◽

Zhizhi Wang ◽

Weihong Zhou ◽

Xin Li ◽

...

Keyword(s):

Crystal Structure ◽

Large Scale ◽

Sparse Matrix ◽

Critical Role ◽

Three Dimensional ◽

Mitotic Cell ◽

Ankyrin Repeat ◽

Dimensional Structure ◽

Telomere Repeat ◽

Tankyrase 1

Telomere repeat factor 1 (TRF1) is a subunit of shelterin (also known as the telosome) and plays a critical role in inhibiting telomere elongation by telomerase. Tankyrase 1 (TNKS1) is a poly(ADP-ribose) polymerase that regulates the activity of TRF1 through poly(ADP-ribosyl)ation (PARylation). PARylation of TRF1 by TNKS1 leads to the release of TRF1 from telomeres and allows telomerase to access telomeres. The interaction between TRF1 and TNKS1 is thus important for telomere stability and the mitotic cell cycle. Here, the crystal structure of a complex between the N-terminal acidic domain of TRF1 (residues 1–55) and a fragment of TNKS1 covering the second and third ankyrin-repeat clusters (ARC2-3) is presented at 2.2 Å resolution. The TNKS1–TRF1 complex crystals were optimized using an `oriented rescreening' strategy, in which the initial crystallization condition was used as a guide for a second round of large-scale sparse-matrix screening. This crystallographic and biochemical analysis provides a better understanding of the TRF1–TNKS1 interaction and the three-dimensional structure of the ankyrin-repeat domain of TNKS.

Download Full-text

Parallel Solution of Large Scale Quadratic Programs

Applied Optimization - High Performance Algorithms and Software in Nonlinear Optimization ◽

10.1007/978-1-4613-3279-4_13 ◽

1998 ◽

pp. 189-205 ◽

Cited By ~ 5

Author(s):

Emanuele Galligani ◽

Valeria Ruggiero ◽

Luca Zanni

Keyword(s):

Large Scale ◽

Quadratic Programs ◽

Parallel Solution

Download Full-text

Request, Coalesce, Serve, and Forget: Miss-Optimized Memory Systems for Bandwidth-Bound Cache-Unfriendly Applications on FPGAs

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3466823 ◽

2022 ◽

Vol 15 (2) ◽

pp. 1-33

Author(s):

Mikhail Asiatici ◽

Paolo Ienne

Keyword(s):

Large Scale ◽

Sparse Matrix ◽

Memory Systems ◽

Graph Analytics ◽

Matrix Vector Multiplication ◽

Area Reduction ◽

Cache Line ◽

Speed Up ◽

Memory Accesses ◽

On Chip

Applications such as large-scale sparse linear algebra and graph analytics are challenging to accelerate on FPGAs due to the short irregular memory accesses, resulting in low cache hit rates. Nonblocking caches reduce the bandwidth required by misses by requesting each cache line only once, even when there are multiple misses corresponding to it. However, such reuse mechanism is traditionally implemented using an associative lookup. This limits the number of misses that are considered for reuse to a few tens, at most. In this article, we present an efficient pipeline that can process and store thousands of outstanding misses in cuckoo hash tables in on-chip SRAM with minimal stalls. This brings the same bandwidth advantage as a larger cache for a fraction of the area budget, because outstanding misses do not need a data array, which can significantly speed up irregular memory-bound latency-insensitive applications. In addition, we extend nonblocking caches to generate variable-length bursts to memory, which increases the bandwidth delivered by DRAMs and their controllers. The resulting miss-optimized memory system provides up to 25% speedup with 24× area reduction on 15 large sparse matrix-vector multiplication benchmarks evaluated on an embedded and a datacenter FPGA system.

Download Full-text

Unobtrusive monitoring: Statistical dissemination latency estimation in Bitcoin’s peer-to-peer network

PLoS ONE ◽

10.1371/journal.pone.0243475 ◽

2020 ◽

Vol 15 (12) ◽

pp. e0243475

Author(s):

David Mödinger ◽

Jan-Hendrik Lorenz ◽

Rens W. van der Heijden ◽

Franz J. Hauck

Keyword(s):

Bayesian Model ◽

Large Scale ◽

Risk Estimation ◽

Peer To Peer ◽

Third Party ◽

Network Resources ◽

Connection Number ◽

Peer Network ◽

Time Required ◽

Peer To Peer Network

The cryptocurrency system Bitcoin uses a peer-to-peer network to distribute new transactions to all participants. For risk estimation and usability aspects of Bitcoin applications, it is necessary to know the time required to disseminate a transaction within the network. Unfortunately, this time is not immediately obvious and hard to acquire. Measuring the dissemination latency requires many connections into the Bitcoin network, wasting network resources. Some third parties operate that way and publish large scale measurements. Relying on these measurements introduces a dependency and requires additional trust. This work describes how to unobtrusively acquire reliable estimates of the dissemination latencies for transactions without involving a third party. The dissemination latency is modelled with a lognormal distribution, and we estimate their parameters using a Bayesian model that can be updated dynamically. Our approach provides reliable estimates even when using only eight connections, the minimum connection number used by the default Bitcoin client. We provide an implementation of our approach as well as datasets for modelling and evaluation. Our approach, while slightly underestimating the latency distribution, is largely congruent with observed dissemination latencies.

Download Full-text