Special parallel processor for lu decomposition of a large-scale sparse matrix

Telomere repeat factor 1 (TRF1) is a subunit of shelterin (also known as the telosome) and plays a critical role in inhibiting telomere elongation by telomerase. Tankyrase 1 (TNKS1) is a poly(ADP-ribose) polymerase that regulates the activity of TRF1 through poly(ADP-ribosyl)ation (PARylation). PARylation of TRF1 by TNKS1 leads to the release of TRF1 from telomeres and allows telomerase to access telomeres. The interaction between TRF1 and TNKS1 is thus important for telomere stability and the mitotic cell cycle. Here, the crystal structure of a complex between the N-terminal acidic domain of TRF1 (residues 1–55) and a fragment of TNKS1 covering the second and third ankyrin-repeat clusters (ARC2-3) is presented at 2.2 Å resolution. The TNKS1–TRF1 complex crystals were optimized using an `oriented rescreening' strategy, in which the initial crystallization condition was used as a guide for a second round of large-scale sparse-matrix screening. This crystallographic and biochemical analysis provides a better understanding of the TRF1–TNKS1 interaction and the three-dimensional structure of the ankyrin-repeat domain of TNKS.

Download Full-text

Request, Coalesce, Serve, and Forget: Miss-Optimized Memory Systems for Bandwidth-Bound Cache-Unfriendly Applications on FPGAs

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3466823 ◽

2022 ◽

Vol 15 (2) ◽

pp. 1-33

Author(s):

Mikhail Asiatici ◽

Paolo Ienne

Keyword(s):

Large Scale ◽

Sparse Matrix ◽

Memory Systems ◽

Graph Analytics ◽

Matrix Vector Multiplication ◽

Area Reduction ◽

Cache Line ◽

Speed Up ◽

Memory Accesses ◽

On Chip

Applications such as large-scale sparse linear algebra and graph analytics are challenging to accelerate on FPGAs due to the short irregular memory accesses, resulting in low cache hit rates. Nonblocking caches reduce the bandwidth required by misses by requesting each cache line only once, even when there are multiple misses corresponding to it. However, such reuse mechanism is traditionally implemented using an associative lookup. This limits the number of misses that are considered for reuse to a few tens, at most. In this article, we present an efficient pipeline that can process and store thousands of outstanding misses in cuckoo hash tables in on-chip SRAM with minimal stalls. This brings the same bandwidth advantage as a larger cache for a fraction of the area budget, because outstanding misses do not need a data array, which can significantly speed up irregular memory-bound latency-insensitive applications. In addition, we extend nonblocking caches to generate variable-length bursts to memory, which increases the bandwidth delivered by DRAMs and their controllers. The resulting miss-optimized memory system provides up to 25% speedup with 24× area reduction on 15 large sparse matrix-vector multiplication benchmarks evaluated on an embedded and a datacenter FPGA system.

Download Full-text

HotSpot Thermal Floorplan Solver Using Conjugate Gradient to Speed Up

Mobile Information Systems ◽

10.1155/2018/2921451 ◽

2018 ◽

Vol 2018 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Zhonghua Jiang ◽

Ning Xu

Keyword(s):

Simulated Annealing ◽

Conjugate Gradient ◽

Decomposition Method ◽

Sparse Matrix ◽

Lu Decomposition ◽

Ratio Curve ◽

Resistance Model ◽

Speed Up ◽

Iterative Framework ◽

Conjugate Gradient Solver

We proposed to use the conjugate gradient method to effectively solve the thermal resistance model in HotSpot thermal floorplan tool. The iterative conjugate gradient solver is suitable for traditional sparse matrix linear systems. We also defined the relative sparse matrix in the iterative thermal floorplan of Simulated Annealing framework algorithm, and the iterative method of relative sparse matrix could be applied to other iterative framework algorithms. The experimental results show that the running time of our incremental iterative conjugate gradient solver is speeded up approximately 11x compared with the LU decomposition method for case ami49, and the experiment ratio curve shows that our iterative conjugate gradient solver accelerated more with increasing number of modules.

Download Full-text

The Local Singular Boundary Method for Solution of Two-Dimensional Advection–Diffusion Equation

International Journal of Computational Methods ◽

10.1142/s0219876221500419 ◽

2021 ◽

pp. 2150041

Author(s):

Karel Kovářík ◽

Juraj Mužík

Keyword(s):

Diffusion Equation ◽

Large Scale ◽

Sparse Matrix ◽

Singular Boundary ◽

Tracer Transport ◽

Advection Diffusion Equation ◽

Advection Diffusion ◽

Boundary Method ◽

Local Variant ◽

Singular Boundary Method

This work focuses on the derivation of the local variant of the singular boundary method (SBM) for solving the advection-diffusion equation of tracer transport. Localization is based on the combination of SBM and finite collocation. Unlike the global variant, local SBM leads to a sparse matrix of the resulting system of equations, making it much more efficient to solve large-scale tasks. It also allows solving velocity vector variable tasks, which is a problem with global SBM. This paper compares the results on several examples for the steady and unsteady variant of the advection-diffusion equation and also examines the dependence of the accuracy of the solution on the density of the nodal grid and the size of the subdomain.

Download Full-text

The ensparsed LU decomposition method for large scale circuit transient analysis

Proceedings of 1998 Asia and South Pacific Design Automation Conference ◽

10.1109/aspdac.1998.669537 ◽

2002 ◽

Author(s):

R. Suda ◽

Y. Oyanagi

Keyword(s):

Decomposition Method ◽

Transient Analysis ◽

Large Scale ◽

Lu Decomposition

Download Full-text

The Massively Parallel Processor (MPP): A Large Scale SIMD Processor

10.1117/12.936455 ◽

1983 ◽

Cited By ~ 2

Author(s):

Paul A. Gilmore

Keyword(s):

Large Scale ◽

Massively Parallel ◽

Parallel Processor ◽

Massively Parallel Processor

Download Full-text

Lessons from ten years of crystallization experiments at the SGC

Acta Crystallographica Section D Structural Biology ◽

10.1107/s2059798315024687 ◽

2016 ◽

Vol 72 (2) ◽

pp. 224-235 ◽

Cited By ~ 20

Author(s):

Jia Tsing Ng ◽

Carien Dekker ◽

Paul Reardon ◽

Frank von Delft

Keyword(s):

Large Scale ◽

Sparse Matrix ◽

Protein Crystallization ◽

Data Sets ◽

Mixing Ratios ◽

Protein Sample ◽

Screening Experiments ◽

Crystallization Experiments ◽

Practical Guidelines ◽

Incubation Temperatures

Although protein crystallization is generally considered more art than science and remains significantly trial-and-error, large-scale data sets hold the promise of providing general learning. Observations are presented here from retrospective analyses of the strategies actively deployed for the extensive crystallization experiments at the Oxford site of the Structural Genomics Consortium (SGC), where comprehensive annotations by SGC scientists were recorded on a customized database infrastructure. The results point to the importance of using redundancy in crystallizing conditions, specifically by varying the mixing ratios of protein sample and precipitant, as well as incubation temperatures. No meaningful difference in performance could be identified between the four most widely used sparse-matrix screens, judged by the yield of crystals leading to deposited structures; this suggests that in general any comparison of screens will be meaningless without extensive cross-testing. Where protein sample is limiting, exploring more conditions has a higher likelihood of being informative by yielding hits than does redundancy of either mixing ratio or temperature. Finally, on the logistical question of how long experiments should be stored, 98% of all crystals that led to deposited structures appeared within 30 days. Overall, these analyses serve as practical guidelines for the design of initial screening experiments for new crystallization targets.

Download Full-text