Optimization of the Multishift QR Algorithm with Coprocessors for Non-Hermitian Eigenvalue Problems

2011 ◽  
Vol 1 (2) ◽  
pp. 187-196
Author(s):  
Takafumi Miyata ◽  
Yusaku Yamamoto ◽  
Takashi Uneyama ◽  
Yoshimasa Nakamura ◽  
Shao-Liang Zhang

AbstractThe multishift QR algorithm is efficient for computing all the eigenvalues of a dense, large-scale, non-Hermitian matrix. The major part of this algorithm can be performed by matrix-matrix multiplications and is therefore suitable for modern processors with hierarchical memory. A variant of this algorithm was recently proposed which can execute more computational parts by matrix-matrix multiplications. The algorithm is especially appropriate for recent coprocessors which contain many processor-elements such as the CSX600. However, the performance of the algorithm highly depends on the setting of parameters such as the numbers of shifts and divisions in the algorithm. Optimal settings are different depending on the matrix size and computational environments. In this paper, we construct a performance model to predict a setting of parameters which minimizes the execution time of the algorithm. Experimental results with the CSX600 coprocessor show that our model can be used to find the optimal setting.

2008 ◽  
Vol 8 (4) ◽  
pp. 336-349 ◽  
Author(s):  
L. GRASEDYCK ◽  
W. HACKBUSCH ◽  
R. KRIEMANN

AbstractIn this paper we review the technique of hierarchical matrices and put it into the context of black-box solvers for large linear systems. Numerical examples for several classes of problems from medium- to large-scale illustrate the applicability and efficiency of this technique. We compare the results with those of several direct solvers (which typically scale quadratically in the matrix size) as well as an iterative solver (algebraic multigrid) which scales linearly (if it converges in O(1) steps).


Author(s):  
Lin Lin ◽  
Xiaojie Wu

The Hartree-Fock-Bogoliubov (HFB) theory is the starting point for treating superconducting systems. However, the computational cost for solving large scale HFB equations can be much larger than that of the Hartree-Fock equations, particularly when the Hamiltonian matrix is sparse, and the number of electrons $N$ is relatively small compared to the matrix size $N_{b}$. We first provide a concise and relatively self-contained review of the HFB theory for general finite sized quantum systems, with special focus on the treatment of spin symmetries from a linear algebra perspective. We then demonstrate that the pole expansion and selected inversion (PEXSI) method can be particularly well suited for solving large scale HFB equations. For a Hubbard-type Hamiltonian, the cost of PEXSI is at most $\Or(N_b^2)$ for both gapped and gapless systems, which can be significantly faster than the standard cubic scaling diagonalization methods. We show that PEXSI can solve a two-dimensional Hubbard-Hofstadter model with $N_b$ up to $2.88\times 10^6$, and the wall clock time is less than $100$ s using $17280$ CPU cores. This enables the simulation of physical systems under experimentally realizable magnetic fields, which cannot be otherwise simulated with smaller systems.


Author(s):  
Alice Cortinovis ◽  
Daniel Kressner

AbstractRandomized trace estimation is a popular and well-studied technique that approximates the trace of a large-scale matrix B by computing the average of $$x^T Bx$$ x T B x for many samples of a random vector X. Often, B is symmetric positive definite (SPD) but a number of applications give rise to indefinite B. Most notably, this is the case for log-determinant estimation, a task that features prominently in statistical learning, for instance in maximum likelihood estimation for Gaussian process regression. The analysis of randomized trace estimates, including tail bounds, has mostly focused on the SPD case. In this work, we derive new tail bounds for randomized trace estimates applied to indefinite B with Rademacher or Gaussian random vectors. These bounds significantly improve existing results for indefinite B, reducing the number of required samples by a factor n or even more, where n is the size of B. Even for an SPD matrix, our work improves an existing result by Roosta-Khorasani and Ascher (Found Comput Math, 15(5):1187–1212, 2015) for Rademacher vectors. This work also analyzes the combination of randomized trace estimates with the Lanczos method for approximating the trace of f(B). Particular attention is paid to the matrix logarithm, which is needed for log-determinant estimation. We improve and extend an existing result, to not only cover Rademacher but also Gaussian random vectors.


Materials ◽  
2021 ◽  
Vol 14 (9) ◽  
pp. 2431
Author(s):  
Wen Zhang ◽  
Juanjuan Wang ◽  
Xue Han ◽  
Lele Li ◽  
Enping Liu ◽  
...  

In this paper, effective separation of oil from both immiscible oil–water mixtures and oil-in-water (O/W) emulsions are achieved by using poly(dimethylsiloxane)-based (PDMS-based) composite sponges. A modified hard template method using citric acid monohydrate as the hard template and dissolving it in ethanol is proposed to prepare PDMS sponge composited with carbon nanotubes (CNTs) both in the matrix and the surface. The introduction of CNTs endows the composite sponge with enhanced comprehensive properties including hydrophobicity, absorption capacity, and mechanical strength than the pure PDMS. We demonstrate the successful application of CNT-PDMS composite in efficient removal of oil from immiscible oil–water mixtures within not only a bath absorption, but also continuous separation for both static and turbulent flow conditions. This notable characteristic of the CNT-PDMS sponge enables it as a potential candidate for large-scale industrial oil–water separation. Furthermore, a polydopamine (PDA) modified CNT-PDMS is developed here, which firstly realizes the separation of O/W emulsion without continuous squeezing of the sponge. The combined superhydrophilic and superoleophilic property of PDA/CNT-PDMS is assumed to be critical in the spontaneously demulsification process.


Author(s):  
Fayu Wang ◽  
Nicholas Kyriakides ◽  
Christis Chrysostomou ◽  
Eleftherios Eleftheriou ◽  
Renos Votsis ◽  
...  

AbstractFabric reinforced cementitious matrix (FRCM) composites, also known as textile reinforced mortars (TRM), an inorganic matrix constituting fibre fabrics and cement-based mortar, are becoming a widely used composite material in Europe for upgrading the seismic resistance of existing reinforced concrete (RC) frame buildings. One way of providing seismic resistance upgrading is through the application of the proposed FRCM system on existing masonry infill walls to increase their stiffness and integrity. To examine the effectiveness of this application, the bond characteristics achieved between (a) the matrix and the masonry substrate and (b) the fabric and the matrix need to be determined. A series of experiments including 23 material performance tests, 15 direct tensile tests of dry fabric and composites, and 30 shear bond tests between the matrix and brick masonry, were carried out to investigate the fabric-to-matrix and matrix-to-substrate bond behaviour. In addition, different arrangements of extruded polystyrene (XPS) plates were applied to the FRCM to test the shear bond capacity of this insulation system when used on a large-scale wall.


Author(s):  
Martin Schreiber ◽  
Pedro S Peixoto ◽  
Terry Haut ◽  
Beth Wingate

This paper presents, discusses and analyses a massively parallel-in-time solver for linear oscillatory partial differential equations, which is a key numerical component for evolving weather, ocean, climate and seismic models. The time parallelization in this solver allows us to significantly exceed the computing resources used by parallelization-in-space methods and results in a correspondingly significantly reduced wall-clock time. One of the major difficulties of achieving Exascale performance for weather prediction is that the strong scaling limit – the parallel performance for a fixed problem size with an increasing number of processors – saturates. A main avenue to circumvent this problem is to introduce new numerical techniques that take advantage of time parallelism. In this paper, we use a time-parallel approximation that retains the frequency information of oscillatory problems. This approximation is based on (a) reformulating the original problem into a large set of independent terms and (b) solving each of these terms independently of each other which can now be accomplished on a large number of high-performance computing resources. Our results are conducted on up to 3586 cores for problem sizes with the parallelization-in-space scalability limited already on a single node. We gain significant reductions in the time-to-solution of 118.3× for spectral methods and 1503.0× for finite-difference methods with the parallelization-in-time approach. A developed and calibrated performance model gives the scalability limitations a priori for this new approach and allows us to extrapolate the performance of the method towards large-scale systems. This work has the potential to contribute as a basic building block of parallelization-in-time approaches, with possible major implications in applied areas modelling oscillatory dominated problems.


2022 ◽  
Vol 48 (1) ◽  
pp. 1-36
Author(s):  
Mirko Myllykoski

The QR algorithm is one of the three phases in the process of computing the eigenvalues and the eigenvectors of a dense nonsymmetric matrix. This paper describes a task-based QR algorithm for reducing an upper Hessenberg matrix to real Schur form. The task-based algorithm also supports generalized eigenvalue problems (QZ algorithm) but this paper concentrates on the standard case. The task-based algorithm adopts previous algorithmic improvements, such as tightly-coupled multi-shifts and Aggressive Early Deflation (AED) , and also incorporates several new ideas that significantly improve the performance. This includes, but is not limited to, the elimination of several synchronization points, the dynamic merging of previously separate computational steps, the shortening and the prioritization of the critical path, and experimental GPU support. The task-based implementation is demonstrated to be multiple times faster than multi-threaded LAPACK and ScaLAPACK in both single-node and multi-node configurations on two different machines based on Intel and AMD CPUs. The implementation is built on top of the StarPU runtime system and is part of the open-source StarNEig library.


Author(s):  
Nikta Shayanfar ◽  
Heike Fassbender

The polynomial eigenvalue problem is to find the eigenpair of $(\lambda,x) \in \mathbb{C}\bigcup \{\infty\} \times \mathbb{C}^n \backslash \{0\}$ that satisfies $P(\lambda)x=0$, where $P(\lambda)=\sum_{i=0}^s P_i \lambda ^i$ is an $n\times n$ so-called matrix polynomial of degree $s$, where the coefficients $P_i, i=0,\cdots,s$, are $n\times n$ constant matrices, and $P_s$ is supposed to be nonzero. These eigenvalue problems arise from a variety of physical applications including acoustic structural coupled systems, fluid mechanics, multiple input multiple output systems in control theory, signal processing, and constrained least square problems. Most numerical approaches to solving such eigenvalue problems proceed by linearizing the matrix polynomial into a matrix pencil of larger size. Such methods convert the eigenvalue problem into a well-studied linear eigenvalue problem, and meanwhile, exploit and preserve the structure and properties of the original eigenvalue problem. The linearizations have been extensively studied with respect to the basis that the matrix polynomial is expressed in. If the matrix polynomial is expressed in a special basis, then it is desirable that its linearization be also expressed in the same basis. The reason is due to the fact that changing the given basis ought to be avoided \cite{H1}. The authors in \cite{ACL} have constructed linearization for different bases such as degree-graded ones (including monomial, Newton and Pochhammer basis), Bernstein and Lagrange basis. This contribution is concerned with polynomial eigenvalue problems in which the matrix polynomial is expressed in Hermite basis. In fact, Hermite basis is used for presenting matrix polynomials designed for matching a series of points and function derivatives at the prescribed nodes. In the literature, the linearizations of matrix polynomials of degree $s$, expressed in Hermite basis, consist of matrix pencils with $s+2$ blocks of size $n \times n$. In other words, additional eigenvalues at infinity had to be introduced, see e.g. \cite{CSAG}. In this research, we try to overcome this difficulty by reducing the size of linearization. The reduction scheme presented will gradually reduce the linearization to its minimal size making use of ideas from \cite{VMM1}. More precisely, for $n \times n$ matrix polynomials of degree $s$, we present linearizations of smaller size, consisting of $s+1$ and $s$ blocks of $n \times n$ matrices. The structure of the eigenvectors is also discussed.


2021 ◽  
Vol 30 (3) ◽  
pp. 59-75
Author(s):  
M. A. Golovchin

In 2016-2018 the state in Russia adopted a package of program documents, which implies the transfer of education to the large-scale introduction of digital technologies. This phenomenon has been called “digitalization of education”. In scientific literature, electronization and digitalization are increasingly called one of the institutional traps for the development of Russian universities, since the corresponding institutional environment has not yet been formed due to the forced nature of innovations. As a result, the processes of introducing new technologies into education are still not regulated. Within the framework of the purpose of the study, the manifestations of the trap of electronization and digitalization of Russian higher education were analyzed on the basis of sociological data, and the theoretical modeling of the process of adaptation of educational agents to the institution of digitalization was carried out.In the course of the study, the approaches were summarized that have been developed in discussions on educational digitalization. The article presents the author’s vision of the studied phenomenon as an institutional trap; as well as understanding of the institutional features and characteristics of electronization and digitalization in education.The research method is the analysis of estimates obtained in the course of an expert survey which was conducted by the Vologda Scientific Center of the Russian Academy of Sciences among the representatives of the teaching staff of state universities in the Vologda region. In the course of this analysis, the indicators of educational digitalization as an effective innovation were clarified such as an increased accessibility of educational resources; simplification of communication and the process of transferring knowledge from teacher to student; increased opportunities for training specialists for the new (digital) economy; improving the quality of education in universities, etc. Based on the results of the empirical study, it has been determined that the conditions for the development of digitalization in Russian universities are currently ambiguous, which is closely related to the level of competitiveness of the educational organization.The scientific novelty of the research consists in the presentation of an original matrix describing the process of university employees adaptation to the conditions of digital transformation of education. The matrix is proposed on the basis of a sociological analysis of the impact of the trap of electronization and digitalization on the activities of educational agents. The matrix can be taken into account in the practice of higher education management.


Sign in / Sign up

Export Citation Format

Share Document