BLAS IV: A BLAS for Rk Matrix Algebra

2021 ◽  
Vol 35 (11) ◽  
pp. 1266-1267
Author(s):  
John Shaeffer

Basic Linear Algebra Subroutines (BLAS) are well-known low-level workhorse subroutines for linear algebra vector-vector, matrixvector and matrix-matrix operations for full rank matrices. The advent of block low rank (Rk) full wave direct solvers, where most blocks of the system matrix are Rk, an extension to the BLAS III matrix-matrix work horse routine is needed due to the agony of Rk addition. This note outlines the problem of BLAS III for Rk LU and solve operations and then outlines an alternative approach, which we will call BLAS IV. This approach utilizes the thrill of Rk matrix-matrix multiply and uses the Adaptive Cross Approximation (ACA) as a methodology to evaluate sums of Rk terms to circumvent the agony of low rank addition.

Author(s):  
Б.М. Глинский ◽  
В.И. Костин ◽  
Н.В. Кучин ◽  
С.А. Соловьев ◽  
В.А. Чеверда

Предложен алгоритм решения систем линейных алгебраических уравнений (СЛАУ), основанный на методе исключении Гаусса и предназначенный для решения уравнения Гельмгольца в трехмерных неоднородных средах. Для решения СЛАУ, возникающих в геофизических приложениях, разработана параллельная версия алгоритма, направленная на использование гетерогенных высокопроизводительных вычислительных систем, содержащих узлы с MPP- и SMP-архитектурой. Малоранговая аппроксимация, HSS-формат и динамическое распределение промежуточных результатов среди кластерных узлов позволяют решать задачи в разы большие, чем при использовании традиционных прямых методов, сохраняющих блоки $L$-фактора в полном ранге (Full-Rank, FR). Использование предложенного алгоритма позволяет сократить время расчетов, что актуально для решения трехмерных задач геофизики. Численные эксперименты подтверждают упомянутые преимущества предложенного малорангового прямого метода (Low-Rank, LR) по сравнению с прямыми FR-методами. На модельных геофизических задачах показана жизнеспособность реализованного алгоритма. An algorithm for solving systems of linear algebraic equations based on the Gaussian elimination method is proposed. The algorithm is aimed to solve boundary value problems for the Helmholtz equation in 3D heterogeneous media. In order to solve linear systems raised from geophysical applications, we developed a parallel version targeted on heterogeneous high-performance computing clusters (MPP and SMP architecture). Using the low-rank approximation technique and the HSS format allows us to solve problems larger than by the use of traditional direct solvers with saving the L-factor in full rank (FR). Using the proposed approach reduces computation time; it is the key-point of 3D geophysical problems. Numerical experiments demonstrate a number of advantages of the proposed low-rank approach in comparison with direct solvers (FR-approaches).


Behaviour ◽  
1987 ◽  
Vol 103 (4) ◽  
pp. 241-258 ◽  
Author(s):  
Anders Fernö

AbstractTerritorial mosaics of A. burtoni were studied in the laboratory. A difference in rank between neighbouring territorial fish was usually found, with the male with higher rank exhibiting more offensive behaviour and the opponent resisting more passively. A role asymmetry in boundary disputes was found in both high- and low-intensity aggression. Linear rank orders were formed. High rank was associated with a high aggressive and sexual activity towards non-territorial fish and a high mating succes". Territorial size was larger in superior males. A superior did not, however, generally expand his territory towards an inferior. This could be due to the involvement of escalated aggression with the reduction of territory. Most males of low rank did, however, eventually lose their territories. Establishing and losing territories were correlated with a low level of low-intensity aggression. Escalated fighting seldom occurred in spite of a strong competition for females, and aggression was usually limited to Frontal display and low-intensity aggression. Frontal display also played a key role for de-escalation of physical aggression. A. burtoni seems to follow the strategy "Honest", using a honestly graded display with few escalations.


2007 ◽  
Vol 29 (2) ◽  
pp. 496-529 ◽  
Author(s):  
Fernando De Terán ◽  
Froilán M. Dopico

2020 ◽  
Vol 34 (07) ◽  
pp. 10470-10477 ◽  
Author(s):  
Adrian Bulat ◽  
Jean Kossaifi ◽  
Georgios Tzimiropoulos ◽  
Maja Pantic

The prominence of deep learning, large amount of annotated data and increasingly powerful hardware made it possible to reach remarkable performance for supervised classification tasks, in many cases saturating the training sets. However the resulting models are specialized to a single very specific task and domain. Adapting the learned classification to new domains is a hard problem due to at least three reasons: (1) the new domains and the tasks might be drastically different; (2) there might be very limited amount of annotated data on the new domain and (3) full training of a new model for each new task is prohibitive in terms of computation and memory, due to the sheer number of parameters of deep CNNs. In this paper, we present a method to learn new-domains and tasks incrementally, building on prior knowledge from already learned tasks and without catastrophic forgetting. We do so by jointly parametrizing weights across layers using low-rank Tucker structure. The core is task agnostic while a set of task specific factors are learnt on each new domain. We show that leveraging tensor structure enables better performance than simply using matrix operations. Joint tensor modelling also naturally leverages correlations across different layers. Compared with previous methods which have focused on adapting each layer separately, our approach results in more compact representations for each new task/domain. We apply the proposed method to the 10 datasets of the Visual Decathlon Challenge and show that our method offers on average about 7.5× reduction in number of parameters and competitive performance in terms of both classification accuracy and Decathlon score.


Biosemiotics ◽  
2020 ◽  
Author(s):  
Dan Faltýnek ◽  
Ľudmila Lacková

AbstractThe concept of protosemiosis or semiosis at the lower levels of the living goes back to Giorgio Prodi, Thomas A. Sebeok and others. More recently, a typology of proto-signs was introduced by Sharov and Vehkavaara. Kull uses the term of vegetative semiosis, defined by iconicity, when referring to plants and lower organism semiosis. The criteria for the typology of proto-signs by Sharov and Vehkavaara are mostly based on two important presuppositions: agency and a lack of representation in low-level semiosis. We would like to focus on an alternative approach to protosign classification. In particular, we aim to provide a sign-typological characteristic of proteins (in analogy to Maran’ s classification of environmental signs). Our approach is focused on representation, that is, we only consider the relation between a sign and its object. We are considering representation independently from the role of interpretant and interpretation (which is an epiphenomenon of agency). Two hypotheses are investigated and accordingly evaluated in this paper: (I) Proteins are indexical protosigns. (II) Proteins are iconic protosigns. The conclusion our argumentation leads to supports the hypothesis (II).


2009 ◽  
Vol 19 (01) ◽  
pp. 159-174 ◽  
Author(s):  
Mostafa I. Soliman

Multi-core technology is a natural next step in delivering the benefits of Moore's law to computing platforms. On multi-core processors, the performance of many applications would be improved by parallel processing threads of codes using multi-threading techniques. This paper evaluates the performance of the multi-core Intel Xeon processors on the widely used basic linear algebra subprograms (BLAS). On two dual-core Intel Xeon processors with Hyper-Threading technology, our results show that a performance of around 20 GFLOPS is achieved on Level-3 (matrix-matrix operations) BLAS using multi-threading, SIMD, matrix blocking, and loop unrolling techniques. However, on a small size of Level-2 (matrix-vector operations) and Level-1 (vector operations) BLAS, the use of multi-threading technique speeds down the execution because of the thread creation overheads. Thus the use of Intel SIMD instruction set is the way to improve the performance of single-threaded Level-2 (6 GFLOPS) and Level-1 BLAS (3 GFLOPS). When the problem size becomes large (cannot fit in L2 cache), the performance of the four Xeon cores is less than 2 and 1 GFLOPS on Level-2 and Level-1 BLAS, respectively, even though eight threads are executed in parallel on eight logical processors.


2018 ◽  
Vol 2018 ◽  
pp. 1-14 ◽  
Author(s):  
Xun Wang ◽  
Tao Luo ◽  
Jianfeng Li

Achieving both simplicity and efficiency in fully homomorphic encryption (FHE) schemes is important for practical applications. In the simple FHE scheme proposed by Ducas and Micciancio (DM), ciphertexts are refreshed after each homomorphic operation. And ciphertext refreshing has become a major bottleneck for the overall efficiency of the scheme. In this paper, we propose a more efficient FHE scheme with fewer ciphertext refreshings. Based on the DM scheme and another simple FHE scheme proposed by Gentry, Sahai, and Waters (GSW), ciphertext matrix operations and ciphertext vector additions are both applied in our scheme. Compared with the DM scheme, one more homomorphic NOT AND (NAND) operation can be performed on ciphertexts before ciphertext refreshing. Results show that, under the same security parameters, the computational cost of our scheme is obviously lower than that of GSW and DM schemes for a depth-2 binary circuit with NAND gates. And the error rate of our scheme is kept at a sufficiently low level.


Sign in / Sign up

Export Citation Format

Share Document