Study of the vectorization efficiency of loop nests with an irregular number of iterations

Алексей Анатольевич Рыбаков; Сергей Сергеевич Шумилин

doi:10.25209/2079-3316-2019-10-4-77-96

Study of the vectorization efficiency of loop nests with an irregular number of iterations

Program systems theory and applications ◽

10.25209/2079-3316-2019-10-4-77-96 ◽

2019 ◽

Vol 10 (4) ◽

pp. 77-96

Author(s):

Алексей Анатольевич Рыбаков ◽

Сергей Сергеевич Шумилин

Keyword(s):

Loop Nests ◽

Number Of Iterations

Векторизация вычислений является важной низкоуровневой оптимизацией, используемой для создания высокоэффективного параллельного кода. Особенности набора инструкций AVX-512 позволяют применять векторизацию для сложного программного контекста, в частности для гнезд циклов и циклов с сильно разветвленным управлением. При использовании векторных инструкций для контекста с неизвестным профилем исполнения существует опасность низкой эффективности векторизации. Особенно ярко это проявляется при векторизации гнезд циклов с нерегулярным числом итераций внутреннего цикла. В статье рассматривается практический подход к векторизации гнезд циклов, основанный на предикатном представлении программы. В качестве примера приводится реализация сортировки Шелла, компактная реализация которой состоит из гнезда циклов, в котором количество итераций внутреннего цикла носит нерегулярный характер и зависит от номеров итераций внешних циклов. Такой контекст является крайне неудобным для векторизации. Приводится сравнение теоретической и практической эффективности векторизации сортировки Шелла, рассматриваются особенности этого программного контекста и объясняется их негативное влияние на производительность векторизованного кода. Полученные результаты могут быть использованы исследователями и разработчиками программного обеспечения для обнаружения причин низкой эффективности векторизации программного кода с похожими особенностями.

Download Full-text

A Note on the Decomposition Methods for Support Vector Regression

Neural Computation ◽

10.1162/089976602753712936 ◽

2002 ◽

Vol 14 (6) ◽

pp. 1267-1281 ◽

Cited By ~ 14

Author(s):

Shuo-Peng Liao ◽

Hsuan-Tien Lin ◽

Chih-Jen Lin

Keyword(s):

Support Vector Regression ◽

Decomposition Method ◽

Decomposition Methods ◽

Support Vector ◽

Dual Formulation ◽

Working Set ◽

Number Of Iterations

The dual formulation of support vector regression involves two closely related sets of variables. When the decomposition method is used, many existing approaches use pairs of indices from these two sets as the working set. Basically, they select a base set first and then expand it so all indices are pairs. This makes the implementation different from that for support vector classification. In addition, a larger optimization subproblem has to be solved in each iteration. We provide theoretical proofs and conduct experiments to show that using the base set as the working set leads to similar convergence (number of iterations). Therefore, by using a smaller working set while keeping a similar number of iterations, the program can be simpler and more efficient.

Download Full-text

A practical tile size selection model for affine loop nests

Proceedings of the ACM International Conference on Supercomputing ◽

10.1145/3447818.3462213 ◽

2021 ◽

Author(s):

Kumudha Narasimhan ◽

Aravind Acharya ◽

Abhinav Baid ◽

Uday Bondhugula

Keyword(s):

Selection Model ◽

Size Selection ◽

Tile Size ◽

Loop Nests

Download Full-text

On the Locally Polynomial Complexity of the Projection-Gradient Method for Solving Piecewise Quadratic Optimisation Problems

Entropy ◽

10.3390/e23040465 ◽

2021 ◽

Vol 23 (4) ◽

pp. 465

Author(s):

Agnieszka Prusińska ◽

Krzysztof Szkatuła ◽

Alexey Tret’yakov

Keyword(s):

Computational Complexity ◽

Polynomial Complexity ◽

Initial Point ◽

Quadratic Functions ◽

Problem Dimension ◽

Piecewise Quadratic Functions ◽

Large Systems ◽

Systems Of Linear Inequalities ◽

Number Of Iterations ◽

Finite Number Of Iterations

This paper proposes a method for solving optimisation problems involving piecewise quadratic functions. The method provides a solution in a finite number of iterations, and the computational complexity of the proposed method is locally polynomial of the problem dimension, i.e., if the initial point belongs to the sufficiently small neighbourhood of the solution set. Proposed method could be applied for solving large systems of linear inequalities.

Download Full-text

A Comparative Study among New Hybrid Root Finding Algorithms and Traditional Methods

Mathematics ◽

10.3390/math9111306 ◽

2021 ◽

Vol 9 (11) ◽

pp. 1306

Author(s):

Elsayed Badr ◽

Sultan Almotairi ◽

Abdallah El Ghamry

Keyword(s):

Comparative Study ◽

Numerical Results ◽

Traditional Methods ◽

Running Time ◽

Root Finding ◽

Average Running Time ◽

False Position ◽

Newton Raphson ◽

Number Of Iterations

In this paper, we propose a novel blended algorithm that has the advantages of the trisection method and the false position method. Numerical results indicate that the proposed algorithm outperforms the secant, the trisection, the Newton–Raphson, the bisection and the regula falsi methods, as well as the hybrid of the last two methods proposed by Sabharwal, with regard to the number of iterations and the average running time.

Download Full-text

A Termination Criterion for Probabilistic Point Clouds Registration

Signals ◽

10.3390/signals2020013 ◽

2021 ◽

Vol 2 (2) ◽

pp. 159-173

Author(s):

Simone Fontana ◽

Domenico Giorgio Sorrenti

Keyword(s):

State Of The Art ◽

Point Clouds ◽

Computational Time ◽

Termination Criteria ◽

Practical Usefulness ◽

Local Point ◽

Termination Criterion ◽

Probabilistic Point ◽

Excessive Number ◽

Number Of Iterations

Probabilistic Point Clouds Registration (PPCR) is an algorithm that, in its multi-iteration version, outperformed state-of-the-art algorithms for local point clouds registration. However, its performances have been tested using a fixed high number of iterations. To be of practical usefulness, we think that the algorithm should decide by itself when to stop, on one hand to avoid an excessive number of iterations and waste computational time, on the other to avoid getting a sub-optimal registration. With this work, we compare different termination criteria on several datasets, and prove that the chosen one produces very good results that are comparable to those obtained using a very large number of iterations, while saving computational time.

Download Full-text

Scanning integer points with lex-inequalities: a finite cutting plane algorithm for integer programming with linear objective

4OR ◽

10.1007/s10288-020-00459-6 ◽

2020 ◽

Author(s):

Michele Conforti ◽

Marianna De Santis ◽

Marco Di Summa ◽

Francesco Rinaldi

Keyword(s):

Integer Point ◽

Cutting Plane ◽

Integer Program ◽

Compact Set ◽

Cutting Plane Method ◽

Integer Points ◽

Gomory Cuts ◽

The Family ◽

Split Cuts ◽

Number Of Iterations

AbstractWe consider the integer points in a unimodular cone K ordered by a lexicographic rule defined by a lattice basis. To each integer point x in K we associate a family of inequalities (lex-inequalities) that define the convex hull of the integer points in K that are not lexicographically smaller than x. The family of lex-inequalities contains the Chvátal–Gomory cuts, but does not contain and is not contained in the family of split cuts. This provides a finite cutting plane method to solve the integer program $$\min \{cx: x\in S\cap \mathbb {Z}^n\}$$ min { c x : x ∈ S ∩ Z n } , where $$S\subset \mathbb {R}^n$$ S ⊂ R n is a compact set and $$c\in \mathbb {Z}^n$$ c ∈ Z n . We analyze the number of iterations of our algorithm.

Download Full-text

Converting MST to TSP Path by Branch Elimination

Applied Sciences ◽

10.3390/app11010177 ◽

2020 ◽

Vol 11 (1) ◽

pp. 177

Author(s):

Pasi Fränti ◽

Teemu Nenonen ◽

Mingchuan Yuan

Keyword(s):

Spanning Tree ◽

Closed Loop ◽

Minimum Spanning Tree ◽

Travelling Salesman Problem ◽

Open Loop ◽

Travelling Salesman ◽

Elimination Algorithm ◽

Number Of Iterations ◽

Number Of Branches

Travelling salesman problem (TSP) has been widely studied for the classical closed loop variant but less attention has been paid to the open loop variant. Open loop solution has property of being also a spanning tree, although not necessarily the minimum spanning tree (MST). In this paper, we present a simple branch elimination algorithm that removes the branches from MST by cutting one link and then reconnecting the resulting subtrees via selected leaf nodes. The number of iterations equals to the number of branches (b) in the MST. Typically, b << n where n is the number of nodes. With O-Mopsi and Dots datasets, the algorithm reaches gap of 1.69% and 0.61 %, respectively. The algorithm is suitable especially for educational purposes by showing the connection between MST and TSP, but it can also serve as a quick approximation for more complex metaheuristics whose efficiency relies on quality of the initial solution.

Download Full-text