Thallo – Scheduling for High-Performance Large-Scale Non-Linear Least-Squares Solvers

2021 ◽  
Vol 40 (5) ◽  
pp. 1-14
Author(s):  
Michael Mara ◽  
Felix Heide ◽  
Michael Zollhöfer ◽  
Matthias Nießner ◽  
Pat Hanrahan

Large-scale optimization problems at the core of many graphics, vision, and imaging applications are often implemented by hand in tedious and error-prone processes in order to achieve high performance (in particular on GPUs), despite recent developments in libraries and DSLs. At the same time, these hand-crafted solver implementations reveal that the key for high performance is a problem-specific schedule that enables efficient usage of the underlying hardware. In this work, we incorporate this insight into Thallo, a domain-specific language for large-scale non-linear least squares optimization problems. We observe various code reorganizations performed by implementers of high-performance solvers in the literature, and then define a set of basic operations that span these scheduling choices, thereby defining a large scheduling space. Users can either specify code transformations in a scheduling language or use an autoscheduler. Thallo takes as input a compact, shader-like representation of an energy function and a (potentially auto-generated) schedule, translating the combination into high-performance GPU solvers. Since Thallo can generate solvers from a large scheduling space, it can handle a large set of large-scale non-linear and non-smooth problems with various degrees of non-locality and compute-to-memory ratios, including diverse applications such as bundle adjustment, face blendshape fitting, and spatially-varying Poisson deconvolution, as seen in Figure 1. Abstracting schedules from the optimization, we outperform state-of-the-art GPU-based optimization DSLs by an average of 16× across all applications introduced in this work, and even some published hand-written GPU solvers by 30%+.

2014 ◽  
pp. 77-81
Author(s):  
Chefi Triki ◽  
Lucio Grandinetti

In this paper we discuss the use computational grids to solve stochastic optimization problems. These problems are generally difficult to solve and are often characterized by a high number of variables and constraints. Furthermore, for some applications it is required to achieve a real-time solution. Obtaining reasonable results is a difficult objective without the use of high performance computing. Here we present a grid-enabled path-following algorithm and we discuss some experimental results.


Author(s):  
Nived Chebrolu ◽  
Thomas Labe ◽  
Olga Vysotska ◽  
Jens Behley ◽  
Cyrill Stachniss

2000 ◽  
Vol 77 (5) ◽  
pp. 669 ◽  
Author(s):  
Sidney Young ◽  
Andrzej Wierzbicki

1995 ◽  
Vol 117 (1) ◽  
pp. 155-157 ◽  
Author(s):  
F. C. Anderson ◽  
J. M. Ziegler ◽  
M. G. Pandy ◽  
R. T. Whalen

We have examined the feasibility of using massively-parallel and vector-processing supercomputers to solve large-scale optimization problems for human movement. Specifically, we compared the computational expense of determining the optimal controls for the single support phase of gait using a conventional serial machine (SGI Iris 4D25), a MIMD parallel machine (Intel iPSC/860), and a parallel-vector-processing machine (Cray Y-MP 8/864). With the human body modeled as a 14 degree-of-freedom linkage actuated by 46 musculotendinous units, computation of the optimal controls for gait could take up to 3 months of CPU time on the Iris. Both the Cray and the Intel are able to reduce this time to practical levels. The optimal solution for gait can be found with about 77 hours of CPU on the Cray and with about 88 hours of CPU on the Intel. Although the overall speeds of the Cray and the Intel were found to be similar, the unique capabilities of each machine are better suited to different portions of the computational algorithm used. The Intel was best suited to computing the derivatives of the performance criterion and the constraints whereas the Cray was best suited to parameter optimization of the controls. These results suggest that the ideal computer architecture for solving very large-scale optimal control problems is a hybrid system in which a vector-processing machine is integrated into the communication network of a MIMD parallel machine.


Author(s):  
Jie Guo ◽  
Zhong Wan

A new spectral three-term conjugate gradient algorithm in virtue of the Quasi-Newton equation is developed for solving large-scale unconstrained optimization problems. It is proved that the search directions in this algorithm always satisfy a sufficiently descent condition independent of any line search. Global convergence is established for general objective functions if the strong Wolfe line search is used. Numerical experiments are employed to show its high numerical performance in solving large-scale optimization problems. Particularly, the developed algorithm is implemented to solve the 100 benchmark test problems from CUTE with different sizes from 1000 to 10,000, in comparison with some similar ones in the literature. The numerical results demonstrate that our algorithm outperforms the state-of-the-art ones in terms of less CPU time, less number of iteration or less number of function evaluation.


Sign in / Sign up

Export Citation Format

Share Document