Talks - Lars Karlsson

Concurrent Alternating Least Squares and Jackknife Resampling for Canonical Polyadic Decompositions
Lars Karlsson and Paolo Bientinesi
ISC High Performance, Hamburg, Germany, June 2023.
Speaker: Lars Karlsson.
In the domains of matrix and tensor computations, the most typical approach to speed up a workflow consists in optimizing the underlying building blocks, i.e., operations such as the matrix product and LU factorization (for matrices), or tensor contractions and the MTTKRP (for tensors). While undeniably useful and effective, this approach is inherently limited by the rigid interface and boundaries of each individual building block, which prevent multi-operation optimizations. Inspired by a workflow we observed in chemometrics, namely that of fitting repeatedly one same tensor to many different models, we consider the problem of concurrently computing multiple CP decompositions. We recently published CALS (Concurrent ALS) that simultaneously computes multiple CP decompositions of the same tensor using Alternating Least Squares. The arithmetic intensity of the computation is increased by fusing independent MTTKRP operations. When the rank is small, each individual ALS computation is inherently memory-bound, but CALS makes the whole set of computations compute-bound, thus enabling the use of efficient kernels, including offloading to accelerators. We also adapted the idea to support jackknife resampling, a technique used to to estimate the uncertainties in the parameters of CP decompositions. In jackknife, the underlying tensor is nearly, but not exactly, the same. Nevertheless, the idea of concurrent ALS applies, resulting in significant speedups for the entire workflow.
abstract hide
Building blocks: From matrix to tensor operations
Paolo Bientinesi and Lars Karlsson
Dagstuhl Seminar 20111, Tensor Computations: Applications and Optimization, March 2020.
The entire domain of matrix computations is tightly connected to the concept of building blocks, i.e., computational kernels that support one well defined mathematical operation. For almost 50 years, the linear algebra community has been identifying, layering, implementing, and optimizing building blocks. Significant benefits include portable performance, self-documenting quality of code, and robust implementations. Furthermore, standardization of interfaces and wide adoption paved the road towards automated approaches to code generation and performance tuning, and enabled abstraction (via high-level languages). Nowadays there exists a sophisticated and comprehensive hierarchy of libraries for dense and sparse linear algebra (e.g., BLAS, LAPACK, PETSc, etc.); these libraries provide invaluable support for a vast ecosystem of applications. We are convinced that the tensor community could benefit from similar ideas. The need for a better computational infrastructure was publicly recognized already in 2009 (if not earlier) at a workshop organized by Charles Van Loan. Despite many years of development, in 2020 the software landscape for tensor computations is still heavily fragmented and dominated by sophisticated MATLAB toolboxes and niche C++ libraries. Libraries similar to the BLAS and LAPACK are not even on the radar. We believe that it is (still) time for a major community effort to agree on the functionality and possibly the interface of a few low-level tensor operations, to then focus on high performance and parallel scalability.
abstract PDF hide