Home People Publications Talks Teaching Contact Github

Recent Talks

  1. Concurrent Alternating Least Squares and Jackknife Resampling for Canonical Polyadic Decompositions
    ISC High Performance, Hamburg, Germany, June 2023.
    Speaker: Lars Karlsson.
    In the domains of matrix and tensor computations, the most typical approach to speed up a workflow consists in optimizing the underlying building blocks, i.e., operations such as the matrix product and LU factorization (for matrices), or tensor contractions and the MTTKRP (for tensors). While undeniably useful and effective, this approach is inherently limited by the rigid interface and boundaries of each individual building block, which prevent multi-operation optimizations. Inspired by a workflow we observed in chemometrics, namely that of fitting repeatedly one same tensor to many different models, we consider the problem of concurrently computing multiple CP decompositions. We recently published CALS (Concurrent ALS) that simultaneously computes multiple CP decompositions of the same tensor using Alternating Least Squares. The arithmetic intensity of the computation is increased by fusing independent MTTKRP operations. When the rank is small, each individual ALS computation is inherently memory-bound, but CALS makes the whole set of computations compute-bound, thus enabling the use of efficient kernels, including offloading to accelerators. We also adapted the idea to support jackknife resampling, a technique used to to estimate the uncertainties in the parameters of CP decompositions. In jackknife, the underlying tensor is nearly, but not exactly, the same. Nevertheless, the idea of concurrent ALS applies, resulting in significant speedups for the entire workflow.
    abstracthide
  2. Current state of programming languages for linear algebra computations
    TU Delft, DCSE High Performance Computing Symposium, Delft, The Netherlands, June 2023.
  3. High-Performance Matrix Computations: We Need More Than Fast Libraries
    SIAM Conference on Computational Science and Engineering.
    Amsterdam, NL, February 2023.
  4. Matrix computations: Going beyond libraries
    eSSENCE, Swedish e-Science Academy, Umeå, Sweden, October 2022.
  5. The fragmented landscape of tensor computations
    Chalmers University, 4th Workshop on Scientific Computing in Sweden (SwedComp22), Göteborg, Sweden, October 2022.
  6. High-performance matrix computations: It’s not all about libraries
    RWTH Aachen University, EU Regional School, Aachen, Germany, May 2022.
  7. Software for tensor computations: What is happening???
    Dagstuhl Seminar 22101, Tensor Computations: Applications and Optimization, Dagstul, Germany, March 2022.
  8. The MTTKRP, a closer look
    Dagstuhl Seminar 22101, Tensor Computations: Applications and Optimization.
    March 2022.
  9. Parallel Algorithms --- Introduction to High Performance Computing
    PDC summer school on High-Performance Computing, KTH, Stockholm, August 2021.
  10. High-Performance Tensor Computations: Where Do We Stand?
    SIAM Conference on Computational Science and Engineering.
    Dallas (via Zoom), March 2021.
    Since the introduction of the BLAS-1 library 40+ years ago, the entire domain of matrix computations has evolved around well defined layers, and a few "container" libraries that included state-of-the-art algorithms/implementations for a specific class of problems and/or a specific type of parallelism; these libraries served and are still serving the needs of a vast ecosystem of applications. In stark contrast, the domain of tensor computations still lacks a set of building blocks, and many similar libraries are developed in different application domains. This situation inevitably leads to redundancy and to suboptimal results. Furthermore, the software landscape for tensor computations is fragmented in terms of features, programming languages, and computing platforms, to the point that comparisons between new and existing algorithms are excessively challenging. In this talk we survey the software for high-performance tensor computations and make suggestions for an initial set of computational building blocks.
    abstractPDFhide