# Recent Talks

**High-Performance Tensor Computations: Where Do We Stand?**SIAM Conference on Computational Science and Engineering.

Dallas (via Zoom), March 2021.abstractPDFSince the introduction of the BLAS-1 library 40+ years ago, the entire domain of matrix computations has evolved around well defined layers, and a few "container" libraries that included state-of-the-art algorithms/implementations for a specific class of problems and/or a specific type of parallelism; these libraries served and are still serving the needs of a vast ecosystem of applications. In stark contrast, the domain of tensor computations still lacks a set of building blocks, and many similar libraries are developed in different application domains. This situation inevitably leads to redundancy and to suboptimal results. Furthermore, the software landscape for tensor computations is fragmented in terms of features, programming languages, and computing platforms, to the point that comparisons between new and existing algorithms are excessively challenging. In this talk we survey the software for high-performance tensor computations and make suggestions for an initial set of computational building blocks.**Tensor computations: A fragmented landscape**Huawei, January 2021.abstractPDFSince the 1970s, the domain of matrix computations has evolved around well established libraries (e.g., LINPACK, BLAS, LAPACK, PETSc) with clearly defined interfaces. Such libraries include state-of-the-art algorithms for specific problems and/or for specific types of parallelism. Written in Fortran and/or C, these libraries served and are still serving the needs of a vast ecosystem of applications. In stark contrast, the domain of tensor computations is evolving in a seemingly uncoordinated fashion. Countless libraries and packages are being developed in different application domains, resulting in redundancy and suboptimal performance. The software landscape is fragmented in terms of features, programming languages, and computing platforms, to the point of obstructing comparisons between new and existing algorithms. In this talk we first give an overview of different tensor operations as they arise in scientific computing and data science. We focus on software, and contrast the development of linear algebra and tensor libraries. Finally, we make suggestions for an initial set of computational building blocks for tensor operations, and discuss related challenges.**Achieving the compute bound with CP-CALS**BLIS Retreat 2020.

5 October 2020.**How fast can you drive your computer?**Kunskapsveckan 2020, Umeå, October 2020.abstractPDFComputationally, nowadays any computer is incredibly powerful. As a reference, the computational power of any of today's laptops surpasses that of the 1996 world's fastest supercomputer. By analogy with cars, it is as if everyone owned a Ferrari, a Lamborghini, or a Formula 1 car. And the same way only selected drivers can race such supercars at more than 300Kms/hour, exploiting the full potential of a computer is a challenging task that only few experts can perform. In most cases, users rely on well known programming languages and compilers, trusting them to "drive their computers fast enough". Is this really the case? Our investigation illustrates that while programming languages are excellent at computations with numbers, they still cannot compete with human solutions when it comes to more advanced mathematical operations that involve vectors and matrices. Our study aims at giving directions for the future development of programming languages.**Achieving the compute bound with Concurrect Alternating Least Squares for the Canonical Polyadic Decomposition (CP-CALS)**IRTG Annual Meeting.

July 2020.**Programming languages and linear algebra computations**Uppsala University, April 2020.abstractPDFMatrix computations appear in virtually every domain of science and engineering, and due to the seemingly unstoppable growth of data science, are now as widespread as ever. In response to such a massive demand, the numerical linear algebra community puts tremendous effort in the identification, analysis, and optimization of a reasonably small set of simple operations---such as those included in the BLAS and LAPACK libraries---that serve as building blocks for most users' target computations. However, in a recent investigation, we observed a noticeable disconnect between the developers and the end users of numerical linear algebra libraries: Low-level languages such as C and Fortran and progressively less likely to be used, in favor of high-level programming languages such as Matlab, Julia, and R, make it possible to code matrix computations at the same level of abstraction at which experts reason about them. Unfortunately, our experience suggests that this increase in productivity often comes together with significantly suboptimal algorithms. Under the cover, all such languages face the problem of expressing a target matrix computation in terms of available building blocks; we refer to this problem as the "Linear Algebra Mapping Problem" (LAMP). In this talk, we define the problem, present the challenges it poses, and carefully survey how it is currently solved by the state-of-the-art languages.**Building blocks: From matrix to tensor operations**Dagstuhl Seminar 20111, Tensor Computations: Applications and Optimization, March 2020.abstractPDFThe entire domain of matrix computations is tightly connected to the concept of building blocks, i.e., computational kernels that support one well defined mathematical operation. For almost 50 years, the linear algebra community has been identifying, layering, implementing, and optimizing building blocks. Significant benefits include portable performance, self-documenting quality of code, and robust implementations. Furthermore, standardization of interfaces and wide adoption paved the road towards automated approaches to code generation and performance tuning, and enabled abstraction (via high-level languages). Nowadays there exists a sophisticated and comprehensive hierarchy of libraries for dense and sparse linear algebra (e.g., BLAS, LAPACK, PETSc, etc.); these libraries provide invaluable support for a vast ecosystem of applications. We are convinced that the tensor community could benefit from similar ideas. The need for a better computational infrastructure was publicly recognized already in 2009 (if not earlier) at a workshop organized by Charles Van Loan. Despite many years of development, in 2020 the software landscape for tensor computations is still heavily fragmented and dominated by sophisticated MATLAB toolboxes and niche C++ libraries. Libraries similar to the BLAS and LAPACK are not even on the radar. We believe that it is (still) time for a major community effort to agree on the functionality and possibly the interface of a few low-level tensor operations, to then focus on high performance and parallel scalability.**How good (or bad) is your favorite language for linear algebra?**UMIT Research Lab, Umeå University, November 2019.abstractMatrix computations appear in virtually every domain of science and engineering, and due to the seemingly unstoppable growth of data science, are now as widespread as ever. Such a massive demand triggered two distinct developments. On the one hand, the numerical linear algebra community has put tremendous effort in the identification, analysis and optimization of a reasonably small set of simple operations---such as those included in the BLAS and LAPACK libraries---that serve as building blocks for most users' target computations. On the other hand, the computer science community delivered several high-level programming languages---such as Matlab, Julia, R---as well as libraries---such as Eigen, Armadillo, NumPy---that make it possible to code matrix computations at the same level of abstraction at which experts reason about them. In this talk we investigate how sophisticated these high-level tools are.**From Problem to Solution in one Cl1ck**Umeå University, Annual Celebration 2019, October 2019.abstractwebPDFAs computers become increasingly powerful, it becomes more and more challenging to take advantage of their potential. Scientists and engineers alike are constantly facing the dilemma of whether to invest months in the development of sophisticated and efficient code, or to opt for code which is quick to generate, but severely suboptimal in terms of performance. In the first case, computer efficiency comes at the cost of human productivity; in the second case, human productivity comes at the expense of time and energy wasted. Prof. Bientinesi's research focuses on mathematical computations that arise in disciplines such as biology, chemistry, data science and materials science, and aims at achieving simultaneously both human productivity and computer efficiency. The objective is to allow domain experts to easily express their problems with a computer language, and then have computers--and not humans--automatically generate efficient code. Automation lowers computing costs, and at the same time allows scientists to perform more science.**Efficiently Mapping Linear Algebra to High-Performance Code (Poster session)**TACCSTER 2019.

September 2019.