# Publications - Henrik Barthels

### Journal Articles

**The Linear Algebra Mapping Problem. Current state of linear algebra languages and libraries.**ACM Transactions on Mathematical Software, Volume 48(3), pp. 1-30, September 2022.@article{Psarras2022:618, author = "Christos Psarras and Henrik Barthels and Paolo Bientinesi", title = "The Linear Algebra Mapping Problem. Current state of linear algebra languages and libraries.", journal = "ACM Transactions on Mathematical Software", year = 2022, volume = 48, number = 3, pages = "1--30", month = sep, doi = "10.1145/3549935", url = "https://arxiv.org/pdf/1911.09421v2.pdf" }

abstractwebPDFbibtexWe observe a disconnect between the developers and the end users of linear algebra libraries. On the one hand, the numerical linear algebra and the high-performance communities invest significant effort in the development and optimization of highly sophisticated numerical kernels and libraries, aiming at the maximum exploitation of both the properties of the input matrices, and the architectural features of the target computing platform. On the other hand, end users are progressively less likely to go through the error-prone and time consuming process of directly using said libraries by writing their code in C or Fortran; instead, languages and libraries such as Matlab, Julia, Eigen and Armadillo, which offer a higher level of abstraction, are becoming more and more popular. Users are given the opportunity to code matrix computations with a syntax that closely resembles the mathematical description; it is then a compiler or an interpreter that internally maps the input program to lower level kernels, as provided by libraries such as BLAS and LAPACK. Unfortunately, our experience suggests that in terms of performance, this translation is typically vastly suboptimal. In this paper, we first introduce the Linear Algebra Mapping Problem, and then investigate how effectively a benchmark of test problems is solved by popular high-level programming languages. Specifically, we consider Matlab, Octave, Julia, R, Armadillo (C++), Eigen (C++), and NumPy (Python); the benchmark is meant to test both standard compiler optimizations such as common subexpression elimination and loop-invariant code motion, as well as linear algebra specific optimizations such as optimal parenthesization of a matrix product and kernel selection for matrices with properties. The aim of this study is to give concrete guidelines for the development of languages and libraries that support linear algebra computations.**Linnea: Automatic Generation of Efficient Linear Algebra Programs**ACM Transactions on Mathematical Software (TOMS), Volume 47(3), pp. 1-26, June 2021.@article{Barthels2021:688, author = "Henrik Barthels and Christos Psarras and Paolo Bientinesi", title = "Linnea: Automatic Generation of Efficient Linear Algebra Programs", journal = "ACM Transactions on Mathematical Software (TOMS)", year = 2021, volume = 47, number = 3, pages = "1--26", month = jun, doi = "10.1145/3446632", url = "https://arxiv.org/pdf/1912.12924.pdf" }

abstractwebPDFbibtexThe translation of linear algebra computations into efficient sequences of library calls is a non-trivial task that requires expertise in both linear algebra and high-performance computing. Almost all high-level languages and libraries for matrix computations (e.g., Matlab, Eigen) internally use optimized kernels such as those provided by BLAS and LAPACK; however, their translation algorithms are often too simplistic and thus lead to a suboptimal use of said kernels, resulting in significant performance losses. In order to combine the productivity offered by high-level languages, and the performance of low-level kernels, we are developing Linnea, a code generator for linear algebra problems. As input, Linnea takes a high-level description of a linear algebra problem; as output, it returns an efficient sequence of calls to high-performance kernels. Linnea uses a custom best-first search algorithm to find a first solution in less than a second, and increasingly better solutions when given more time. In 125 test problems, the code generated by Linnea almost always outperforms Matlab, Julia, Eigen and Armadillo, with speedups up to and exceeding 10x.**MatchPy: Pattern Matching in Python**Journal of Open Source Software, Volume 3(26), pp. 2, June 2018.

### Peer Reviewed Conference Publications

**MOM: Matrix Operations in MLIR**Proceedings of the 12th International Workshop on Polyhedral Compilation Techniques, May 2022.@inproceedings{Chiellini2022:920, author = "Lorenzo Chiellini and Henrik Barthels and Marcin Copik and {Tobias } Grosser and Paolo Bientinesi and {Daniele } Spampinato", title = "MOM: Matrix Operations in MLIR", year = 2022, address = "Budapest, Hungary", month = may }

abstractbibtexModern research in code generators for dense linear algebra computations has shown the ability to produce optimized code with a performance which compares and often exceeds the one of state-of-the-art implementations by domain experts. However, the underlying infrastructure is often developed in isolation making the interconnection of logically combinable systems complicated if not impossible. In this paper, we propose to leverage MLIR as a unifying compiler infrastructure for the optimization of dense linear algebra operations. We propose a new MLIR dialect for expressing linear algebraic computations including matrix properties to enable high-level algorithmic transformations. The integration of this new dialect in MLIR enables end-to-end compilation of matrix computations via conversion to existing lower-level dialects already provided by the framework.**Automatic Generation of Efficient Linear Algebra Programs**Proceedings of the Platform for Advanced Scientific Computing Conference (PASC20), June 2020.@inproceedings{Barthels2020:200, author = "Henrik Barthels and Christos Psarras and Paolo Bientinesi", title = "Automatic Generation of Efficient Linear Algebra Programs", booktitle = "Proceedings of the Platform for Advanced Scientific Computing Conference (PASC20)", year = 2020, month = jun, journal = "CoRR" }

abstractwebbibtexThe level of abstraction at which application experts reason about linear algebra computations and the level of abstraction used by developers of high-performance numerical linear algebra libraries do not match. The former is conveniently captured by high-level languages and libraries such as Matlab and Eigen, while the latter expresses the kernels included in the BLAS and LAPACK libraries. Unfortunately, the translation from a high-level computation to an efficient sequence of kernels is a task, far from trivial, that requires extensive knowledge of both linear algebra and high-performance computing. Internally, almost all high-level languages and libraries use efficient kernels; however, the translation algorithms are too simplistic and thus lead to a suboptimal use of said kernels, with significant performance losses. In order to both achieve the productivity that comes with high-level languages, and make use of the efficiency of low level kernels, we are developing Linnea, a code generator for linear algebra problems. As input, Linnea takes a high-level description of a linear algebra problem and produces as output an efficient sequence of calls to high-performance kernels. In 25 application problems, the code generated by Linnea always outperforms Matlab, Julia, Eigen and Armadillo, with speedups up to and exceeding 10x.**Code Generation in Linnea**Proceedings of the 6th International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY 2019), 22 June 2019.

Extended abstract.@inproceedings{Barthels2019:800, author = "Henrik Barthels and Paolo Bientinesi", title = "Code Generation in Linnea", year = 2019, address = "Phoenix, Arizona", month = jun, note = "Extended abstract", url = "http://hpac.cs.umu.se/~barthels/publications/ARRAY_2019.pdf" }

abstractPDFbibtexLinnea is a code generator for the translation of high-level linear algebra problems to efficient code. Unlike other languages and libraries for linear algebra, Linnea heavily relies on domain-specific knowledge to rewrite expressions and infer matrix properties. Here we focus on two aspects related to code generation and matrix properties: 1) The automatic generation of code consisting of explicit calls to BLAS and LAPACK kernels, and the corresponding challenge with specialized storage formats. 2) A general notion of banded matrices can be used to simplify the inference of many matrix properties. While it is crucial to make use of matrix properties to achieve high performance, inferring those properties is challenging. We show how matrix bandwidth can be used as a unifying language to reason about many common matrix properties.**The Generalized Matrix Chain Algorithm**Proceedings of 2018 IEEE/ACM International Symposium on Code Generation and Optimization, pp. 11, 24 February 2018.@inproceedings{Barthels2018:130, author = "Henrik Barthels and Marcin Copik and Paolo Bientinesi", title = "The Generalized Matrix Chain Algorithm", booktitle = "Proceedings of 2018 IEEE/ACM International Symposium on Code Generation and Optimization", year = 2018, pages = 11, address = "Vienna, Austria", month = feb, url = "https://arxiv.org/pdf/1804.04021.pdf" }

abstractwebPDFbibtexIn this paper, we present a generalized version of the matrix chain algorithm to generate efficient code for linear algebra problems, a task for which human experts often invest days or even weeks of works. The standard matrix chain problem consists in finding the parenthesization of a matrix product $M := A_1 A_2 \cdots A_n$ that minimizes the number of scalar operations. In practical applications, however, one frequently encounters more complicated expressions, involving transposition, inversion, and matrix properties. Indeed, the computation of such expressions relies on a set of computational kernels that offer functionality well beyond the simple matrix product. The challenge then shifts from finding an optimal parenthesization to finding an optimal mapping of the input expression to the available kernels. Furthermore, it is often the case that a solution based on the minimization of scalar operations does not result in the optimal solution in terms of execution time. In our experiments, the generated code outperforms other libraries and languages on average by a factor of about 5. The motivation for this work comes from the fact that---despite great advances in the development of compilers---the task of mapping linear algebra problems to optimized kernels is still to be done manually. In order to relieve the user from this complex task, new techniques for the compilation of linear algebra expressions have to be developed.**Efficient Pattern Matching in Python**Proceedings of the 7th Workshop on Python for High-Performance and Scientific Computing, November 2017.

In conjunction with SC17: The International Conference for High Performance Computing, Networking, Storage and Analysis.@inproceedings{Krebber2017:404, author = "Manuel Krebber and Henrik Barthels and Paolo Bientinesi", title = "Efficient Pattern Matching in Python", booktitle = "Proceedings of the 7th Workshop on Python for High-Performance and Scientific Computing", year = 2017, address = "Denver, Colorado", month = nov, note = "In conjunction with SC17: The International Conference for High Performance Computing, Networking, Storage and Analysis", url = "https://arxiv.org/pdf/1710.00077.pdf" }

abstractwebPDFbibtexPattern matching is a powerful tool for symbolic computations. Applications include term rewriting systems, as well as the manipulation of symbolic expressions, abstract syntax trees, and XML and JSON data. It also allows for an intuitive description of algorithms in the form of rewrite rules. We present the open source Python module MatchPy, which offers functionality and expressiveness similar to the pattern matching in Mathematica. In particular, it includes syntactic pattern matching, as well as matching for commutative and/or associative functions, sequence variables, and matching with constraints. MatchPy uses new and improved algorithms to efficiently find matches for large pattern sets by exploiting similarities between patterns. The performance of MatchPy is investigated on several real-world problems.**Linnea: Compiling Linear Algebra Expressions to High-Performance Code**Proceedings of the 8th International Workshop on Parallel Symbolic Computation, July 2017.@inproceedings{Barthels2017:254, author = "Henrik Barthels and Paolo Bientinesi", title = "Linnea: Compiling Linear Algebra Expressions to High-Performance Code", booktitle = "Proceedings of the 8th International Workshop on Parallel Symbolic Computation", year = 2017, address = "Kaiserslautern, Germany", month = jul, doi = "10.1145/3115936.3115937" }

abstractwebbibtexLinear algebra expressions appear in fields as diverse as computational biology, signal processing, communication technology, finite element methods, and control theory. Libraries such as BLAS and LAPACK provide highly optimized building blocks for just about any linear algebra computation; thus, a linear algebra expression can be evaluated efficiently by breaking it down into those building blocks. However, this is a challenging problem, requiring knowledge in high-performance computing, compilers, and numerical linear algebra. In this paper we give an overview of existing solutions, and introduce Linnea, a compiler that solves this problem. As shown through a set of test cases, Linnea’s results are comparable with those obtained by human experts.**MatchPy: A Pattern Matching Library**Proceedings of the 15th Python in Science Conference, July 2017.@inproceedings{Krebber2017:550, author = "Manuel Krebber and Henrik Barthels and Paolo Bientinesi", title = "MatchPy: A Pattern Matching Library", booktitle = "Proceedings of the 15th Python in Science Conference", year = 2017, address = "Austin, Texas", month = jul, url = "https://arxiv.org/pdf/1710.06915.pdf" }

abstractwebPDFbibtexPattern matching is a powerful tool for symbolic computations, based on the well-defined theory of term rewriting systems. Application domains include algebraic expressions, abstract syntax trees, and XML and JSON data. Unfortunately, no lightweight implementation of pattern matching as general and flexible as Mathematica exists for Python Mathics,MacroPy,patterns,PyPatt. Therefore, we created the open source module MatchPy which offers similar pattern matching functionality in Python using a novel algorithm which finds matches for large pattern sets more efficiently by exploiting similarities between patterns.**A Compiler for Linear Algebra Operations**SPLASH '16 Companion, ACM, October 2016.

Student Research Competition.@inproceedings{Barthels2016:280, author = "Henrik Barthels", title = "A Compiler for Linear Algebra Operations", booktitle = "SPLASH '16 Companion", year = 2016, address = "Amsterdam, Netherlands", month = oct, publisher = "ACM", doi = "10.1145/2984043.2998539", note = "Student Research Competition" }

abstractwebbibtexIn this paper we present a compiler that translates arithmetic expressions containing matrices to efficient sequences of calls to basic linear algebra kernels.**Juggrnaut – An Abstract JVM**International Conference on Formal Verification of Object-Oriented Software, pp. 142-159, 2011.@inproceedings{Heinen2011:370, author = "Jonathan Heinen and Henrik Barthels and Christina Jansen", title = "Juggrnaut – An Abstract JVM", booktitle = "International Conference on Formal Verification of Object-Oriented Software", year = 2011, pages = "142--159", organization = "Springer", url = "http://hpac.cs.umu.se/~barthels/publications/FoVeOOS_2011.pdf" }

abstractPDFbibtexWe introduce a new kind of hypergraphs and hyperedge replacement grammars, where nodes are associated types. We use them to adapt the abstraction framework Juggrnaut presented by us in [7, 8] – for the verification of Java Bytecode programs. The framework is extended to handle additional concepts needed for the analysis of Java Bytecode like null pointers and method stacks as well as local and static variables. We define the abstract transition rules for a significant subset of opcodes and show how to compute the abstract state space. Finally we complete the paper with some experimental results.

### Theses

**Linnea: A Compiler for Mapping Linear Algebra Problems onto High-Performance Kernel Libraries**Dissertation, RWTH Aachen University, August 2021.PDFbibtex@phdthesis{Barthels2021:754, author = "Henrik Barthels", title = "Linnea: A Compiler for Mapping Linear Algebra Problems onto High-Performance Kernel Libraries", school = "RWTH Aachen University", year = 2021, type = "Dissertation", month = aug, url = "https://hpac.cs.umu.se/~barthels/publications/dissertation.pdf" }

**Systematic Generation of Algorithms for Iterative Methods**Master's Thesis, RWTH Aachen University, March 2015.webPDFbibtex@mastersthesis{Barthels2015:240, author = "Henrik Barthels", title = "Systematic Generation of Algorithms for Iterative Methods", school = "RWTH Aachen University", year = 2015, type = "Master's Thesis", address = "Aachen, Germany", month = mar, url = "https://arxiv.org/pdf/1703.00279" }

### Others

**Linnea: A Compiler for Linear Algebra Operations**May 2017.

ACM Student Research Competition Grand Finals Candidates, 2016 - 2017.@misc{Barthels2017:260, author = "Henrik Barthels", title = "Linnea: A Compiler for Linear Algebra Operations", howpublished = "ACM Student Research Competition Grand Finals Candidates, 2016 - 2017", month = may, year = 2017, note = "ACM Student Research Competition Grand Finals Candidates, 2016 - 2017", url = "http://hpac.cs.umu.se/~barthels/publications/ACM_SRC_2016_final_round.pdf" }

abstractwebPDFbibtexThe evaluation of linear algebra expressions is a central part of both languages for scientific computing such as Julia and Matlab, and libraries such as Eigen, Blaze, and NumPy. However, the existing strategies are still rather primitive. At present, the only way to achieve high performance is by handcoding algorithms using libraries such as BLAS and LAPACK, a task that requires extensive knowledge in linear algebra, numerical linear algebra and high-performance computing. We present Linnea, the prototype of a compiler that automates the translation of linear algebra expressions to an efficient sequence of calls to BLAS and LAPACK kernels. Initial results indicate that the algorithms generated by Linnea significantly outperform existing high-level languages by a factor of up to six.**A Compiler for Linear Algebra Operations**Poster, 2 November 2016.

ACM Student Research Competition at SPLASH 2016.PDFbibtex@misc{Barthels2016:948, author = "Henrik Barthels", title = "A Compiler for Linear Algebra Operations", howpublished = "ACM Student Research Competition at SPLASH 2016", month = nov, year = 2016, note = "ACM Student Research Competition at SPLASH 2016", address = "Amsterdam, Netherlands", type = "Poster", url = "http://hpac.cs.umu.se/~barthels/publications/ACM_SRC_2016_poster.pdf" }

**The Matrix Chain Algorithm to Compile Linear Algebra Expressions**Proceedings of the 4th Workshop on Domain Specific Language Design and Implementation (DSLDI), Extended abstract, 31 October 2016.**Automata-Based Detection of Hypergraph Embeddings**Bachelor's Thesis, RWTH Aachen University, September 2011.@misc{Barthels2011:804, author = "Henrik Barthels", title = "Automata-Based Detection of Hypergraph Embeddings", month = sep, year = 2011, type = "Bachelor's Thesis, RWTH Aachen University", url = "http://hpac.cs.umu.se/~barthels/publications/bachelor_thesis.pdf" }

abstractPDFbibtexThe verification of programs using pointers and dynamic data structures requires to deal with potentially infinite state spaces. Because of that, it is reasonable to use abstraction techniques capable of dealing with those potentially infinite structures. The Juggrnaut framework applies hyperedge replacement grammars to dynamically abstract and concretize parts of a heap. Abstraction is done by the backwards application of grammar rules, which is related to subgraph isomorphism and therefore NP-complete. In this thesis, after giving a formal definition to hypergraphs, hyperedge replacement and heap representation, an automata model is introduced which is able to detect embeddings of grammar rules with certain properties efficiently. We provide an algorithm to construct an automaton that is able to detect a given set of embeddings of grammar rules. Finally, proofs of the NP-completeness of subgraph isomorphism on hypergraphs and embedding detection in general are presented.