Publications - Marcin Copik

Journal Article

Work-stealing prefix scan: Addressing load imbalance in large-scale image registration
Marcin Copik, Tobias Grosser, Torsten Hoefler, Paolo Bientinesi and Benjamin Berkels
IEEE Transactions on Parallel and Distributed Systems, Volume 33(3), pp. 523-535, 2022.
```
@article{Copik2022:900,
    author  = "Marcin Copik and {Tobias } Grosser and {Torsten } Hoefler and Paolo Bientinesi and Benjamin Berkels",
    title   = "Work-stealing prefix scan: Addressing load imbalance in large-scale image registration",
    journal = "IEEE Transactions on Parallel and Distributed Systems",
    year    = 2022,
    volume  = 33,
    number  = 3,
    pages   = "523-535",
    url     = "https://arxiv.org/pdf/2010.12478.pdf"
}
```
Parallelism patterns (e.g., map or reduce) have proven to be effective tools for parallelizing high-performance applications. In this paper, we study the recursive registration of a series of electron microscopy images – a time consuming and imbalanced computation necessary for nano-scale microscopy analysis. We show that by translating the image registration into a specific instance of the prefix scan, we can convert this seemingly sequential problem into a parallel computation that scales to over thousand of cores. We analyze a variety of scan algorithms that behave similarly for common low-compute operators and propose a novel work-stealing procedure for a hierarchical prefix scan. Our evaluation shows that by identifying a suitable and well-optimized prefix scan algorithm, we reduce time-to-solution on a series of 4,096 images spanning ten seconds of microscopy acquisition from over 10 hours to less than 3 minutes (using 1024 Intel Haswell cores), enabling derivation of material properties at nanoscale for long microscopy image series.
abstractwebPDFbibtexhide

Peer Reviewed Conference Publications

MOM: Matrix Operations in MLIR
Lorenzo Chiellini, Henrik Barthels, Marcin Copik, Tobias Grosser, Paolo Bientinesi and Daniele Spampinato
Proceedings of the 12th International Workshop on Polyhedral Compilation Techniques, May 2022.
```
@inproceedings{Chiellini2022:920,
    author  = "Lorenzo Chiellini and Henrik Barthels and Marcin Copik and {Tobias } Grosser and Paolo Bientinesi and {Daniele } Spampinato",
    title   = "MOM: Matrix Operations in MLIR",
    year    = 2022,
    address = "Budapest, Hungary",
    month   = may
}
```
Modern research in code generators for dense linear algebra computations has shown the ability to produce optimized code with a performance which compares and often exceeds the one of state-of-the-art implementations by domain experts. However, the underlying infrastructure is often developed in isolation making the interconnection of logically combinable systems complicated if not impossible. In this paper, we propose to leverage MLIR as a unifying compiler infrastructure for the optimization of dense linear algebra operations. We propose a new MLIR dialect for expressing linear algebraic computations including matrix properties to enable high-level algorithmic transformations. The integration of this new dialect in MLIR enables end-to-end compilation of matrix computations via conversion to existing lower-level dialects already provided by the framework.
abstractbibtexhide
The Generalized Matrix Chain Algorithm
Henrik Barthels, Marcin Copik and Paolo Bientinesi
Proceedings of 2018 IEEE/ACM International Symposium on Code Generation and Optimization, pp. 11, 24 February 2018.
```
@inproceedings{Barthels2018:130,
    author    = "Henrik Barthels and Marcin Copik and Paolo Bientinesi",
    title     = "The Generalized Matrix Chain Algorithm",
    booktitle = "Proceedings of 2018 IEEE/ACM International Symposium on Code Generation and Optimization",
    year      = 2018,
    pages     = 11,
    address   = "Vienna, Austria",
    month     = feb,
    url       = "https://arxiv.org/pdf/1804.04021.pdf"
}
```
In this paper, we present a generalized version of the matrix chain algorithm to generate efficient code for linear algebra problems, a task for which human experts often invest days or even weeks of works. The standard matrix chain problem consists in finding the parenthesization of a matrix product $M := A_1 A_2 \cdots A_n$ that minimizes the number of scalar operations. In practical applications, however, one frequently encounters more complicated expressions, involving transposition, inversion, and matrix properties. Indeed, the computation of such expressions relies on a set of computational kernels that offer functionality well beyond the simple matrix product. The challenge then shifts from finding an optimal parenthesization to finding an optimal mapping of the input expression to the available kernels. Furthermore, it is often the case that a solution based on the minimization of scalar operations does not result in the optimal solution in terms of execution time. In our experiments, the generated code outperforms other libraries and languages on average by a factor of about 5. The motivation for this work comes from the fact that---despite great advances in the development of compilers---the task of mapping linear algebra problems to optimized kernels is still to be done manually. In order to relieve the user from this complex task, new techniques for the compilation of linear algebra expressions have to be developed.
abstractwebPDFbibtexhide