Home People Publications Talks Teaching Contact Github

Publications - Aravind Sankaran

Peer Reviewed Conference Publications

  1. Performance Comparison for Scientific Computations on the Edge via Relative Performance
    Aravind Sankaran and Paolo Bientinesi
    Proceedings of the 3rd Workshop on Parallel AI and Systems for the Edge (PAISE 2021), March 2021.
        author = "Aravind  Sankaran and Paolo Bientinesi",
        title  = "Performance Comparison for Scientific Computations on the Edge via Relative Performance",
        year   = 2021,
        month  = mar,
        url    = "https://arxiv.org/pdf/2102.12740.pdf"
    In a typical Internet-of-Things setting that involves scientific applications, a target computation can be evaluated in many different ways depending on the split of computations among various devices. On the one hand, different implementations (or algorithms)---equivalent from a mathematical perspective---might exhibit significant difference in terms of performance. On the other hand, some of the implementations are likely to show similar performance characteristics. In this paper, we focus on analysing the performance of a given set of algorithms by clustering them into performance classes. To this end, we use a measurement-based approach to evaluate and score algorithms based on pair-wise comparisons; we refer to this approach as ``Relative performance analysis". Each comparison yields one of three outcomes: one algorithm can be ``better", ``worse", or ``equivalent" to another; those algorithms evaluating to have ``equivalent'' performance are merged into the same performance class. We show that our clustering methodology facilitates algorithm selection with respect to more than one metric; for instance, from the subset of equivalently fast algorithms, one could then select an algorithm that consumes the least energy on a certain device.
  2. TTC: A Tensor Transposition Compiler for Multiple Architectures
    Paul Springer, Aravind Sankaran and Paolo Bientinesi
    Proceedings of the 3rd International Workshop on Libraries, Languages and Compilers for Programming (ARRAY 2016), June 2016.
        author = "Paul Springer and Aravind  Sankaran and Paolo Bientinesi",
        title  = "TTC: A Tensor Transposition Compiler for Multiple Architectures",
        year   = 2016,
        month  = jun,
        url    = "https://arxiv.org/pdf/1607.01249.pdf"
    We consider the problem of transposing tensors of arbitrary dimension and describe TTC, an open source domain-specific parallel compiler. TTC generates optimized parallel C++/CUDA C code that achieves a significant fraction of the system's peak memory bandwidth. TTC exhibits high performance across multiple architectures, including modern AVX-based systems (e.g.,~Intel Haswell, AMD Steamroller), Intel's Knights Corner as well as different CUDA-based GPUs such as NVIDIA's Kepler and Maxwell architectures. We report speedups of TTC over a meaningful baseline implementation generated by external C++ compilers; the results suggest that a domain-specific compiler can outperform its general purpose counterpart significantly: For instance, comparing with Intel's latest C++ compiler on the Haswell and Knights Corner architecture, TTC yields speedups of up to 8x and 32x, respectively. We also showcase TTC's support for multiple leading dimensions, making it a suitable candidate for the generation of performance-critical packing functions that are at the core of the ubiquitous BLAS 3 routines.