High-Performance Matrix Computations --- 2015
- Summer semester 2015.
- CAMPUS #: 15ss-24886.
- Lectures begin: Tuesday, April 14.
-
Lectures & Exercises:
Tuesday, Thursday: 5.15pm Rogowski 115 - AICES seminar room (Schinkelstrasse 2)
- Office hours: Tuesdays, 11am-1pm. AICES R432 (Rogowski Building - Schinkelstrasse 2)
- 14.04 - Introduction. [Notes] [GER]
- 16.04 - Timers. Pipelining. Memory hierarchy, prefetching. [File]
- 21.04 - Locality. Time, performance, TPP, GEMM. [Notes]
- 23.04 - BLAS, scalability. [Notes]
- 28.04 - Storage by Rows & Cols. Caching & cache thrashing. [File] [File]
- 30.04 - Efficiency; turbo vs. heating. [BLAS reference] [File]
- 05.05 - BLAS interface. Tensors & GEMM. [Homework #1]; Due: Friday, May 15th, 1pm.
- 07.05 - Blocked vs. unblocked algorithms. Cholesky factorization. [File]
- 12.05 - Partitioned Matrix Expression, Cholesky variants. [Notes]
- 19.05 - How to optimize GEMM. [rvdgWIKI]
- 21.05 - #flops vs BLAS-level; multithreading (part 1) [File], [# FLOPS]
- 02.06 - review HW1; Least Squares
- 09.06 - ELAPS 1/2. [ELAPS on GitHub]
- 11.06 - ELAPS 2/2 [SandyBridge_MKL.cfg] [cluster batch system] [Homework #2]. Due: Saturday, June 20th, 23.59pm.
- 16.06 - Algorithms by blocks [Paper].
- 18.06 - Roofline Model [Paper]. Eigensolvers (intro).
- 23.06 - Bisection & Inverse Iteration [Section 2.3.1]
- 25.06 - The symmetric eigenproblem
- 30.06 - HW2 review [Archive].
- 02.07 - MRRR, sequential [Section 2.3.2]
- 07.07 - Final project [PDF] [file]
- 09.07 - MRRR, parallelism [Talk]
- 14.07 - Computing Petaflops over Teraflops of data [Paper]
- 16.07 - Semester review
- July 27, 28, 29, 30, 31
- August 3, 4, 5
October 2, 5
Prerequisites
Basic knowledge of numerical linear algebra.Principles of algorithms and programming.
Familiarity with Matlab and C.
Overview
The course centers around the idea of developing efficient numerical algorithms through a synergy between mathematics and architectures.We will cover all the following topics.
processor architecture (cpu, memory system, interconnect)
floating point operations
roofline model
vectorization
matrix-matrix product, BLAS
factorizations
methods of relatively robust representations (MR3)
blocked algorithms
algorithms by block
dynamic scheduling
data parallelism
shared memory vs. distributed memory paradigm
synchronization vs. communication
Schedule
Exams
First come first served.Slots available: