High-Performance Matrix Computations --- 2011
Prerequisites
Basic knowledge of numerical linear algebra.
Principles of algorithms and programming.
Familiarity with Matlab and C.
Overview
This course is research and programming oriented.
-
Summer semester 2011.
-
CAMPUS #: 11ss-24886
Prof. Paolo Bientinesi & Prof. Martin Buecker
-
Lectures
Mondays, 17.15-18.45, AICES R432 (Rogowski Building - Schinkelstrasse 2)
Tuesdays, 17.00-18.30, AICES R432 (Rogowski Building - Schinkelstrasse 2)
-
Office hours
Tuesdays, 11am-1pm, AICES R432 (Rogowski Building - Schinkelstrasse 2)
-
First meeting: Tuesday, April 5th, 5pm.
Lectures begin: Monday, April 11th, 5.15pm.
-
Classes
-
April 5th
Introduction (Paolo & Martin).
-
April 11th & 12th
ALU, memory hierarchy, pipelining, caching, cost metrics, performance.
Examples: DGER, DGEMM.
Reference: "Computer organization and design: the hardware/software interface", by
David A. Patterson and John L. Hennessy.
-
April 18th & 19th
BLAS, HPL.
-
April 26th
Parallel performance, scalability.
-
May 2nd, 3rd
LAPACK;
collective communication, 2D matrix distribution, analysis of MV; Strassen.
-
May 9th
Intro to iterative methods. (Martin)
-
May 10th
Linear systems vs. eigenproblems.
Question (BLAS 3): Can you use BLAS3 routines if your matrices are stored by rows?
Program (LAPACK): Write a program that computes the
eigenvalues W and eigenvectors Z of a symmetric matrix A
using the Divide & Conquer method. Check the correctness of
your program by computing the residual ||A Z - Z W|| and the
orthogonality ||Z^T Z - I||. Link your program against an
optimized BLAS (GotoBLAS, MKL, ...), and if possible compare
the timings with the reference (unoptimized) BLAS.
-
May 16th & 17th
Generalized, standard, tridiagonal eigenproblems;
Gerschgorin theorems;
reductions & backtransformations.
Program (Mathematica/Matlab): Write a program that mimics the proof of the second Gerschgorin's theorem.
-
May 23th & 24th
Blocked reduction to tridiagonal form;
bisection, Sturm's count.
-
May 30th & 31th
GPU programming. (Christian Lessig)
Material:
Lecture 1
Lecture 2
[PDF]
On the design of display processors.
T. H. Myer and I. E. Sutherland.
Commun. ACM 11(6) 1968.
-
June 6th & 7th
The algorithm of Multiple Relatively Robust Representations (MRRR).
Parallelization for distributed and shared memory architectures.
-
June 18th & 19th
June 25th & 26th
July 4th & 5th
Sparse matrix-vector multiplication. Graphs & hypergraphs. (Martin)
-
July 11th (final class)
Multithreaded BLAS vs. algorithms by blocks.