For each project, I provide a number of questions and key points to be answered/addressed in your reports. Use the given links as starting points. Do some research and find more -good- references to deepen your knowledge on the matter. Remember: - Build a story - Narrow the focus and dig in one or two details - Give good examples / own experimentation - Show you learnt/understood the topic - Use your own words --------------------------- Expression templates vs Smart expression templates: - What are expression templates (ET)? - What are smart expression templates (SET)? - What is the basic idea behind them? (Give examples) - How do they improve over ET? - Which tools/libraries make use of them? (Give a general overview of these libraries) * Keywords: Expression Templates, smart expression templates, lazy evaluation, memory bound, compute bound. * Links: * Expression Templates by Todd Veldhuizen: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.43.248 * Techniques for Scientific C++ by Todd Veldhuizen: www.cs.indiana.edu/pub/techreports/TR542.pdf * arxiv.org/pdf/1104.1729 --------------------------- Eigen / Armadillo - Introduction: What is it? Why was the project born? Target optimizations / use cases? - API + Operands and Datatypes + Classes of operations provided + Features. Matrix properties. (triangularity, symmetry, etc) + (Good set of interesting) Examples - Technology behind it / which optimizations are applied? - Uses own implementation of computational kernels? Relies on libraries like BLAS? - Performance results. Comparison with other tools. * Keywords: Eigen/Armadillo, (expression) templates, BLAS. * Links: * http://eigen.tuxfamily.org/index.php?title=Main_Page * http://arma.sourceforge.net/ --------------------------- Build-to-order BLAS (BTO) - Introduction, architecture and target operations - API - Operands - Operators and datatypes - Properties, storage, ... - Approach to loop fusion and tiling - Memory-accesses models, architecture representation - Experimental results, comparisons * Keywords: Build-to-order, BLAS, loop fusion, loop tiling * Links: * rintintin.colorado.edu/~karlini/pohll08.pdf --------------------------- LLVM: a) Intermediate Representation (IR) and optimizations (up to, not including instruction selection) b) Code generation and auto-vectorization - Introduction to LLVM. - General overview of the LLVM framework/architecture: from program to machine code. - Focus on the specific module(s) for each topic. a) Intermediate representation and the optimizations performed on this representation b) Code generation and autovectorization. * Keywords: LLVM, intermediate representation, vectorization, SIMD. * Links: * http://llvm.org/ * http://llvm.org/docs/ --------------------------- Peephole optimization - General introduction to code generation and compiler optimizations. - What is the idea behind peephole optimization? - Give examples of characteristic peephole (local) optimizations. - Automatic generation of peephole optimizers. * Keywords: Code generation, peephole optimizers. * Links: * Compilers: Principles, Techniques, and Tools (Chapter 8). Available in the CS library. * Using Peephole Optimization on Intermediate Code: http://dspace.ubvu.vu.nl/bitstream/handle/1871/2606/11047.pdf * Automatic Generation of Peephole Superoptimizers: http://theory.stanford.edu/~aiken/publications/papers/asplos06.pdf --------------------------- JIT - Concept - Use cases - Pros/cons - Examples: Java, llvm, web browsers, ... - Examples of optimizations and performance numbers * Keywords: just-in-time compilation, java virtual machine, hotspot, llvm, firefox * Links: * http://www.oracle.com/technetwork/java/whitepaper-135217.html (chapter 3) * http://llvm.org/docs/tutorial/LangImpl4.html --------------------------- Roofline model: a) memory-bound b)compute-bound operations - What is the roofline model? - Goals of the roofline model? - Which architecture information/knowledge is required? - How to get that information? (manufacturer, empirical, tools, ...). - Concept of arithmetic intensity, compute-bound vs memory-bound, ... - Give and idea and examples of at least 3 optimizations to improve a) memory-bound b) compute-bound code. * Keywords: roofline model, arithmetic intensity, memory- vs compute-bound, stream. * Links: * An Insightful Visual Performance Model for Multicore Architectures: http://www.eecs.berkeley.edu/~waterman/papers/roofline.pdf * http://www.spiral.net/software/roofline.html --------------------------- ICC/GCC autovectorization: - Concept of vectorization and SIMD extensions - Compiler flags/pragmas related to the topic - Limitations - Keywords and coding techniques to help the compiler - Try out and illustrate with several examples. * Keywords: icc/gcc, compiler options, automatic vectorization, SIMD, aliasing. * Links: - Upload the white paper --------------------------- OpenUH (OpenMP: under the hood) - Quick overview of basic OpenMP constructs: #pragma omp parallel for, scheduling, and variable scoping. - Basic ideas of pthreads (thread creation, execution, and passing arguments to the threads) --> for presentation, not report ) - How OpenUH translates the basic constructs to pthreads - Runtime functionality required in addition * Keywords: OpenMP, pthreads, shared/private clauses * Links: * pacman.cs.tsinghua.edu.cn/papers_cwg/openuh.pdf * Using OpenMP by B. Chapman et al. chapter 8 --------------------------- Nanos Mercurium compiler - General structure of the compiler - Support for trying out new ideas/extensions for OpenMP - Example of how to add support for a new OpenMP directive * Keywords: Nanos, Mercurium, OpenMP, runtime * Links: * personals.ac.upc.edu/aduran/papers/2004/mercurium_ewomp04.pdf * https://pm.bsc.es/