High Performance Computing by Martin Raum

After passing the assignments development environment, optimization, OpenMP, and threads you receive a passing grad, i.e., 3 or G. You can sign up for the exam to get a higher grade. Regardless of how you perform in the exam, you are guaranteed a grad of 3 or G.

The exam is a combined take-home/oral exam: You have to prepare individually a detailed analysis and benchmark of one aspect of one of the assignments on OpenMP, threads, OpenCL, or MPI. You also sign up for one time slot on Canvas.

Exam slots will be made available towards the middle of the course, after Chalmers deadline for exam sign-up on 08 October 2023.

Oral Exam: Procedure

The oral exam will take place online. A time slot is 10 minutes long, 8 of which are reserved for the exam and 2 of which are available for your conveniently settling down.

You are expected to give a five minutes presentation of your prepared benchmarks and conclusions (see below for details). Please prepare slides (usual not more than 4 or 5) that you use to support your presentation. Slides should not print the code for the mere sake of it; you want to distill its quintessence.

Following the presentation, I might ask further questions for up to three minutes. These questions may connected to any material presented in the lectures and are not limited to the assignment that you decided to present on.

The presentation is graded according to a fixed grading scheme.

Exam: Content and preparation

In preparation to the oral exam, choose one assignment and one topic specific to that assignment. Next comes a list of topics that you may pick, but you are not limited to these.

Assignment	Topic / Aspect
openmp	efficient reading and parsing of the input file
	efficient computation of the distances
	efficient use of memory and cache
	SIMD instructions and/or intrinsics
threads	efficient evaluation of the formula for Newton iteration
	efficient writing to the files
	efficient assignment of computation to computation threads
	bottle necks for large number of threads or lines or high degree polynomials
opencl	efficient data transfer between host and GPU
	impact of branch divergence
	reduce algorithms on the GPU
	efficiency balance between host and GPU computation
mpi	reduce algorithms in MPI
	efficient communication patterns

For your topic or aspect answer the following questions:

What is a naive approach to the topic? Implement and benchmark it. You may modify the code that you handed in.
Does your handed in assignment go beyond a naive solution? If so, what does it do differently?
Benchmark the given aspect of your handed in solution.
What theory presented in the course plays into the topic?
What approach does your understanding of the theory suggest could be fastest?
Implement at least one variant that goes beyond the naive approach. Benchmark it, too.
Provide interpretations for your benchmarks. Did your ideas work out? If not, what might be the reason?

TMA881/MMA620: Exam

Oral Exam: Procedure

Exam: Content and preparation