High Performance Computing by Martin Raum

In this assignment, we count distances between points in 3-dimensional space. One possibility to think of this is as cells, whose coordinates were obtained by diagnostics and which require further processing.

Distances are symmetric, i.e. distance between two cells c_1 and c_2 and the distance in reverse direction c_2 to c_1 should be counted once.

Implementation

Implement a program in C, called distances, that uses OpenMP for parallelization and that:

Reads coordinates of cells from a text file "cells" in the current working directory.
Computes the distances between cells counting how often each distance occurs, approximating, i.e. rounding, truncating, or similar, to 2 decimal places.
Outputs to stdout a sorted list of distances with associated frequencies (including 0 frequencies at your will).

Your program should accept command line arguments

./distances -t5

The argument to -t give the number of OpenMP threads that you may use. You may only use OpenMP parallelism.

Input

Coordinates are given in a file, in which each row corresponds to one cell. The three coordinates are separated by exactly one blank space. Each coordinate is given exactly as follows: A sign '+' or '-', two digits, a decimal point '.', and three digits. For instance, the following is a valid input of 10 cells.

+01.330 -09.035 +03.489
-03.718 +02.517 -05.995
+09.568 -03.464 +02.645
-09.620 +09.279 +08.828
+07.630 -02.290 +00.679
+04.113 -03.399 +05.299
-00.994 +07.313 -06.523
+03.376 -03.614 -06.657
+01.304 +09.381 -01.559
-04.238 -07.514 +08.942

Output

A valid output to the above input is

Implementation details: Simplifications

You can assume that coordinates are between -10 and 10.

Implementation details: Memory consumption

The program at no time may consume more than 5 MiB = 5 * 1024^2 bytes of (heap) memory. This will tested via Linux control groups.
You may not make any assumption on the number of cells except that there is less than 2^32.

Implementation details: Precision

Rounding and truncation errors are tolerated to a certain extend.

In case of doubt use the test script from below.

Implementation strategy

The following is an outline of aspects that play into designing an efficient implementation.

Splitting the assignment into smaller tasks

This assignment comprises four interdependent task, i.e. design choices influence each other.

Computing the distance of two points and incrementing the corresponding counts.
Reading and parsing the file.
Memory management.
Parallelization.

Computing the distance of two points

Here is a list of observations and questions that can help you designing a good solution to this subtask.

Given the fixed point format for input and output, floating point data types are not necessarily the optimal one. What other data types would be alternatives?
When using another data type to store input and output, it is necessary to convert to float to compute square roots. How large is the incurred performance loss?
The choice of output format impacts the way how distance counts can be stored. What is the most efficient way of recording distance counts that you can think of?

Most relevant videos are Algbraic Expressions, Operand Type, perf.

Parsing the file

When parsing the file or parts of it, it is a huge advantage to benefit from the fixed input format. The following observations and questions can guide your implementation.

How are strings encoded in C? In doubt rewatch the video on string encoding.
Is there a connection between the value of a digit (as a character) and the numerical value that is represents? How can that be used to parse efficiently a string of the form "+15.023"?

Most relevant videos are Binary vs. Text Files, String Encoding and Handling, Memory and File Access.

Memory management

There are three aspects of memory management that you have to balance. Recall that block iteration was a technique you saw in the videos.

File access is relatively time consuming. In particular, latency is relatively high. So you want to avoid reading from file frequently.
A large number of cells would not fit into 5 MiB of memory, so you have to reload at times. The functions fseek and ftell allow you to navigate inside of a file.
Balancing these first two aspects yields a read size that is bad for memory locality when iterating through pairs of cells. So you also have to employ an additional block iteration to improve locality in inner for-loops.

Most relevant videos are Locality, Locality and Block Access.

Parallization

Parallelization can enter at several points of the program. Parsing and counting distances are the two most beneficial ones.

Is there a synchronization issue with parsing?
Can you avoid explicit synchronization when counting distances in parallel?

Most relevant videos are Parallel For, Sections, and Single, Advanced Reduction.

Assignment 2: OpenMP