Exercises

Compile the mutex.c file from the lab skeleton and run the resulting program multiple times (you can use the test_mutex.sh script). You will notice that the result is not always as expected. Solve the synchronization problem using a mutex.
Starting from the barrier.c file in the lab skeleton, use a barrier to ensure that the output will always be "1\n2". Check tip 1 below for additional information.
Starting from the multiply_outer.c file in the lab skeleton, parallelize the program by dividing the outer iteration into multiple threads. Verify the correctness and scalability of the resulting program. Check tip 2 below for additional information.
Starting from the multiply_middle.c file in the lab skeleton, parallelize only the second iteration loop. Verify the correctness and scalability of the resulting program.
Starting from the multiply_inner.c file in the lab skeleton, parallelize only the inner loop. Verify the correctness and scalability of the resulting program. Check tip 3 below for additional information.
Starting from the strassen.c file in the lab skeleton, parallelize matrix multiplication using the Strassen algorithm with 7 threads (preferably in a separate file named strassen_par.c to test correctness using the test_strassen.sh script, which compares the serial version with the parallel one). Check tip 4 below for additional information.

tip

On MacOS systems, the Pthreads library does not contain an implementation for barriers. To perform this exercise, you will find a file named pthread_barrier_mac.h in the lab skeleton that you should include in your source file.

tip

To test the correctness of parallelization in exercises 3, 4, and 5, you will find a script named test_multiply.sh in the lab archive. It performs the following steps:

Checks for the existence of a binary named multiply_seq for the sequential matrix multiplication implementation, for which you have the source file multiply_seq.c in the lab archive (this will serve as a benchmark for the correctness of the parallel implementation).
Checks for the existence of binaries for your parallel implementations (multiply_outer for exercise 3, multiply_middle for exercise 4, multiply_inner for exercise 5).
Runs the sequential program.
Runs the three parallel programs.
Compares the results of the parallel runs with the sequential run using diff. If there are no differences, the script does not display anything. If the parallel implementation is incorrect, it will display differences between the sequential and parallel runs.

By default, the script runs on 1000×1000 matrices with two threads for the parallel implementation. If you want to change these values (and we recommend doing so for comprehensive testing), you can modify the values of the variables N and P in the script.

tip

You will notice in this exercise that if you parallelize the inner loop as you did in the previous two exercises, the results may not always be correct. Why is this? What do you need to do for a correct implementation?

tip

The Strassen algorithm is an algorithm for matrix multiplication that is faster than the standard method, with a complexity of O(N^2.8074). In this algorithm, at the first step, 7 additional matrices are defined by multiplying block matrices obtained from the initial matrices. In the second step, these 7 new matrices are used to calculate the block components of the final matrix (through addition and subtraction operations).

For exercise 6, you already have implemented the calculation of additional matrices and the final calculation, so you only need to parallelize these operations.