Exercises
- Compile and run the code from the example.c file in the lab skeleton, which you can find in this GitHub repository that you need to clone using the
git clone
command. Change the number of threads and observe how the program's behavior changes. - Change the number of threads in the code to match the number of cores on the machine you are running on, so that when you run the code on a different computer, the number of threads automatically adjusts. Check tip 1 below for additional information.
- Modify the function f so that the "Hello World" message is iteratively displayed 100 times by each thread, along with the iteration index and tread id. Question: Does the code display the messages in the order you expect? Run multiple times. What do you observe about the order of the prints made by the same thread? What about those from different threads?
- Modify the program to create two threads, each running its own function.
- Starting from the code in the add_serial.c file in the lab archive, parallelize the incrementing of elements in a vector by 100. This will involve dividing the addition iterations among all threads as evenly as possible. Check tip 2 below for additional information.
- Demonstrate that your program scales (i.e., it takes less time when run with more threads). Check tip 3 and tip 4 below for additional information.
- Use a method to measure the execution time of a portion of the program to measure the execution time specifically for the parallelized component of the program. How is the speedup calculated using the times obtained through this method compared to those obtained in the previous exercises? Check tip 5 below for additional information.
tip
- To obtain the number of cores on a computer, you can use the sysconf function as follows:
#include <unistd.h>
long cores = sysconf(_SC_NPROCESSORS_CONF);
tip
- For exercise 5, we have a vector of N elements that we want to divide approximately equally among P threads, where each thread has an ID from 0 to P-1. Each thread will iterate over its own section of the initial vector without affecting the operations of other threads. Therefore, it is necessary to calculate the start index and the end index for each thread. One way to calculate these two values can be as follows:
int start = ID * (double)N / P;
int end = min((ID + 1) * (double)N / P, N);
tip
- To better observe the scalability of a program, it is necessary for it to run for at least a few seconds because otherwise, the initialization time, other programs running on the computer, and the overhead caused by thread scheduling could affect execution times enough that we cannot see scalability by measuring only the total execution time. Furthermore, the serial initialization of the vector (in the main function) takes a comparable amount of time to the execution of the operation to be parallelized on one thread. Therefore, for exercise 5, it is recommended to increase the execution time of a thread by iteratively repeating the operations performed in the thread function. To verify if a program scales, you need to measure its both its sequential execution time (with a single thread) and its parallel time (with multiple threads). For this purpose, you can use the
time
command in the command line, like this:
$ time ./program
real 0m6.958s
user 0m6.745s
sys 0m0.010s
tip
- To check if a program scales, you need to:
- choose a problem size (N) for which the sequential execution time is large enough so that variations do not significantly impact the result (in this case, select N so that the execution time is at least a few seconds)
- measure the execution time of the serial (non-parallelized) program
- measure the execution times for a variable number of threads (2, 3, ..., as many as you have processors)
- calculate the speedup for each configuration.
The measured execution times may vary (for the same values of N and P) from one run to another. In this case, it is recommended to perform multiple runs and use the average of the measured values (or other relevant statistical indicators).
tip
5. You can find here a method of measuring the elapsed time between two points of a program.
#include <time.h>
struct timespec start, finish;
double elapsed;
clock_gettime(CLOCK_MONOTONIC, &start);
WORK();
clock_gettime(CLOCK_MONOTONIC, &finish);
elapsed = (finish.tv_sec - start.tv_sec);
elapsed += (finish.tv_nsec - start.tv_nsec) / 1000000000.0;
tip
- A good way to debug a multi-threaded C program is to use gdb. In addition to the gdb commands you already know, you should also know the info threads command (which displays information about the existing threads at the current time) and the thread <N> comand (which switches the execution context to thread N).