Wave Algorithms
A wave algorithm is a type of distributed algorithm used for propagating information within a distributed network of nodes.
A wave algorithm is characterized by the following features:
- There is an initiator process (or more) that triggers message sending within the network.
- Each process sends information to its neighbors, except to its parent, from whom it received the information to pass on.
- Each process receives information from its neighbors (except from its parent) and sends it to the parent.
- At the end, a decision is made based on the received information.
Leader Selection
Node clusters can be hierarchical or not. In clusters based on autonomous nodes, there is no well-established leader, but clear communication contracts between nodes. On the other hand, in clusters with leaders, one or more nodes perform elevated functions. Among the elevated functions is establishing and distributing the topology to other nodes.
The leader represents a node or a group of nodes chosen based on a heuristic. The heuristic can take into account the resources of the node, the number or nature of the processes running on it, or simply simple things like its identifier.
In this laboratory, we will explore leader selection based on the highest rank of a node. The leader's selection will be based on a heartbeat algorithm (the pulsation algorithm).
Leader Selection: Heartbeat
The operation of a heartbeat algorithm is similar to the beating of a heart:
- Systole → the "expansion" part. Information is transmitted to neighboring nodes.
- Diastole → the "contraction" part. Information is received back from neighboring nodes.
In the context of leader selection, the heartbeat algorithm will be applied as follows:
- Nodes will send their rank to neighbors.
- Each node will calculate its local leader based on the maximum between its own rank and the information received from neighbors.
- Neighbors will send their local leader back to the nodes from which they received messages.
- The nodes that sent the initial message will update their local leader value.
The algorithm is repeated a sufficient number of times to ensure the convergence of information.
Pseudocode:
leader = rank; // Initially, each node considers itself the leader
for (step = 0; step < convergence; step++) {
for (i = 0 to nr_neighbors) {
send leader to neighbor[i]
receive neighbor_leader from any neighbor
leader = max(neighbor_leader, leader);
}
}
An example of running this algorithm can be seen here.
Building the Covering Tree
After the leader is chosen, it will initiate the construction of the covering tree.
The purpose of the covering tree is for every node to know at any given time how to reach the leader.
The covering tree is most easily represented by a parent array. Building this tree is done through a wave - echo type of algorithm.
Building the Covering Tree: Wave - Echo
The wave algorithm has two parts:
- wave → all nodes, except the leader, wait to receive a message. The leader initiates the wave. Upon receiving the first probe, each node records its parent and sends a message to all neighbors, except the one from which it received (its parent).
- echo → wait to receive a message from all neighbors, except from the parent. The information received from neighbors is aggregated into a parent array, which is then sent to the parent.
In the end, the leader will have the final parent array, representing the covering tree. The covering tree is then distributed throughout the entire system.
Pseudocode:
if (rank != leader) {
// Receive a probe from any neighbor if not the leader
parent[rank] = neighbor_rank
}
// Forward the probe to all other neighbors
for (i = 0; i < num_neighbors; i++) {
// Do not send the probe back to the sender
if (neighbor[i] != parent[rank]) {
send probe to neighbor[i]
}
}
// Wait for an echo from each neighbor, except from the parent
for (i = 0; i < num_neighbors; i++) {
// Check if the current neighbor is not the parent of the node
if (neighbor[i] != parent[rank]) {
receive the array of parents (parentRecv[]) or the probe from neighbor[i]
if (received_probe) {
// Ignore probes from neighbors whose parent is not in the spanning tree
continue;
} else {
// Receive echo from one of my children
// Update local topology with new information from the echo
for (j = 0; j < num_processes; j++) {
// If node j has a parent
// (parent[j] == -1 means node j has no parent)
// update the array of parents (bottom to top, from leaves to root)
if (parentRecv[j] != -1) {
parent[j] = parentRecv[j];
}
}
}
}
}
// At this point, the current node has the topology formed by its child nodes
// up to the current level and needs to send it to its parent
if (rank != leader) {
// Each process that is not the leader sends the current topology
// to its parent, who will complete it
send array of parents (parents[]) to parent[rank]
}
/// ----------------------------------------------------------
// Here begins the (optional) stage of broadcasting the topology
// (covering tree) to all nodes.
// At this point, the root knows the covering tree.
/// ----------------------------------------------------------
if (rank != leader) {
// Each node (except the root, which already has all the necessary information)
// waits for the topology from its parent
receive array of parents (parents[]) from parent[rank]
}
// Forward the complete topology to the children of the current node
for (i = 0; i < num_processes; i++) {
if (parent[i] == rank) {
send array of parents (parents[]) to node i
}
}
An example of running this algorithm can be seen here.