commit 3b692627a5155b9deb2e7173d892bbc781ea6aa8
parent 6f1c630d1205273e506ecc2ca81c3d809f1c2fea
Author: Ivan Gankevich <igankevich@ya.ru>
Date: Mon, 15 May 2017 13:24:45 +0300
Add paragraph on load balancing.
Diffstat:
1 file changed, 35 insertions(+), 2 deletions(-)
diff --git a/src/body.tex b/src/body.tex
@@ -16,7 +16,7 @@ is derived from node's IP address. Detailed explanation of the algorithm is
provided in~\cite{gankevich2015subordination}. Its strengths are scalability to
a large number of nodes and low overhead, which are essential for large-scale
high-performance computations, and its weakness is in artificial dependence of
-node's position in the hierarchy on its IP address, which is not desirable in
+node's position in the hierarchy on its IP address, which may not desirable in
virtual environments, where nodes' IP addresses may change without a notice.
The only purpose of daemon hierarchy is to provide load balancing and
@@ -28,7 +28,40 @@ daemons exchange messages telling each other how many daemons are ``behind''
them in the hierarchy. This information is used to distribute the load evenly,
even if a parallel programme is launched on a slave node. In addition, this
topology reduces the number of simultaneous connections, thus preventing
-network overload.
+network overload.
+
+Load balancing is implemented as follows. When daemon $A$ tries to become a
+subordinate of daemon $B$, it sends a message to a corresponding IP address
+telling how many slave daemons are already connected to it (including itself).
+If there are no slaves, then it counts itself only. After all links between
+daemons in the cluster are established, every daemon has enough information to
+tell, how many nodes exist behind each link. If the link is between a slave and
+a master, and the slave wants to know, how many nodes are behind the master,
+then it simply subtracts the total number of nodes behind its slave nodes from
+the total number of nodes behind the master to get the correct count. To
+distribute kernels across nodes we use simple round-robin algorithm,
+i.e.~iterate over all links of the current daemon (including the link to its
+master) taking into account how many nodes are behind each link: the pointer
+advances to a next link, only when enough number of kernels are sent through
+the current link. That way even if an application is launched on a slave node
+in the bottom of the hierarchy, the kernels will be distributed evenly across
+all cluster nodes.
+
+The advantage of this approach is that it can be extended to include
+sophisticated logic into load distribution policy. Any metric, that is required
+to implement such policy, can be sent from the directly linked daemon during
+the link initialisation. As of now, only the number of nodes behind the link is
+sent. The disadvantage of the approach is that update of the metric happens
+only when a change in the hierarchy occurs: if a metric changes periodically,
+then periodically sending update messages is also required for implementing the
+policy, and too frequent updates may consume considerable amount of network
+bandwidth. The other disadvantage is that when reconfiguration of the hierarchy
+occurs due to a node failure or a new node joining the cluster, the kernels
+that are already executed on the nodes are not taken into account in the load
+distribution, so frequent updates to the hierarchy may cause uneven load
+distribution, which, however, balances over time. Uneven load distribution does
+not cause node overload, since there is a kernel pool on each node that queues
+the kernels prior to execution.
\paragraph{Kernel layer} Consists of kernels and hierarchical (parent/child)
logical links between them. The only purpose of kernel hierarchy is to provide