hpcs-17-subord

git clone https://git.igankevich.com/hpcs-17-subord.git
Log | Files | Refs

commit 3b692627a5155b9deb2e7173d892bbc781ea6aa8
parent 6f1c630d1205273e506ecc2ca81c3d809f1c2fea
Author: Ivan Gankevich <igankevich@ya.ru>
Date:   Mon, 15 May 2017 13:24:45 +0300

Add paragraph on load balancing.

Diffstat:
src/body.tex | 37+++++++++++++++++++++++++++++++++++--
1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/src/body.tex b/src/body.tex @@ -16,7 +16,7 @@ is derived from node's IP address. Detailed explanation of the algorithm is provided in~\cite{gankevich2015subordination}. Its strengths are scalability to a large number of nodes and low overhead, which are essential for large-scale high-performance computations, and its weakness is in artificial dependence of -node's position in the hierarchy on its IP address, which is not desirable in +node's position in the hierarchy on its IP address, which may not desirable in virtual environments, where nodes' IP addresses may change without a notice. The only purpose of daemon hierarchy is to provide load balancing and @@ -28,7 +28,40 @@ daemons exchange messages telling each other how many daemons are ``behind'' them in the hierarchy. This information is used to distribute the load evenly, even if a parallel programme is launched on a slave node. In addition, this topology reduces the number of simultaneous connections, thus preventing -network overload. +network overload. + +Load balancing is implemented as follows. When daemon $A$ tries to become a +subordinate of daemon $B$, it sends a message to a corresponding IP address +telling how many slave daemons are already connected to it (including itself). +If there are no slaves, then it counts itself only. After all links between +daemons in the cluster are established, every daemon has enough information to +tell, how many nodes exist behind each link. If the link is between a slave and +a master, and the slave wants to know, how many nodes are behind the master, +then it simply subtracts the total number of nodes behind its slave nodes from +the total number of nodes behind the master to get the correct count. To +distribute kernels across nodes we use simple round-robin algorithm, +i.e.~iterate over all links of the current daemon (including the link to its +master) taking into account how many nodes are behind each link: the pointer +advances to a next link, only when enough number of kernels are sent through +the current link. That way even if an application is launched on a slave node +in the bottom of the hierarchy, the kernels will be distributed evenly across +all cluster nodes. + +The advantage of this approach is that it can be extended to include +sophisticated logic into load distribution policy. Any metric, that is required +to implement such policy, can be sent from the directly linked daemon during +the link initialisation. As of now, only the number of nodes behind the link is +sent. The disadvantage of the approach is that update of the metric happens +only when a change in the hierarchy occurs: if a metric changes periodically, +then periodically sending update messages is also required for implementing the +policy, and too frequent updates may consume considerable amount of network +bandwidth. The other disadvantage is that when reconfiguration of the hierarchy +occurs due to a node failure or a new node joining the cluster, the kernels +that are already executed on the nodes are not taken into account in the load +distribution, so frequent updates to the hierarchy may cause uneven load +distribution, which, however, balances over time. Uneven load distribution does +not cause node overload, since there is a kernel pool on each node that queues +the kernels prior to execution. \paragraph{Kernel layer} Consists of kernels and hierarchical (parent/child) logical links between them. The only purpose of kernel hierarchy is to provide