hpcs-17-subord

git clone https://git.igankevich.com/hpcs-17-subord.git
Log | Files | Refs

commit e7932b0622313f64c15e623ae065e732b921f46d
parent 5f27624c5dcf93e3854ef227ef247beff062cb12
Author: Ivan Gankevich <igankevich@ya.ru>
Date:   Fri, 24 Mar 2017 19:49:55 +0300

Revise the first failure scenario.

Diffstat:
src/body.tex | 135+++++++++++++++++++++++++++++++++++++++++++++----------------------------------
1 file changed, 77 insertions(+), 58 deletions(-)

diff --git a/src/body.tex b/src/body.tex @@ -103,77 +103,96 @@ one node to another (local kernels are not considered). \subsection{Failure scenarios} \label{sec:failure-scenarios} -Now we discuss failure scenarios and how scheduler can handle it. First, define -clearly relations between sets of daemons and kernels. We named such relations -in different manner to avoid misunderstanding, because order rule itself is the -same. There are two intersection hierarchies, the horizontal one~--- daemons -hierarchy, and vertical one~--- kernels hierarchy. In horizontal -daemon-to-daemon hierarchy relations defined as master-slave. Thus, node (and, -accordingly, its daemon) with the nearest IP address to gateway will be a -master, and every other node will be a slave. This master-slave hierarchy -introduced to scheduler for better kernels distribution. Vertical hierarchy of -kernels organized in principal-to-subordinate order. Principal kernel produce -subordinates and so provides task atomization to archive fault tolerance. - -The main purpose of scheduler is to continue or restore execution while failures -occur in daemons hierarchy. There are three types of such failures. +To disambiguate hierarchical links between daemon processes and kernels, we +will use the following naming conventions throughout the text. If the link is +between two daemon processes, the relationship is \emph{master-slave}. If the +link is between two kernels, then the relationship is +\emph{principal-subordinate} (or \emph{parent-child}). Two hierarchies are +orthogonal to each other in a sense that no kernel may have a link to a daemon, +and vice versa. Since daemon hierarchy is used to distribute the load on the +cluster, kernel hierarchy is mapped onto it, and this mapping can be arbitrary. +It is common to have principal kernel on a slave node with its subordinate +kernels distributed evenly between all cluster nodes (including the node where +the principal is located). Both hierarchies can be arbitrarily deep, but +``shallow'' ones are preferred for highly parallel programmes, as there are +less number of hops when kernels are distributed between cluster nodes. Since +there is one-to-one correspondence between daemons and cluster nodes, they are +used interchangeably in the paper. + +The main purpose of the system is to provide continuous execution of kernels in +the presence of daemon (and consequently node) failures. There are three types +of such failures. \begin{enumerate} - \item Failure of at most one node. - \item Failure of more than one node but less than total number of nodes. - \item Failure of all nodes (electricity outage). + \item Simultaneous failure of at most one node. + \item Simultaneous failure of more than one node but less than total number + of nodes. + \item Simultaneous failure of all nodes (electricity outage). \end{enumerate} +For the sake of simplicity, it is assumed that parallel programme runs on all +cluster nodes. -By dividing kernels on principals and subordinate we create restore points. Each -principal is, mainly, a control unit, with a goal. To archive it, principal make -portion of task and delegates parts to subordinates. With such delegation -principal copies itself to each subordinate in order of appearance. To ensure -correct restoration, when the new partition is ready to deploy as new -subordinate, principal include in that kernel information about all previously -generated subordinates, expressed as ordered list of daemons address where -subordinates transferred. So, then we discuss about failures, we mean that -daemon is gone, and all kernels of all types at this node break their tasks -execution process. To resolve failed states scheduler restore kernels using -existing or newly appeared daemons accordingly to each mentioned scenarios. - -Consider the first scenario. In accordance to principal-to-subordinate -hierarchy, there are two variants of this failure: then principal is gone and -then any subordinate is gone. Subordinate itself is not a valuable part of -execution, it is a simple worker. Our scheduler not stored any subordinate -states, but only principle state. Thus, to restore execution, scheduler finds -last valid principle state and simply recreate~\ref{fig:sc12} failed -subordinate on most appropriate daemon. When principle is gone~\ref{fig:sc1} we -need to restore it only once and only on one node. To archive this limitation, -each subordinate will try to find any available daemon from its addresses list -in reverse order. If such daemon exists and available, finding process will -stop, as current subordinate kernel will assume that the kernel found will take -principal restoration process. +By dividing kernels into principals and subordinates we create recovery points. +Each principal is, mainly, a control unit, with a goal. To achieve it, +principal divides the task into parts and creates a subordinate to compute each +of them. The principal copies itself to each subordinate in the order of their +creation, and includes in each subordinate a list of all node IP addresses to +which previously created subordinates were sent (a list of \emph{neighbours}). +When a daemon fails or goes offline, all kernels which reside on the +corresponding cluster node are considered failed, and recovery process is +triggered in accordance with the following scenarios. + +\paragraph*{Scenario~1} With respect to kernel hierarchy, there are two +possible variants of this failure: when a principal fails and when a +subordinate fails,~--- but both of them may or may not reside on the same +cluster node. + +Since a subordinate is a simple worker, rather than a valuable part of +execution, a copy of it (in initial state) is simply restored from the node +where its parent is located (Figure~\ref{fig:subordinate-fails}). Each daemon +maintains a list of kernels, that were sent to a particular subordinate node. +When the corresponding network connection closes all kernels from the list are +automatically re-sent to available node, or executed locally of there no +network connections. + +When a principal fails every subordinate has its copy, but we need to restore +it only once and only on one node to correctly continue programme execution. To +ensure that the principal is restored only once, each subordinate tries to find +the first surviving node from the IP address list of neighbours +(Figure~\ref{fig:principal-fails}). If such node is online, the search stops +and the subordinate kernel is deleted. If the node is not found, the +subordinate restores the principal from the copy and deletes itself. Kernel +deletion is necessary, because the whole computational step, modelled by the +principal, is re-executed from the principal initial state, and there is no +simple and reliable way of taking into account the results which was produced +so far by the subordinates. \begin{figure} \centering - \includegraphics{sc1} - \caption{First scenario of restoration after principle fails.} - \label{fig:sc1} + \includegraphics{sc12} + \caption{Restoration after only one subordinate fails.} + \label{fig:subordinate-fails} \end{figure} \begin{figure} \centering - \includegraphics{sc12} - \caption{Restoration after only one subordinate fails.} - \label{fig:sc12} + \includegraphics{sc1} + \caption{First scenario of restoration after principle fails.} + \label{fig:principal-fails} \end{figure} -In comparison with first scenario, the second one is more complicate yet -frequent. While on principal-to-subordinate layer scheduler act -same~\ref{fig:sc12}, then we move to daemons layer one more variant added. In -kernel hierarchy principal kernels mostly a dual kernel. For a higher level -kernels it seems like a subordinate, for rest lower kernels it is a principal. -Thus, we need to add to our restoration scope only a state of principals -principle. As a result, we add to variants from first scenario situation,a one -where principals principal also is gone. Since scheduler through daemons knew -all kernels state before it begin a restoration process, first it will check -state of principals principle. If it's gone~\ref{fig:sc12}, all subordinates -will be started accordingly to hierarchy once again, despite their states. +\paragraph*{Scenario~2} In comparison to the first scenario, the second one is +more complicate yet frequent. While on principal-to-subordinate layer scheduler +act same~\ref{fig:subordinate-fails}, then we move to daemons layer one more +variant added. In kernel hierarchy principal kernels mostly a dual kernel. For +a higher level kernels it seems like a subordinate, for rest lower kernels it +is a principal. Thus, we need to add to our restoration scope only a state of +principals principle. As a result, we add to variants from first scenario +situation,a one where principals principal also is gone. Since scheduler +through daemons knew all kernels state before it begin a restoration process, +first it will check state of principals principle. If it is +gone~\ref{fig:subordinate-fails}, all subordinates will be started accordingly +to hierarchy once again, despite their states. \begin{figure} \centering