hpcs-17-subord

git clone https://git.igankevich.com/hpcs-17-subord.git
Log | Files | Refs

commit c96ea7fb3f162181a8584f250d4f01dd4b15f5b5
parent bfd5b5e81514bb8357f0071868845c2fcd0ed714
Author: Ivan Gankevich <igankevich@ya.ru>
Date:   Mon, 15 May 2017 16:47:33 +0300

Rewrite scenario 1. Comment out all figures.

Diffstat:
src/body.tex | 160++++++++++++++++++++++++++++++++++++++++++-------------------------------------
1 file changed, 85 insertions(+), 75 deletions(-)

diff --git a/src/body.tex b/src/body.tex @@ -128,6 +128,22 @@ wait, and call correct kernel methods by analysing their internal state. \section{Resilience to multiple node failures} +To disambiguate hierarchical links between daemon processes and kernels and to +simplify the discussion, we will use the following naming conventions +throughout the text. If the link is between two daemon processes, the +relationship is \emph{master-slave}. If the link is between two kernels, then +the relationship is \emph{principal-subordinate} (or \emph{parent-child}). Two +hierarchies are orthogonal to each other in a sense that no kernel may have a +link to a daemon, and vice versa. Since daemon hierarchy is used to distribute +the load on the cluster, kernel hierarchy is mapped onto it, and this mapping +can be arbitrary. It is common to have principal kernel on a slave node with +its subordinate kernels distributed evenly between all cluster nodes (including +the node where the principal is located). Both hierarchies can be arbitrarily +deep, but ``shallow'' ones are preferred for highly parallel programmes, as +there are less number of hops when kernels are distributed between cluster +nodes. Since there is one-to-one correspondence between daemons and cluster +nodes, they are used interchangeably in the paper. + In our system a node is considered failed if the corresponding network connection is abruptly closed. Developing more sophisticated failure detection techniques is out of scope of this paper. For the purpose of studying recovery @@ -145,33 +161,29 @@ and a copy of the parent is re-executed on a healthy node. If parent kernel fails, then its copy, which is sent along with every subordinate on other cluster nodes, is re-executed on the node where the first survived subordinate kernel resides. Kernel failure is detected only for kernels that are sent from -one node to another (local kernels are not considered). Healthy node does not +one node to another (local kernels are not considered). A healthy node does not need to be a new one, any already loaded node will do: recovery does not overload the node, because each node has its own pool of kernels in which they wait before being executed by a pipeline. +When a kernel is sent to other node, its copy is saved in the outbound buffer +(a list of kernels, that were sent to a particular node), from which it is +removed only when the kernel returns to its parent. If the corresponding +connection closes, all kernels from this buffer are retrieved and distributed +across available nodes including the current node. The fail over algorithm is +straightforward for a subordinate, but for a principal it is more involved. +Whereas a subordinate is implicitly copied to another node as a consequence of +load distribution, a principal is left on the one node. In order to implement +resilience to a principal failure, one needs to copy it along with each of its +subordinates to other nodes, and provide a rule to determine from which copy +the principal is restored upon the node failure. The following paragraphs +discuss this algorithm and the rule in detail. + \subsection{Failure scenarios} \label{sec:failure-scenarios} -To disambiguate hierarchical links between daemon processes and kernels and to -simplify the discussion, we will use the following naming conventions -throughout the text. If the link is between two daemon processes, the -relationship is \emph{master-slave}. If the link is between two kernels, then -the relationship is \emph{principal-subordinate} (or \emph{parent-child}). Two -hierarchies are orthogonal to each other in a sense that no kernel may have a -link to a daemon, and vice versa. Since daemon hierarchy is used to distribute -the load on the cluster, kernel hierarchy is mapped onto it, and this mapping -can be arbitrary. It is common to have principal kernel on a slave node with -its subordinate kernels distributed evenly between all cluster nodes (including -the node where the principal is located). Both hierarchies can be arbitrarily -deep, but ``shallow'' ones are preferred for highly parallel programmes, as -there are less number of hops when kernels are distributed between cluster -nodes. Since there is one-to-one correspondence between daemons and cluster -nodes, they are used interchangeably in the paper. - The main purpose of the system is to provide continuous execution of kernels in -the presence of daemon (and consequently node) failures. There are three types -of such failures. +the presence of node failures. There are three types of such failures. \begin{itemize} \item Simultaneous failure of at most one node. \item Simultaneous failure of more than one node but less than total number @@ -179,7 +191,8 @@ of such failures. \item Simultaneous failure of all nodes (electricity outage). \end{itemize} For the sake of simplicity, it is assumed that parallel programme runs on all -cluster nodes. +cluster nodes. Our system provide resilience to node failures for the first and +the second scenario. By dividing kernels into principals and subordinates we create recovery points. Each principal is, mainly, a control unit, with a goal. To achieve it, @@ -187,57 +200,54 @@ principal divides the task into parts and creates a subordinate to compute each of them. The principal copies itself to each subordinate in the order of their creation, and includes in each subordinate a list of all node IP addresses to which previously created subordinates were sent (a list of \emph{neighbours}). -When a connection from master node to slave node closes either as a results of -node failure, or as a consequence of the daemon hierarchy change, all kernels +When a connection from master node to slave node closes either as a result of +a node failure, or as a consequence of the daemon hierarchy change, all kernels which reside on the corresponding cluster node are considered failed, and recovery process is triggered in accordance with the following scenarios. \paragraph*{Scenario~1} With respect to kernel hierarchy, there are two possible variants of this failure: when a principal fails and when a -subordinate fails,~--- but both of them may or may not reside on the same -cluster node. +subordinate fails (and both of them may or may not reside on the same cluster +node). -Since a subordinate is a simple worker, rather than a valuable part of -execution, a copy of it (in initial state) is simply restored from the node -where its parent is located (fig.~\ref{fig:subordinate-fails}). Each daemon -maintains a list of kernels, that were sent to a particular subordinate node. -When the corresponding network connection closes all kernels from the list are -automatically re-sent to available node, or executed locally of there no -network connections. +When a subordinate fails, its copy is simply restored from the outbound buffer +on the node where its principal is located. When the corresponding network +connection closes all kernels from the list are automatically distributed +across available nodes, or executed locally if there are no network +connections. When a principal fails every subordinate has its copy, but we need to restore it only once and only on one node to correctly continue programme execution. To ensure that the principal is restored only once, each subordinate tries to find -the first surviving node from the IP address list of neighbours -(fig.~\ref{fig:principal-fails} -and~\ref{fig:subordinate-and-principal-fail}). If such node is online, the -search stops and the subordinate kernel is deleted. If the node is not found, -the subordinate restores the principal from the copy and deletes itself. Kernel -deletion is necessary, because the whole computational step, modelled by the -principal, is re-executed from the principal initial state, and there is no -simple and reliable way of taking into account the results which was produced -so far by the subordinates. - -\begin{figure} - \centering - \includegraphics{sc12} - \caption{First failure scenario. Recovery of a subordinate.} - \label{fig:subordinate-fails} -\end{figure} - -\begin{figure} - \centering - \includegraphics{sc1} - \caption{First failure scenario. Recovery of a principal.} - \label{fig:principal-fails} -\end{figure} - -\begin{figure} - \centering - \includegraphics{sc2} - \caption{Simultaneous failure of a subordinate and its principal.} - \label{fig:subordinate-and-principal-fail} -\end{figure} +the first surviving node from the IP address list of neighbours. If such node +is online, the search stops and the subordinate is deleted. If the node is not +found, the subordinate restores the principal from the copy on the current node +and deletes itself. This algorithm is executed on every node, to which a copy +of the principal was sent. Subordinate deletion is necessary, because the whole +computational step, modelled by the principal, is re-executed from the initial +state, and there is no simple and reliable way of taking into account partial +results which were produced so far by the subordinates. + +%\begin{figure} +% \centering +% \includegraphics{sc12} +% \caption{First failure scenario. Recovery of a subordinate.} +% \label{fig:subordinate-fails} +%\end{figure} +% +%\begin{figure} +% \centering +% \includegraphics{sc1} +% \caption{First failure scenario. Recovery of a principal.} +% \label{fig:principal-fails} +%\end{figure} +% +%\begin{figure} +% \centering +% \includegraphics{sc2} +% \caption{Simultaneous failure of a subordinate and its principal.} +% \label{fig:subordinate-and-principal-fail} +%\end{figure} \paragraph*{Scenario~2} In comparison to the first scenario, the second one is more complicated, but also more frequent. While on kernel hierarchy the system @@ -262,12 +272,12 @@ to files by multiple subordinates is not reliable, and to avoid data loss programme logic should be changed so that only one (principal) kernel writes to a file, whereas subordinates only process separate dataset parts. -\begin{figure} - \centering - \includegraphics{sc3} - \caption{Simultaneous failure of two principals.} - \label{fig:sc3} -\end{figure} +%\begin{figure} +% \centering +% \includegraphics{sc3} +% \caption{Simultaneous failure of two principals.} +% \label{fig:sc3} +%\end{figure} \paragraph*{Scenario~3} Both failure scenarios are handled at runtime: the system will not stop execution of a programme, if some of its kernels are @@ -302,14 +312,14 @@ principal kernels in the same hierarchy branch have failed. If a node fails in the middle of recovery process, the whole process is restarted from the beginning. -\begin{figure} - \noindent% - \spbuInsertFigure{tex/cluster-0}~\spbuInsertFigure{tex/frame-0}\newline - \spbuInsertFigure{tex/frame-3}~\spbuInsertFigure{tex/frame-4}\newline - \spbuInsertFigure{tex/legend}% - \caption{An example of fail over algorithm in - action.\label{fig:failover-example}} -\end{figure} +%\begin{figure} +% \noindent% +% \spbuInsertFigure{tex/cluster-0}~\spbuInsertFigure{tex/frame-0}\newline +% \spbuInsertFigure{tex/frame-3}~\spbuInsertFigure{tex/frame-4}\newline +% \spbuInsertFigure{tex/legend}% +% \caption{An example of fail over algorithm in +% action.\label{fig:failover-example}} +%\end{figure} \section{Evaluation}