hpcs-17-subord

git clone https://git.igankevich.com/hpcs-17-subord.git
Log | Files | Refs

commit 2f8badc0cbe5e2469cccc82851bfb98f2a404272
parent cac0d3e620473877830fed11f37de0ab6c6dcc2a
Author: Yuri Tipikin <yuriitipikin@gmail.com>
Date:   Wed, 22 Feb 2017 00:00:11 +0300

+ captions and refs

Diffstat:
src/body.tex | 42++++++++++++++++++++++++++++++------------
1 file changed, 30 insertions(+), 12 deletions(-)

diff --git a/src/body.tex b/src/body.tex @@ -117,34 +117,52 @@ existing or newly appeared daemons accordingly to each mentioned scenarios. Consider the first scenario. In accordance to principal-to-subordinate hierarchy, there are two variants of this failure: then principal is gone and -then any subordinate is gone. -\includegraphics[scale=0.33]{img/sc1} -\includegraphics[scale=0.33]{img/sc12} -Subordinate itself is not a valuable part of +then any subordinate is gone. Subordinate itself is not a valuable part of execution, it is a simple worker. Our scheduler not stored any subordinate states, but only principle state. Thus, to restore execution, scheduler finds -last valid principle state and simply recreate failed subordinate on most -appropriate daemon. When principle is gone we need to restore it only once and +last valid principle state and simply recreate ~\ref{fig:sc12} failed subordinate on most +appropriate daemon. When principle is gone ~\ref{fig:sc1} we need to restore it only once and only on one node. To archive this limitation, each subordinate will try to find any available daemon from its addresses list in reverse order. If such daemon exists and available, finding process will stop, as current subordinate kernel will assume that the kernel found will take principal restoration process. +\begin{figure} + \caption{First scenario of restoration after principle fails} + \includegraphics[scale=0.33]{img/sc1} + \label{fig:sc1} +\end{figure} + +\begin{figure} + \caption{Restoration after only one subordinate fails} + \includegraphics[scale=0.33]{img/sc12} + \label{fig:sc12} +\end{figure} + In comparison with first scenario, the second one is more complicate yet -frequent. While on principal-to-subordinate layer scheduler act same, then we +frequent. While on principal-to-subordinate layer scheduler act same ~\ref{fig:sc12}, then we move to daemons layer one more variant added. In kernel hierarchy principal kernels mostly a dual kernel. For a higher level kernels it seems like a subordinate, for rest lower kernels it is a principal. Thus, we need to add to our restoration scope only a state of principals principle. As a result, we add to variants from first scenario situation,a one where principals principal also -is gone. -\includegraphics[scale=0.33]{img/sc2} -\includegraphics[scale=0.33]{img/sc3} -Since scheduler through daemons knew all kernels state before it begin +is gone. Since scheduler through daemons knew all kernels state before it begin a restoration process, first it will check state of principals principle. If -it's gone, all subordinates will be started accordingly to hierarchy once again, +it's gone ~\ref{fig:sc12}, all subordinates will be started accordingly to hierarchy once again, despite their states. +\begin{figure} + \caption{Simultaneous fail of one of subordinates, which lies inside subordinates butch, and his principle} + \includegraphics[scale=0.33]{img/sc2} + \label{fig:sc2} +\end{figure} + +\begin{figure} + \caption{The condition for restarting an execution subtree} + \includegraphics[scale=0.33]{img/sc3} + \label{fig:sc3} +\end{figure} + This two scenarios imply cases in runtime, that means scheduler operates kernels in memory and will not stop execution of whole task if some part of it was placed on failed node. But occasionally, all nodes of cluster may fail at same