hpcs-17-subord

git clone https://git.igankevich.com/hpcs-17-subord.git
Log | Files | Refs

commit 79068920b16525da87c0f55fcd3349e95311b03e
parent 2485815d4550e8446a35ae0904474207941a5757
Author: Ivan Gankevich <igankevich@ya.ru>
Date:   Fri, 24 Mar 2017 20:35:39 +0300

Revise discussion.

Diffstat:
src/body.tex | 29+++++++++++++----------------
src/tail.tex | 2++
2 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/src/body.tex b/src/body.tex @@ -423,19 +423,16 @@ in the list. If such kernel is not found, execution proceeds on the current node. The sequence of IP addresses in the list implicitly forms linear hierarchy, which makes this optimisation equivalent to the transformation. -% Brainstorming results (various random thoughts). - -There are two scenarios of failures. Failure of more than one node at a time -and electricity outage. In the first scenario failure is handled by sending a -list previous IP addresses to the subsequent kernels in the batch. Then if -subordinate node and its master fail simultaneously, the surviving subordinate -nodes scan all of the IP addresses they received until they find alive node and -parent is revived on this node. - -We believe that kernel coordinates and inter dependencies is enough to mitigate -any type of failure: given that at least one node survives, all applications -continue their execution in possibly degraded state. However it requires -recursively duplicating parents on the subordinate node. - -Only electricity outage requires writing data to disk other failures can be -mitigated by duplicating kernels in memory. +There are essentially two scenarios of failures. Failure of more than one node +at a time and electricity outage. In the first scenario failure is handled by +sending a list previous IP addresses to the subsequent kernels in the batch. +Then if subordinate node and its master fail simultaneously, the surviving +subordinate nodes scan all of the IP addresses they received until they find +alive node and parent is revived on this node. + +We believe that having kernel state and their inter-dependencies is enough to +mitigate any combination of node failures: given that at least one node +survives, all programmes continue their execution in possibly degraded state. +However it requires recursively duplicating principals and sending the along +with the subordinates. Only electricity outage requires writing data to disk +other failures can be mitigated by duplicating kernels in memory. diff --git a/src/tail.tex b/src/tail.tex @@ -51,6 +51,8 @@ kernels upon a failure. \section{Conclusion} + + \section*{Acknowledgment} The research was carried out using computational resources of Resource Centre ``Computational Centre of Saint Petersburg State University'' (\mbox{T-EDGE96}