hpcs-17-subord

git clone https://git.igankevich.com/hpcs-17-subord.git
Log | Files | Refs

commit b6311a1cfd05d7195886b29110f3135ccb70e746
parent 2b0e9079abc5d283fac6274d7d4402a3ed68fa87
Author: Yuri Tipikin <yuriitipikin@gmail.com>
Date:   Mon, 13 Mar 2017 19:52:22 +0300

add abstract and introduction. No spell check.

Diffstat:
main.tex | 2+-
src/abstract.tex | 9++++++++-
src/body.tex | 13+++++++++++++
3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/main.tex b/main.tex @@ -13,7 +13,7 @@ \begin{document} -\title{TITLE} +\title{Subordination: a framework for fault-tolerant systems} \author{% \IEEEauthorblockN{Yuri Tipikin \quad Ivan Gankevich \quad Vladimir Korkhov} diff --git a/src/abstract.tex b/src/abstract.tex @@ -1,3 +1,10 @@ \begin{abstract} - ABSTRACT + In this paper we describe a new framework for creating a reliable to hardware + errors distributed programs. Our main goal was to create a simply yet powerful + tool to archiving fault tolerance without creation of checkpoints, memory + dumps and other highly disk usage activities. To archive this we first + introduce a strong hierarchy of program components (or parts) and then discuss + about scenarios for continue computations. The programs parts hierarchy based + on Actor model by C. Hewitt, failure scenarios cover most common hardware + errors; software error handling are not covered by this article. \end{abstract} diff --git a/src/body.tex b/src/body.tex @@ -1,3 +1,16 @@ +\section{Introduction} + +In large scale cluster environments node faults are common. In general this do +not lead to global cluster malfunction, but it have huge impact on job running +on faulty resources. Classical MPI programs will fail if any one of used nodes +will broke. Today existed solutions mainly focused on making node checkpoints, +but with increasing speed of computations it became less efficient. Our approach +to make cluster computations reliable and efficient again is to use special +framework focused on structuring source algorithm in strong hierarchy of +parallel and sequential parts. Using different fault tolerant scenarios based on +hierarchy interactions framework can provide continuous computations in case of +hardware errors or electricity outages. + \section{Computational kernel hierarchy} The framework provides classes and methods to simplify development of