commit 5f06a229c6a651b5b0d6e8c1bd5d6c7bc6227e1e
parent 79068920b16525da87c0f55fcd3349e95311b03e
Author: Ivan Gankevich <igankevich@ya.ru>
Date: Fri, 24 Mar 2017 20:44:05 +0300
Revise abstract.
Diffstat:
1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/src/abstract.tex b/src/abstract.tex
@@ -1,10 +1,14 @@
\begin{abstract}
- In this paper we describe a new framework for creating a reliable to hardware
- errors distributed programs. Our main goal was to create a simply yet powerful
- tool to archiving fault tolerance without creation of checkpoints, memory
- dumps and other highly disk usage activities. To archive this we first
- introduce a strong hierarchy of program components (or parts) and then discuss
- about scenarios for continue computations. The programs parts hierarchy based
- on Actor model by C. Hewitt, failure scenarios cover most common hardware
- errors; software error handling are not covered by this article.
+
+In this paper we describe a new framework for creating distributed programmes
+which are resilient to cluster node failures. Our main goal is to create a
+simple and reliable model, that ensures continuous execution of parallel
+programmes without creation of checkpoints, memory dumps and other I/O
+intensive activities. To achieve this we introduce multi-layered system
+architecture, each layer of which consists of unified entities organised into
+hierarchies, and then show how this system handles different node failure
+scenarios. We benchmark our system on the example of real-world HPC application
+on both physical and virtual clusters. The results of the experiments show that
+our approach has low overhead and scales to a large number of cluster nodes.
+
\end{abstract}