commit f34aebc87afa3544022b39ec92a66175588045fb
parent ecf4cdbb7d0193be272dbaa4fab6390f219998d1
Author: Ivan Gankevich <igankevich@ya.ru>
Date: Fri, 3 Mar 2017 18:45:21 +0300
Incorporate new abstract and title.
Diffstat:
2 files changed, 14 insertions(+), 14 deletions(-)
diff --git a/main.tex b/main.tex
@@ -28,7 +28,7 @@
\CLline
\subtitle{}
-\title{Factory: Master node high-availability for Big Data applications and beyond}
+\title{Master node fault tolerance in distributed big data processing clusters}
\authorA{I.~Gankevich\qquad{}Yu.~Tipikin\qquad{}V.~Korkhov\\V.~Gaiduchok\qquad{}A.~Degtyarev\qquad{}A.~Bogdanov}
diff --git a/src/abstract.tex b/src/abstract.tex
@@ -1,17 +1,17 @@
\begin{abstract}
-Master node fault-tolerance is the topic that is often dimmed in the discussion
-of big data processing technologies. Although failure of a master node can take
-down the whole data processing pipeline, this is considered either improbable or
-too difficult to encounter. The aim of the studies reported here is to propose
-rather simple technique to deal with master-node failures. This technique is
-based on temporary delegation of the master role to one of the slave nodes and
-transferring updated state back to the master when one step of computation is
-complete. That way the state is duplicated and computation can proceed to the
-next step regardless of a failure of a delegate or the master (but not both). We
-run benchmarks to show that a failure of a master is almost ``invisible'' to
-other nodes, and a failure of a delegate results in recomputation of only one step
-of data processing pipeline. We believe that the technique can be used not only
-in Big Data processing but in other types of applications.
+Distributed computing clusters are often built with commodity hardware which
+leads to periodic failures of processing nodes due to relatively low
+reliability of such hardware. While worker node fault-tolerance is
+straightforward, fault tolerance of master node poses a bigger challenge. In
+this paper master node failure handling is based on the concept of master and
+worker roles that can be dynamically re-assigned to cluster nodes along with
+maintaining a backup of the master node state on one of worker nodes. In such
+case no special component is needed to monitor the health of the cluster while
+master node failures can be resolved except for the cases of simultaneous
+failure of master and backup. We present experimental evaluation of the
+technique implementation, show benchmarks demonstrating that a failure of a
+master does not affect running job, and a failure of a backup results in
+re-computation of only the last job step.
\end{abstract}
\KEYWORD{parallel computing; Big Data processing; distributed computing; backup node; state transfer; delegation; cluster computing; fault-tolerance; high-availability; hierarchy}