iccsa-16-factory-extended

git clone https://git.igankevich.com/iccsa-16-factory-extended.git
Log | Files | Refs

commit f34aebc87afa3544022b39ec92a66175588045fb
parent ecf4cdbb7d0193be272dbaa4fab6390f219998d1
Author: Ivan Gankevich <igankevich@ya.ru>
Date:   Fri,  3 Mar 2017 18:45:21 +0300

Incorporate new abstract and title.

Diffstat:
main.tex | 2+-
src/abstract.tex | 26+++++++++++++-------------
2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/main.tex b/main.tex @@ -28,7 +28,7 @@ \CLline \subtitle{} -\title{Factory: Master node high-availability for Big Data applications and beyond} +\title{Master node fault tolerance in distributed big data processing clusters} \authorA{I.~Gankevich\qquad{}Yu.~Tipikin\qquad{}V.~Korkhov\\V.~Gaiduchok\qquad{}A.~Degtyarev\qquad{}A.~Bogdanov} diff --git a/src/abstract.tex b/src/abstract.tex @@ -1,17 +1,17 @@ \begin{abstract} -Master node fault-tolerance is the topic that is often dimmed in the discussion -of big data processing technologies. Although failure of a master node can take -down the whole data processing pipeline, this is considered either improbable or -too difficult to encounter. The aim of the studies reported here is to propose -rather simple technique to deal with master-node failures. This technique is -based on temporary delegation of the master role to one of the slave nodes and -transferring updated state back to the master when one step of computation is -complete. That way the state is duplicated and computation can proceed to the -next step regardless of a failure of a delegate or the master (but not both). We -run benchmarks to show that a failure of a master is almost ``invisible'' to -other nodes, and a failure of a delegate results in recomputation of only one step -of data processing pipeline. We believe that the technique can be used not only -in Big Data processing but in other types of applications. +Distributed computing clusters are often built with commodity hardware which +leads to periodic failures of processing nodes due to relatively low +reliability of such hardware. While worker node fault-tolerance is +straightforward, fault tolerance of master node poses a bigger challenge. In +this paper master node failure handling is based on the concept of master and +worker roles that can be dynamically re-assigned to cluster nodes along with +maintaining a backup of the master node state on one of worker nodes. In such +case no special component is needed to monitor the health of the cluster while +master node failures can be resolved except for the cases of simultaneous +failure of master and backup. We present experimental evaluation of the +technique implementation, show benchmarks demonstrating that a failure of a +master does not affect running job, and a failure of a backup results in +re-computation of only the last job step. \end{abstract} \KEYWORD{parallel computing; Big Data processing; distributed computing; backup node; state transfer; delegation; cluster computing; fault-tolerance; high-availability; hierarchy}