tail.tex (5674B)
1 \section{Related work} 2 3 The feature that distinguishes our research with respect to some others, is the 4 use of hierarchy as the only possible way of defining dependencies between 5 objects, into which a programme is decomposed. The main advantage of hierarchy 6 is trivial handling of object failures. 7 8 In~\cite{zuckerman2011using} the authors describe codelet model for exascale 9 machines. This model breaks a programme into small bits of functionality, 10 called codelets, and dependencies between them. The programme dataflow 11 represents directed graph, which is called well-behaved if forward progress of 12 the programme is guaranteed. In contrast to our model, in codelet model 13 hierarchical dependencies are not enforced, and resilience to failures is 14 provided by object migration and relies on hardware fault detection mechanisms. 15 Furthermore, execution of kernel hierarchies in our model resembles 16 stack-based execution of ordinary programmes: the programme finishes only when 17 all subordinate kernels of the main kernel finish. So, there is no need to 18 define well-behaved graph to guarantee programme termination. 19 20 In~\cite{meneses2015using} the authors describe migratable objects model for 21 parallel programmes. In the framework of this model a programme is decomposed 22 into objects that may communicate with each other by sending messages, and can 23 be migrated to any cluster node if desired. The authors propose several 24 possibilities, how this model may enhance fault-tolerance techniques for 25 Charm++/AMPI programmes: proactive fault detection, checkpoint/restart and 26 message logging. In contrast to our model, migratable objects do not compose a 27 hierarchy, but may exchange messages with any object address of which is known 28 to the sender. A spanning tree of nodes is used to orchestrate collective 29 operations between objects. This tree is similar to tree hierarchy of nodes, 30 which is used in our work to distribute kernels between available cluster 31 nodes, but we use this hierarchy for any operations that require distribution 32 of work, rather than collective ones. Our model does not use techniques 33 described in this paper to provide fault-tolerance: upon a failure we 34 re-execute subordinate kernels and copy principal kernels to be able to 35 re-execute them as well. Our approach blends checkpoint/restart and message 36 logging: each kernel which is sent to other cluster node is saved (logged) in 37 the outbound buffer of the sender, and removed from the buffer upon return. 38 Since subordinate kernels are allowed to communicate only with their principals 39 (all other communication may happen only when physical location of the kernel 40 is known, if the communication fails, then the kernel also fails to trigger 41 recovery by the principal), a collection of all logs on each cluster nodes 42 constitutes the current state of programme execution, which is used to restart 43 failed kernels on the surviving nodes. 44 45 To summarise, the feature that distinguishes our model with respect to models 46 proposed for improving parallel programme fault-tolerance is the use of kernel 47 hierarchy~--- an abstraction which defines strict total order on a set of 48 kernels (their execution order) and, consequently, defines for each kernel a 49 principal kernel, responsibility of which is to re-execute failed subordinate 50 kernels upon a failure. 51 52 With respect to various high-availability cluster 53 projects~\cite{robertson2000linux,haddad2003ha,leangsuksun2005achieving} our 54 approach has the following advantages. First, it scales with the large number 55 of nodes, as only point-to-point communication between slave and master node is 56 used instead of broadcast messages (which has been shown in the previous 57 work~\cite{gankevich2015subordination}), hence, the use of several switches and 58 routers is possible within single cluster. Second, our approach does not 59 require the use of standby servers to provide high availability of a master 60 node: we provide fault tolerance on kernel layer instead. As the computation 61 progresses, kernels copy themselves on nodes that are logically connected to the 62 current one, and these can be any nodes from the cluster. Finally, 63 high-availability cluster projects do not deal with parallel programme 64 failures, they aim to provide high-availability for services running on master 65 node (NFS, SMB, DHCP, etc.), whereas our approach is specifically targeted at 66 providing continuous execution of parallel applications. 67 68 \section{Conclusion} 69 70 In the paper we propose a system architecture consisting of two tree 71 hierarchies of entities, mapped on each other, that simplifies provision of 72 resilience to failures for parallel programmes. The resilience is solely 73 provided by the use of hierarchical dependencies between entities, and is 74 independent on each layer of the system. To optimise handling failure of 75 multiple cluster nodes, we use the hierarchy implied by the order of creation 76 of subordinate entities. The hierarchical approach to fault tolerance is 77 efficient, scales to a large number of cluster nodes, and requires slow I/O 78 operations only for the most disastrous scenario~--- simultaneous failure of 79 all cluster nodes. 80 81 The future work is to standardise application programming interface of the 82 system and investigate load-balancing techniques, which are optimal for a 83 programme composed of many computational kernels. 84 85 \section*{Acknowledgement} 86 The research was carried out using computational resources of Resource Centre 87 ``Computational Centre of Saint Petersburg State University'' (\mbox{T-EDGE96} 88 \mbox{HPC-0011828-001}) within frameworks of grants of Russian Foundation for 89 Basic Research (projects no.~\mbox{16-07-01111}, \mbox{16-07-00886}, 90 \mbox{16-07-01113}).