In the paper we propose a system architecture consisting of two tree
hierarchies of entitites, mapped on each other, that simplifies provision of
resilience to failures for parallel programmes. The resilience is solely
provided by the use of hierarchical dependencies between entitites, and is
independent on each layer of the system. To optimise handling failure of
multiple cluster nodes, we use the hierarchy implied by the order of creation
of subordinate entitities. The hierarchical approach to fault tolerance is
efficient, scales to a large number of cluster nodes, and requires slow I/O
operations only for the most disastrous scenario~--- simultaneous failure of
all cluster nodes.
The future work is to standardase application programming interface of the
system and investigate load-balancing techniques, which are optimal for a
programme composed of many computational kernels.