iccsa-16-factory-extended

git clone https://git.igankevich.com/iccsa-16-factory-extended.git
Log | Files | Refs

commit 96ce4a106de16ece475dc51188da9241cecde92f
parent 5747d019c9a3d412e09f83f8d1b67d918146862a
Author: Ivan Gankevich <igankevich@ya.ru>
Date:   Sat, 28 Jan 2017 18:44:25 +0300

Add governing principles.

Diffstat:
src/sections.tex | 126+++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------------
1 file changed, 92 insertions(+), 34 deletions(-)

diff --git a/src/sections.tex b/src/sections.tex @@ -50,6 +50,64 @@ Computational model with a pipeline can be seen as \emph{bulk-asynchronous model}, because of the parallel nature of otherwise sequential execution steps. This model is the basis of the fault-tolerance model developed here. +\subsection{Programming model principles} + +Data processing pipeline model is based on the following principles, following +which maximises efficiency of a programme. +\begin{itemize} + +\item There is no notion of a message in the model, a kernel is itself a + message that can be sent over network to another node and directly access + any kernel on the local node. Only programme logic may guarantee the + existence of the kernel. + +\item A kernel is a \emph{cooperative routine}, which is submitted to kernel + pool upon the call and is executed asynchronously by a scheduler. There can + be any number of calls to other subroutines inside routine body. Every call + submits corresponding subroutine to kernel pool and returns immediately. + Kernels in the pool can be executed in any order; this fact is used by a + scheduler to exploit parallelism offered by the computer by distributing + kernels from the pool across available cluster nodes and processor cores. + +\item Asynchronous execution prevents the use of explicit synchronisation after + the call to subroutine is made; system scheduler returns control flow to + the routine each time one of its subroutine returns. Such cooperation + transforms each routine which calls subroutines into event handler, where + each event is a subroutine and the handler is the routine that called them. + +\item The routine may communicate with any number of local kernels, addresses + of which it knows; communication with kernels which are not adjacent in the + call stack complexifies control flow and call stack looses its tree shape. + Only programme logic may guarantee presence of communicating kernels in + memory. One way to ensure this is to perform communication between + subroutines which are called from the same routine. Since such + communication is possible within hierarchy through parent routine, it may + treated as an optimisation that eliminates overhead of transferring data + over intermediate node. The situation is different for interactive or + event-based programmes (e.g. servers and programmes with graphical + interface) in which this is primary type of communication. + +\item In addition to this, communication which does not occur along + hierarchical links and executed over cluster network complexify design of + resiliency algorithms. Since it is impossible to ensure that a kernel + resides in memory of a neighbour node, because a node may fail in the + middle of its execution of the corresponding routine. As a result, upon + failure of a routine all of its subroutines must be restarted. This + encourages a programmer to construct + \begin{itemize} + \item deep tree hierarchies of tightly-coupled kernels (which communicate + on the same level of hierarchy) to reduce overhead of recomputation; + \item fat tree hierarchies of loosely-coupled kernels, providing maximal + degree of parallelism. + \end{itemize} + Deep hierarchy is not only requirement of technology, it helps optimise + communication of large number of cluster nodes reducing it to + communication of adjacent nodes. + +\end{itemize} +So, control flow objects (or kernels) possess properties of both cooperative +routines and event handlers. + \subsection{Fail over model} Although, fault-tolerance and high-availability are different terms, in essence @@ -61,16 +119,16 @@ high-availability it is the ability to switch from a failed master node to a backup node with full restoration of execution state. These are the core abilities that constitute distributed system's ability to \emph{fail over}. -The key feature that is missing in the current parallel programming and big data -processing technologies is a possibility to specify hierarchical dependencies -between parallel tasks. When one has such dependency, it is trivial to determine -which task should be responsible for re-executing a failed task on a healthy -node. To re-execute the root of the hierarchy, a backup root task is created and -executed on a different node. There exists a number of engines that are capable -of executing directed acyclic graphs of tasks in -parallel~\citep{acun2014charmpp,islam2012oozie}, but graphs are not good to infer -master-slave relationship between tasks, because a node in the graph may have -multiple parent nodes. +The key feature that is missing in the current parallel programming and big +data processing technologies is a possibility to specify hierarchical +dependencies between parallel tasks. When one has such dependency, it is +trivial to determine which task should be responsible for re-executing a failed +task on a healthy node. To re-execute the root of the hierarchy, a backup root +task is created and executed on a different node. There exists a number of +engines that are capable of executing directed acyclic graphs of tasks in +parallel~\citep{acun2014charmpp,islam2012oozie}, but graphs are not good to +infer master-slave relationship between tasks, because a node in the graph may +have multiple parent nodes. \subsection{Programming model} @@ -81,9 +139,9 @@ cluster nodes. The sole purpose of this hierarchy is to make a cluster more fault-tolerant by introducing multiple master nodes. If a master node fails, then its subordinates try to connect to another node from the same or higher level of the hierarchy. If there is no such node, one of the subordinates -becomes the master. In~\citep{gankevich2015spec} we developed a framework for big -data processing without fault tolerance, and here this framework is combined -with fault-tolerance techniques described in this paper. +becomes the master. In~\citep{gankevich2015spec} we developed a framework for +big data processing without fault tolerance, and here this framework is +combined with fault-tolerance techniques described in this paper. Each programme that runs on top of the tree hierarchy is composed of computational kernels---objects that contain data and code to process it. To @@ -101,12 +159,12 @@ transfer is eliminated, and the file is always read from a local drive. This approach is more deterministic compared to existing ones, e.g. MapReduce framework runs jobs on nodes that are ``close'' to the file location, but not necessarily the exact node where the file is located~\citep{dean2008mapreduce}. -However, this approach does not come without disadvantages: scalability of a big -data application is limited by the strategy that was employed to distribute its -input files across cluster nodes. The more nodes used to store input files, the -more read performance is achieved. The advantage of our approach is that the I/O -performance is more predictable, than one of hybrid approach with streaming -files over the network. +However, this approach does not come without disadvantages: scalability of a +big data application is limited by the strategy that was employed to distribute +its input files across cluster nodes. The more nodes used to store input files, +the more read performance is achieved. The advantage of our approach is that +the I/O performance is more predictable, than one of hybrid approach with +streaming files over the network. The main purpose of the model is to simplify development of distributed batch processing applications and middleware. The main focus is to make application @@ -115,8 +173,8 @@ it transparently to a programmer. The implementation is divided into two layers: the lower layer consists of routines and classes for single node applications (with no network interactions), and the upper layer for applications that run on an arbitrary number of nodes. There are two kinds of -tightly coupled entities in the model --- \emph{control flow objects} (or -\emph{kernels}) and \emph{pipelines} --- which are used together to compose a +tightly coupled entities in the model~--- \emph{control flow objects} (or +\emph{kernels}) and \emph{pipelines}~--- which are used together to compose a programme. Kernels implement control flow logic in theirs \texttt{act} and \texttt{react} @@ -156,9 +214,9 @@ and call correct kernel methods by analysing their internal state. \subsection{Handling master node failures} -A possible way of handling a failure of a node where the first kernel is located -(a master node) is to replicate this kernel to a backup node, and make all -updates to its state propagate to the backup node by means of a distributed +A possible way of handling a failure of a node where the first kernel is +located (a master node) is to replicate this kernel to a backup node, and make +all updates to its state propagate to the backup node by means of a distributed transaction. This approach requires synchronisation between all nodes that execute subordinates of the first kernel and the node with the first kernel itself. When a node with the first kernel goes offline, the nodes with @@ -168,22 +226,22 @@ kernel, then it is impossible for this kernel to discover the next backup node to return to, because this kernel has not discovered the unavailability of the master node yet. One can think of a consensus-based algorithm to ensure that subordinate kernels always know where the backup node is, but distributed -consensus algorithms do not scale well to the large number of nodes and they are -not reliable~\citep{fischer1985impossibility}. So, consensus-based approach does -not play well with asynchronous nature of computational kernels as it may +consensus algorithms do not scale well to the large number of nodes and they +are not reliable~\citep{fischer1985impossibility}. So, consensus-based approach +does not play well with asynchronous nature of computational kernels as it may inhibit scalability of a parallel programme. Fortunately, the first kernel usually does not perform operations in parallel, it is rather sequentially launches execution steps one by one, so it has only one subordinate at a time. Such behaviour is described by bulk-synchronous parallel programming model, in the framework of which a programme consists of -sequential supersteps which are internally parallel~\citep{valiant1990bridging}. -Keeping this in mind, we can simplify synchronisation of its state: we can send -the first kernel along with its subordinate to the subordinate node. When the -node with the first kernel fails, its copy receives its subordinate, and no -execution time is lost. When the node with its copy fails, its subordinate is -rescheduled on some other node, and in the worst case a whole step of -computation is lost. +sequential supersteps which are internally +parallel~\citep{valiant1990bridging}. Keeping this in mind, we can simplify +synchronisation of its state: we can send the first kernel along with its +subordinate to the subordinate node. When the node with the first kernel fails, +its copy receives its subordinate, and no execution time is lost. When the node +with its copy fails, its subordinate is rescheduled on some other node, and in +the worst case a whole step of computation is lost. Described approach works only for kernels that do not have a parent and have only one subordinate at a time, and act similar to manually triggered