commit 6006ab436f029ff19d3a7ad2d06398abec88ff92
parent 468a1f04d38ff033ab0790bbca325b8a4e98a685
Author: Ivan Gankevich <igankevich@ya.ru>
Date: Thu, 9 Feb 2017 11:34:52 +0300
Spell-check and revise the text.
Diffstat:
4 files changed, 68 insertions(+), 69 deletions(-)
diff --git a/src/abstract.tex b/src/abstract.tex
@@ -14,4 +14,4 @@ of data processing pipeline. We believe that the technique can be used not only
in Big Data processing but in other types of applications.
\end{abstract}
-\KEYWORD{parallel computing; Big Data processing; distributed computing; backup node; state transfer; delegation; cluster computing; fault-tolerance; high-availability}
+\KEYWORD{parallel computing; Big Data processing; distributed computing; backup node; state transfer; delegation; cluster computing; fault-tolerance; high-availability; hierarchy}
diff --git a/src/intro.tex b/src/intro.tex
@@ -22,11 +22,11 @@ either master or slave, rather than to think of a cluster as a whole with
master and slave roles being \emph{dynamically} assigned to a particular
physical machine.
-This evolution in thinking allows to implement middleware that manages master
+This evolution in thinking allows implementing middleware that manages master
and slave roles automatically and handles node failures in a generic way. This
software provides an API to distribute parallel tasks on the pool of available
-nodes and among them. Using this API one can write an application that runs on
-a cluster without knowing the exact number of online nodes. The middleware is
+nodes and among them. Using this API one can write an application that runs on a
+cluster without knowing the exact number of online nodes. The middleware is
implemented as a daemon running on each cluster node which acts as an
intermediate point of communication for distributed applications and
transparently routes application messages between operating system processes
@@ -67,7 +67,7 @@ assigned to available routers, this protocol works on top of the IPv4 and IPv6
protocols and is designed to be used by routers and reverse proxy servers. Such
servers lack the state that needs to be restored upon a failure (i.e.~there is
no job queue in web servers), so it is easier for them to provide
-high-availability. In Linux it is implemented in Keepalived routing
+high-availability. On Linux it is implemented in Keepalived routing
daemon~\citep{cassen2002keepalived}.
In contrast to web servers and HPC and Big Data job schedulers, some distributed
@@ -92,19 +92,19 @@ to propose such an environment and to describe its internal structure.
The programming model used in this work is partly based on the well-known actor
model of concurrent computation~\citep{agha1985actors,hewitt1973universal}. Our
-model borrows the concept of actor---an object that stores data and methods to
+model borrows the concept of actor --- an object that stores data and methods to
process it; this object can react to external events by either changing its
state or producing more actors. We call this objects \emph{computational
-kernels}. Their distinct feature is hierarchical dependence on parent kernel
-that created each of them, which allows to implement fault-tolerance based on
-simple restart of a failed subordinate kernel.
+ kernels}. Their distinct feature is hierarchical dependence on parent kernel
+that created each of them, which allows implementing fault-tolerance by
+restarting failed subordinate kernel.
However, using hierarchical dependence alone is not enough to develop
-high-availability of a master kernel---the first kernel in a parallel programme.
-To solve the problem the other part of our programming model is based on
-bulk-synchronous parallel model~\citep{valiant1990bridging}. It borrows the
-concept of superstep---a sequential step of a parallel programme; at any time a
-programme executes only one superstep, which allows to implement
+high-availability of a master kernel --- the first kernel in a parallel
+programme. To solve the problem the other part of our programming model is based
+on bulk-synchronous parallel model~\citep{valiant1990bridging}. It borrows the
+concept of superstep --- a sequential step of a parallel programme; at any time
+a programme executes only one superstep, which allows implementing
high-availability of the first kernel (under assumption that it has only one
subordinate at a time) by sending it along its subordinate to a different
cluster node thus making a distributed copy of it. Since the first kernel has
@@ -116,7 +116,7 @@ master node failure per superstep.
To summarise, the framework developed in this paper protects a parallel
programme from failure of any number of subordinate nodes and from one failure
of a master node per superstep. The paper does not answer the question of how to
-determine if a node failed, it assumes a failure when the network connection to
-a node is prematurely closed. In general, the presented research goes in line
-with further development of the virtual supercomputer concept coined and
-evaluated in~\citep{vsc-csit2013,vsc-iccsa2014,vsc-nova}.
+determine if a node has failed, it assumes node failure when the network
+connection to a node is prematurely closed. In general, the presented research
+goes in line with further development of the virtual supercomputer concept
+coined and evaluated in~\citep{vsc-csit2013,vsc-iccsa2014,vsc-nova}.
diff --git a/src/outro.tex b/src/outro.tex
@@ -11,7 +11,7 @@ cluster.
\subsubsection*{Acknowledgements}
The research was carried out using computational resources of Resource Centre
-``Computational Centre of Saint Petersburg State University'' (T-EDGE96
-HPC-0011828-001) within frameworks of grants of Russian Foundation for Basic
-Research (projects no. 16-07-01111, 16-07-00886, 16-07-01113) and Saint
-Petersburg State University (project no. 0.37.155.2014).
+``Computational Centre of Saint Petersburg State University'' (\mbox{T-EDGE96}
+\mbox{HPC-0011828-001}) within frameworks of grants of Russian Foundation for
+Basic Research (projects no.~\mbox{16-07-01111}, \mbox{16-07-00886},
+\mbox{16-07-01113}).
diff --git a/src/sections.tex b/src/sections.tex
@@ -19,13 +19,13 @@ but every step will (ideally) be internally parallel.
After that the only possibility to speedup the programme is to overlap execution
of code blocks that work with different hardware devices. The most common
-pattern is to overlap computation with network I/O or disk I/O. This approach
-makes sense because all devices operate with little synchronisation, and issuing
+pattern is to overlap computation with network or disk I/O. This approach makes
+sense because all devices operate with little synchronisation, and issuing
commands in parallel makes the whole programme perform better. This behaviour
can be achieved by allocating a separate task queue for each device and
submitting tasks to these queues asynchronously with execution of the main
-thread. After this optimisation the programme will be composed of several
-steps chained into the pipeline, each step is implemented as a task queue for a
+thread. After this optimisation the programme will be composed of several steps
+chained into the pipeline, each step is implemented as a task queue for a
particular device.
Pipelining of otherwise sequential steps is beneficial not only for the code
@@ -41,10 +41,10 @@ one input file (or a set of input parameters), it adds parallelism when the
programme can process multiple input files: each input generates tasks which
travel through the whole pipeline in parallel with tasks generated by other
inputs. With a pipeline an array of files is processed in parallel by the same
-set of resources allocated for a batch job. The pipeline is likely to deliver greater
-efficiency for busy HPC clusters compared to executing a separate job for each
-input file, because the time that each subsequent job spends in
-the queue is eliminated. A diagram of computational pipeline is presented in
+set of resources allocated for a batch job. The pipeline is likely to deliver
+greater efficiency for busy HPC clusters compared to executing a separate job
+for each input file, because the time that each subsequent job spends in the
+queue is eliminated. A diagram of a pipeline is presented in
fig.~\ref{fig:pipeline}.
\begin{figure}
@@ -70,39 +70,38 @@ maximise efficiency of a programme:
existence of the kernel.
\item A kernel is a \emph{cooperative routine}, which is submitted to the kernel
- pool upon the call and is executed asynchronously by the scheduler. There can
- be any number of calls to other subroutines inside routine body. Every call
- submits corresponding subroutine to the kernel pool and returns immediately.
- Kernels in the pool can be executed in any order; this fact is used by the
- scheduler to exploit parallelism offered by the computer by distributing
- kernels from the pool across available cluster nodes and processor cores.
+ pool upon the call and is executed asynchronously by the scheduler. There can
+ be any number of calls to other subroutines inside routine body. Every call
+ submits corresponding subroutine to the kernel pool and returns immediately.
+ Kernels in the pool can be executed in any order; this fact is used by the
+ scheduler to exploit parallelism offered by the computer by distributing
+ kernels from the pool across available cluster nodes and processor cores.
\item Asynchronous execution prevents the use of explicit synchronisation after
- the call to subroutine is made; the system scheduler returns the control flow to
- the routine each time one of its subroutine returns. Such cooperation
- transforms each routine which calls subroutines into an event handler, where
- each event is a subroutine and the handler is the routine that called them.
-
-\item The routine may communicate with any number of local kernels, whose addresses
- it knows; communication with kernels which are not adjacent in the
- call stack complexifies the control flow and the call stack looses its tree shape.
- Only the programme logic may guarantee the presence of communicating kernels in
- memory. One way to ensure this is to perform communication between
- subroutines which are called from the same routine. Since such
- communication is possible within the hierarchy through the parent routine, it may
- be treated as an optimisation that eliminates the overhead of transferring data
- over an intermediate node. The situation is different for interactive or
- event-based programmes (e.g. servers and programmes with graphical
- interface) in which this is primary type of communication.
-
-\item In addition to this, communication which does not occur along
- hierarchical links and is executed over the cluster network complexifies the design of
- resiliency algorithms. Since it is impossible to ensure that a kernel
- resides in memory of a neighbour node, a node may fail in the
- middle of its execution of the corresponding routine. As a result, upon
- a failure of a routine all of its subroutines must be restarted. This
- encourages a programmer to construct
- \begin{itemize}
+ the call to subroutine is made; the system scheduler returns the control flow
+ to the routine each time one of its subroutine returns. Such cooperation
+ transforms each routine which calls subroutines into an event handler, where
+ each event is a subroutine and the handler is the routine that called them.
+
+\item The routine may communicate with any number of local kernels, whose
+ addresses it knows; communication with kernels which are not adjacent in the
+ call stack complexifies the control flow and the call stack looses its tree
+ shape. Only the programme logic may guarantee the presence of communicating
+ kernels in memory. One way to ensure this is to perform communication between
+ subroutines which are called from the same routine. Since such communication
+ is possible within the hierarchy through the parent routine, it may be treated
+ as an optimisation that eliminates the overhead of transferring data over an
+ intermediate node. The situation is different for interactive or event-based
+ programmes (e.g. servers and programmes with graphical interface) in which
+ this is primary type of communication.
+
+\item In addition to this, communication which does not occur along hierarchical
+ links and is executed over the cluster network complexifies the design of
+ resiliency algorithms. Since it is impossible to ensure that a kernel resides
+ in memory of a neighbour node, a node may fail in the middle of its execution
+ of the corresponding routine. As a result, upon a failure of a routine all of
+ its subroutines must be restarted. This encourages a programmer to construct
+ \begin{itemize}
\item deep tree hierarchies of tightly-coupled kernels (which communicate
on the same level of hierarchy) to reduce overhead of recomputation;
\item fat tree hierarchies of loosely-coupled kernels, providing maximal
@@ -119,13 +118,13 @@ routines and event handlers.
\subsection{Fail over model}
Although fault-tolerance and high-availability are different terms, in essence
-they describe the same property---an ability of a system to switch processing
+they describe the same property --- an ability of a system to switch processing
from a failed component to its live spare or backup component. In case of
fault-tolerance it is the ability to switch from a failed slave node to a spare
one, i.e. to repeat computation step on a healthy slave node. In case of
high-availability it is the ability to switch from a failed master node to a
-backup node with full restoration of execution state. These are the core
-abilities that constitute distributed system's ability to \emph{fail over}.
+backup node with full recovery of execution state. These are the core abilities
+that constitute distributed system's ability to \emph{fail over}.
The key feature that is missing in the current parallel programming and big
data processing technologies is a possibility to specify hierarchical
@@ -244,7 +243,7 @@ it is rather sequentially launches execution steps one by one, so it has only
one subordinate at a time. Such behaviour is described by bulk-synchronous
parallel programming model, in the framework of which a programme consists of
sequential supersteps which are internally
-parallel~\citep{valiant1990bridging}. Keeping this in mind, we can simplify
+parallel~\citep{valiant1990bridging}. Keeping this in mind, we can simplify
synchronisation of its state: we can send the first kernel along with its
subordinate to the subordinate node. When the node with the first kernel fails,
its copy receives its subordinate, and no execution time is lost. When the node
@@ -289,11 +288,11 @@ listed in Table~\ref{tab:ndbc-dataset}.
\caption{NDBC dataset properties.}
\begin{tabular}{ll}
\toprule
- Dataset size & 144MB \\
+ Dataset size & 144MB \\
Dataset size (uncompressed) & 770MB \\
- No. of wave stations & 24 \\
- Time span & 3 years (2010--2012) \\
- Total no. of spectra & 445422 \\
+ No. of wave stations & 24 \\
+ Time span & 3 years (2010--2012) \\
+ Total no. of spectra & 445422 \\
\bottomrule
\end{tabular}
\label{tab:ndbc-dataset}