commit 316169750284c7a011268d3e7e4eb4d3d6335fbb
parent a44a05b991acd6e1ca6416e75acefc107f638ce1
Author: Vladimir Korkhov <vkorkhov@gmail.com>
Date: Fri, 3 Feb 2017 22:07:43 +0300
Minor changes
Diffstat:
2 files changed, 33 insertions(+), 33 deletions(-)
diff --git a/singlecol-new.cls b/singlecol-new.cls
@@ -324,7 +324,7 @@
\def\ISSUE#1{\gdef\theISSUE{#1}}%
\ISSUE{00}%
-\JOURNALNAME{\TEN{\it Int. J. Electric and Hybrid Vehicles, Vol.
+\JOURNALNAME{\TEN{\it Int. J. of Business Intelligence and Data Mining, Vol.
\theVOL, No. \theISSUE, \thePAGES}\hfill \thepage}%
\def\jtitlefont{\fontsize{16}{22}\selectfont\rm}
diff --git a/src/sections.tex b/src/sections.tex
@@ -24,12 +24,12 @@ makes sense because all devices operate with little synchronisation, and issuing
commands in parallel makes the whole programme perform better. This behaviour
can be achieved by allocating a separate task queue for each device and
submitting tasks to these queues asynchronously with execution of the main
-thread. So, after this optimisation, the programme will be composed of several
+thread. After this optimisation the programme will be composed of several
steps chained into the pipeline, each step is implemented as a task queue for a
particular device.
-Pipelining of otherwise sequential steps is beneficial not only for code
-accessing different devices, but for code different branches of which are
+Pipelining of otherwise sequential steps is beneficial not only for the code
+accessing different devices, but for the code different branches of which are
suitable for execution by multiple hardware threads of the same core, i.e.
branches accessing different regions of memory or performing mixed arithmetic
(floating point and integer). In other words, code branches which use different
@@ -41,10 +41,10 @@ one input file (or a set of input parameters), it adds parallelism when the
programme can process multiple input files: each input generates tasks which
travel through the whole pipeline in parallel with tasks generated by other
inputs. With a pipeline an array of files is processed in parallel by the same
-set of resources allocated for a batch job, and possibly with greater
+set of resources allocated for a batch job. The pipeline is likely to deliver greater
efficiency for busy HPC clusters compared to executing a separate job for each
-input file, because the time that each subsequent job after the first spends in
-a queue is eliminated. A diagram of computational pipeline is presented in
+input file, because the time that each subsequent job spends in
+the queue is eliminated. A diagram of computational pipeline is presented in
fig.~\ref{fig:pipeline}.
\begin{figure}
@@ -60,8 +60,8 @@ This model is the basis of the fault-tolerance model developed here.
\subsection{Programming model principles}
-Data processing pipeline model is based on the following principles, following
-which maximises efficiency of a programme.
+Data processing pipeline model is based on the following principles that
+maximise efficiency of a programme:
\begin{itemize}
\item There is no notion of a message in the model, a kernel is itself a
@@ -69,38 +69,38 @@ which maximises efficiency of a programme.
any kernel on the local node. Only programme logic may guarantee the
existence of the kernel.
-\item A kernel is a \emph{cooperative routine}, which is submitted to kernel
- pool upon the call and is executed asynchronously by a scheduler. There can
+\item A kernel is a \emph{cooperative routine}, which is submitted to the kernel
+ pool upon the call and is executed asynchronously by the scheduler. There can
be any number of calls to other subroutines inside routine body. Every call
- submits corresponding subroutine to kernel pool and returns immediately.
- Kernels in the pool can be executed in any order; this fact is used by a
+ submits corresponding subroutine to the kernel pool and returns immediately.
+ Kernels in the pool can be executed in any order; this fact is used by the
scheduler to exploit parallelism offered by the computer by distributing
kernels from the pool across available cluster nodes and processor cores.
\item Asynchronous execution prevents the use of explicit synchronisation after
- the call to subroutine is made; system scheduler returns control flow to
+ the call to subroutine is made; the system scheduler returns the control flow to
the routine each time one of its subroutine returns. Such cooperation
- transforms each routine which calls subroutines into event handler, where
+ transforms each routine which calls subroutines into an event handler, where
each event is a subroutine and the handler is the routine that called them.
-\item The routine may communicate with any number of local kernels, addresses
- of which it knows; communication with kernels which are not adjacent in the
- call stack complexifies control flow and call stack looses its tree shape.
- Only programme logic may guarantee presence of communicating kernels in
+\item The routine may communicate with any number of local kernels, whose addresses
+ it knows; communication with kernels which are not adjacent in the
+ call stack complexifies the control flow and the call stack looses its tree shape.
+ Only the programme logic may guarantee the presence of communicating kernels in
memory. One way to ensure this is to perform communication between
subroutines which are called from the same routine. Since such
- communication is possible within hierarchy through parent routine, it may
- treated as an optimisation that eliminates overhead of transferring data
- over intermediate node. The situation is different for interactive or
+ communication is possible within the hierarchy through the parent routine, it may
+ be treated as an optimisation that eliminates the overhead of transferring data
+ over an intermediate node. The situation is different for interactive or
event-based programmes (e.g. servers and programmes with graphical
interface) in which this is primary type of communication.
\item In addition to this, communication which does not occur along
- hierarchical links and executed over cluster network complexify design of
+ hierarchical links and is executed over the cluster network complexifies the design of
resiliency algorithms. Since it is impossible to ensure that a kernel
- resides in memory of a neighbour node, because a node may fail in the
+ resides in memory of a neighbour node, a node may fail in the
middle of its execution of the corresponding routine. As a result, upon
- failure of a routine all of its subroutines must be restarted. This
+ a failure of a routine all of its subroutines must be restarted. This
encourages a programmer to construct
\begin{itemize}
\item deep tree hierarchies of tightly-coupled kernels (which communicate
@@ -108,17 +108,17 @@ which maximises efficiency of a programme.
\item fat tree hierarchies of loosely-coupled kernels, providing maximal
degree of parallelism.
\end{itemize}
- Deep hierarchy is not only requirement of technology, it helps optimise
+ Deep hierarchy is not only the requirement of technology; it helps optimise
communication of large number of cluster nodes reducing it to
communication of adjacent nodes.
\end{itemize}
-So, control flow objects (or kernels) possess properties of both cooperative
+Thus, control flow objects (or kernels) possess properties of both cooperative
routines and event handlers.
\subsection{Fail over model}
-Although, fault-tolerance and high-availability are different terms, in essence
+Although fault-tolerance and high-availability are different terms, in essence
they describe the same property---an ability of a system to switch processing
from a failed component to its live spare or backup component. In case of
fault-tolerance it is the ability to switch from a failed slave node to a spare
@@ -171,11 +171,11 @@ However, this approach does not come without disadvantages: scalability of a
big data application is limited by the strategy that was employed to distribute
its input files across cluster nodes. The more nodes used to store input files,
the more read performance is achieved. The advantage of our approach is that
-the I/O performance is more predictable, than one of hybrid approach with
+the I/O performance is more predictable, than one of the hybrid approach with
streaming files over the network.
-The main purpose of the model is to simplify development of distributed batch
-processing applications and middleware. The main focus is to make application
+The main purpose of the model is to simplify the development of distributed batch
+processing applications and middleware. The main focus is to make an application
resilient to failures, i.e. make it fault tolerant and highly available, and do
it transparently to a programmer. The implementation is divided into two
layers: the lower layer consists of routines and classes for single node
@@ -187,7 +187,7 @@ programme.
Kernels implement control flow logic in theirs \texttt{act} and \texttt{react}
methods and store the state of the current control flow branch. Both logic and
-state are implemented by a programmer. In \texttt{act} method some function is
+state are implemented by a programmer. In the \texttt{act} method some function is
either directly computed or decomposed into nested functions (represented by a
set of subordinate kernels) which are subsequently sent to a pipeline. In
\texttt{react} method subordinate kernels that returned from the pipeline are
@@ -251,7 +251,7 @@ its copy receives its subordinate, and no execution time is lost. When the node
with its copy fails, its subordinate is rescheduled on some other node, and in
the worst case a whole step of computation is lost.
-Described approach works only for kernels that do not have a parent and have
+The described approach works only for kernels that do not have a parent and have
only one subordinate at a time, and act similar to manually triggered
checkpoints. The advantage is that they
\begin{itemize}