hpcs-17-subord

git clone https://git.igankevich.com/hpcs-17-subord.git
Log | Files | Refs

commit 6d0d2acb699b161f08340d31adcc7dfb02cdf18b
parent 7094e26af262a736fef577a53dc56bdefc0bb4df
Author: Ivan Gankevich <igankevich@ya.ru>
Date:   Tue, 21 Feb 2017 18:34:44 +0300

Spell check.

Diffstat:
src/body.tex | 16+++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/src/body.tex b/src/body.tex @@ -172,17 +172,17 @@ gap. Proposed node failure handling approach was evaluated on the example of real-world application. The application generates ocean wavy surface in -parallel with speficied frequency-directional spectrum. There are two +parallel with specified frequency-directional spectrum. There are two sequential steps in the programme. The first step is to compute model coefficients by solving system of linear algebraic equations. The system is solved in parallel on all cores of the principal node. The second step is to generate wavy surface, parts of which are generated in parallel on all cluster nodes including the principal one. All generated parts are written in parallel to individual files. So, from computational point of view the programme is -embarassingly parallel with little synchronisation between concurrent +embarrassingly parallel with little synchronisation between concurrent processes. -All exprements were run on physical computer cluster consisting of 12 nodes. +All experiments were run on physical computer cluster consisting of 12 nodes. Wavy ocean surface parts were written to local file system to eliminate overhead of parallel writing to NFS from the total application performance. The input data was read from Network File System (NFS) mounted on each node, as @@ -206,14 +206,16 @@ Table~\ref{tab:platform-configuration}. The first failure scenario (see Section~\ref{sec:failure-scenarios}) was evaluated in the following experiment. At the beginning of the second -sequential application step all parallel applciation processes except one were +sequential application step all parallel application processes except one were shutdown with a small delay to give principal kernel time to distribute its subordinates between cluster nodes. The experiment was repeated 12 times with a different surviving process each time. For each run total application running time was measured and compared to each other. The result of the experiment is -the overhead of failure of a specific kernel in the hierarchy (the overhead of -recovering from failure of a principal kernel is different from the one of a -subordinate kernel). +the overhead of recovery from a failure of a specific kernel in the hierarchy +(the overhead of recovering from failure of a principal kernel is different +from the one of a subordinate kernel). + + \section{Discussion}