hpcs-17-subord

git clone https://git.igankevich.com/hpcs-17-subord.git
Log | Files | Refs

commit 2845ed6c74849095e3a0f84b7738d7350d409713
parent 4cc5ce8cc5dd1ba81758df94b1d5ecc17c049dc6
Author: Ivan Gankevich <igankevich@ya.ru>
Date:   Tue, 21 Feb 2017 18:29:36 +0300

Describe the experiment.

Diffstat:
src/body.tex | 13+++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/src/body.tex b/src/body.tex @@ -185,7 +185,7 @@ processes. All exprements were run on physical computer cluster consisting of 12 nodes. Wavy ocean surface parts were written to local file system to eliminate overhead of parallel writing to NFS from the total application performance. -The input data was read from Network file system (NFS) mounted on each node, as +The input data was read from Network File System (NFS) mounted on each node, as it is much smaller in size than the output data and does not cause big overhead. Platform configuration is presented in Table~\ref{tab:platform-configuration}. @@ -204,7 +204,16 @@ Table~\ref{tab:platform-configuration}. \end{tabular} \end{table} - +The first failure scenario (see Section~\ref{sec:failure-scenarios}) was +evaluated in the following experiment. At the beginning of the second +sequential application step all parallel applciation processes except one were +shutdown with a small delay to give principal kernel time to distribute its +subordinates between cluster nodes. The experiment was repeated 12 times with a +different surviving process each time. For each run total application running +time was measured and compared to each other. The result of the experiment is +the overhead of failure of a specific kernel in the hierarchy (the overhead of +recovering from failure of a principal kernel is different from the one of a +subordinate kernel). \section{Discussion}