hpcs-17-subord

git clone https://git.igankevich.com/hpcs-17-subord.git
Log | Files | Refs

commit 587fe910ffb26b2660be4d05dcde39b8e3d7c0db
parent 89ac3e3eab52ed93e86a271db9ff1ac4cf9494f9
Author: Ivan Gankevich <igankevich@ya.ru>
Date:   Wed, 22 Mar 2017 10:53:50 +0300

Clarify experiment description.

Diffstat:
src/body.tex | 31++++++++++++++-----------------
1 file changed, 14 insertions(+), 17 deletions(-)

diff --git a/src/body.tex b/src/body.tex @@ -215,7 +215,11 @@ All experiments were run on physical computer cluster consisting of 12 nodes. Wavy ocean surface parts were written to Network File System (NFS) which is located on a dedicated server. The input data was read from NFS mounted on each node, as it is small in size and does not cause big overhead. Platform -configuration is presented in Table~\ref{tab:platform-configuration}. +configuration is presented in Table~\ref{tab:platform-configuration}. A dry run +of each experiment~--- a run in which all expensive computations (wavy surface +generation and coefficient computation) were disabled, but memory allocations +and communication between processes were retained~--- was performed on the +virtual cluster. \begin{table} \centering @@ -246,21 +250,14 @@ the hierarchy, which should be different for principal and subordinate kernel. In the second experiment we compared the time to generate ocean wavy surface without process failures and with/without failure handling code in the -programme. This test was repeated for different number of cluster nodes. Apart -from physical cluster the test was run on virtual cluster with a large number -of nodes all launched on the same physical node. Since only one physical node -is used for the virtual cluster, only a dry run of the programme was performed: -all expensive computations (wavy surface generation and coefficient -computation) were disabled to reduce the load on the node, but memory -allocations and communication between processes were retained. The purpose of -the experiment is to investigate how failure handling overhead affects -scalability of the application to a large number of nodes. +programme. This test was repeated for different number of cluster nodes. The +purpose of the experiment is to investigate how failure handling overhead +affects scalability of the application to a large number of nodes. In the final experiment we benchmarked overhead of the multiple node failure handling code by instrumenting it with calls to time measuring routines. For this experiment all logging and output was disabled to exclude its time from the -measurements. A dry run was performed on virtual cluster and real run on the -physical cluster. The purpose of the experiment is to complement results of the +measurements. The purpose of the experiment is to complement results of the previous one with precisely measured overhead of multiple node failure handling code. @@ -294,7 +291,7 @@ executed on one of the remaining nodes. nodes.\label{fig:test-1}} \end{figure} - +% TODO insert virtual version of the first experiment \begin{figure} \centering @@ -338,10 +335,10 @@ executed on one of the remaining nodes. \begin{figure} \centering \includegraphics{test-3-dryrun-virt-overhead-ndebug-226} - \caption{Application running time with failure handling code for - different number of virtual cluster nodes (dry - run, only overhead was measured, no debug - output, cluster 226).\label{fig:test-3-dryrun-virt-overhead-ndebug-226}} + \caption{Application running time with failure handling code for different + number of virtual cluster nodes (dry run, only overhead was measured, no + debug output, cluster + 226).\label{fig:test-3-dryrun-virt-overhead-ndebug-226}} \end{figure} \section{Discussion}