git clone https://git.igankevich.com/hpcs-17-subord.git
Log | Files | Refs

commit 6cbd9920b3f6a0c112e99881932f818fa7ddb19e
parent f0c69d7316432e0e71f5e9547ea280a952234c7a
Author: Ivan Gankevich <igankevich@ya.ru>
Date:   Tue, 21 Mar 2017 18:14:18 +0300

Describe results of the first experiment.

src/body.tex | 14++++++++++++++
1 file changed, 14 insertions(+), 0 deletions(-)

diff --git a/src/body.tex b/src/body.tex @@ -227,6 +227,7 @@ configuration is presented in Table~\ref{tab:platform-configuration}. HDD & ST3250310NS, 7200rpm \\ No. of nodes & 12 \\ No. of CPU cores per node & 8 \\ + Interconnect & 100Mbit ethernet \\ \bottomrule \end{tabular} \end{table} @@ -252,6 +253,19 @@ of the application to a large number of nodes. \section{Results} +The first experiment showed that in terms of performance there are three +possible outcomes when all nodes except one fail. The first case is failure of +all kernels except the principal and its first subordinate. There is no +communication with other nodes to find the survivor, so it takes the least time +to recover from the failure. The second case is failure of all kernels except +any subordinate kernel other than the first one. Here the survivor try to +communicate with all subordinates that were created before the survivor, so the +overhead of recovery is larger. The third case is failure of all kernels except +the last subordinate. Here performance is different only in the test +environment, because this is the node where output data and logs are gathered. +So, the overhead is smaller, because there is no communication over the network +for storing output. + \begin{figure} \centering \includegraphics{test-1}