commit 6cbd9920b3f6a0c112e99881932f818fa7ddb19e
parent f0c69d7316432e0e71f5e9547ea280a952234c7a
Author: Ivan Gankevich <igankevich@ya.ru>
Date: Tue, 21 Mar 2017 18:14:18 +0300
Describe results of the first experiment.
Diffstat:
1 file changed, 14 insertions(+), 0 deletions(-)
diff --git a/src/body.tex b/src/body.tex
@@ -227,6 +227,7 @@ configuration is presented in Table~\ref{tab:platform-configuration}.
HDD & ST3250310NS, 7200rpm \\
No. of nodes & 12 \\
No. of CPU cores per node & 8 \\
+ Interconnect & 100Mbit ethernet \\
\bottomrule
\end{tabular}
\end{table}
@@ -252,6 +253,19 @@ of the application to a large number of nodes.
\section{Results}
+The first experiment showed that in terms of performance there are three
+possible outcomes when all nodes except one fail. The first case is failure of
+all kernels except the principal and its first subordinate. There is no
+communication with other nodes to find the survivor, so it takes the least time
+to recover from the failure. The second case is failure of all kernels except
+any subordinate kernel other than the first one. Here the survivor try to
+communicate with all subordinates that were created before the survivor, so the
+overhead of recovery is larger. The third case is failure of all kernels except
+the last subordinate. Here performance is different only in the test
+environment, because this is the node where output data and logs are gathered.
+So, the overhead is smaller, because there is no communication over the network
+for storing output.
+
\begin{figure}
\centering
\includegraphics{test-1}