results.tex (6336B)
1 \section{RESULTS} 2 3 Factory framework is evaluated on physical cluster (Table~\ref{tab:cluster}) on 4 the example of hydrodynamics HPC application which was developed 5 in~\cite{autoreg-stab,autoreg2011csit,autoreg1,autoreg2}. This programme 6 generates wavy ocean surface using ARMA model, its output is a set of files 7 representing different parts of the realisation. From a computer scientist point 8 of view the application consists of a series of filters, each applying to the 9 result of the previous one. Some of the filters are parallel, so the programme 10 is written as a sequence of big steps and some steps are made internally 11 parallel to get better performance. In the programme only the most 12 compute-intensive step (the surface generation) is executed in parallel across 13 all cluster nodes, and other steps are executed in parallel across all cores of 14 the master node. 15 16 \begin{table} 17 \centering 18 \caption{Test platform configuration.} 19 \begin{tabular}{ll} 20 \toprule 21 CPU & Intel Xeon E5440, 2.83GHz \\ 22 RAM & 4Gb \\ 23 HDD & ST3250310NS, 7200rpm \\ 24 No. of nodes & 12 \\ 25 No. of CPU cores per node & 8 \\ 26 \bottomrule 27 \end{tabular} 28 \label{tab:cluster} 29 \end{table} 30 31 The application was rewritten for the new version of the framework which 32 required only slight modifications to handle failure of a node with the first 33 kernel: The kernel was flagged so that the framework makes a replica and sends 34 it to some subordinate node. There were no additional code changes other than 35 modifying some parts to match the new API. So, the tree hierarchy of kernels is 36 mostly non-intrusive model for providing fault tolerance which demands explicit 37 marking of replicated kernels. 38 39 In a series of experiments we benchmarked performance of the new version of the 40 application in the presence of different types of failures (numbers correspond 41 to the graphs in Figure~\ref{fig:benchmark}): 42 \begin{enumerate} 43 \item no failures, 44 \item failure of a slave node (a node where a part of wavy surface is 45 generated), 46 \item failure of a master node (a node where the first kernel is run), 47 \item failure of a backup node (a node where a copy of the first kernel is 48 stored). 49 \end{enumerate} 50 A tree hierarchy with fan-out value of 64 was chosen to make all cluster nodes 51 connect directly to the first one. In each run the first kernel was launched on 52 a different node to make mapping of kernel hierarchy to the tree hierarchy 53 optimal. A victim node was made offline after a fixed amount of time after the 54 programme start which is equivalent approximately to $1/3$ of the total run time 55 without failures on a single node. All relevant parameters are summarised in 56 Table~\ref{tab:benchmark} (here ``root'' and ``leaf'' refer to a node in the 57 tree hierarchy). The results of these runs were compared to the run without node 58 failures (Figures~\ref{fig:benchmark}-\ref{fig:slowdown}). 59 60 There is considerable difference in net performance for different types of 61 failures. Graphs 2 and 3 in Figure~\ref{fig:benchmark} show that performance in 62 case of master or slave node failure is the same. In case of master node failure 63 a backup node stores a copy of the first kernel and uses this copy when it fails 64 to connect to the master node. In case of slave node failure, the master node 65 redistributes the load across remaining slave nodes. In both cases execution 66 state is not lost and no time is spent to restore it, that is why performance is 67 the same. Graph 4 in Figure~\ref{fig:benchmark} shows that performance in case 68 of a backup node failure is much lower. It happens because master node stores 69 only the current step of the computation plus some additional fixed amount of 70 data, whereas a backup node not only stores the copy of this information but 71 executes this step in parallel with other subordinate nodes. So, when a backup 72 node fails, the master node executes the whole step once again on arbitrarily 73 chosen healthy node. 74 75 \begin{table} 76 \centering 77 \caption{Benchmark parameters.} 78 \begin{tabular}{llll} 79 \toprule 80 Experiment no. & Master node & Victim node & Time to offline, s \\ 81 \midrule 82 1 & root & & \\ 83 2 & root & leaf & 10 \\ 84 3 & leaf & leaf & 10 \\ 85 4 & leaf & root & 10 \\ 86 \bottomrule 87 \end{tabular} 88 \label{tab:benchmark} 89 \end{table} 90 91 Finally, to measure how much time is lost due to a failure we divide the total 92 execution time with a failure by the total execution time without the failure 93 but with the number of nodes minus one. The results for this calculation are 94 obtained from the same benchmark and are presented in Figure~\ref{fig:slowdown}. 95 The difference in performance in case of master and slave node failures lies 96 within 5\% margin, and in case of backup node failure within 50\% margin for the 97 number of node less than~6\footnote{Measuring this margin for higher number of 98 nodes does not make sense since time before failure is greater than total 99 execution time with these numbers of nodes, and programme's execution finishes 100 before a failure occurs.}. Increase in execution time of 50\% is more than 101 $1/3$ of execution time after which a failure occurs, but backup node failure 102 need some time to be discovered: they are detected only when subordinate kernel 103 carrying the copy of the first kernel finishes its execution and tries to reach 104 its parent. Instant detection requires abrupt stopping of the subordinate kernel 105 which may be undesirable for programmes with complicated logic. 106 107 \begin{figure} 108 \centering 109 \includegraphics{factory-3000} 110 \caption{Performance of hydrodynamics HPC application in the presence of node failures.} 111 \label{fig:benchmark} 112 \end{figure} 113 114 To summarise, the benchmark showed that \emph{no matter a master or a slave node 115 fails, the resulting performance roughly equals to the one without failures 116 with the number of nodes minus one}, however, when a backup node fails 117 performance penalty is much higher. 118 119 \begin{figure} 120 \centering 121 \includegraphics{slowdown-3000} 122 \caption{Slowdown of the hydrodynamics HPC application in the presence of different types of node failures compared to execution without failures but with the number of nodes minus one.} 123 \label{fig:slowdown} 124 \end{figure}