arma-thesis

git clone https://git.igankevich.com/arma-thesis.git
Log | Files | Refs | LICENSE

commit 8a2545118d2fdaa8b5923f70ad35de621b5fc3b4
parent 337b35a34dbe3e9e1b744f7ab409113e385baef6
Author: Ivan Gankevich <igankevich@ya.ru>
Date:   Wed, 27 Sep 2017 12:27:12 +0300

Update description of the benchmark.

Diffstat:
arma-thesis.org | 57++++++++++++++++++++++++++++++++++++---------------------
1 file changed, 36 insertions(+), 21 deletions(-)

diff --git a/arma-thesis.org b/arma-thesis.org @@ -2902,30 +2902,44 @@ address range, each node connects to its principal only, and inefficient scan of the whole network by each node does not occur. **** Evaluation results. -Test platform consisted of several multi-core nodes, on top of which virtual -clusters with varying number of nodes were deployed using Linux network -namespaces. Similar approach is used +To benchmark performance of node discovery, several daemon processes were +launched on each physical cluster node, each listening on its own IP address. +The number of processes per physical core varied from 2 to 16. Each process was +bound to a particular physical core to reduce process migration overhead on the +benchmark. Tree hierarchy traversal algorithm has low requirements for system +resources (processor time and network throughput), so running multiple processes +per physical core is feasible, in contrast to HPC codes, where oversubscribing +generally leads to poor performance. Test platform configuration is shown in +table\nbsp{}[[tab-cluster]]. + +Similar approach was used in in\nbsp{}cite:lantz2010network,handigol2012reproducible,heller2013reproducible where the authors reproduce various real-world experiments using virtual -clusters and compare results to physical ones. The advantage of it is that the -tests can be performed on a large virtual cluster using relatively small number -of physical nodes. This approach was used to evaluate node discovery algorithm, -because the algorithm has low requirement for system resources (processor time -and network throughput). - -Performance of the algorithm was evaluated by measuring time needed to all nodes -of the cluster to discover each other. Each change of the hierarchy (as seen by -each node) was written to a file and after 30 seconds all the processes (each of -which models cluster node) were forcibly terminated. Test runs showed that -running more than 100 virtual nodes on one physical node simultaneously warp the -results, thus additional physical nodes, each of which run 100 virtual nodes, -were used for the experiment. The experiment showed that discovery of 100--400 -nodes each other takes 1.5 seconds on average, and the value increases only -slightly with increase in the number of nodes (see +clusters, based on Linux namespaces, and compare results to physical ones. The +advantage of it is that the tests can be performed on a large virtual cluster +using relatively small number of physical nodes. The advantage of our approach, +which does not use Linux namespaces, is that it is more lightweight and larger +number of daemon processes are be benchmarked on the same physical cluster. + +Node discovery performance was evaluated by measuring time needed for all nodes +of the cluster to discover each other, i.e. the time needed for the tree +hierarchy of nodes to reach stable state. Each change of the hierarchy (as seen +by each node) was written to a log file and after 30 seconds all daemon +processes (each of which models cluster node) were forcibly terminated. Each new +daemon process was launched with a 100ms delay to ensure that master nodes +are always come online before slave nodes and hierarchy does not change randomly +as a result of different start time of each process. As a result, in ideal case +adding a daemon process to the hierarchy adds 100ms to the total discovery time. + +Test runs showed that running more than ??? virtual nodes on one physical node +simultaneously warp the results, thus additional physical nodes, each of which +run ??? virtual nodes, were used for the experiment. The experiment showed that +discovery of 100--400 nodes each other takes 1.5 seconds on average, and the +value increases only slightly with increase in the number of nodes (see fig.\nbsp{}[[fig-bootstrap-local]]). #+name: fig-bootstrap-local -#+caption: Time to discover all nodes of the cluster in depending on number of nodes. +#+caption: Time to discover all daemon processes running on the cluster depending on the number of daemon processes. [[file:graphics/discovery.eps]] **** Discussion. @@ -3285,8 +3299,9 @@ time after the programme start which is equivalent approximately to \(1/3\) of the total run time without failures on a single node. The application immediately recognised node as offline, because the corresponding connection was closed; in real-world scenario, however, the failure is detected after a -configurable time-out. All relevant parameters are summarised in table\nbsp{}[[tab-benchmark]]. The results of these runs were compared to the run without node -failures (fig.\nbsp{}[[fig-benchmark]] and\nbsp{}[[fig-slowdown]]). +configurable time-out. All relevant parameters are summarised in +table\nbsp{}[[tab-benchmark]]. The results of these runs were compared to the run +without node failures (fig.\nbsp{}[[fig-benchmark]] and\nbsp{}[[fig-slowdown]]). There is considerable difference in overall application performance for different types of failures. Graphs\nbsp{}2 and\nbsp{}3 in