commit 5eb8dd923a98be56c2137899ce529b863c02add9
parent 947cec5dbd92d004c979fff1687b38ab9e5b7e96
Author: Ivan Gankevich <igankevich@ya.ru>
Date: Mon, 27 Feb 2017 18:28:09 +0300
Add comparison of Factory to PBS.
Diffstat:
2 files changed, 85 insertions(+), 8 deletions(-)
diff --git a/phd-diss-ru.org b/phd-diss-ru.org
@@ -3141,6 +3141,9 @@ Keepalived\nbsp{}cite:cassen2002keepalived.
сбой\nbsp{}cite:fischer1985impossibility и невозможность надежной передачи
данных в случае сбоя одного из узлов\nbsp{}cite:fekete1993impossibility.
+** Сравнение предложенного подхода с современными подходами
+
+
* Заключение
**** Итоги исследования.
В изучении возможностей математического аппарата для имитационного моделирования
diff --git a/phd-diss.org b/phd-diss.org
@@ -2275,7 +2275,6 @@ performance of applications that read or write large volumes of data to disk,
but may be used in other cases too. The main idea of the algorithm is to
classify the load and find the suitable device to route the load to. So, any
devices other than disks may be used as well.
-
** MPP implementation
*** Cluster node discovery algorithm
:PROPERTIES:
@@ -2450,13 +2449,14 @@ principal only, and inefficient scan of all network by each node does not occur.
**** Evaluation results.
Test platform consisted of several multi-core nodes, on top of which virtual
clusters with varying number of nodes were deployed using Linux network
-namespaces. Similar approach is used in\nbsp{}cite:lantz2010network,handigol2012reproducible,heller2013reproducible where the
-authors reproduce various real-world experiments using virtual clusters and
-compare results to physical ones. The advantage of it is that the tests can be
-performed on a large virtual cluster using relatively small number of physical
-nodes. This approach was used to evaluate node discovery algorithm, because the
-algorithm has low requirement for system resources (processor time and network
-throughput).
+namespaces. Similar approach is used
+in\nbsp{}cite:lantz2010network,handigol2012reproducible,heller2013reproducible
+where the authors reproduce various real-world experiments using virtual
+clusters and compare results to physical ones. The advantage of it is that the
+tests can be performed on a large virtual cluster using relatively small number
+of physical nodes. This approach was used to evaluate node discovery algorithm,
+because the algorithm has low requirement for system resources (processor time
+and network throughput).
Performance of the algorithm was evaluated by measuring time needed to all nodes
of the cluster to discover each other. Each change of the hierarchy (as seen by
@@ -2938,6 +2938,80 @@ proved the impossibility of the distributed consensus with one faulty
process\nbsp{}cite:fischer1985impossibility and impossibility of reliable
communication in the presence of node
failures\nbsp{}cite:fekete1993impossibility.
+** Comparison of the proposed approach to the current approaches
+Current state-of-the-art approach to developing and running parallel programmes
+on the cluster is the use of message passing library (MPI) and job scheduler, and,
+although, this approach is highly efficient in terms of parallel execution, it
+is not flexible enough to accommodate dynamic load balancing and automatic
+fault-tolerance. Programmes written with MPI typically assume
+- equal load on each processor,
+- non-interruptible and reliable execution of the batch job,
+- constant number of parallel processes/threads throughout the execution which
+ is equal to the total number of processors.
+The first is not true for ocean wave simulation programme because AR model
+requires dynamic load balancing between compute devices to compute each part of
+the surface only when all dependent parts are computed. The last point is also
+false because for the sake of efficiency each part is written to a file
+asynchronously by a separate thread. The second point is not very important for
+the programme itself, but is generally not true for very large computer clusters
+in which node failures occur regularly. So, the idea of the proposed approach is
+to give parallel programmes more flexibility:
+- provide dynamic load balancing via pipelined execution of sequential,
+ internally parallel programme steps,
+- restart only affected processes upon node failure, and
+- execute the programme on as many compute nodes as are available in the
+ cluster.
+In this section advantages and disadvantages of this approach are discussed.
+
+In comparison to portable batch systems (PBS) the proposed approach uses
+lightweight control flow objects instead of heavy-weight parallel jobs to
+distribute the load on cluster nodes. First, this allows to have node object
+queues instead of several cluster-wide job queues. The granularity of control
+flow objects is much higher than the jobs, and, although, their execution time
+cannot be reliably predicted (as is execution time of batch jobs), objects from
+multiple parallel programmes can be dynamically distributed between the same set
+of cluster nodes, thus making the load more even. The disadvantage is that this
+requires more RAM to execute many programmes on the same set of nodes, and
+execution of each programme may be longer because of the shared node queues.
+Second, the proposed approach dynamic distribution of principal and subordinate
+node roles instead of static assignment to the particular physical nodes. So, by
+introducing execution of multiple parallel programmes on the same set of cluster
+node, the proposed approach may increase throughput of the cluster, but decrease
+performance of parallel programmes taken separately.
+
+In comparison to MPI the proposed approach uses lightweight control flow objects
+instead of heavy-weight processes to decompose the programme into parallel
+entities. First, this allows to determine the number of entities computed in
+parallel by the problem being solved, not the computer. A programme is
+encourages to create as many objects as needed, guided by the algorithm
+structure or restrictions on minimal size of data structures. In ocean wave
+simulation programme the minimal size of each wavy surface part depends on the
+number of coefficients along each dimension and at the same time the number of
+parts should be larger than the number of processors to make the load on each
+processor more even. Considering these limits the optimal part size is
+determined at runtime and is often not equal to the number of parallel
+processes. The disadvantage is that the more control flow objects there are in
+the programme, the more shared structures are copied to the same node with
+subordinate objects; this problem is solved by introducing another layer of
+hierarchy, which in turn adds another layer of complexity to the programme.
+Second, hierarchy of control flow objects together with hierarchy of nodes
+allows for automatic restart of failed objects on surviving nodes in an event of
+hardware failures. It is possible because the state of the programme execution
+is stored in each object and not in global variables like in MPI programme. By
+duplicating the state to a subordinate node, the system recomputes only failed
+objects instead of the whole programme. So, by introducing control flow objects
+instead of processes, the proposed approach may increase performance of a
+parallel programme via dynamic load balancing, but inhibit its scalability for a
+large number of nodes due to duplication of execution state.
+
+To summarise, the control flow object approach to writing and running parallel
+programmes on the cluster makes parallel programmes more flexible: it balances
+the decrease in the performance due to shared object queues with the increase
+due to dynamic load balancing, but requires more memory to achieve this due to
+using the same set of nodes to simultaneously execute all parallel programmes.
+There is no cluster-wide job queue, and the cluster acts as a unified computer
+system which makes best effort to execute distributed programmes without
+interruption.
* Conclusion
* Acknowledgements