arma-thesis

git clone https://git.igankevich.com/arma-thesis.git
Log | Files | Refs | LICENSE

commit 330f0659ebd50513558bde664a4275d348553ffa
parent d49441d8a9cdadddd0483ae68a1e1faa69ba5514
Author: Ivan Gankevich <igankevich@ya.ru>
Date:   Fri, 20 Jan 2017 14:52:25 +0300

Translate governing principles. Copy overview.

Diffstat:
phd-diss-ru.org | 49++++++++++++++++++++++++++++++++++++++++++++++++-
phd-diss.org | 142+++++++++++++++++++++++++++++++++++++++++++++++++++----------------------------
2 files changed, 140 insertions(+), 51 deletions(-)

diff --git a/phd-diss-ru.org b/phd-diss-ru.org @@ -1780,7 +1780,6 @@ eqref:eq:old-sol-2d, сопоставимы для волн малых ампл * Высокопроизводительный программный комплекс для моделирования морского волнения ** Модель вычислений -*** Основополагающие принципы модели *** Отображение алгоритма генерации взволнованной поверхности на вычислительную модель Модель АРСС реализована в программном комплексе, работающем по принципу вычислительного конвейера, в котором каждое звено применяет фильтр к выходным @@ -1935,6 +1934,54 @@ cite:malewicz2010pregel,seo2010hama. Преимущество конвейера Это минимизирует время простоя процессора и других устройств компьютера и повышает общую пропускную способность кластера. +*** Основополагающие принципы модели +Модель конвейера обработки данных строится по следующим принципам, следование +которым обеспечивает максимальную эффективность программы. +- В модели отсутствует понятие сообщения, роль сообщения выполняет сам + управляющий объект: он может быть передан по сети на другой узел и получить + доступ к полям любого другого управляющего объекта на этом узле. Гарантировать + существование такого объекта может только логика программы. +- Управляющий объект представляет собой /сопрограмму/, которая при вызове + отправляется в пул задач и затем выполняется планировщиком асинхронно. Тело + сопрограммы может содержать произвольное количество вызовов других сопрограмм. + Каждый вызов отправляет соответствующую сопрограммы в пул задач и сразу + завершается. Управляющие объекты, находящиеся в пуле, могут быть обработаны в + любом порядке; это используется планировщиком для извлечения максимального + параллелизма из вычислительной системы путем распределения объектов из пула + между доступными узлами кластера и ядрами процессора. +- Асинхронное выполнение управляющих объектов позволяет избежать явной + синхронизации после вызова сопрограммы (отправки объекта в очередь); + планировщик возвращает поток управления в родительский управляющий объект + каждый раз когда какой-либо его дочерний объект завершает выполнение. Такое + /взаимодействие/ превращает сопрограмму в некоторого рода обработчик событий, + в котором событием является дочерний объект, а обработчиком --- родительский. +- Механизм обработки событий может быть использован для передачи потока + управления не только в родительский, но и любой другой объект, адрес которого + известен; взаимодействие с объектами, осуществляемое вразрез с иерархией + сильно усложняет поток управления и стек вызовов сопрограмм теряет древовидную + структуру. Только логика программы может гарантировать существование в памяти + машины двух взаимодействующих объектов. Один из способов обеспечения такой + гарантии --- взаимодействие между дочерними объектами одного родительского + объекта, которое типично для параллельной пакетной обработки данных. Поскольку + такого рода взаимодействие можно осуществить в рамках иерархии через + родительский объект, его можно считать оптимизацией, позволяющей избавиться от + накладных расходов при передаче данных через промежуточный узел. Для программ, + логика которых полностью основана на событиях (например, для серверов и + программ с графическим интерфейсом), ситуация иная, и такого рода + взаимодействия являются основными. +- Также, взаимодействия, идущие вразрез с иерархией, усложняют алгоритмы + обеспечения отказоустойчивости, применяемый для кластеров машин. Гарантировать + нахождение определенного управляющего объекта в памяти соседнего узла + невозможно, поскольку узел может выйти из строя прямо во время выполнения + соответствующей сопрограммы. В результате, при аварийном завершении + управляющего объекта, все его дочерние объекты должны быть выполнены заново. + Это подталкивает программиста к созданию + - глубоких иерархий сильно связанных управляющих объектов (которые + взаимодействуют между собой на одном уровне иерархии), уменьшающих накладные + расходы на повторное выполнение сопрограмм; + - толстых иерархий слабо связанных управляющих объектов, обеспечивающих + максимальную степень параллелизма. + ** Реализация для систем с общей памятью (SMP) *** Алгоритм распределения нагрузки Наиболее простым и широко применяемым подходом к распределению нагрузки на diff --git a/phd-diss.org b/phd-diss.org @@ -1713,55 +1713,6 @@ work. * High-performance software implementation of ocean wave simulation ** Computational model -*** Governing principles -- There are no messages in the model, a kernel is itself a message that can - be sent over network to another node and directly access any kernel on the - local node. It is responsibility of a programmer to ensure that such kernel - exist. -- A kernel is best viewed as a cooperative routine, which is submitted to - execution queue upon the call and is executed asynchronously by system - scheduler. There can be any number of calls to other subroutines inside - routine body. Every call submits corresponding subroutine to execution - queue and returns immediately. Kernels in the queue can be executed in any - order; this fact is used by system scheduler to exploit parallelism - offered by the computer by distributed kernels from the queue across - available cluster nodes and processor cores. -- Asynchronous execution prevents the use of explicit synchronisation after - the call to subroutine is made; system scheduler returns control flow to - the routine each time one of its subroutine returns. Such *cooperation* - transforms each routine which calls subroutines into event handler, where - each event is a subroutine and the handler is the routine that called - them. In many batch processing programmes control flow enters each routine - which calls subroutines at least two times: the first time it occurs upon - the call to the routine and the second time happens when control flow - returns to the caller after completion of a subroutine. -- The routine may communicate with any number of local kernels, addresses of - which it knows; communication with routines which are not adjacent in the - call stack complexifies control flow and call stack looses its tree shape. - It is responsibility of a programmer to ensure that communicating kernels - are present in memory. One way to ensure this is to perform communication - between subroutines which are called from the same routine. Incidentally, - it is the usual way of writing batch parallel programmes: each such - communication creates a cycle in the call stack graph, and a cycle between - different (possibly non-adjacent) layers of kernel hierarchy is redundant, - because there are other edges that can be used instead. The situation may - be different when the programme is interactive or event-based. -- The other disadvantage of communication which does no occur along - hierarchical links is that it complexifies resiliency algorithms when - executed across cluster network. Since it is difficult to ensure that - a kernel resides in memory of a neighbour node, because a node may fail in - the middle of its execution. Thus, upon failure of a node all of the - subroutines which are called from the same routine must be restarted. This - encourage a programmer to construct - - deep hierarchies of tightly-coupled kernels (which require - communication on the same level of hierarchy) to reduce overhead of - recomputation, - - fat hierarchies of loosely-coupled kernels. - Deep hierarchy is not only requirement of technology, it helps optimise - communication of cluster nodes limiting it to adjacent nodes. -- No explicit synchronisation. -- Local communications between adjacent nodes in the hierarchy. - *** Mapping wavy surface generation algorithm on computational model #+name: fig:pipeline #+begin_src dot :exports results :file build/pipeline.pdf @@ -1869,6 +1820,98 @@ digraph { #+RESULTS: fig:pipeline [[file:build/pipeline.pdf]] +*** Computational model overview +The core provides classes and methods to simplify development of distributed +applications and middleware. The main focus of this package is to make +distributed application resilient to failures, i.e. make it fault tolerant and +highly available, and do it transparently to a programmer. All classes are +divided into two layers: the lower layer consists of classes for single node +applications, and the upper layer consists of classes for applications that run +on an arbitrary number of nodes. There are two kinds of tightly coupled entities +in the package --- kernels and pipelines --- which are used together to compose +a programme. + +Kernels implement control flow logic in theirs act and react methods and store +the state of the current control flow branch. Both logic and state are +implemented by a programmer. In act method some function is either sequentially +computed or decomposed into subtasks (represented by another set of kernels) +which are subsequently sent to a pipeline. In react method subordinate kernels +that returned from the pipeline are processed by their parent. Calls to act and +react methods are asynchronous and are made within threads spawned by a +pipeline. For each kernel act is called only once, and for multiple kernels the +calls are done in parallel to each other, whereas react method is called once +for each subordinate kernel, and all the calls are made in the same thread to +prevent race conditions (for different parent kernels different threads may be +used). + +Pipelines implement asynchronous calls to act and react, and try to make as many +parallel calls as possible considering concurrency of the platform (no. of cores +per node and no. of nodes in a cluster). A pipeline consists of a kernel pool, +which contains all the subordinate kernels sent by their parents, and a thread +pool that processes kernels in accordance with rules outlined in the previous +paragraph. A separate pipeline exists for each compute device: There are +pipelines for parallel processing, schedule-based processing (periodic and +delayed tasks), and a proxy pipeline for processing of kernels on other cluster +nodes. + +In principle, kernels and pipelines machinery reflect the one of procedures and +call stacks, with the advantage that kernel methods are called asynchronously +and in parallel to each other. The stack, which ordinarily stores local +variables, is modelled by fields of a kernel. The sequence of processor +instructions before nested procedure calls is modelled by act method, and +sequence of processor instructions after the calls is modelled by react method. +The procedure calls themselves are modelled by constructing and sending +subordinate kernels to the pipeline. Two methods are necessary because calls are +asynchronous and one must wait before subordinate kernels complete their work. +Pipelines allow circumventing active wait, and call correct kernel methods by +analysing their internal state. + +*** Governing principles +- There are no messages in the model, a kernel is itself a message that can be + sent over network to another node and directly access any kernel on the local + node. It is responsibility of a programmer to ensure that such kernel exist. +- A kernel is a /cooperative routine/, which is submitted to task pool upon the + call and is executed asynchronously by a scheduler. There can be any number of + calls to other subroutines inside routine body. Every call submits + corresponding subroutine to task pool and returns immediately. Kernels in the + pool can be executed in any order; this fact is used by a scheduler to exploit + parallelism offered by the computer by distributing kernels from the pool + across available cluster nodes and processor cores. +- Asynchronous execution prevents the use of explicit synchronisation after the + call to subroutine is made; system scheduler returns control flow to the + routine each time one of its subroutine returns. Such *cooperation* transforms + each routine which calls subroutines into event handler, where each event is a + subroutine and the handler is the routine that called them. In many batch + processing programmes control flow enters each routine which calls subroutines + at least twice: the first time it occurs upon the call to the routine and the + second time happens when control flow returns to the caller after completion + of a subroutine. +- The routine may communicate with any number of local kernels, addresses of + which it knows; communication with routines which are not adjacent in the call + stack complexifies control flow and call stack looses its tree shape. It is + responsibility of a programmer to ensure that communicating kernels are + present in memory. One way to ensure this is to perform communication between + subroutines which are called from the same routine. Incidentally, it is the + usual way of writing batch parallel programmes: each such communication + creates a cycle in the call stack graph, and a cycle between different + (possibly non-adjacent) layers of kernel hierarchy is redundant, because there + are other edges that can be used instead. The situation may be different when + the programme is interactive or event-based. +- The other disadvantage of communication which does not occur along + hierarchical links is that it complexifies resiliency algorithms when executed + across cluster network. Since it is difficult to ensure that a kernel resides + in memory of a neighbour node, because a node may fail in the middle of its + execution. Thus, upon failure of a node all of the subroutines which are + called from the same routine must be restarted. This encourages a programmer + to construct + - deep hierarchies of tightly-coupled kernels (which require communication on + the same level of hierarchy) to reduce overhead of recomputation, + - fat hierarchies of loosely-coupled kernels. + Deep hierarchy is not only requirement of technology, it helps optimise + communication of cluster nodes limiting it to adjacent nodes. +- No explicit synchronisation. +- Local communications between adjacent nodes in the hierarchy. + ** SMP implementation *** Load balancing algorithm *** Evaluation @@ -1905,7 +1948,6 @@ arma.plot_factory_vs_openmp_overlap( #+RESULTS: fig:factory-overlap [[file:build/factory-vs-openmp-overlap.pdf]] - ** MPP implementation *** Cluster node discovery algorithm *** Fail over algorithm