git clone https://git.igankevich.com/hpcs-17-subord.git
Log | Files | Refs

commit 0eb4ded2937570066d7a5c97b6dea929372e76df
parent fb4da6ebb5619edadbe9aee53e65ef6dbb3f306f
Author: Ivan Gankevich <igankevich@ya.ru>
Date:   Thu, 16 Feb 2017 11:23:51 +0300

Add text from the recent report. (We need to rewrite the text to be nice.)

main.tex | 6+++---
src/body.tex | 94+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
src/tail.tex | 6++++++
3 files changed, 103 insertions(+), 3 deletions(-)

diff --git a/main.tex b/main.tex @@ -13,16 +13,16 @@ \title{TITLE} \author{% - \IEEEauthorblockN{Yuri Tipikin, Ivan Gankevich, Vladimir Korkhov} + \IEEEauthorblockN{Yuri Tipikin \quad Ivan Gankevich \quad Vladimir Korkhov} \IEEEauthorblockA{% Dept. of Computer Modeling and Multiprocessor Systems\\ Saint Petersburg State University\\ Saint-Petersburg, Russia\\ - Email: y.tipikin@spbu.ru, i.gankevich@spbu.ru% + Email: y.tipikin@spbu.ru, i.gankevich@spbu.ru, v.korkhov@spbu.ru% }% }% -%\IEEEspecialpapernotice{(Poster Paper)} +\IEEEspecialpapernotice{(Poster Paper)} diff --git a/src/body.tex b/src/body.tex @@ -1,4 +1,98 @@ +\section{Computational kernel hierarchy} + +The core provides classes and methods to simplify development of distributed +applications and middleware. The main focus of this package is to make +distributed application resilient to failures, i.e. make it fault tolerant and +highly available, and do it transparently to a programmer. All classes are +divided into two layers: the lower layer consists of classes for single node +applications, and the upper layer consists of classes for applications that run +on an arbitrary number of nodes. There are two kinds of tightly coupled +entities in the package~--- kernels and pipelines~--- which are used together +to compose a programme. + +Kernels implement control flow logic in theirs act and react methods and store +the state of the current control flow branch. Both logic and state are +implemented by a programmer. In act method some function is either sequentially +computed or decomposed into subtasks (represented by another set of kernels) +which are subsequently sent to a pipeline. In react method subordinate kernels +that returned from the pipeline are processed by their parent. Calls to act and +react methods are asynchronous and are made within threads spawned by a +pipeline. For each kernel act is called only once, and for multiple kernels the +calls are done in parallel to each other, whereas react method is called once +for each subordinate kernel, and all the calls are made in the same thread to +prevent race conditions (for different parent kernels different threads may be +used). + +Pipelines implement asynchronous calls to act and react, and try to make as +many parallel calls as possible considering concurrency of the platform (no. of +cores per node and no. of nodes in a cluster). A pipeline consists of a kernel +pool, which contains all the subordinate kernels sent by their parents, and a +thread pool that processes kernels in accordance with rules outlined in the +previous paragraph. A separate pipeline exists for each compute device: There +are pipelines for parallel processing, schedule-based processing (periodic and +delayed tasks), and a proxy pipeline for processing of kernels on other cluster +nodes. + +In principle, kernels and pipelines machinery reflect the one of procedures and +call stacks, with the advantage that kernel methods are called asynchronously +and in parallel to each other. The stack, which ordinarily stores local +variables, is modelled by fields of a kernel. The sequence of processor +instructions before nested procedure calls is modelled by act method, and +sequence of processor instructions after the calls is modelled by react method. +The procedure calls themselves are modelled by constructing and sending +subordinate kernels to the pipeline. Two methods are necessary because calls +are asynchronous and one must wait before subordinate kernels complete their +work. Pipelines allow circumventing active wait, and call correct kernel +methods by analysing their internal state. + +\section{Cluster scheduler architecture} + +\subsection{Overview} +This scheduler has layered architecture: +\begin{itemize} + + \item \textit{Physical layer.} Consists of nodes and direct/routed network + links. + + \item \textit{Daemon layer.} Consists of daemon processes residing on + cluster nodes and hierarchical (master/slave) links between them. + + \item \textit{Kernel layer.} Consists of kernels and hierarchical + (parent/child) links between them. + +\end{itemize} + +Master and slave roles are dynamically assigned to daemon processes, any +physical cluster node may become master or slave. Dynamic reassignment uses +leader election algorithm that does not require periodic broadcasting of +messages, and the role is derived from node's IP address. Detailed explanation +of the algorithm is provided in [hpcs-2015-paper]. + +\subseciton{Fault tolerance and high availability} + +The scheduler has fault tolerance and high availability built into its +low-level core API. Every failed kernel is restarted on healthy node or on its +parent node, however, failure is detected only for kernels that are sent from +one node to another (local kernels are not considered). High availability is +provided by replicating master kernel to a subordinate node. When any of the +replicas fails, another one is used in place. Detailed explanation of the fail +over algorithm is provided in Section~\ref{sec:failure-scenoarios}. + +\subsection{Security} + +Scheduler driver is able to communicate with scheduler daemons in local area +network. Inter-daemon messaging is not encrypted or signed in any way, assuming +that local area network is secure. There is also no protection from Internet +``noise''. Submission of the task to a remote cluster can be done via SSH +(Secure Shell) connection/tunnel which is de facto standard way of +communication between Linux/UNIX servers. So, scheduler security is based on +the assumption that it is deployed in secure local area network. Every job is +run from the same user, as there is no portable way to switch process owner in +Java. + + \section{Failure scenarios} +\label{sec:failure-scenoarios} \begin{itemize} \item Failure of at most one node. diff --git a/src/tail.tex b/src/tail.tex @@ -1,2 +1,8 @@ \section{Conclusion} + \section*{Acknowledgment} +The research was carried out using computational resources of Resource Centre +``Computational Centre of Saint Petersburg State University'' (\mbox{T-EDGE96} +\mbox{HPC-0011828-001}) within frameworks of grants of Russian Foundation for +Basic Research (projects no.~\mbox{16-07-01111}, \mbox{16-07-00886}, +\mbox{16-07-01113}).