iccsa-16-factory-extended

git clone https://git.igankevich.com/iccsa-16-factory-extended.git
Log | Files | Refs

commit 6db2e8a6fd6b39b9ca34d2ec9b0bc510ace88bb2
parent 316169750284c7a011268d3e7e4eb4d3d6335fbb
Author: Vladimir Korkhov <vkorkhov@gmail.com>
Date:   Fri,  3 Feb 2017 22:39:12 +0300

More monir changes

Diffstat:
src/abstract.tex | 4++--
src/intro.tex | 14+++++++-------
src/sections.tex | 8++++----
3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/src/abstract.tex b/src/abstract.tex @@ -4,12 +4,12 @@ of big data processing technologies. Although failure of a master node can take down the whole data processing pipeline, this is considered either improbable or too difficult to encounter. The aim of the studies reported here is to propose rather simple technique to deal with master-node failures. This technique is -based on temporary delegation of master role to one of the slave nodes and +based on temporary delegation of the master role to one of the slave nodes and transferring updated state back to the master when one step of computation is complete. That way the state is duplicated and computation can proceed to the next step regardless of a failure of a delegate or the master (but not both). We run benchmarks to show that a failure of a master is almost ``invisible'' to -other nodes, and failure of a delegate results in recomputation of only one step +other nodes, and a failure of a delegate results in recomputation of only one step of data processing pipeline. We believe that the technique can be used not only in Big Data processing but in other types of applications. \end{abstract} diff --git a/src/intro.tex b/src/intro.tex @@ -14,7 +14,7 @@ probability of a machine failure resulting in a whole system failure, they increase probability of a human error. From such point of view it seems more practical to implement master node fault -tolerance at application level, however, there is no generic implementation. +tolerance at the application level, however, there is no generic implementation. Most implementations are too tied to a particular application to become universally acceptable. We believe that this happens due to people's habit to think of a cluster as a collection of individual machines each of which can be @@ -46,15 +46,15 @@ Zookeeper service which itself uses dynamic role assignment to ensure its fault-tolerance~\citep{okorafor2012zookeeper}. So, the whole setup is complicated due to Hadoop scheduler lacking dynamic roles: if dynamic roles were available, Zookeeper would be redundant in this setup. Moreover, this setup does not -guarantee continuous operation of master node because standby server needs time +guarantee continuous operation of the master node because the standby server needs time to recover current state after a failure. -The same problem occurs in high-performance computing where master node of a job +The same problem occurs in high-performance computing where the master node of a job scheduler is the single point of failure. In~\citep{uhlemann2006joshua,engelmann2006symmetric} the authors use replication -to make the master node highly-available, but backup server role is assigned +to make the master node highly-available, but the backup server role is assigned statically and cannot be delegated to a healthy worker node. This solution is -closer to fully dynamic role assignment than high-availability solution for big +closer to fully dynamic role assignment than the high-availability solution for big data schedulers, because it does not involve using external service to store configuration which should also be highly-available, however, it is far from ideal solution where roles are completely decoupled from physical servers. @@ -79,7 +79,7 @@ This design decision simplifies management and interaction with a distributed system. From system administrator point of view it is much simpler to install the same software stack on each node than to manually configure master and slave nodes. Additionally, it is much easier to bootstrap new nodes into the cluster -and decommission old ones. From user point of view, it is much simpler to +and decommission old ones. From a user point of view, it is much simpler to provide web service high-availability and load-balancing when you have multiple backup nodes to connect to. @@ -90,7 +90,7 @@ is no general solution to this problem is that there is no generic programming environment to write and execute distributed programmes. The aim of this work is to propose such an environment and to describe its internal structure. -The programming model used in this work is partly based on well-known actor +The programming model used in this work is partly based on the well-known actor model of concurrent computation~\citep{agha1985actors,hewitt1973universal}. Our model borrows the concept of actor---an object that stores data and methods to process it; this object can react to external events by either changing its diff --git a/src/sections.tex b/src/sections.tex @@ -305,7 +305,7 @@ cluster nodes and each buoy's data processing is distributed across processor cores of a node. Processing begins with joining corresponding measurements for each spectrum variables into a tuple, then for each tuple frequency-directional spectrum is reconstructed and its variance is computed. Results are gradually -copied back to the machine where application was executed and when the +copied back to the machine where the application was executed and when the processing is complete the programme terminates. \begin{table} @@ -361,7 +361,7 @@ failures (Figure~\ref{fig:benchmark-bigdata}). The benchmark showed that only a backup node failure results in significant performance penalty, in all other cases the performance is roughly equals to the one without failures but with the number of nodes minus one. It happens because -a backup node not only stores the copy of the state of the current computation +the backup node not only stores the copy of the state of the current computation step but executes this step in parallel with other subordinate nodes. So, when a backup node fails, the master node executes the whole step once again on arbitrarily chosen healthy subordinate node. @@ -389,10 +389,10 @@ and backup nodes fail, there is no chance for an application to survive. In this case the state of the current computation step is lost, and the only way to restore it is to restart the application. -Computational kernels are means of abstraction that decouple distributed +Computational kernels are means of abstraction that decouple a distributed application from physical hardware: it does not matter how many nodes are online for an application to run successfully. Computational kernels eliminate the need -to allocate a physical backup node to make master node highly-available, with +to allocate a physical backup node to make the master node highly-available, with computational kernels approach any node can act as a backup one. Finally, computational kernels can handle subordinate node failures in a way that is transparent to a programmer.