arma-thesis

git clone https://git.igankevich.com/arma-thesis.git
Log | Files | Refs | LICENSE

commit 0d571bba049d0510562ad9133b5f56d22d3d49df
parent d5f4f99678fbe1b59b8b8100bd6727aeee08dcf6
Author: Ivan Gankevich <igankevich@ya.ru>
Date:   Mon, 27 Feb 2017 10:23:35 +0300

Replace "allow doing" with "allow to do".

Diffstat:
phd-diss.org | 65+++++++++++++++++++++++++++++++++--------------------------------
1 file changed, 33 insertions(+), 32 deletions(-)

diff --git a/phd-diss.org b/phd-diss.org @@ -328,7 +328,7 @@ was still much work to be done to make it useful in practice. 4. Finally, verify wavy surface integral characteristics to match the ones of real ocean waves. 5. In the final stage, develop software programme that implements ARMA model and - pressure calculation method, and allows running simulations on both shared + pressure calculation method, and allows to run simulations on both shared memory (SMP) and distributed memory (MPP) computer systems. **** Scientific novelty. @@ -351,7 +351,7 @@ software. 1. Since pressure field formula is derived for discrete wavy surface and without assumptions about wave amplitudes, it is applicable to any wavy surface of incompressible inviscid fluid (in particular, it is applicable to wavy - surface generated by LH model). This allows using pressure field formula + surface generated by LH model). This allows to use pressure field formula without being tied to ARMA model. 2. From computational point of view this formula is more efficient than the corresponding formula for LH model, because integrals in it are reduced to @@ -370,13 +370,13 @@ Software implementation of ARMA model and pressure field formula was created incrementally: a prototype written in high-level engineering language\nbsp{}cite:mathematica10,octave2015 was rewritten in lower level language (C++). Implementation of the same algorithm and formulae in languages of varying levels (which involves usage of different abstractions and language primitives) -allows correcting errors, which would left unnoticed otherwise. Wavy surface, +allows to correct errors, which would left unnoticed otherwise. Wavy surface, generated by ARMA model, as well as all input parameters (ACF, distribution of wave elevation etc.) were inspected via graphical means built into the programming language allowing visual control of programme correctness. **** Theses for the defence. -- Wind wave model which allows generating wavy surface realisations with large +- Wind wave model which allows to generate wavy surface realisations with large period and consisting of wave of arbitrary amplitudes; - Pressure field formulae derived for this model without assumptions of linear wave theory; @@ -528,7 +528,7 @@ is mostly a different problem. profiles. So, model verification includes distributions of various parameters of generated waves (lengths, heights, periods etc.). Multi-dimensionality of investigated model not only complexifies the task, but -also allows carrying out visual validation of generated wavy surface. It is the +also allows to carry out visual validation of generated wavy surface. It is the opportunity to visualise output of the programme that allowed to ensure that generated surface is compatible with real ocean surface, and is not abstract multi-dimensional stochastic process that is real only statistically. @@ -553,7 +553,7 @@ equation of motion is solely used to determine pressures for calculated velocity potential derivatives. The assumption of small amplitudes means the slow decay of wind wave coherence function, i.e. small change of local wave number in time and space compared to the wavy surface elevation (\(z\) coordinate). This -assumption allows calculating elevation \(z\) derivative as \(\zeta_z=k\zeta\), +assumption allows to calculate elevation \(z\) derivative as \(\zeta_z=k\zeta\), where \(k\) is wave number. In two-dimensional case the solution is written explicitly as \begin{align} @@ -596,7 +596,7 @@ arbitrary-amplitude waves. :CUSTOM_ID: linearisation :END: -LH model allows deriving an explicit formula for velocity field by linearising +LH model allows to derive an explicit formula for velocity field by linearising kinematic boundary condition. Velocity potential formula is written as \begin{equation*} \phi(x,y,z,t) = \sum_n \frac{c_n g}{\omega_n} @@ -841,7 +841,7 @@ process might increase model precision, which is one of the objectives of the future research. ** Modelling non-linearity of ocean waves -ARMA model allows modelling asymmetry of wave elevation distribution, i.e. +ARMA model allows to model asymmetry of wave elevation distribution, i.e. generate ocean waves, distribution of z-coordinate of which has non-nought kurtosis and asymmetry. Such distribution is inherent to real ocean waves\nbsp{}cite:longuet1963nonlinear. @@ -1106,7 +1106,7 @@ Check the validity of derived formulae by substituting \(\zeta(x,t)\) with known analytic formula for plain waves. Symbolic computation of Fourier transforms in this section were performed in Mathematica\nbsp{}cite:mathematica10. In the framework of linear wave theory assume that waves have small amplitude compared to their -lengths, which allows us simplifying initial system of equations +lengths, which allows us to simplify initial system of equations eqref:eq:problem-2d to \begin{align*} & \phi_{xx}+\phi_{zz}=0,\\ @@ -1482,7 +1482,7 @@ transform ACF; relative error without interpolation is \(10^{-5}\). In order to eliminate periodicity from generated wavy surface, it is imperative to use PRNG with sufficiently large period to generate white noise. Parallel Mersenne Twister\nbsp{}cite:matsumoto1998mersenne with a period of \(2^{19937}-1\) is -used as a generator in this work. It allows producing aperiodic ocean wavy +used as a generator in this work. It allows to produce aperiodic ocean wavy surface realisations in any practical usage scenarios. There is no guarantee that multiple Mersenne Twisters executed in parallel @@ -1551,7 +1551,7 @@ this approach does not work here, because applying inverse Fourier transform to this representation does not produce exponent, which severely warp resulting velocity field. In order to get unique analytic definition normalisation factor \(1/\Sinh{2\pi{u}{h}}\) (which is also included in formula for \(E(u)\)) may be -used. Despite the fact that normalisation allows obtaining adequate velocity +used. Despite the fact that normalisation allows to obtain adequate velocity potential field, numerical experiments show that there is little difference between this field and the one produced by formulae from linear wave theory, in which terms with \(\zeta\) are omitted. @@ -1667,7 +1667,7 @@ for (i in seq(0, 4)) { Comparing obtained generic formulae eqref:eq:solution-2d and eqref:eq:solution-2d-full to the known formulae from linear wave theory allows -seeing the difference between velocity fields for both large and small amplitude +to see the difference between velocity fields for both large and small amplitude waves. In general analytic formula for velocity potential in not known, even for plain waves, so comparison is done numerically. Taking into account conclusions of [[#sec:pressure-2d]], only finite depth formulae are compared. @@ -1720,7 +1720,7 @@ the ones of real ocean waves. Theoretically, ocean waves themselves can be chosen as ACFs, the only pre-processing step is to make them decay exponentially. This may allow -generating waves of arbitrary profiles, and is one of the directions of future +to generate waves of arbitrary profiles, and is one of the directions of future work. * High-performance software implementation of ocean wave simulation @@ -1859,7 +1859,7 @@ synchronisation occurs after each step. Object pipeline speeds up the programme by parallel execution of code blocks that work with different compute devices: while the current part of wavy surface is generated by a processor, the previous part is written to a disk. This -approach allows getting speed-up because compute devices operate asynchronously, +approach allows to get speed-up because compute devices operate asynchronously, and their parallel usage increases the whole programme performance. Since data transfer between pipeline joints is done in parallel to computations, @@ -1868,7 +1868,7 @@ different parameters (generate several ocean wavy surfaces having different characteristics). In practise, high-performance applications do not always consume 100% of processor time spending a portion of time on synchronisation of parallel processes and writing data to disk. Using pipeline in this case allows -running several computations on the same set of processes, and use all of the +to run several computations on the same set of processes, and use all of the computer devices at maximal efficiency. For example, when one object writes data to a file, the other do computations on the processor in parallel. This minimises downtime of the processor and other computer devices and increases @@ -1946,7 +1946,7 @@ procedure call, and ~react~ method is a sequence of processor instructions after the call. Constructing and sending subordinate kernels to the pipeline is nested procedure call. Two methods are necessary to make calls asynchronous, and replace active wait for completion of subordinate kernels with passive one. -Pipelines, in turn, allow implementing passive wait, and call correct kernel +Pipelines, in turn, allow to implement passive wait, and call correct kernel methods by analysing their internal state. #+name: fig:subord-ppl @@ -2119,7 +2119,7 @@ efficient from the computer system point of view: the number of parts is either too large compared to the number of processors working in parallel, which increases data transfer overhead, or too small, which prevents using all available processor cores. Second, restrictions of problem being solved may not -allow splitting input data into even parts which may result in load imbalance +allow to split input data into even parts which may result in load imbalance across processor cores. Third, there are multiple components in the system aside from the processor that take part in the computation (such as vector co-processors and storage devices), and the problem solution time depends on the @@ -2160,7 +2160,7 @@ slices, and computations are performed with only a few transcendental functions. In case of MA model, performance is also increased by doing convolution with FFT transforms. So, high performance of ARMA model is due to scarce use of transcendental functions and heavy use of FFT, not to mention that high -convergence rate and non-existence of periodicity allows using far fewer +convergence rate and non-existence of periodicity allows to use far fewer coefficients compared to LH model. ARMA implementation uses several libraries of reusable mathematical functions @@ -2221,7 +2221,7 @@ of overlap of computation phase and data output phase (fig.\nbsp{}[[fig:factory- when computation is over, whereas load balancing algorithm makes both phases end almost simultaneously. So, /pipelined execution of internally parallel sequential phases is more efficient than their sequential execution/, and this -allows balancing the load across different devices involved in computation. +allows to balance the load across different devices involved in computation. #+name: fig:factory-performance #+begin_src R :results output graphics :exports results :file build/factory-vs-openmp.pdf @@ -2253,7 +2253,7 @@ arma.plot_factory_vs_openmp_overlap( #+RESULTS: fig:factory-overlap [[file:build/factory-vs-openmp-overlap.pdf]] -Proposed load balancing method for multi-core systems allows increasing +Proposed load balancing method for multi-core systems allows to increase performance of applications that read or write large volumes of data to disk, but may be used in other cases too. The main idea of the algorithm is to classify the load and find the suitable device to route the load to. So, any @@ -2318,7 +2318,7 @@ existing proposals\nbsp{}cite:brunekreef1996design,aguilera2001stable,romano2014 principal each node sends a message to the old principal and to the new one. - *Completely event-based.* The messages are sent only when some node fails, so there is no constant load on the network. Since the algorithm allows - tolerating failure of sending any message, there is no need in heartbeat + to tolerate failure of sending any message, there is no need in heartbeat packets indicating presence of a node in the network; instead, all messages play role of heartbeats and packet send time-out is adjusted. - *No manual configuration.* A node does not require any prior knowledge to find @@ -2614,13 +2614,13 @@ section [[#sec:node-discovery]]), and the load is distributed between direct neighbours: when one runs the kernel on the subordinate node, the principal node also receive some of its subordinate kernels. This makes the system symmetrical and easy to maintain: each node have the same set of software that allows -replacing one node with another in case of failure of the former. Similar +to replace one node with another in case of failure of the former. Similar architectural solution used in key-value stores\nbsp{}cite:anderson2010couchdb,lakshman2010cassandra to provide fault tolerance, but author does not know any task schedulers that use this approach. Unlike ~main~ function in programmes based on message passing library, the first (the main) kernel is initially run only on one node, and remote nodes are used -on-demand. This design choice allows having arbitrary number of nodes throughout +on-demand. This design choice allows to have arbitrary number of nodes throughout execution of a programme, and use more nodes for highly parallel parts of the code. Similar choice is made in the design of big data frameworks\nbsp{}cite:dean2008mapreduce,vavilapalli2013yarn\nbsp{}--- a user @@ -2660,16 +2660,17 @@ collect the resulting data from it. **** Handling nodes failures. Basic strategy to overcome a failure of a subordinate node is to restart corresponding kernels on a healthy node\nbsp{}--- a strategy employed by Erlang -language to restart failed subordinate processes\nbsp{}cite:armstrong2003thesis. In -order to implement this method in the framework of kernel hierarchy, sender node -saves every kernel that is sent to remote cluster nodes, and in an event of a -failure of any number of nodes, where kernels were sent, their copies are +language to restart failed subordinate processes\nbsp{}cite:armstrong2003thesis. +In order to implement this method in the framework of kernel hierarchy, sender +node saves every kernel that is sent to remote cluster nodes, and in an event of +a failure of any number of nodes, where kernels were sent, their copies are redistributed between the remaining nodes without custom handling by a programmer. If there are no nodes to sent kernels to, they are executed locally. So, in contrast to "heavy-weight" checkpoint/restart machinery employed by HPC cluster job schedulers, tree hierarchy of nodes coupled with hierarchy of -kernels allow automatic and transparent handling of any number of subordinate -node failures without restarting any processes of a parallel programme. +kernels allow to automatically and transparently handle of any number of +subordinate node failures without restarting any processes of a parallel +programme. A possible way of handling failure of the main node (a node where the main kernel is executed) is to replicate the main kernel to a backup node, and make @@ -2695,7 +2696,7 @@ checkpoint mechanism. The advantage of this approach is that it is low, - saves only relevant data, and - uses memory of a subordinate node rather than disk storage. -This simple approach allows tolerating at most one failure of /any/ cluster node +This simple approach allows to tolerate at most one failure of /any/ cluster node per computational step or arbitrary number of subordinate nodes at any time during programme execution. @@ -2848,7 +2849,7 @@ inapplicable for programmes with complicated logic. #+caption: Performance of hydrodynamics HPC application in the presence of node failures. #+RESULTS: fig:benchmark -The results of the benchmark allows concluding that /no matter a principal or a +The results of the benchmark allows to conclude that /no matter a principal or a subordinate node fails, the overall performance of a parallel programme roughly equals to the one without failures with the number of nodes minus one/, however, when a backup node fails performance penalty is much higher. @@ -2875,7 +2876,7 @@ physical hardware: it does not matter how many cluster nodes are currently available for a programme to run without interruption. Kernels eliminate the need to allocate a physical backup node to tolerate principal node failures: in the framework of kernel hierarchy any physical node (except the principal one) -can act as a backup one. Finally, kernels allow handling failures in a way that +can act as a backup one. Finally, kernels allow to handle failures in a way that is transparent to a programmer, deriving the order of actions from the internal state of a kernel.