commit 0d571bba049d0510562ad9133b5f56d22d3d49df
parent d5f4f99678fbe1b59b8b8100bd6727aeee08dcf6
Author: Ivan Gankevich <igankevich@ya.ru>
Date: Mon, 27 Feb 2017 10:23:35 +0300
Replace "allow doing" with "allow to do".
Diffstat:
phd-diss.org | | | 65 | +++++++++++++++++++++++++++++++++-------------------------------- |
1 file changed, 33 insertions(+), 32 deletions(-)
diff --git a/phd-diss.org b/phd-diss.org
@@ -328,7 +328,7 @@ was still much work to be done to make it useful in practice.
4. Finally, verify wavy surface integral characteristics to match the ones of
real ocean waves.
5. In the final stage, develop software programme that implements ARMA model and
- pressure calculation method, and allows running simulations on both shared
+ pressure calculation method, and allows to run simulations on both shared
memory (SMP) and distributed memory (MPP) computer systems.
**** Scientific novelty.
@@ -351,7 +351,7 @@ software.
1. Since pressure field formula is derived for discrete wavy surface and without
assumptions about wave amplitudes, it is applicable to any wavy surface of
incompressible inviscid fluid (in particular, it is applicable to wavy
- surface generated by LH model). This allows using pressure field formula
+ surface generated by LH model). This allows to use pressure field formula
without being tied to ARMA model.
2. From computational point of view this formula is more efficient than the
corresponding formula for LH model, because integrals in it are reduced to
@@ -370,13 +370,13 @@ Software implementation of ARMA model and pressure field formula was created
incrementally: a prototype written in high-level engineering language\nbsp{}cite:mathematica10,octave2015 was rewritten in lower level language (C++).
Implementation of the same algorithm and formulae in languages of varying
levels (which involves usage of different abstractions and language primitives)
-allows correcting errors, which would left unnoticed otherwise. Wavy surface,
+allows to correct errors, which would left unnoticed otherwise. Wavy surface,
generated by ARMA model, as well as all input parameters (ACF, distribution of
wave elevation etc.) were inspected via graphical means built into the
programming language allowing visual control of programme correctness.
**** Theses for the defence.
-- Wind wave model which allows generating wavy surface realisations with large
+- Wind wave model which allows to generate wavy surface realisations with large
period and consisting of wave of arbitrary amplitudes;
- Pressure field formulae derived for this model without assumptions of linear
wave theory;
@@ -528,7 +528,7 @@ is mostly a different problem.
profiles. So, model verification includes distributions of various parameters
of generated waves (lengths, heights, periods etc.).
Multi-dimensionality of investigated model not only complexifies the task, but
-also allows carrying out visual validation of generated wavy surface. It is the
+also allows to carry out visual validation of generated wavy surface. It is the
opportunity to visualise output of the programme that allowed to ensure that
generated surface is compatible with real ocean surface, and is not abstract
multi-dimensional stochastic process that is real only statistically.
@@ -553,7 +553,7 @@ equation of motion is solely used to determine pressures for calculated velocity
potential derivatives. The assumption of small amplitudes means the slow decay
of wind wave coherence function, i.e. small change of local wave number in time
and space compared to the wavy surface elevation (\(z\) coordinate). This
-assumption allows calculating elevation \(z\) derivative as \(\zeta_z=k\zeta\),
+assumption allows to calculate elevation \(z\) derivative as \(\zeta_z=k\zeta\),
where \(k\) is wave number. In two-dimensional case the solution is written
explicitly as
\begin{align}
@@ -596,7 +596,7 @@ arbitrary-amplitude waves.
:CUSTOM_ID: linearisation
:END:
-LH model allows deriving an explicit formula for velocity field by linearising
+LH model allows to derive an explicit formula for velocity field by linearising
kinematic boundary condition. Velocity potential formula is written as
\begin{equation*}
\phi(x,y,z,t) = \sum_n \frac{c_n g}{\omega_n}
@@ -841,7 +841,7 @@ process might increase model precision, which is one of the objectives of the
future research.
** Modelling non-linearity of ocean waves
-ARMA model allows modelling asymmetry of wave elevation distribution, i.e.
+ARMA model allows to model asymmetry of wave elevation distribution, i.e.
generate ocean waves, distribution of z-coordinate of which has non-nought
kurtosis and asymmetry. Such distribution is inherent to real ocean waves\nbsp{}cite:longuet1963nonlinear.
@@ -1106,7 +1106,7 @@ Check the validity of derived formulae by substituting \(\zeta(x,t)\) with known
analytic formula for plain waves. Symbolic computation of Fourier transforms in
this section were performed in Mathematica\nbsp{}cite:mathematica10. In the framework
of linear wave theory assume that waves have small amplitude compared to their
-lengths, which allows us simplifying initial system of equations
+lengths, which allows us to simplify initial system of equations
eqref:eq:problem-2d to
\begin{align*}
& \phi_{xx}+\phi_{zz}=0,\\
@@ -1482,7 +1482,7 @@ transform ACF; relative error without interpolation is \(10^{-5}\).
In order to eliminate periodicity from generated wavy surface, it is imperative
to use PRNG with sufficiently large period to generate white noise. Parallel
Mersenne Twister\nbsp{}cite:matsumoto1998mersenne with a period of \(2^{19937}-1\) is
-used as a generator in this work. It allows producing aperiodic ocean wavy
+used as a generator in this work. It allows to produce aperiodic ocean wavy
surface realisations in any practical usage scenarios.
There is no guarantee that multiple Mersenne Twisters executed in parallel
@@ -1551,7 +1551,7 @@ this approach does not work here, because applying inverse Fourier transform to
this representation does not produce exponent, which severely warp resulting
velocity field. In order to get unique analytic definition normalisation factor
\(1/\Sinh{2\pi{u}{h}}\) (which is also included in formula for \(E(u)\)) may be
-used. Despite the fact that normalisation allows obtaining adequate velocity
+used. Despite the fact that normalisation allows to obtain adequate velocity
potential field, numerical experiments show that there is little difference
between this field and the one produced by formulae from linear wave theory, in
which terms with \(\zeta\) are omitted.
@@ -1667,7 +1667,7 @@ for (i in seq(0, 4)) {
Comparing obtained generic formulae eqref:eq:solution-2d and
eqref:eq:solution-2d-full to the known formulae from linear wave theory allows
-seeing the difference between velocity fields for both large and small amplitude
+to see the difference between velocity fields for both large and small amplitude
waves. In general analytic formula for velocity potential in not known, even for
plain waves, so comparison is done numerically. Taking into account conclusions
of [[#sec:pressure-2d]], only finite depth formulae are compared.
@@ -1720,7 +1720,7 @@ the ones of real ocean waves.
Theoretically, ocean waves themselves can be chosen as ACFs, the only
pre-processing step is to make them decay exponentially. This may allow
-generating waves of arbitrary profiles, and is one of the directions of future
+to generate waves of arbitrary profiles, and is one of the directions of future
work.
* High-performance software implementation of ocean wave simulation
@@ -1859,7 +1859,7 @@ synchronisation occurs after each step.
Object pipeline speeds up the programme by parallel execution of code blocks
that work with different compute devices: while the current part of wavy surface
is generated by a processor, the previous part is written to a disk. This
-approach allows getting speed-up because compute devices operate asynchronously,
+approach allows to get speed-up because compute devices operate asynchronously,
and their parallel usage increases the whole programme performance.
Since data transfer between pipeline joints is done in parallel to computations,
@@ -1868,7 +1868,7 @@ different parameters (generate several ocean wavy surfaces having different
characteristics). In practise, high-performance applications do not always
consume 100% of processor time spending a portion of time on synchronisation of
parallel processes and writing data to disk. Using pipeline in this case allows
-running several computations on the same set of processes, and use all of the
+to run several computations on the same set of processes, and use all of the
computer devices at maximal efficiency. For example, when one object writes data
to a file, the other do computations on the processor in parallel. This
minimises downtime of the processor and other computer devices and increases
@@ -1946,7 +1946,7 @@ procedure call, and ~react~ method is a sequence of processor instructions after
the call. Constructing and sending subordinate kernels to the pipeline is nested
procedure call. Two methods are necessary to make calls asynchronous, and
replace active wait for completion of subordinate kernels with passive one.
-Pipelines, in turn, allow implementing passive wait, and call correct kernel
+Pipelines, in turn, allow to implement passive wait, and call correct kernel
methods by analysing their internal state.
#+name: fig:subord-ppl
@@ -2119,7 +2119,7 @@ efficient from the computer system point of view: the number of parts is either
too large compared to the number of processors working in parallel, which
increases data transfer overhead, or too small, which prevents using all
available processor cores. Second, restrictions of problem being solved may not
-allow splitting input data into even parts which may result in load imbalance
+allow to split input data into even parts which may result in load imbalance
across processor cores. Third, there are multiple components in the system aside
from the processor that take part in the computation (such as vector
co-processors and storage devices), and the problem solution time depends on the
@@ -2160,7 +2160,7 @@ slices, and computations are performed with only a few transcendental functions.
In case of MA model, performance is also increased by doing convolution with FFT
transforms. So, high performance of ARMA model is due to scarce use of
transcendental functions and heavy use of FFT, not to mention that high
-convergence rate and non-existence of periodicity allows using far fewer
+convergence rate and non-existence of periodicity allows to use far fewer
coefficients compared to LH model.
ARMA implementation uses several libraries of reusable mathematical functions
@@ -2221,7 +2221,7 @@ of overlap of computation phase and data output phase (fig.\nbsp{}[[fig:factory-
when computation is over, whereas load balancing algorithm makes both phases end
almost simultaneously. So, /pipelined execution of internally parallel
sequential phases is more efficient than their sequential execution/, and this
-allows balancing the load across different devices involved in computation.
+allows to balance the load across different devices involved in computation.
#+name: fig:factory-performance
#+begin_src R :results output graphics :exports results :file build/factory-vs-openmp.pdf
@@ -2253,7 +2253,7 @@ arma.plot_factory_vs_openmp_overlap(
#+RESULTS: fig:factory-overlap
[[file:build/factory-vs-openmp-overlap.pdf]]
-Proposed load balancing method for multi-core systems allows increasing
+Proposed load balancing method for multi-core systems allows to increase
performance of applications that read or write large volumes of data to disk,
but may be used in other cases too. The main idea of the algorithm is to
classify the load and find the suitable device to route the load to. So, any
@@ -2318,7 +2318,7 @@ existing proposals\nbsp{}cite:brunekreef1996design,aguilera2001stable,romano2014
principal each node sends a message to the old principal and to the new one.
- *Completely event-based.* The messages are sent only when some node fails, so
there is no constant load on the network. Since the algorithm allows
- tolerating failure of sending any message, there is no need in heartbeat
+ to tolerate failure of sending any message, there is no need in heartbeat
packets indicating presence of a node in the network; instead, all messages
play role of heartbeats and packet send time-out is adjusted.
- *No manual configuration.* A node does not require any prior knowledge to find
@@ -2614,13 +2614,13 @@ section [[#sec:node-discovery]]), and the load is distributed between direct
neighbours: when one runs the kernel on the subordinate node, the principal node
also receive some of its subordinate kernels. This makes the system symmetrical
and easy to maintain: each node have the same set of software that allows
-replacing one node with another in case of failure of the former. Similar
+to replace one node with another in case of failure of the former. Similar
architectural solution used in key-value stores\nbsp{}cite:anderson2010couchdb,lakshman2010cassandra to provide fault tolerance, but
author does not know any task schedulers that use this approach.
Unlike ~main~ function in programmes based on message passing library, the first
(the main) kernel is initially run only on one node, and remote nodes are used
-on-demand. This design choice allows having arbitrary number of nodes throughout
+on-demand. This design choice allows to have arbitrary number of nodes throughout
execution of a programme, and use more nodes for highly parallel parts of the
code. Similar choice is made in the design of big data
frameworks\nbsp{}cite:dean2008mapreduce,vavilapalli2013yarn\nbsp{}--- a user
@@ -2660,16 +2660,17 @@ collect the resulting data from it.
**** Handling nodes failures.
Basic strategy to overcome a failure of a subordinate node is to restart
corresponding kernels on a healthy node\nbsp{}--- a strategy employed by Erlang
-language to restart failed subordinate processes\nbsp{}cite:armstrong2003thesis. In
-order to implement this method in the framework of kernel hierarchy, sender node
-saves every kernel that is sent to remote cluster nodes, and in an event of a
-failure of any number of nodes, where kernels were sent, their copies are
+language to restart failed subordinate processes\nbsp{}cite:armstrong2003thesis.
+In order to implement this method in the framework of kernel hierarchy, sender
+node saves every kernel that is sent to remote cluster nodes, and in an event of
+a failure of any number of nodes, where kernels were sent, their copies are
redistributed between the remaining nodes without custom handling by a
programmer. If there are no nodes to sent kernels to, they are executed locally.
So, in contrast to "heavy-weight" checkpoint/restart machinery employed by HPC
cluster job schedulers, tree hierarchy of nodes coupled with hierarchy of
-kernels allow automatic and transparent handling of any number of subordinate
-node failures without restarting any processes of a parallel programme.
+kernels allow to automatically and transparently handle of any number of
+subordinate node failures without restarting any processes of a parallel
+programme.
A possible way of handling failure of the main node (a node where the main
kernel is executed) is to replicate the main kernel to a backup node, and make
@@ -2695,7 +2696,7 @@ checkpoint mechanism. The advantage of this approach is that it
is low,
- saves only relevant data, and
- uses memory of a subordinate node rather than disk storage.
-This simple approach allows tolerating at most one failure of /any/ cluster node
+This simple approach allows to tolerate at most one failure of /any/ cluster node
per computational step or arbitrary number of subordinate nodes at any time
during programme execution.
@@ -2848,7 +2849,7 @@ inapplicable for programmes with complicated logic.
#+caption: Performance of hydrodynamics HPC application in the presence of node failures.
#+RESULTS: fig:benchmark
-The results of the benchmark allows concluding that /no matter a principal or a
+The results of the benchmark allows to conclude that /no matter a principal or a
subordinate node fails, the overall performance of a parallel programme roughly
equals to the one without failures with the number of nodes minus one/, however,
when a backup node fails performance penalty is much higher.
@@ -2875,7 +2876,7 @@ physical hardware: it does not matter how many cluster nodes are currently
available for a programme to run without interruption. Kernels eliminate the
need to allocate a physical backup node to tolerate principal node failures: in
the framework of kernel hierarchy any physical node (except the principal one)
-can act as a backup one. Finally, kernels allow handling failures in a way that
+can act as a backup one. Finally, kernels allow to handle failures in a way that
is transparent to a programmer, deriving the order of actions from the internal
state of a kernel.