Describe LH model parallel algorithm. Summarise. Write intro.

commit f3f78b3000479f6010baa99e3a83d9382fe4cbbe
parent 4e6e34a3aca251969f2f10a274ce58939215fa5e
Author: Ivan Gankevich <igankevich@ya.ru>
Date:   Mon,  7 Aug 2017 17:11:30 +0300

Describe LH model parallel algorithm. Summarise. Write intro.

Diffstat:
arma-thesis.org  | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++

1 file changed, 50 insertions(+), 0 deletions(-)
diff --git a/arma-thesis.org b/arma-thesis.org
@@ -3481,10 +3481,60 @@ scheduler, in which
 - and jobs are executed by a simple thread pool in which each thread removes the
   first job for which all dependent jobs have completed and executes it.
 
+In contrast to AR model, MA model does not have autoregressive dependencies
+between points, instead, each surface points depends on previous in time and
+space white noise values. MA model's formula allows for rewriting it as a
+convolution of white noise with the coefficients as a kernel. Using convolution
+theorem, the convolution is rewritten as inverse Fourier transform of the
+product of Fourier transforms of white noise and coefficients. Since the number
+of MA coefficients is much smaller than the number of wavy surface points,
+parallel FFT implementation is not suitable here, as it requires padding the
+coefficients with noughts to match the size of the surface. Instead, the surface
+is divided into parts along each dimension which are padded with noughts to
+match the number of the coefficients along each dimension multiplied by two.
+Then Fourier transform of each part is computed in parallel, multiplied by
+previously computed Fourier transform of the coefficients, and inverse Fourier
+transform of the result is computed. After that, each part is written to the
+output array with overlapping points (due to padding) added to each other. This
+algorithm is commonly known in signal processing as
+"overlap-add"\nbsp{}cite:svoboda2011efficient. Padding with noughts is needed to
+prevent aliasing errors: without it the result would be circular convolution.
+
+Despite the fact that MA model algorithm partitions the surface into the same
+parts (but possible of different sizes), the vicinity of autoregressive
+dependencies between them allows to compute them in parallel without the use of
+specialised job scheduler. However, it requires padding with noughts to make the
+result correspond to the original MA model's formula. So, MA model's algorithm
+is more scalable to a large number of nodes as it has less dependencies between
+parts computed in parallel, but the size of the parts is greater than in AR
+model, so they are slower to compute.
+
+The distinct feature of LH model's algorithm is its simplicity: to make it
+parallel, the surface is partitioned into parts of equal sizes and each part is
+computed in parallel. There are no dependencies between parts, which makes this
+algorithm particularly suitable for computation on GPU: each hardware thread
+simply computes its own point. In addition, sine and cosine functions in the
+model's formula which are slow to compute on CPU, make GPU even more favourable
+choice.
+
+To summarise, even though AR and MA models are part of the mixed ARMA model,
+their parallel algorithms are fundamentally different and are more complicated
+than trivial parallel algorithm of LH model. Efficient AR algorithm requires
+specialised job scheduler to manage autoregressive dependencies between wavy
+surface parts, whereas MA algorithm requires padding part with noughts to be
+able to compute them in parallel. In contrast to these models, LH model has no
+dependencies between parts computed in parallel, but requires more computational
+power (floating point operations per seconds).
+
 **** Performance of OpenMP and OpenCL implementations.
 :PROPERTIES:
 :header-args:R: :results output org
 :END:
+
+Differences in models' parallel algorithms make them efficient on different
+processor architectures, and to find the most efficient one all the models were
+benchmarked in both CPU and GPU.
+
 ARMA model does not require highly optimised software implementation to be
 efficient, its performance is high even without use of co-processors; there are
 two main causes of that. First, ARMA model itself does not use transcendental

	arma-thesis
	git clone https://git.igankevich.com/arma-thesis.git
	Log \| Files \| Refs \| LICENSE