commit 0c452d629abdbd7d08d7c30c658ebbe817cec0a9
parent 2be05f7f4becfabacb39a33bdc60c112d5cf400d
Author: Ivan Gankevich <igankevich@ya.ru>
Date: Fri, 27 Jan 2017 17:16:07 +0300
Justify high performance.
Diffstat:
1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/phd-diss.org b/phd-diss.org
@@ -1704,7 +1704,6 @@ satisfactory results without restriction on wave amplitudes.
[[file:build/low-amp-nocolor.eps]]
[[file:build/high-amp-nocolor.eps]]
#+end_figure
-
*** Non-physical nature of ARMA model
ARMA model, owing to its non-physical nature, does not have the notion of ocean
wave; it simulates wavy surface as a whole instead. Motions of individual waves
@@ -1996,7 +1995,23 @@ such kernels are not present in ARMA model implementation.
*** Evaluation
**** Performance of MPI, OpenMP, OpenCL implementations.
-**** Performance of load balancing method.
+ARMA model does not require highly optimised software implementation to be
+efficient, its performance is high even without use of co-processors; there are
+two main causes of that. First, ARMA model itself does not use transcendental
+functions (sines, cosines and exponents) as opposed to LH model. All
+calculations (except model coefficients) are done via polynomials, which can be
+efficiently computed on modern processors using a series of FMA instructions.
+Second, pressure computation is done via explicit analytic formula using nested
+FFTs. Since two-dimensional FFT of the same size is repeatedly applied to every
+time slice, its coefficients (complex exponents) are pre-computed for all
+slices, and computations are performed with only a few transcendental functions.
+In case of MA model, performance is also increased by doing convolution with FFT
+transforms. So, high performance of ARMA model is due to scarce use of
+transcendental functions and heavy use of FFT, not to mention that high
+convergence rate and non-existence of periodicity allows using far fewer
+coefficients compared to LH model.
+
+**** Performance of load balancing algorithm.
Software implementation of wavy surface generation is balanced in terms of the
load on processor cores, however, as shown by tests, has high load on storage
device. Before testing wavy surface generation was implemented using OpenMP for
@@ -2593,6 +2608,7 @@ communication in the presence of node failures cite:fekete1993impossibility.
| <<<OpenMP>>> | Open Multi-Processing |
| <<<MPI>>> | Message Passing Interface |
| <<<POSIX>>> | Portable Operating System |
+| <<<FMA>>> | Fused multiply-add |
#+begin_export latex
\input{postamble}