iccsa-19-vtestbed

git clone https://git.igankevich.com/iccsa-19-vtestbed.git
Log | Files | Refs

commit 99f0c70205c00ec15c28e43c8a793e76f2069cff
parent ba1023e37e7aad060ff3507ddb0efa6bea9624d0
Author: Ivan Gankevich <igankevich@ya.ru>
Date:   Sat, 23 Mar 2019 17:23:16 +0300

Results.

Diffstat:
main.tex | 53+++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 51 insertions(+), 2 deletions(-)

diff --git a/main.tex b/main.tex @@ -1,6 +1,7 @@ \documentclass[runningheads]{llncs} \usepackage{amsmath} +\usepackage{booktabs} \usepackage{graphicx} \usepackage{tikz} \usetikzlibrary{arrows.meta} @@ -300,7 +301,7 @@ local memory of the accelerator. Using this algorithm allowed us to store arrays of derivatives entirely in graphical accelerator's main memory and eliminate data transfer altogether. -\subsection{Translational and angular motion computation} +\subsection{Translational and angular ship motion computation} In order to compute ship position, translational velocity, angular displacement and angular velocity each time step we solve equations motion (adapted @@ -337,9 +338,53 @@ processor. \section{Results} +Virtual testbed performance was benchmarked in a number of tests. Since we use +both OpenMP and OpenCL technologies for parallel computing, we wanted to know +how performance scales with the number of processor cores and with and without +graphical accelerator. + +Graphical accelerators are divided into two broad categories: for general +purpose computations and for visualisation. Accelerators from the first +category typically have more double precision arithmetic units and accelerators +from the second category are typically optimised for single precision. The +ratio of single to double precision performance can be as high as 32. We ran +all tests on a node with Quadro P5000 (tab.~\ref{tab:setup}) which falls into +the second category, so we choose single precision in all benchmarks. + +\begin{table} + \centering + \caption{Hardware configuration and compiler options for + benchmarks.\label{tab:setup}} + \begin{tabular}{ll} + \toprule + Graphical accelerator & NVIDIA Quadro P5000 \\ + Processor & Intel Xeon CPU E5-2630 v4 \\ + Compiler & GCC 8.1.1 \\ + Compiler options & \texttt{-O3 -march=native} \\ + \bottomrule + \end{tabular} +\end{table} + +Double precision was used only for computing autoregressive model coefficients, +because roundoff and truncation numerical errors make covariance matrices (from +which coefficients are computed) non-positive definite. These matrices +typically have very large condition numbers, and linear system which they +represent cannot be solved by Gaussian elimination or \(LDLT\) Cholesky +decomposition, as these methods are numerically unstable. + +Since Virtual testbed does both visualisation and computation in real-time, we +measured performance of each stage of the main loop (fig.~\ref{fig:loop}) +synchronously with parameters that affect it. To assess computational +performance we measured execution time of each stage in microseconds (wall +clock time) together with the number of wetted panels, and wavy surface size. +To assess visualisation performance we measured the execution time of each +visualisation frame (one iteration of the visualisation main loop) and +execution time of computational frame (one iteration of the computational +loop), from which it is easy to compute the usual frames-per-second metric. + \begin{figure} \centering - \begin{tikzpicture}[x=2.2cm,y=-1.5cm] + \begin{tikzpicture}[x=2.2cm,y=-1.4cm] \node[Block] (s1) at (0,0) {\strut{}Wavy surface}; \node[Block] (s2) at (1,0) {\strut{}Autoreg. model}; \node[Block] (s3) at (2,0) {\strut{}Wave numbers}; @@ -362,6 +407,10 @@ processor. \caption{Virtual testbed main loop.\label{fig:loop}} \end{figure} +We ran all tests on the same node for increasing number of processor cores and +with and without graphical accelerator. The code was compiled with maximum +optimisation level including processor-specific optimisations which enabled +auto-vectorisation for further performance improvement. \section{Discussion}