Results. - iccsa-19-vtestbed

commit 99f0c70205c00ec15c28e43c8a793e76f2069cff
parent ba1023e37e7aad060ff3507ddb0efa6bea9624d0
Author: Ivan Gankevich <igankevich@ya.ru>
Date:   Sat, 23 Mar 2019 17:23:16 +0300

Results.

Diffstat:
main.tex  | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++--

1 file changed, 51 insertions(+), 2 deletions(-)
diff --git a/main.tex b/main.tex
@@ -1,6 +1,7 @@
 \documentclass[runningheads]{llncs}
 
 \usepackage{amsmath}
+\usepackage{booktabs}
 \usepackage{graphicx}
 \usepackage{tikz}
 \usetikzlibrary{arrows.meta}
@@ -300,7 +301,7 @@ local memory of the accelerator. Using this algorithm allowed us to store
 arrays of derivatives entirely in graphical accelerator's main memory and
 eliminate data transfer altogether.
 
-\subsection{Translational and angular motion computation}
+\subsection{Translational and angular ship motion computation}
 
 In order to compute ship position, translational velocity, angular displacement
 and angular velocity each time step we solve equations motion (adapted
@@ -337,9 +338,53 @@ processor.
 
 \section{Results}
 
+Virtual testbed performance was benchmarked in a number of tests. Since we use
+both OpenMP and OpenCL technologies for parallel computing, we wanted to know
+how performance scales with the number of processor cores and with and without
+graphical accelerator.
+
+Graphical accelerators are divided into two broad categories: for general
+purpose computations and for visualisation. Accelerators from the first
+category typically have more double precision arithmetic units and accelerators
+from the second category are typically optimised for single precision. The
+ratio of single to double precision performance can be as high as 32. We ran
+all tests on a node with Quadro P5000 (tab.~\ref{tab:setup}) which falls into
+the second category, so we choose single precision in all benchmarks.
+
+\begin{table}
+	\centering
+	\caption{Hardware configuration and compiler options for
+	benchmarks.\label{tab:setup}}
+	\begin{tabular}{ll}
+		\toprule
+		Graphical accelerator   & NVIDIA Quadro P5000 \\
+		Processor               & Intel Xeon CPU E5-2630 v4 \\
+		Compiler                & GCC 8.1.1 \\
+		Compiler options        & \texttt{-O3 -march=native} \\
+		\bottomrule
+	\end{tabular}
+\end{table}
+
+Double precision was used only for computing autoregressive model coefficients,
+because roundoff and truncation numerical errors make covariance matrices (from
+which coefficients are computed) non-positive definite. These matrices
+typically have very large condition numbers, and linear system which they
+represent cannot be solved by Gaussian elimination or \(LDLT\) Cholesky
+decomposition, as these methods are numerically unstable.
+
+Since Virtual testbed does both visualisation and computation in real-time, we
+measured performance of each stage of the main loop (fig.~\ref{fig:loop})
+synchronously with parameters that affect it. To assess computational
+performance we measured execution time of each stage in microseconds (wall
+clock time) together with the number of wetted panels, and wavy surface size.
+To assess visualisation performance we measured the execution time of each
+visualisation frame (one iteration of the visualisation main loop) and
+execution time of computational frame (one iteration of the computational
+loop), from which it is easy to compute the usual frames-per-second metric.
+
 \begin{figure}
 	\centering
-	\begin{tikzpicture}[x=2.2cm,y=-1.5cm]
+	\begin{tikzpicture}[x=2.2cm,y=-1.4cm]
 		\node[Block] (s1) at (0,0) {\strut{}Wavy surface};
 		\node[Block] (s2) at (1,0) {\strut{}Autoreg. model};
 		\node[Block] (s3) at (2,0) {\strut{}Wave numbers};
@@ -362,6 +407,10 @@ processor.
 	\caption{Virtual testbed main loop.\label{fig:loop}}
 \end{figure}
 
+We ran all tests on the same node for increasing number of processor cores and
+with and without graphical accelerator. The code was compiled with maximum
+optimisation level including processor-specific optimisations which enabled
+auto-vectorisation for further performance improvement.
 
 \section{Discussion}

	iccsa-19-vtestbed
	git clone https://git.igankevich.com/iccsa-19-vtestbed.git
	Log \| Files \| Refs