Edit p1. - arma-thesis

commit 66bc238d84598cacaec371a8aea4833d5d0c467c
parent 8670c09717bb2fd3dfd6a2e0f0c60b6545533e13
Author: Ivan Gankevich <igankevich@ya.ru>
Date:   Wed,  1 Nov 2017 15:37:48 +0300

Edit p1.

Diffstat:
arma-thesis-ru.org  | 45 ++++++++++++++++++++++++++++++++-------------
arma-thesis.org  | 25 +++++++++++++------------

2 files changed, 45 insertions(+), 25 deletions(-)
diff --git a/arma-thesis-ru.org b/arma-thesis-ru.org
@@ -1902,13 +1902,13 @@ MPP части, от которых зависит данная, должны б
 #+name: tab-gpulab
 #+caption: Конфигурация системы "Gpulab".
 #+attr_latex: :booktabs t
-| Процессор                    | AMD FX-8370                      |
-| Память                       | 16ГБ                             |
-| Видеокарта                   | GeForce GTX 1060                 |
-| Память видеокарты            | 6ГБ                              |
-| Жесткий диск                 | WDC WD40EZRZ-00WN9B0, 5400об/мин |
-| Количество процессорных ядер | 4                                |
-| Количество потоков на ядро   | 2                                |
+| Процессор                    | AMD FX-8370                        |
+| Память процессора            | 16ГБ                               |
+| Видеокарта                   | GeForce GTX 1060                   |
+| Память видеокарты            | 6ГБ                                |
+| Жесткий диск                 | WDC WD40EZRZ-00WN9B0, 5400об./мин. |
+| Количество процессорных ядер | 4                                  |
+| Количество потоков на ядро   | 2                                  |
 
 Программа АРСС использует несколько библиотек математических функций, численных
 алгоритмов и примитивов визуализации (перечисленных в таб.\nbsp{}[[tab-arma-libs]]),
@@ -2194,15 +2194,32 @@ arma.plot_io_events(names)
 :header-args:R: :results output raw :exports results
 :END:
 
+Эксперименты показали, что реализация OpenCL превосходит OpenMP по
+производительности в 10--15 раз (рис.\nbsp{}[[fig-arma-realtime-graph]]), однако,
+распределение времени работы между подпрограммами отличается
+(таб.\nbsp{}[[tab-arma-realtime]]). В случае процессора больше всего времени
+тратится на вычисление \(g_1\), а в случае видеокарты время вычисления \(g_1\)
+сопоставимо с \(g_2\). Копирование результирующего поля потенциала скорости
+между процессором и видеокартой занимает \(\approx{}20\%\) общего времени
+вычисления этого поля. Вычисление \(g_2\) занимает больше всего времени в случае
+OpenCL и меньше всего времени в случае OpenMP. В обоих реализациях \(g_2\)
+вычислятется на центральном процессоре, поскольку готовая подпрограмма
+вычисления производной на OpenCL не была найдена. В случае OpenCL результат
+вычислений дублируется для каждой точки сетки по оси \(z\), для того чтобы
+переменожить все точки одного временного среза в одной подпрограмме OpenCL, а,
+затем, копируется в память видеокарты, что негативно сказывается на
+производительности. Все тесты запускались на машине с видеокартой,
+характеристики которой просуммированы в таб.\nbsp{}[[tab-storm]].
+
 #+name: tab-storm
 #+caption: Конфигурация вычислительной системы "Storm".
 #+attr_latex: :booktabs t
-| CPU              | Intel Core 2 Quad Q9550     |
-| RAM              | 8Gb                         |
-| GPU              | AMD Radeon R7 360           |
-| GPU memory       | 2GB                         |
-| HDD              | Seagate Barracuda, 7200 rpm |
-| No. of CPU cores | 4                           |
+| Процессор                    | Intel Core 2 Quad Q9550          |
+| Память процессора            | 8ГБ                              |
+| Видеокарта                   | AMD Radeon R7 360                |
+| Память видеокарты            | 2ГБ                              |
+| Жесткий диск                 | Seagate Barracuda, 7200 об./мин. |
+| Количество процессорных ядер | 4                                |
 
 #+name: fig-arma-realtime-graph
 #+header: :results output graphics
@@ -2219,6 +2236,8 @@ title(xlab="Размер взволнованной поверхности по 
 #+RESULTS: fig-arma-realtime-graph
 [[file:build/realtime-performance-ru.pdf]]
 
+
+
 #+name: tab-arma-realtime
 #+begin_src R
 source(file.path("R", "benchmarks.R"))
diff --git a/arma-thesis.org b/arma-thesis.org
@@ -2147,18 +2147,19 @@ found.
 :END:
 
 The experiments showed that OpenCL outperforms OpenMP implementation by a factor
-of 10--15 (fig.\nbsp{}[[fig-arma-realtime-graph]]), however, distribution of time
-between computation stages is different for each implementation
-(table\nbsp{}[[tab-arma-realtime]]). The major time consumer on CPU is \(g_1\),
-whereas in GPU its running time is comparable to \(g_2\). Copying the resulting
-velocity potential field between CPU and GPU consumes \(\approx{}20\%\) of
-solver execution time. \(g_2\) consumes the most of the execution time for
-OpenCL solver, and \(g_1\) for OpenMP solver. In both implementations \(g_2\) is
-computed on CPU, but for GPU implementation the result is duplicated for each
-\(z\) grid point in order to perform multiplication of all \(XYZ\) planes along
-\(z\) dimension in single OpenCL kernel, and, subsequently copied to GPU memory
-which severely hinders the overall performance. All benchmarks were run on a
-machine equipped with a GPU, characteristics of which are summarised in
+of 10--15 (fig.\nbsp{}[[fig-arma-realtime-graph]]), however, distribution of running
+time between subroutines is different (table\nbsp{}[[tab-arma-realtime]]). For CPU
+the most of the time is spent to compute \(g_1\), whereas for GPU time spent to
+compute \(g_1\) is comparable to \(g_2\). Copying the resulting velocity
+potential field between CPU and GPU consumes \(\approx{}20\%\) of the total
+field computation time. Computing \(g_2\) consumes the most of the execution
+time for OpenCL and the least of the time for OpenMP. In both implementations
+\(g_2\) is computed on CPU, because subroutine for derivative computation in
+OpenCL was not found. For OpenCL the result is duplicated for each \(z\) grid
+point in order to perform multiplication of all \(XYZ\) planes along \(z\)
+dimension in single OpenCL kernel, and, subsequently copied to GPU memory which
+has negative impact on performance. All benchmarks were run on a machine
+equipped with a GPU, characteristics of which are summarised in
 table\nbsp{}[[tab-storm]].
 
 #+name: tab-storm

	arma-thesis
	git clone https://git.igankevich.com/arma-thesis.git
	Log \| Files \| Refs \| LICENSE

arma-thesis-ru.org	\|	45	++++++++++++++++++++++++++++++++-------------
arma-thesis.org	\|	25	+++++++++++++------------