iccsa-20-waves

git clone https://git.igankevich.com/iccsa-20-waves.git
Log | Files | Refs

commit 41a10483db6ab7ae5514d28b1354a1db7764bc55
parent 3983a887d6b3503214d3e8b24f01375856a5a9ea
Author: Петряков Иван <franceskoizump@gmail.com>
Date:   Sun,  8 Mar 2020 23:08:30 +0300

first page

Diffstat:
main.tex | 54++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 54 insertions(+), 0 deletions(-)

diff --git a/main.tex b/main.tex @@ -49,6 +49,60 @@ TODO \section{Introduction} \section{Methods} + +Virtual testbed is a program for personal computers. +Its main feature is to perform all calculations in real time, + paying attention to the high accuracy of calculations. +This is achieved by using graphical accelerator. +Generating Gerstner waves isn't an exception. +We implement algorithm for GPU, using OpenCL framework, + and regular CPU, with the ability to parallelization, using OpenMP framework. + +This algorithm consists of several parts. +First of all, we calculate wavy surface, according to our approach. +Then, we compute wetted panels, which are located under the calculated surface. +Finally, we find the buoyancy force, acting on a ship. +These steps are repeated in infinity loop, and this is how we get things worked. + +Let's consider process of computing wavy surface in more details. +Since we have an irregular structure of surface + (it means, that we store set of points, describing surface), + we just need to perform same formulas for each point of surface. +It is easy to do with C++ for CPU computation, but it takes some effort + to effectively run this algorithm with GPU acceleration. +Our first implementations was quiet slow, when we had about five iterations of global loop, + but now it is much more. + +Storage order is very important for GPU architecture. +Those algorithms are efficient, which are with sequential memory access. +In this way, we store our set of points in sequential order: one by one. +It is very obviosly statement, but we need it to keep in mind. +The next feature, that we use to increase performance, was built-in vector functions. +So, we don't need to implement custom vector functions to work with our large set of vectors, + and it leads to decreasing size of code and possible mistakes. +Besides, these functions are very fast, and that is how we get there acceleration. +The third feature, is cache managment. +Unlike CPU, GPU allows programmers to control it's own kind of L3 cache + (more precicely -- part of L3 cache), that is called "shared memory". +Moreover, in most cases, among of any algorithms, we have to manage shared memory to accelerate them. +A distinctive point of this kind of memory is that this memory has the smallest latency, + at the same time sharing data between some others computing unit, +As far as, memory bandwith remains a bottleneck, this kind of optimization would fit any situations. +In our case, summation occurs over the surface of the ship, + so we copy small pieces of it to shared memory. +By this action we reduce number of access to global memory, which has a much bigger latency. +Following these simple rules, we can easily implement efficient algorithm. +All we have to do is: + check storage order; + include vector operations, as much, as possible; + and finally, manage shared memory. + + + + + + + \section{Results} \section{Discussion} \section{Conclusion}