commit 41a10483db6ab7ae5514d28b1354a1db7764bc55
parent 3983a887d6b3503214d3e8b24f01375856a5a9ea
Author: Петряков Иван <franceskoizump@gmail.com>
Date: Sun, 8 Mar 2020 23:08:30 +0300
first page
Diffstat:
main.tex | | | 54 | ++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
1 file changed, 54 insertions(+), 0 deletions(-)
diff --git a/main.tex b/main.tex
@@ -49,6 +49,60 @@ TODO
\section{Introduction}
\section{Methods}
+
+Virtual testbed is a program for personal computers.
+Its main feature is to perform all calculations in real time,
+ paying attention to the high accuracy of calculations.
+This is achieved by using graphical accelerator.
+Generating Gerstner waves isn't an exception.
+We implement algorithm for GPU, using OpenCL framework,
+ and regular CPU, with the ability to parallelization, using OpenMP framework.
+
+This algorithm consists of several parts.
+First of all, we calculate wavy surface, according to our approach.
+Then, we compute wetted panels, which are located under the calculated surface.
+Finally, we find the buoyancy force, acting on a ship.
+These steps are repeated in infinity loop, and this is how we get things worked.
+
+Let's consider process of computing wavy surface in more details.
+Since we have an irregular structure of surface
+ (it means, that we store set of points, describing surface),
+ we just need to perform same formulas for each point of surface.
+It is easy to do with C++ for CPU computation, but it takes some effort
+ to effectively run this algorithm with GPU acceleration.
+Our first implementations was quiet slow, when we had about five iterations of global loop,
+ but now it is much more.
+
+Storage order is very important for GPU architecture.
+Those algorithms are efficient, which are with sequential memory access.
+In this way, we store our set of points in sequential order: one by one.
+It is very obviosly statement, but we need it to keep in mind.
+The next feature, that we use to increase performance, was built-in vector functions.
+So, we don't need to implement custom vector functions to work with our large set of vectors,
+ and it leads to decreasing size of code and possible mistakes.
+Besides, these functions are very fast, and that is how we get there acceleration.
+The third feature, is cache managment.
+Unlike CPU, GPU allows programmers to control it's own kind of L3 cache
+ (more precicely -- part of L3 cache), that is called "shared memory".
+Moreover, in most cases, among of any algorithms, we have to manage shared memory to accelerate them.
+A distinctive point of this kind of memory is that this memory has the smallest latency,
+ at the same time sharing data between some others computing unit,
+As far as, memory bandwith remains a bottleneck, this kind of optimization would fit any situations.
+In our case, summation occurs over the surface of the ship,
+ so we copy small pieces of it to shared memory.
+By this action we reduce number of access to global memory, which has a much bigger latency.
+Following these simple rules, we can easily implement efficient algorithm.
+All we have to do is:
+ check storage order;
+ include vector operations, as much, as possible;
+ and finally, manage shared memory.
+
+
+
+
+
+
+
\section{Results}
\section{Discussion}
\section{Conclusion}